Lecture 7
Dr. Elijah Meyer
Duke University
STA 199 - Spring 2023
Feburary 3rd, 2023
Clone your ae-06
project in RStudio
Lab 2 due Tuesday 2-7
Don’t work on Lab-2 during Lab-3
Groups are coming after Exam 1.
– Data Wrangling Practice
– Exam 1 - Feb 10th
– Friday Review (Survey Coming on Sakai)
– ggplot practice
– Practice re-creating graphs
– Understand pivot_longer
– Understand pivot_wider
– In ae-06 project, open up ae-05-fisheries.qmd.
– Run every code chunk up until: Warm up starts here!
– Recreate the following plot below using the fisheries_summary
data set.
– Wide data contains values that do not repeat in the first column
– Long data contains values that do repeat in the first column
– Which have we typically used to create plots in this class?
There are three interrelated rules that make a dataset tidy:
Each variable is a column; each column is a variable.
Each observation is row; each row is an observation.
Each value is a cell; each cell is a single value.
– Sometimes, data are not in this format…
– pivot_longer
– pivot_wider
– Making tables for quick comparison / display purposes
– names_to
– values_to
When pivoting longer, variable names that turn into values are characters by default. If you need them to be in another format, you need to explicitly make that transformation, which you can do so within the pivot_longer()
function.
You can tweak a plot forever, but at some point the tweaks are likely not very productive. However, you should always be critical of defaults (however pretty they might be) and see if you can improve the plot to better portray your data / results / what you want to communicate.
pivot_wider()
which makes data sets wider by increasing columns and reducing rows. pivot_wider()
has the opposite interface to pivot_longer(): we need to provide the existing columns that define the values (values_from) and the column name (names_from).