Lecture 9
Dr. Elijah Meyer
Duke University
STA 199 - Spring 2023
February 10th, 2023
– Clone exam review repo
– This is an individual exam
– With the exception of major emergencies, late submissions will not be accepted. A last-minute technical issue is not a major emergency.
– Turn in via PDF. If you fail to do so, we will grade your latest commit and issue a penalty
– Include appropriate labels, titles, etc. when making any plot
– Clarification questions are welcome. Debuging is not
– Cite any code you obtain outside of the course materials
– Pull, Commit, Push often (after every question)
– Look at what’s rendered!
– group_by
– mutate
– summarize
– Pivots
– Joins
– Relationship Discussion
– Join (2 data sets) vs Pivot (1 data set)
– Many ways to join data sets vs Two ways to pivot data
– In the exam-review.qmd
, join our two fake data sets using left_join
, right_join
, and full_join
. Take note of the differences in the resulting output.
With a wide structure, each person (observational unit) has one observation (row) and a separate column contains data for each measurement. With a long structure, each person (observational unit) has multiple observations; one measurement per row.
– What values should make new columns?
– Where do the values should go in those new columns?
babies |>
pivot_wider(
names_from = Year,
values_from = Mobile
)
– What columns should be values?
– What should we name that column?
– What should we name the new column for the “left over” values?
babies |>
pivot_longer(
cols = !(Country),
names_to = Year,
values_to = Mobile
)
– pivot_longer
names_transform
- changes variable type
values_drop_na
- drop rows that contain only NA’s.
– Resources
How we talk about graphs….