Lecture 6
Dr. Elijah Meyer
Duke University
STA 199 - Spring 2023
February 1st, 2023
– Clone ae-05
– Make sure you are keeping up with Preperation Videos
– If you need to turn HW in late, see late policy
– If you have extenuating circumstances (see syllabus), contact Ed.
Videos
– Requesting videos for missed classes
Homework + Labs
– Late work policy
– Drop 1
Exam
– February 10th
– Take home
– Open Notes / Internet / etc
– Coding + Short answer questions
– Extension questions
– Can NOT be late
– Pull -> Commit -> Push after every question
Updates to deadlines
– More time for labs (1 week: Tuesday -> Tuesday)
– Homework (1 week: Wednesday -> Wednesday)
– Subject to change around Exam time / holiday / etc
– A lot of errors happen when coding (and that’s okay)
Glimpse mtcars
if you need to refamilarize yourself with these data
mtcars |>
summarize(mean_mpg = mean(mpg))
---------------------------------
mtcars |>
mutate(cyl = factor(cyl)) |>
group_by(cyl) |>
summarize(mean_mpg = mean(mpg))
– Understand join functions
– Join multiple data frames
Messy data
– The sheer volume of information is sometimes referred to as “messy” data, because it’s hard to make sense of it all.
Data merging is the process of combining two or more data sets into a single data set. Most often, this process is necessary when you have raw data stored in multiple files, worksheets, or data tables, that you want to analyze together.
– Left Join
– Inner Join
– Right Join
– Full Join
Clone ae-05
– This is important! Data are messy!
– Think carefully about the join you use