Working with multiple data frames

Lecture 6

Dr. Elijah Meyer

Duke University
STA 199 - Spring 2023

February 1st, 2023


– A lot of errors happen when coding (and that’s okay)

Warm Up

Glimpse mtcars if you need to refamilarize yourself with these data

mtcars |>
  summarize(mean_mpg = mean(mpg))
mtcars |>
  mutate(cyl = factor(cyl)) |>
  group_by(cyl) |>
  summarize(mean_mpg = mean(mpg))

Warm Up

mtcars |>
  summarize(mean_mpg = mean(mpg))
1 20.09062
mtcars |>
  mutate(cyl = factor(cyl)) |>
  group_by(cyl) |>
  summarize(mean_mpg = mean(mpg))
# A tibble: 3 × 2
  cyl   mean_mpg
  <fct>    <dbl>
1 4         26.7
2 6         19.7
3 8         15.1


– Understand join functions

– Join multiple data frames


Messy data

– The sheer volume of information is sometimes referred to as “messy” data, because it’s hard to make sense of it all.

Messy data


Joining datasets

Data merging is the process of combining two or more data sets into a single data set. Most often, this process is necessary when you have raw data stored in multiple files, worksheets, or data tables, that you want to analyze together.

Joining datasets

– Left Join

– Inner Join

– Right Join

– Full Join

Joining datasets


Recap of AE

– This is important! Data are messy!

– Think carefully about the join you use