Below, left_join coffee2 to coffee1. Comment on how these two data sets were joined together. Hint: You may need to use the by argument in the left_join function.
coffee1|>left_join(coffee2 , by =c("Month"="month"))
# A tibble: 5 × 4
Month Coffee_Shop Drinks_Sold Special
<chr> <chr> <dbl> <chr>
1 July Starbucks 3 Half-Off
2 July Starbucks 2 Half-Off
3 August ThePerk 6 Free Drink
4 August ThePerk 5 Free Drink
5 September Starbucks 1 <NA>
Same thing different way:
The reason we used the by argument in the left_join function was because the column names were not the same across the two data sets. Run the following code below, and compare it to the output above. Same? Different?
# A tibble: 5 × 4
Month Coffee_Shop Drinks_Sold Special
<chr> <chr> <dbl> <chr>
1 July Starbucks 3 Half-Off
2 July Starbucks 2 Half-Off
3 August ThePerk 6 Free Drink
4 August ThePerk 5 Free Drink
5 June <NA> NA Free Drink
left_join - coffee2 gets joined to coffee1 by Month. Anywhere there is a match from the Month, it will add informatino from the other columns of Y. Only keeps rows of coffee1.
right_join - the opposite of left_join. Keeps rows of y and adds x columns.
full_join - includes both x and y rows.
Summary Statistics
In this short activity, we will use the Orange data set built in R. Please run ?Orange to learn more.
Below, please complete the following:
Calculate the mean circumference of each tree.
Create a new variable called old to indicate when the tree became over 1000 years old. Use the value Yes if the measurement is over 1000, and No if it is not. Hint: A way to answer this involves using if_else