Lecture 3
Dr. Elijah Meyer
Duke University
STA 199 - Spring 2023
January 20, 2022
– Go to the course GitHub org and find your ae-02-s23
(repo name will be suffixed with your GitHub name).
– Clone the repo in your container, open the Quarto document in the repo
– Are you on Slack?
Due Dates + Turn In
– AE’s due Saturday and Monday (11:59) PM - GitHub
– Labs due Fri (11:59) - Gradescope
– HWs due Tue (11:59) - Gradescope
– render, commit, and push
If you made any changes since the last render, render again to get the final version of the AE.
Check the box next to each document in the Git tab (this is called “staging” the changes). Commit the changes you made using an simple and informative message.
Use the green arrow to push your changes to your repo on GitHub.
Check your repo on GitHub and see the updated files. Once your updated files are in your repo on GitHub, you’re good to go!
Prepare Material
R4DS: Chp 2 - Data visualization - Sections 2.1 and 2.4
Create plots!
– Understand geoms
– Scatterplots, boxplots, histograms, etc
– Practice with the fundamentals of ggplot
– Let the types of variables dictate the plot
– Informative title
– Axes should be labeled
– Careful consideration of aesthetic choices (like color)
You have a data set you want to work with…
mtcars
mtcars
You want to create a visualization. The first thing we need to do is set up the canvas…
mtcars |>
ggplot()
mtcars |>
ggplot(
aes(
x = variable.name, y = variable.name)
)
aes: describe how variables in the data are mapped to your canvas
+
“and”
When working with ggplot functions, we will add to our canvus using +
mtcars |>
ggplot(
aes(
x = variable.name, y = variable.name)
)
+
geom_point()
– Two quantitative variables
– One quantitative variable
– One categorical variable
– Two categorical variables
https://ggplot2.tidyverse.org/reference/
A geom is the geometrical object that a plot uses to represent data. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on. Scatterplots break the trend; they use the point geom.
These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network. The data were imported directly from the Environmental Data Initiative (EDI) Data Portal, and are available for use by CC0 license (“No Rights Reserved”) in accordance with the Palmer Station Data Policy.
Pick geoms based on data types.
Manipulate graphs to be more appropriate with arguments
Take control of your labels
Use color to your advantage. https://ggplot2.tidyverse.org/reference/ggtheme.html & https://ggplot2.tidyverse.org/reference/scale_viridis.html