Lecture 14
Dr. Elijah Meyer
Duke University 
 STA 199 - Spring 2023
March 1st, 2023
– Clone ae-13
– data analysis competition where teams of up to five students attack a large, complex, and surprise dataset over a weekend
– DataFest is a great opportunity to gain experience that employers are looking for
– Each team will give a brief presentation of their findings that will be judged by a panel of judges comprised of faculty and professionals from a variety of fields.
– HW-3 Due Wednesday: March 8th (11:59)
– Lab 4 due Tuesday March 7th (11:59)
– Project Proposal due Friday: March 10th (11:59)
— 1 submission. Attach everyone to it
– The final project for this class will consist of analysis on a dataset of your own choosing.
– The goal of this project is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond, if you like) and apply them to a novel dataset in a meaningful way.
– A project proposal with three dataset ideas.
– A reproducible project writeup of your analysis, with one required draft along the way.
– Formal peer review on another team’s project.
– A presentation with slides.
You will not be submitting anything on Gradescope for the project. Submission will happen on GitHub and feedback will be provided as GitHub issues (more on this in a future lab) that you need to engage with and close.
You have a team project repo to work in. The collection of the documents in your GitHub repo will create a webpage for your project. To create the webpage go to the Build tab in RStudio, and click on Render Website.
– 3 data sets that you find interesting
– One research question for each
– Literature
– glimpse of data
More with modeling
– Model with multiple predictors
What is the difference between an interaction model and an additive model?


– Start ae-13
– Fit the interaction model between island, flipper length, and body mass in R
– How much variability in Y is explained by our model
“How spread out your data are”
\(\frac{Sums of Square Total - Sums of Square Residuals}{Sums of Square Total}\)
– Reveal Patterns not evident in graphs
– BUT … can also impose structures that are not really there…..
– General Rule:
— Be a skeptic
— Fit an appropriate model
– SLR vs MLR
– Additive vs Interaction
– Predict
– Interpret
– How do we choose our “best” model?
– What is the response variable is categorical?
