Lecture 18
Dr. Elijah Meyer
Duke University
STA 199 - Spring 2023
March 22nd, 2023
– Clone ae-17
– HW 4 coming today (due in 1 week)
– Lab 7 Due next Tuesday
– Project proposal feedback by Friday
– Peer Review Survey
– Each of you should have received an invite email (reach out to me if you did not)
– Due March 29th at 11:59
– Answers will be private
– Encouraged to have conversations with groupmates if things are not going well
– Accountability
– Part of your grade
– Feedback can be seen in **Issues* tab
– Expectation is to respond to feedback on data set you choose for your project
– See lab 7 on how to respond + close issues created by myself and lab leaders
– Project Draft Report is Due April 7th
– Own your work vs “that part wasn’t mine”
– Research Question
— Can you answer it?
– Introduction
— Do not copy + paste the description given on the website
— Do some digging
— Write a brief description of the observations.
— Address ethical concerns about the data, if any.
– Really impressed with mini literature review
– Creative + interesting research questions
– I look forward to what you do next!
– Introduction and data
– Methodology
– Results
– W+F
— update about section
— Respond to all issues from chosen data set
— Everyone is your group must have at least 3 commits
\(\checkmark\) Data Viz
\(\checkmark\) Probability
\(\checkmark\) Modeling Data
– Estimating population parameters (confidence intervals)
– Testing population parameters (hypothesis testing)
– Quick re-run through of ROC curves
– What is bootstrapping?
– What is a confidence interval?
Plot of a model’s true positive and false positive rates based on a testing data set
– Used for model evaluation (AUC)
– Can be used to help pick a threshold to make decisions
Predict spam based on number of exclamation points + winner
AUC ~ 0.650
email_pred |>
roc_curve(
truth = spam,
.pred_1,
event_level = "second" #a success
)
– are a range of plausible values that our population parameter could be
— and we almost always don’t actually know what this value is …. but we want to!
By the end of class, we will
– know how to calculate confidence intervals using bootstrap methods
– understand what bootstrap methods are
– understand how the level of confidence influences our confidence interval
– understand how to interpret a confidence interval in the context of the problem
Bootstrap methods are simulation methods
Variability - Uncertainty: How spread out your data are
We can use this to guess plausible values around our parameter of interest
This is called a confidence interval, and it’s different than probability