Lecture 1
Dr. Elijah Meyer
Duke University
STA 199 - Spring 2023
January 13, 2023
Get organized
Please share with your neighbors:
“Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science.”
Learn to explore, visualize, and analyze data in a reproducible and shareable manner
Gain experience in data wrangling, exploratory data analysis, predictive modeling, and data visualization
Work on problems and case studies inspired by and based on real-world questions and data
Learn to effectively communicate results through written assignments and final project presentation
– Fundamentals of R
– Data visualization
– Web scraping
– Version control with GitHub
– Reproducible reports with Quarto
– Regression
– Statistical inference
{fig.align = “center”}
Warm up question
Mix of lecture and interaction
Homework: Individual assignments combining conceptual and computational skills.
Labs: Individual or team assignments focusing on computational skills.
Exams: Two take-home exams.
Final Project: Team project presented during the final exam period.
Application Exercises: Exercises worked on during the live lecture session.
Statistics Experiences: Engage with statistics outside of the classroom and reflect on your experience.
Focus on computing using R tidyverse syntax
Apply concepts from lecture to case study scenarios
Work on labs individually or in teams of 3 - 4
R for Data Science by Grolemund & Wickham (2nd ed. O’Reilly)
Introduction to Modern Statistics by Cetinkaya-Rundel & Hardin (1st ed. OpenIntro)
GitHub, Inc., is an Internet hosting service for software development and version control.
Please do this before the Getting to know you survey
Go to https://github.com/, and create an account (unless you already have one).
Some tips from Happy Git with R.
– Incorporate your actual name!
– Reuse your username from other contexts if you can, e. g., Twitter or Slack.
– Pick a username you will be comfortable revealing to your future boss.
– Be as unique as possible in as few characters as possible. Shorter is better than longer.
– Avoid words with special meaning in programming (e.g. NA).
https://slack.com/get-started#/createnew
– Reserve a STA198-1991 RStudio container
– Go to https://cmgr.oit.duke.edu/containers
– Click Reserve Container for the STA198-199 container
– We’ll start talking about the computing toolkit
– Watch videos for Wednesday
– Complete Getting to Know You Survey (by Monday)
Please bring laptop to class if able for next time!