The Measles Project

Proposal

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.2.2
Warning: package 'tidyr' was built under R version 4.2.2
Warning: package 'readr' was built under R version 4.2.2
Warning: package 'purrr' was built under R version 4.2.2

Data 1

Introduction and data

  • Identify the source of the data.

    • CDC
  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    • Yearly, CDC survey
  • Write a brief description of the observations.

    • Total, state, county, diabetes percentage, overall SVI,

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

    • What areas are most affected by diabetes and how the food insecurity of those areas affect diabetes
  • A description of the research topic along with a concise statement of your hypotheses on this topic.

    • we believe that as social vulnerability increases, diabetes rates will also increase
  • Identify the types of variables in your research question. Categorical? Quantitative?

    • Diabetes percentage, SVI: quantitative

    • Area: Categorical

Literature

  • Find one published credible article on the topic you are interested in researching.

    • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8037472/
  • Provide a one paragraph summary about the article.

    • Food insecurity is a vicious cycle that leads to further food insecurity. Healthcare providers must work to help alleviate the issues found in communities and patients who are more socially vulnerable. The vulnerability causes poor diet, which causes diabetes, which increases health complications, which increases healthcare costs, which further increases vulnerability.
  • In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.

    • The research question will more specifically find the target areas where vulnerability and diabetes are significantly higher than average. This will give a good estimate of what areas are most affected.

Glimpse of data

Diabetes_data <- read_csv("data/DiabetesAtlasData.csv")
Rows: 3141 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): County_FIPS, County, State
dbl (3): Year, Diagnosed Diabetes Percentage, Overall SVI

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(Diabetes_data)
Rows: 3,141
Columns: 6
$ Year                            <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 20…
$ County_FIPS                     <chr> "01001", "01003", "01005", "01007", "0…
$ County                          <chr> "Autauga County", "Baldwin County", "B…
$ State                           <chr> "Alabama", "Alabama", "Alabama", "Alab…
$ `Diagnosed Diabetes Percentage` <dbl> 9.5, 8.4, 13.5, 10.2, 10.5, 9.4, 10.9,…
$ `Overall SVI`                   <dbl> 0.4354, 0.2162, 0.9959, 0.6003, 0.4242…

Data 2

Introduction and data

  • Identify the source of the data.

    HealthData.gov

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    On a yearly basis, reports of children mistreatment, abuse, and neglect are collected from states that agree to share this information with the National Child Abuse and Neglect Data System. These reports include the type of mistreatment, demographic & geographic information, etc.

  • Write a brief description of the observations.

    The observations included in this data set are state, year, type of report (fatalities, child victim), measure (i.e. the way data was measured: rate of child fatalities per 1000, number of reports), characteristic label (how it was filed, where it was reported), and value (number of children/cases). Additionally, within the dataset a lot of these observations are divided so that within one you can create many observations. For example, under type of report, each has its own values, state, characteristic (perhaps this table needs a pivot longer change).

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

    Since most reports were filed in 2015: Is there correlation between the type of report filed and the state in which it was reported in? Do the measures agree (i.e., does the rate of child fatalities per 1000 show a positive correlation if the number of reports per state shows a positive correlation)? This question is important because child neglect and abuse is immensely prevalent in the United States, and perhaps understanding in what states this is most common, we could narrow down a solution to diminishing child neglect in America. There really is no target audience, considering that this analysis affects all.

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

    The hypothesis of this question would be that there is a correlation between the type of report filed and state, and specifically that Florida would have the highest number of reported cases of child fatalities (the assumption that rate and number of reports both show similar results will be held). The research topic would also include a comparison between the type of measure (rate of child fatalities versus the number of reports) and identify whether these data agree with one another.

  • Identify the types of variables in your research question. Categorical? Quantitative?

    Type of report filed: categorical

    State: categorical

    Measure: Quantitative

Literature

  • Find one published credible article on the topic you are interested in researching.

    https://www.acf.hhs.gov/cb/report/child-maltreatment-2019

  • Provide a one paragraph summary about the article.

    This article considers a state’s policies regarding abuse and neglect investigation. As our research question hopes to achieve, it also facets the data by state and gives specific statistics (#s) of help services/resources, reported cases (if allowed), perpetrator penalties, and children fatalities.

  • In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.

    The aforementioned data focuses more on the legal side of child abuse and neglect. Our research proposes a different lens: one where we identify which of these states (disregarding their bills/policies), if any, has significantly higher rates of a specific type of report (thus, it is more specific in regards of what type of neglect/abuse) - though, we can also consider the legal side if it is deemed important to our analysis.

    A direct link to the data (that the research question will be based on) is here: https://healthdata.gov/dataset/National-Child-Abuse-and-Neglect-Data-System-NCAND/dpz9-kecg

Glimpse of data

proposal2 <- read.csv("https://us-dhhs-aa.s3.us-east-2.amazonaws.com/ej22-ej5b_2021-02-24T23-11-54.csv")
proposal2b <- read.csv("https://us-dhhs-aa.s3.us-east-2.amazonaws.com/ej22-ej5b_2021-02-24T23-29-27.csv")
proposal2c <- read.csv("https://us-dhhs-aa.s3.us-east-2.amazonaws.com/ej22-ej5b_2021-03-12T18-17-17.csv")
glimpse(proposal2)
Rows: 5,003
Columns: 7
$ State                <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alas…
$ Year                 <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
$ Table                <chr> "Child Fatalities by Submission Type", "Child Fat…
$ Measure              <chr> "Child fatalities by submission type", "Child fat…
$ Characteristic.Label <chr> "Reported in the Child File", "Reported in the Ag…
$ Format               <chr> "Number", "Number", "Number", "Rate", "Number", "…
$ Value                <dbl> 13.000000, 0.000000, 13.000000, 1.178074, 5.00000…
glimpse(proposal2b)
Rows: 5,003
Columns: 7
$ State                <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alas…
$ Year                 <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
$ Table                <chr> "Child Fatalities by Submission Type", "Child Fat…
$ Measure              <chr> "Child fatalities by submission type", "Child fat…
$ Characteristic.Label <chr> "Reported in the Child File", "Reported in the Ag…
$ Format               <chr> "Number", "Number", "Number", "Rate", "Number", "…
$ Value                <dbl> 13.000000, 0.000000, 13.000000, 1.178074, 5.00000…
glimpse(proposal2c)
Rows: 5,003
Columns: 7
$ State                <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alas…
$ Year                 <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
$ Table                <chr> "Child Fatalities by Submission Type", "Child Fat…
$ Measure              <chr> "Child fatalities by submission type", "Child fat…
$ Characteristic.Label <chr> "Reported in the Child File", "Reported in the Ag…
$ Format               <chr> "Number", "Number", "Number", "Rate", "Number", "…
$ Value                <dbl> 13.000000, 0.000000, 13.000000, 1.178074, 5.00000…

Data 3

Introduction and data

  • Identify the source of the data.

tidytuesday

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

This repository contains immunization rate data for schools across the U.S., as compiled by The Wall Street Journal. The dataset includes the overall and MMR-specific vaccination rates for 46,412 schools in 32 states. As used in “What’s the Measles Vaccination Rate at Your Child’s School?

  • Write a brief description of the observations.

It includes data about different geographic data and school data pertaining to measles vaccinations. including vaccination rate and types of schools included in the study.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

How does vaccination rate for measles differ across regions and school districts and types of schools? We can also consider factors such as enrollment, year, and whether the district is rural or urban

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

Identify areas of the U.S that have high vaccination rates of measles and further understnad how this data is influenced by factors inlcuding wealth by a measure of private vs public schools.

Updated: In 2018, how does the vaccination rate for measles differ across regions in the US? How do school districts in regions with high vaccination rates vs. low vaccination rates differ (public v. private, enrollment rate, type of geographical region - urban, rural)?

  • Identify the types of variables in your research question. Categorical? Quantitative?

Quantiative - vaccination rates

Categorical - private or public and county and state levels

Literature

  • Find one published credible article on the topic you are interested in researching.

https://www.wsj.com/graphics/school-measles-rate-map/

  • Provide a one paragraph summary about the article.

The article from the Wall Street Journal presents a map of measles vaccination rates in schools across the United States, showing that some schools have dangerously low vaccination rates, which increases the risk of measles outbreaks. The data is based on information from the 2018-2019 school year and includes public, private, and charter schools. According to the map, many of the schools with the lowest vaccination rates are in states such as Idaho, Utah, Colorado, and Oregon. The article also notes that measles cases have been on the rise in recent years, and emphasizes the importance of vaccination in preventing the spread of the highly contagious virus.

  • In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.

My research is direclty related to the topic of the article because the article talks about the data included in the csv.

Glimpse of data

measles <- readr::read_csv('data/measles.csv')
Rows: 66113 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): state, year, name, type, city, county
dbl (8): index, enroll, mmr, overall, xmed, xper, lat, lng
lgl (2): district, xrel

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(measles)
Rows: 66,113
Columns: 16
$ index    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 13, 14, 15, 15, 16…
$ state    <chr> "Arizona", "Arizona", "Arizona", "Arizona", "Arizona", "Arizo…
$ year     <chr> "2018-19", "2018-19", "2018-19", "2018-19", "2018-19", "2018-…
$ name     <chr> "A J Mitchell Elementary", "Academy Del Sol", "Academy Del So…
$ type     <chr> "Public", "Charter", "Charter", "Charter", "Charter", "Public…
$ city     <chr> "Nogales", "Tucson", "Tucson", "Phoenix", "Phoenix", "Phoenix…
$ county   <chr> "Santa Cruz", "Pima", "Pima", "Maricopa", "Maricopa", "Marico…
$ district <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ enroll   <dbl> 51, 22, 85, 60, 43, 36, 24, 22, 26, 78, 78, 35, 54, 54, 34, 5…
$ mmr      <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1…
$ overall  <dbl> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -…
$ xrel     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ xmed     <dbl> NA, NA, NA, NA, 2.33, NA, NA, NA, NA, NA, NA, 2.86, NA, 7.41,…
$ xper     <dbl> NA, NA, NA, NA, 2.33, NA, 4.17, NA, NA, NA, NA, NA, NA, NA, N…
$ lat      <dbl> 31.34782, 32.22192, 32.13049, 33.48545, 33.49562, 33.43532, 3…
$ lng      <dbl> -110.9380, -110.8961, -111.1170, -112.1306, -112.2247, -112.1…