The Measles Project

Report

Warning: package 'ggplot2' was built under R version 4.2.2
Warning: package 'tidyr' was built under R version 4.2.2
Warning: package 'readr' was built under R version 4.2.2
Warning: package 'purrr' was built under R version 4.2.2
Warning: package 'broom' was built under R version 4.2.2
Warning: package 'dials' was built under R version 4.2.2
Warning: package 'parsnip' was built under R version 4.2.2
Warning: package 'recipes' was built under R version 4.2.2
Warning: package 'leaflet' was built under R version 4.2.2
Warning: package 'kableExtra' was built under R version 4.2.3

Introduction and Data

Motivation and Context

The motivation for this project stemmed from the measles dataset from tidytuesday. It was interesting to see how the data set categorically characterized the data. Our team imported the data into R and used the categories as a guide to make our plots and questions. The data as reported in the Wall Street Journal presents a map of measles, mumps, and rubella (MMR) vaccination rates in schools across the United States, showing that some schools have dangerously low MMR vaccination rates, which increases the risk of measles outbreaks. The data is based on information from the 2018-2019 school year and includes public, private, and charter schools. According to the map, many of the schools with the lowest vaccination rates are in states such as Idaho, Utah, Colorado, and Oregon. The article also notes that measles cases have been on the rise in recent years, and emphasizes the importance of vaccination in preventing the spread of the highly contagious virus. Through this narrative and viewing the raw data we were able to form our hypothesis and question.

Research question: Are schools’ vaccination rates influence by the type of school, state it’s located in, and it’s enrollment numbers? If so, can valid predictions on vaccination rates be made from these variables?

Hypothesis: It is expected that private schools with higher enrollment numbers will have higher rates of MMR vaccination. It is also expected that states with a high concentration of these schools will have higher vaccination rates.

The Data:

Quantitative - vaccination rates

Categorical - private or public and county and state levels

Literature Review

The article from the Wall Street Journal presents a map of measles vaccination rates in schools across the United States, showing that some schools have dangerously low vaccination rates, which increases the risk of measles outbreaks. The data is based on information from the 2018-2019 school year and includes public, private, and charter schools. According to the map, many of the schools with the lowest vaccination rates are in states such as Idaho, Utah, Colorado, and Oregon. The article also notes that measles cases have been on the rise in recent years, and emphasizes the importance of vaccination in preventing the spread of the highly contagious virus. Our research is directly related to the topic of the article because the article talks about the data included in the csv. Additionally, we aim to fit a model to predict MMR vaccination rates based on the variables being tested: public v. private status, enrollment statistics, and type of geographical region status (urban vs rural).

Methodology

visualizations and summary statistics

Rows: 66113 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): state, year, name, type, city, county
dbl (8): index, enroll, mmr, overall, xmed, xper, lat, lng
lgl (2): district, xrel

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 66,113
Columns: 16
$ index    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 13, 14, 15, 15, 16…
$ state    <chr> "Arizona", "Arizona", "Arizona", "Arizona", "Arizona", "Arizo…
$ year     <chr> "2018-19", "2018-19", "2018-19", "2018-19", "2018-19", "2018-…
$ name     <chr> "A J Mitchell Elementary", "Academy Del Sol", "Academy Del So…
$ type     <chr> "Public", "Charter", "Charter", "Charter", "Charter", "Public…
$ city     <chr> "Nogales", "Tucson", "Tucson", "Phoenix", "Phoenix", "Phoenix…
$ county   <chr> "Santa Cruz", "Pima", "Pima", "Maricopa", "Maricopa", "Marico…
$ district <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ enroll   <dbl> 51, 22, 85, 60, 43, 36, 24, 22, 26, 78, 78, 35, 54, 54, 34, 5…
$ mmr      <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1…
$ overall  <dbl> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -…
$ xrel     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ xmed     <dbl> NA, NA, NA, NA, 2.33, NA, NA, NA, NA, NA, NA, 2.86, NA, 7.41,…
$ xper     <dbl> NA, NA, NA, NA, 2.33, NA, 4.17, NA, NA, NA, NA, NA, NA, NA, N…
$ lat      <dbl> 31.34782, 32.22192, 32.13049, 33.48545, 33.49562, 33.43532, 3…
$ lng      <dbl> -110.9380, -110.8961, -111.1170, -112.1306, -112.2247, -112.1…
`summarise()` has grouped output by 'name', 'city', 'state', 'lat', 'lng'. You
can override using the `.groups` argument.
Assuming "lng" and "lat" are longitude and latitude, respectively

Figure 1. Each data point represents a school, and the color represents the mmr vaccination rate at that school (reported) - where a dark blue would represent low vaccination rates (if any) and dark red would represent almost 100% (if not 100%) vaccination rates. This visualization will be used to 1) understand variability of vaccination rates in schools across different states, and 2) help us create valid connections between states that have substantially different mmr vaccination rates.

Full Disclaimer: Points in Asia were removed, because this study focuses only on data from the US. It is uknown why those coordinates were in the data in the first place.

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Figure 2. This plot represents the six states with the highest and lowest MMR vaccination rates from the total states sampled. The states with the lowest reported MMR vaccination rates is Arkansas, Washington, and Minnesota. The states with the highest reported MMR vaccination rates is Massachusetts, Illinois, and Pennsylvania.

# A tibble: 21 × 3
   state           max   min
   <chr>         <dbl> <dbl>
 1 Arizona       100    15.4
 2 Arkansas       96.1  17.2
 3 California     99     1  
 4 Colorado      100    16.2
 5 Connecticut   100    67.9
 6 Illinois      100    10.4
 7 Maine         100    38.5
 8 Massachusetts 100     3  
 9 Minnesota     100    20  
10 Missouri       99     1  
# … with 11 more rows
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'county', 'state'. You can override using
the `.groups` argument.
`summarise()` has grouped output by 'county', 'state'. You can override using
the `.groups` argument.
# A tibble: 682 × 3
# Groups:   county, state [662]
   county         state           mmr
   <chr>          <chr>         <dbl>
 1 Los Angeles    California     1   
 2 San Bernardino California     1   
 3 <NA>           Missouri       1   
 4 Montgomery     Ohio           2.78
 5 Berkshire      Massachusetts  3   
 6 Seneca         New York       3.2 
 7 Flathead       Montana        9.43
 8 Snohomish      Washington    10   
 9 Cook           Illinois      10.4 
10 Mohave         Arizona       15.4 
# … with 672 more rows

Figure 3. This graph shows the highest and lowest mmr counties by state. Several states have multiple counties with 100% mmr vaccination rate, but few to none have matching lowest mmrs between counties. If data was not available on the mmr rate of a school, it was removed.

Warning: Removed 7411 rows containing missing values (`geom_point()`).

Figure 4. No data was reported for BOCES, and non public schools. This visualization refutes the idea that enrollment numbers influences reported MMR vaccination rates, though it may show that high enrollment numbers for public schools and for other school types (not including BOCES, and non public schools) suggests a higher MMR vaccination rate reported.

Figure 5.

State highest vs. lowest (county x, county y)

\(H_0: \bar x_{public}-\bar x_{private}=0\) difference in mean vaccination rate is 0, there is no significant difference

\(H_A: \bar x_{public}-\bar x_{private}\ne0\) difference in mean vaccination rate is not 0, there is a significant difference

# A tibble: 2 × 2
  type        m
  <chr>   <dbl>
1 Private  62.4
2 Public   91.8
Warning: Please be cautious in reporting a p-value of 0. This result is an
approximation based on the number of `reps` chosen in the `generate()` step. See
`?get_p_value()` for more information.
# A tibble: 1 × 1
  p_value
    <dbl>
1       0

We find evidence to reject the null hypothesis (p < .05) and we find strong evidence for the alternative hypothesis that the vaccination rate in public schools is significantly higher than in private schools.

Models & Prediction

In order to fit the scope of our hypothesis we chose to include the following variables for our models: type, enroll, and state.

term estimate std.error statistic p.value
(Intercept) 83.5755081 2.6393600 31.6650653 0.0000000
typeKindergarten 4.0497044 7.0295542 0.5760969 0.5645556
typePrivate 4.4187705 4.8400286 0.9129637 0.3612716
typePublic 8.8688080 3.2505490 2.7284031 0.0063692
enroll 0.0596794 0.0313213 1.9053931 0.0567420
stateCalifornia -14.0644653 1.9348712 -7.2689412 0.0000000
stateColorado 7.0629988 6.4687726 1.0918607 0.2749063
stateOhio -3.9569538 2.0626690 -1.9183658 0.0550776
stateUtah 1.2760663 2.7093736 0.4709821 0.6376582
typeKindergarten:enroll -0.0425918 0.0942434 -0.4519337 0.6513212
typePrivate:enroll -0.0307656 0.0909268 -0.3383558 0.7350983
typePublic:enroll -0.0380893 0.0392166 -0.9712534 0.3314328
typeKindergarten:stateCalifornia NA NA NA NA
typePrivate:stateCalifornia -57.7253609 4.5301548 -12.7424697 0.0000000
typePublic:stateCalifornia NA NA NA NA
typeKindergarten:stateColorado NA NA NA NA
typePrivate:stateColorado NA NA NA NA
typePublic:stateColorado NA NA NA NA
typeKindergarten:stateOhio NA NA NA NA
typePrivate:stateOhio 6.1547397 4.6702823 1.3178517 0.1875670
typePublic:stateOhio NA NA NA NA
typeKindergarten:stateUtah NA NA NA NA
typePrivate:stateUtah 1.0105953 6.9539291 0.1453272 0.8844538
typePublic:stateUtah NA NA NA NA
enroll:stateCalifornia 0.1440564 0.0239135 6.0240637 0.0000000
enroll:stateColorado -0.0284074 0.0881911 -0.3221122 0.7473708
enroll:stateOhio 0.0154724 0.0254120 0.6088610 0.5426229
enroll:stateUtah -0.0183839 0.0238506 -0.7707944 0.4408371
typeKindergarten:enroll:stateCalifornia NA NA NA NA
typePrivate:enroll:stateCalifornia 1.0331692 0.0898724 11.4959624 0.0000000
typePublic:enroll:stateCalifornia NA NA NA NA
typeKindergarten:enroll:stateColorado NA NA NA NA
typePrivate:enroll:stateColorado NA NA NA NA
typePublic:enroll:stateColorado NA NA NA NA
typeKindergarten:enroll:stateOhio NA NA NA NA
typePrivate:enroll:stateOhio 0.0148218 0.0933866 0.1587143 0.8738954
typePublic:enroll:stateOhio NA NA NA NA
typeKindergarten:enroll:stateUtah NA NA NA NA
typePrivate:enroll:stateUtah 0.0048797 0.0921312 0.0529648 0.9577604
typePublic:enroll:stateUtah NA NA NA NA
x
0.4717082

The interactive model has an adjusted r-squared of 0.4717082.

term estimate std.error statistic p.value
(Intercept) 81.1839489 1.5657184 51.850927 0.0000000
typeKindergarten -13.0939027 5.3394305 -2.452303 0.0142022
typePrivate -18.8499552 1.7724264 -10.635113 0.0000000
typePublic 7.8614600 1.7206765 4.568819 0.0000049
enroll 0.0922677 0.0027396 33.679627 0.0000000
stateCalifornia -5.8650145 0.7639301 -7.677423 0.0000000
stateColorado 21.0127953 5.0706937 4.143969 0.0000343
stateOhio 4.4453417 0.8578398 5.182018 0.0000002
stateUtah -37.5748208 1.6469356 -22.814992 0.0000000
x
0.2430603

The additive model has an adjusted r-squared of 0.2430603.

The higher adjusted r squared for the interactive model, over the additive one, suggests that the interactive model has a better overall fit. We will use the interactive model as a baseline to test the prediction strength of type and state as well as enroll and state.

term estimate std.error statistic p.value
(Intercept) 86.9521985 3.8643422 22.5011643 0.0000000
typeCharter -18.0127782 4.1595070 -4.3305079 0.0000149
typeKindergarten -59.2164703 7.0297642 -8.4236781 0.0000000
typeNonpublic -79.8734621 4.3675080 -18.2881088 0.0000000
typePrivate -58.1724217 4.2250794 -13.7683616 0.0000000
typePublic 0.0265396 3.7743600 0.0070316 0.9943897
stateCalifornia 5.8082822 0.8580983 6.7687840 0.0000000
stateColorado 66.3421280 5.8348223 11.3700340 0.0000000
stateConnecticut 1.8451364 1.3184303 1.3994949 0.1616752
stateMassachusetts -8.1917346 1.1301765 -7.2481903 0.0000000
stateNew York 11.8009930 1.0125877 11.6542917 0.0000000
stateOhio 1.1039465 0.9954669 1.1089735 0.2674507
stateUtah 8.2070762 1.3535592 6.0633299 0.0000000
typeCharter:stateCalifornia NA NA NA NA
typeKindergarten:stateCalifornia NA NA NA NA
typeNonpublic:stateCalifornia NA NA NA NA
typePrivate:stateCalifornia 11.9448529 1.9742284 6.0503905 0.0000000
typePublic:stateCalifornia NA NA NA NA
typeCharter:stateColorado NA NA NA NA
typeKindergarten:stateColorado NA NA NA NA
typeNonpublic:stateColorado NA NA NA NA
typePrivate:stateColorado NA NA NA NA
typePublic:stateColorado NA NA NA NA
typeCharter:stateConnecticut NA NA NA NA
typeKindergarten:stateConnecticut NA NA NA NA
typeNonpublic:stateConnecticut NA NA NA NA
typePrivate:stateConnecticut NA NA NA NA
typePublic:stateConnecticut NA NA NA NA
typeCharter:stateMassachusetts NA NA NA NA
typeKindergarten:stateMassachusetts NA NA NA NA
typeNonpublic:stateMassachusetts NA NA NA NA
typePrivate:stateMassachusetts -11.0448323 2.3538157 -4.6923097 0.0000027
typePublic:stateMassachusetts NA NA NA NA
typeCharter:stateNew York NA NA NA NA
typeKindergarten:stateNew York NA NA NA NA
typeNonpublic:stateNew York NA NA NA NA
typePrivate:stateNew York 48.2612529 2.0563162 23.4697631 0.0000000
typePublic:stateNew York NA NA NA NA
typeCharter:stateOhio NA NA NA NA
typeKindergarten:stateOhio NA NA NA NA
typeNonpublic:stateOhio NA NA NA NA
typePrivate:stateOhio 46.1414052 2.1342507 21.6194867 0.0000000
typePublic:stateOhio NA NA NA NA
typeCharter:stateUtah NA NA NA NA
typeKindergarten:stateUtah NA NA NA NA
typeNonpublic:stateUtah NA NA NA NA
typePrivate:stateUtah 55.5443591 4.9556399 11.2083123 0.0000000
typePublic:stateUtah NA NA NA NA
x
0.3436327
term estimate std.error statistic p.value
(Intercept) 89.5438570 1.1544104 77.5667482 0.0000000
enroll 0.0423208 0.0145255 2.9135464 0.0035751
stateArkansas -7.6186390 2.1789129 -3.4965321 0.0004718
stateCalifornia -30.1605807 1.1804534 -25.5499969 0.0000000
stateColorado 5.0028329 1.2999912 3.8483590 0.0001191
stateFlorida -90.5438570 1.3692136 -66.1283651 0.0000000
stateIllinois 7.0682630 1.2042257 5.8695498 0.0000000
stateIowa -90.5438570 1.4044800 -64.4678864 0.0000000
stateMaine 2.6125077 1.4469990 1.8054661 0.0710080
stateMichigan -90.5438570 1.2993181 -69.6856736 0.0000000
stateMinnesota -2.4557221 1.3205797 -1.8595789 0.0629512
stateMontana 6.4715249 1.4384919 4.4988260 0.0000068
stateNew Jersey -90.5438570 1.2828797 -70.5786020 0.0000000
stateNorth Carolina -90.5438570 1.3229762 -68.4395224 0.0000000
stateNorth Dakota 1.7958954 1.8362555 0.9780205 0.3280691
stateOhio 0.1672697 1.2546708 0.1333176 0.8939428
stateOregon -3.4410777 1.5895466 -2.1648171 0.0304066
statePennsylvania 6.8126466 1.3260408 5.1375843 0.0000003
stateRhode Island -90.5438570 2.2835395 -39.6506632 0.0000000
stateSouth Dakota 0.7915256 1.5966733 0.4957342 0.6200842
stateTennessee -90.5438570 1.5208128 -59.5364902 0.0000000
stateUtah 3.7272339 1.8471963 2.0177790 0.0436198
stateVermont -2.3743595 1.6433668 -1.4448141 0.1485166
enroll:stateArkansas -0.0450703 0.0148989 -3.0250757 0.0024870
enroll:stateCalifornia 0.2984087 0.0147744 20.1977274 0.0000000
enroll:stateColorado -0.0505891 0.0165938 -3.0486774 0.0022998
enroll:stateFlorida -0.0423208 0.0163123 -2.5944179 0.0094780
enroll:stateIllinois -0.0400754 0.0145499 -2.7543424 0.0058832
enroll:stateIowa -0.0423208 0.0147168 -2.8756723 0.0040334
enroll:stateMaine -0.0402012 0.0149972 -2.6805868 0.0073518
enroll:stateMichigan -0.0423208 0.0167496 -2.5266789 0.0115178
enroll:stateMinnesota 0.0233447 0.0169198 1.3797286 0.1676766
enroll:stateMontana -0.0497164 0.0146853 -3.3854616 0.0007112
enroll:stateNew Jersey -0.0423208 0.0167638 -2.5245291 0.0115885
enroll:stateNorth Carolina -0.0423208 0.0168044 -2.5184415 0.0117907
enroll:stateNorth Dakota -0.0063723 0.0315009 -0.2022904 0.8396905
enroll:stateOhio -0.0143150 0.0159291 -0.8986746 0.3688305
enroll:stateOregon 0.0749675 0.0220176 3.4048907 0.0006624
enroll:statePennsylvania -0.0347961 0.0164128 -2.1200549 0.0340065
enroll:stateRhode Island -0.0423208 0.0356664 -1.1865719 0.2354024
enroll:stateSouth Dakota 0.0464033 0.0268801 1.7263097 0.0842981
enroll:stateTennessee -0.0423208 0.0186132 -2.2737032 0.0229882
enroll:stateUtah -0.0383824 0.0147634 -2.5998297 0.0093298
enroll:stateVermont -0.0143949 0.0154172 -0.9336891 0.3504690
x
0.8710007
Warning in predict.lm(object = object$fit, newdata = new_data, type =
"response"): prediction from a rank-deficient fit may be misleading
# A tibble: 1 × 1
  .pred
  <dbl>
1  94.6

Using the interactive model, we were able to predict a 94.6% MMR vaccination rate for 100 students enrolled in a public school in Arizona. This model could also be applied to other contexts with simple numerical changes.

Results

Figure 1 - This visualization answers the 1st research question by showing which states have a high percentage of mmr vaccination rates across different schools (such as Ohio), and states that have a low percentage (such as Florida). Though Ohio and Florida don’t show much variability across different schools, California does show a lot of variability. Overall, this shows a general map of how mmr vaccination rates differ across different regions in the US.

Figure 2 - To explore if there is a relationship between urban and rural status and MMR vaccination rate, first, the states with the highest (Massachusetts, Illinois, and Pennsylvania) and lowest (Arkansas, Washington, and Minnesota) mean MMR vaccination rates were plotted. On its own this does not display any relation of urban or rural status to percentage of MMR vaccination rates, so information, provided by Iowa State University, on the 2010 U.S. Decennial Cenus was referenced \(^1\). The percentage of the total population in urban areas per each state was found: Massachusetts (92%), Illinois (88.5%), Pennsylvania (78.7%), Arkansas (56.2%), Washington (84.1%), and Minnesota (73.3%). There appears to be a weak correlation between urban/rural status of each state and MMR vaccination rate, but to a slight degree there is a trend. For instance, the state with the most drastic difference in mean MMR vaccination rate is Arkansas (80.49%) when compared to states with the highest mean MMR vaccination rates (>95%). Relatedly, Arkansas also has a significantly lower percentage of the total population in urban areas. States like Massachusetts and Illinois hold comparably higher percentages of the total population in urban areas and hold the highest mean MMR vaccination rates, 97.04% & 97.39%, respectively. However, states like Washington weaken the correlation by having a high percentage of the total population in urban areas, but reporting a comparably low mean MMR vaccination rate (89.3%).

Figure 3 - The data reported for maximum and minimum average vaccination rate by county in each state showed that most states had a maximum vaccination rate of 99 or 100%. On the other hand, minimum vaccination rates ranged from 1% to 96%. States often had several counties tied at the maximum. One noteworthy finding is that California had several counties all at 99, which is odd. It’s very likely that some of the data was influenced by the removal of NA values, and there was also one minimum school missing its County in Missouri.

Figure 4 - The data reported for Charter, Kindergarten and Private schools shows that for schools with no new enrollments the vaccination rates reported differ significantly. For public schools we see a little more diversity, but no strong correlation either. Public schools with 0 new enrollments may have high or even no vaccination rates reported, but generally we see that a high enrollment rate rarely leads to low vaccination rates (this also applies to Charter, Kindergarten, and private schools). This visualization answers the question on whether or not enrollment numbers has an effect on vaccination rates, for which it was determined, enrollment rate is not a determining factor.

Figure 5 - The graph showing the average vaccination rates between private and public schools seems to show a higher average vaccination rate for public schools. This is possibly due to local government rules and control over public schools in which students may be mandated to get vaccinated. Whereas in private schools it may have been more up to choice and out of government control. The resulting hypothesis test showed that there is a significant difference in vaccination rate between public schools and private school (p<.05). This number may not be wholly accurate due to the removal of much private school data, but we assume the difference can be attributed to a lack of vaccine mandate within private schools.

The additive model yielded an adjusted r-square value of 0.243 using the type, enroll, and state variables. This was a considerably lower adjusted r-squared value than the interactive model which yielded an adjusted-r-squared of 0.47. Therefore, we continued using the interactive model as a framework for testing the predictability of other variables. We proceeded to test the predictability of enrollment and and type through constructing two separate models with one being state and enroll and the other being state and type. We found that the model with state and enroll had a far higher adjusted-r-squared of 0.87 compared to the other model’s adjusted r-squared of 0.34. Despite the higher adjusted r squared, our question evaluated whether state, enrollment number, and type of school would be good predictors, so we chose the original interactive model for the prediction. We were not really quite sure why this occurred however it was an interesting finding that warrants further investigation in future studies. 

Discussion

Summary: From our analyses, we learned that whether students attend a private or public school DOES affect, at least to some extent, how high vaccination rates are. In addition, and in answer to our research questions, state, type of school, and enrollment number provide good predictions on the vaccination rates reported - with enrollment and state being the best predictors. In essence, all our factors influence schools’ vaccination rates.

Ethical considerations: Vaccination data is listed under the Health Insurance Portability and Accountability Act (HIPAA) as protected health information and therefore cannot be shared without patient consent. In certain areas, vaccine data is not required for schools as well. The data may also be used to discriminate against certain areas unjustly. The US does not all benefit equally from our research. Some areas may so be more or less receptive to revealing their vaccination information so the study may not impact our target population equally.

Limitations: We saw negative values in the dataset, but we don’t know exactly what it means - we speculate that perhaps these suggest that the schools did/do not report vaccination rates.

Some states do not require schools to report MMR vaccination data, thus resulting in a lack of information for a few states. This results in inconclusive data from those states, and they will not be considered for this research analysis.

While our research question seeks to understand the influence of geographic region status (urban vs rural), the dataset collected does not contain information on the geographic status where each school is located. While we pooled external information drawn from the 2010 US Decennial Census, the analysis derived might not accurately reflect the relationship between these two variables.

Future work: Our study had a very defined scope. However, future work could include observing vaccination rates in different states or outside of the US. Additionally, testing other variables in anticipation of yielding a stronger predictive model could be another route. Last but not least, this data relies mostly on reported vaccination rates from 2018-19, so other (perhaps more current) years should be considered to reach thorough conclusions. Perhaps comparing this data with measles outbreak or cases, would be an interesting route to understand the effect that vaccination rates in schools have on measles cases.

##References

  1. Moriarty, D., Umlauf, T., & Abbott, B. (2019). U.S. Measles Vaccination Data [Data set]. https://github.com/WSJ/measles-data

  2. Moriarty, D., Umlauf, T., & Abbott, B. (2019). What’s the measles vaccination rate at your child’s school? The Wall Street Journal. https://www.wsj.com/graphics/school-measles-rate-map/

  3. “Urban Percentage of the Population for States, Historical.” Iowa State University | Iowa Community Indicators Program, Iowa State University, www.icip.iastate.edu/tables/population/urban-pct-states.