Project title

Proposal

library(tidyverse)

Data 1

Introduction and data

Identify the source of the data

Source: North Carolina Department of Public Instruction (government)
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data was collected by the North Carolina Department of Public Instruction in 2016-2017 from standardized tests administered in public schools in North Carolina such as the EOG and the ACT.
Write a brief description of the observations.

Each observation is one standardized test subject with a categorized level of performance from a public school in North Carolina. The variables for each observation include school name, code, district name, percent of students of different gender and race. Each observation also has variables for percent of students who are Economically Disadvantaged, Limited English Proficient, Students with Disabilities, and Academically or Intellectually Gifted.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

Main question: How do AIG (Academically or Intellectually Gifted) students in Durham Public Schools Perform depending on the subject categories of the standardized test?

In Durham Public Schools, is there correlation between performance on a standardized test for a specific subject and identifiers such as race, gender socioeconomic status?

Does standardized test performance vary by county/school district in Durham?
A description of the research topic along with a concise statement of your hypotheses on this topic.

We are interested in finding out if there is consistency in how AIG performance in Durham Public Schools varies based on subject. More specifically, do most AIG students in Durham Public Schools tend to perform better in one subject over another? Our hypothesis is, since Western society considers math proficiency to be linked to intelligence, there will be a greater percentage of designated AIG students in Durham Public Schools Who perform at a higher level on math standardized tests compared to reading and other categories.
Identify the types of variables in your research question. Categorical? Quantitative?

In order to answer our research question, we will need to look at a combination of categorical and quantitative variables. Performance level on standardized tests is a categorical variable with values such as “College and Career Ready” and “Grade Level Proficient.” Subject is also a categorical variable. Percentage of AIG students will be a quantitative variable.

Literature

NC School Performance (2016-2017)

Ford, J. E., & Triplett, N. (2019, August 19). E(race)ing Inequities | How race influences who is designated ‘gifted’ in North Carolina. EducationNC. Retrieved March 7, 2023, from https://www.ednc.org/eraceing-inequities-the-state-of-racial-equity-in-north-carolina-public-schools/

Ford and Triplett sought to understand racial disparities among the Academically or Intellectually Gifted (AIG) designations in public school students in North Carolina. They found that among students with AIG in Math designations, Asian and White students were overrepresented and in AIG in Reading Asian, White, and Multiracial students were overrepresented. Overall, Black students and American Indian students were underrepresented in the proportion of students designated as academically and intellectually gifted. By their proportion in the student body, there should be 6,200 more Black students designated as AIG in Math. Ford and Triplett chose to conduct this analysis because of the positive impact of gifted and talented programs on students’ plans and career successes post high school. With low representation of students of color in the academically and intellectually student population, students of color are likely being disadvantaged when it comes to education post-high school and career advancements.

Our research also seeks to use North Carolina educational data on academically and intellectually gifted students, but rather than studying these statistics through the lens of racial differences we hope to analyze whether students designated AIG in North Carolina tend to have certain AIG designations in certain categories such as math or reading over others. This question hopes to dig deeper into whether there is a bias in designating students thriving in a particular academic area such as math as being academically or intellectually gifted overall.

Glimpse of data

school <- read.csv("data/school.csv", sep = ",", header = TRUE)

glimpse(school)

Rows: 1,623
Columns: 19
$ District.Name                                       <chr> "Durham Public Sch…
$ School.Code                                         <int> 320, 320, 320, 320…
$ School.Name                                         <chr> "Durham Public Sch…
$ SBE.District                                        <chr> "North Central", "…
$ Subject                                             <chr> "All EOG/EOC Subje…
$ Standard..CCR...Level.4...5..GLP...Level.3...Above. <chr> "College and Caree…
$ All.Students                                        <chr> "37.4", "40", "39.…
$ Female                                              <chr> "39.5", "44", "41.…
$ Male                                                <chr> "35.3", "35.9", "3…
$ American.Indian                                     <chr> "36.4", "33.3", "*…
$ Asian                                               <chr> "67", "76.6", "84.…
$ Black                                               <chr> "26.7", "29.9", "2…
$ Hispanic                                            <chr> "28.7", "30.4", "2…
$ Two.or.More.Races                                   <chr> "49.3", "48.4", "5…
$ White                                               <chr> "73.6", "75.5", "7…
$ EDS                                                 <chr> "24.1", "26.6", "2…
$ LEP                                                 <chr> "8.8", "<5", "<5",…
$ SWD                                                 <chr> "12.6", "9.8", "9.…
$ AIG                                                 <chr> "83.7", "82.7", "8…

add code here

Data 2

Introduction and data

Identify the source of the data.

– Hansen, Eric R.; Treul, Sarah A., 2023, “Replication Data for: Primary Barriers to Working Class Representation”, https://doi.org/10.7910/DVN/YGT1KX, Harvard Dataverse, V1; Treul_Hansen_Codebook.pdf [fileName] Dataset: Hansen, Eric R.; Treul, Sarah A., 2023, “Replication Data for: Primary Barriers to Working Class Representation”, https://doi.org/10.7910/DVN/YGT1KX, Harvard Dataverse, V1, UNF:6:LlOCRMLIvjECWBDwtT+Xdw== [fileUNF]

State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data set came from a journal published at Political Research Quarterly (2023). The authors were Eric Hansen, who is the assistant professor of political science at Loyola University Chicago and by Sarah A. Treul, an Associate Professor of political science at UNC. The authors collected and analyzed novel data describing the occupational background of all candidates who competed in U.S. House primaries between 2008 and 2016.

Write a brief description of the observations.

The dataset was used to give us an idea of how well candidates perform in U.S. House primary elections. Each observation represents a candidate who competed in the U.S. Hour primaries between 2008 and 2016. Each row of observation included a variety of factors related to election results, such as vote share, fundraising, incumbency status, education level, ideology, number of opponents, political affiliation, and so on.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

Among the working class candidates that won the elections, what are the two most prominent factors that positively contribute to higher vote shares. Are these two factors intertwined or function independently in influencing the vote shares?

A description of the research topic along with a concise statement of your hypotheses on this topic.

The dataset is trying to find factors that influence candidate performance, more specifically, working class candidates performance, in U.S. house primary elections. Instead of looking at the general picture, we want to focus on candidates who won the election and pin down the two most important factors. This research question is important given that working class often has a harder time representing themselves at higher levels of government. We hope this projcect can provide insight to future working class candidates regarding things they need to pay extra attention on, and as a result, there could be more equality and equity in the government.

We hypothesize that the incumbent status and the amount of money contributed to the candidate’s campaign would be the two most influential factors in contributing to a larger share of vote for working class candidates. We also hypothesize that these two variables have an interaction, where more money was put into the campaign when the candidate is an incumbent. A quality nonincumbent candidate would also put into more money than a nonquality nonincumbent candidate.

Identify the types of variables in your research question. Categorical? Quantitative?

The dependent variable is the vote share, which is a quantitative variable. The two independent variables hypothesized at this moment to be the most influential are the incumbent status and amount of money contributed to the candidate’s campaign. The incumbent status variable would be categorical (0: nonquality nonincumbent, 1: quality nonincumbent, 2: incumbent) variable, and amount of money would be a quantitative variable.

Literature

Primary Barriers to Working Class Representation

Treul, S. A., & Hansen, E. R. (2023). Primary Barriers to Working Class Representation. Political Research Quarterly, 0(0). https://doi.org/10.1177/10659129231154914

This research article is written by Sarah Treul who co-published the dataset Primary Barriers to Working Class Representation which we hope to utilize in our research. Her research seeks to answer how working class candidates fare in primary elections given that working class candidates are a minority on the ballot compared to white-collar candidates. The research article analyzed occupational backgrounds of primary election candidates for the U.S. House of Representatives from 2008 to 2016 with election results. Treul et al. found that from 2008 to 2016, working class candidates are 31% less likely to win elections and tend to score 24% less votes than white collar candidates. In their analysis, Treul et al. suggests that lower success in primary elections for the working class may be due lower funding and negative bias among party leaders towards working class candidates.

This article uses the Primary Barriers to Working Class Representation dataset to analyze election results based on whether the candidates were working class and then analyzes factors that affected why working class candidates did not win elections. Our research would seek to understand how working class candidates overcame barriers to winning primary elections by analyzing factors that were outliers among candidates that did win the election.

Glimpse of data

candidate <- read.table("data/Treul_Hansen_Replication_Data.tab", 
                        sep = "\t", header = TRUE)
write.csv(candidate, file = "data/candidate.csv", col.names = TRUE)

Warning in write.csv(candidate, file = "data/candidate.csv", col.names = TRUE):
attempt to set 'col.names' ignored

glimpse(candidate)

Rows: 7,869
Columns: 24
$ worker                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
$ worker_current_recode <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ state_abbrev          <chr> "IN", "WA", "NH", "MI", "MD", "MN", "TN", "WV", …
$ raceid                <int> 201014091, 200847089, 201029010, 201222021, 2010…
$ year                  <int> 2010, 2008, 2010, 2012, 2010, 2016, 2014, 2008, …
$ dem_vote              <dbl> 0.4470, 0.4720, 0.4400, 0.3590, 0.6500, 0.3427, …
$ candnumber            <int> 5, 6, 8, 1, 3, 3, 4, 1, 5, 3, 3, 6, 2, 2, 7, 1, …
$ cand_party            <chr> "Democrat", "Third-Party", "Republican", "Democr…
$ candvotes             <int> 4820, 2116, 653, 1, 5479, 4402, 4434, 15919, 343…
$ tvotes                <int> 54552, 152726, 65283, 1, 53352, 9566, 30183, 159…
$ candpct               <dbl> 0.088356100, 0.013854877, 0.010002600, 1.0000000…
$ winwin                <int> 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, …
$ quality_cand          <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ qualnumber            <int> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 2, 0, …
$ openseat              <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, …
$ PartyDonors1          <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ college               <dbl> 0.1172, 0.2649, 0.2080, 0.1564, 0.1818, 0.2096, …
$ dist_income           <dbl> 0.42314, 0.82671, 0.61948, 0.46584, 0.83020, 0.7…
$ total_primary         <dbl> 0, 0, 0, 0, 0, 0, NA, 0, 0, NA, 0, 0, 0, 0, 0, 0…
$ third                 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
$ gop                   <dbl> 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, …
$ ideo_extreme          <dbl> NA, NA, NA, NA, 1.119, NA, NA, NA, NA, NA, NA, N…
$ qualbin               <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ former_worker         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, …

Data 3

Introduction and data

Identify the source of the data.

The data was downloaded from Gun Violence Archive’s website. GVA is a not for profit corporation formed in 2013 to provide free online public access to accurate information about gun-related violence in the United States. We obtained the data set by searching in the database and filter for data (1/1/2023-2/28/2023) and incident characteristics (contains shot with wounded or injuries).
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The Gun Violence Archive website (GVA) collected the data of gun violence and crime incidents from 7,500 sources daily, which could be found at the gunviolencearchive.org website. The source for each observation in the dataset we use could be found in the operation section of this webpage: https://www.gunviolencearchive.org/query/f794465e-e174-4ccc-9692-88dd7f0bb7ab The Gun Violence Archive website collects and updates the data on a daily basis. The data set we use was collected during January and February 2023 and the data sources were last verified on March 05, 2023.
Write a brief description of the observations.

This data set summarizes the gun violence incidents in the United States with shot and wounded/injuries in the first two months of 2023 (from 1/1/2023 - 2/28/2023). There are a total of 4261 observations and 8 columns (7 columns apart from the “operations” section) in the data set. Each observation represents an incident. The columns specify the incident ID, date, state, city, address, whether the victim was killed, and whether the victim was injured.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

How do time and location (independently or dependently) affect the gunshot incident frequency and the rate of death vs. injury?
A description of the research topic along with a concise statement of your hypotheses on this topic.

Our research topic focuses on analyzing the influence of time and space factors on the risk and severance of gunshot incidents in the U.S. We are interested in this topic since gunshot violence incidents are not uncommon in or around universities and we had heard about many gunshot incidents in universities recently, such as the one just happened in MSU. By analyzing the time and space factors, we hope to inform the government and the public about the risk of gunshot incidents. Governments might be able to make informed efforts like strengthening the security measures during particular time and in particular locations while people considering locations to live or study could make more informed decisions based on safety as well.
Identify the types of variables in your research question. Categorical? Quantitative?
- Incident Date - categorical, <chr>, in the format of “<month> <date>, 2023”
- State - categorical, <chr>, the name of the state
- City or County - categorical, <chr>, the name of the city or county
- Address - categorical, <chr>, the place the incident takes place
- # Killed - quantitative, <int>, the number of people killed in the incident
- # Injured - quantitative, <int>, the number of people injured in the incident

Literature

Gun Violence Archive

Johnson, B. T., Sisti, A., Bernstein, M., Chen, K., Hennessy, E. A., Acabchuk, R. L., & Matos, M. (2021). Community-level factors and incidence of gun violence in the United States, 2014–2017. Social Science & Medicine, 280. https://doi.org/10.1016/j.socscimed.2021.113969

This article examines community and state-level factors that affect gun violence in the United States. In order to gain a better understanding of gun violence as a public health issue, the study looked at county and state-level predictors of gun violence rates and gun-related casualty rates. Data sources for this study included the Gun Violence Archive data from 2014-2017, which is the same archive but prior to the dataset we are using. The study found that community and state-level factors do affect gun violence specifically gun violence is higher among counties that had high median incomes and also high levels of poverty. However, poverty levels did not appear to be correlated with gun violence rates in counties with low median incomes. Factors such as institutional racism and police presence may influence the associations between community level factors and gun violence.

Our research will use the Gun Violence Archive but rather than looking at community and state-level factors that affect gun violence we will be analyzing the time and location of gun violence incidents and how these factors affect the level of injury and death rate. There is no causation between time and location and death rates, but there may be correlations, for example if there are fewer operating ambulances in one county at a particular time of day, that could lead to an increase of death rates from gun violence.

Glimpse of data

gunshot_incidents <- read.csv("data/gunshot_incidents.csv", 
                              sep = ",", header = TRUE)

glimpse(gunshot_incidents)

Rows: 4,261
Columns: 8
$ Incident.ID    <int> 2536076, 2535958, 2535944, 2535947, 2535541, 2535883, 2…
$ Incident.Date  <chr> "28-Feb-23", "28-Feb-23", "28-Feb-23", "28-Feb-23", "28…
$ State          <chr> "Missouri", "Missouri", "Virginia", "Missouri", "Tennes…
$ City.Or.County <chr> "Kansas City", "Kansas City", "Richmond", "Saint Louis …
$ Address        <chr> "2300 block of Blue Ridge Blvd", "3100 block of E 51st …
$ X..Killed      <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ X..Injured     <int> 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1…
$ Operations     <chr> "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",…