Project title

Proposal

library(tidyverse)

Data 1

Introduction and data

  • Identify the source of the data.

This data set is called FIFA 2021 Complete Player Dataset from Aayush Mishra on fifaindex.com.

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data set was originally collected 3 years ago. This data is from the source fifaindex.com by the use of scraping and chrome extensions.

  • Write a brief description of the observations.

Each observation is a soccer player from the video game FIFA 2021, and the data includes this individual player’s name, nationality, position, overall rating, age, hits, potential, and team.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

How does a player’s nationality and club team, correlate to their current rating and potential?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

I find it very intriguing to explore how a player’s nationality and club team affects their current rating and potential. For example, do some teams have player’s with low ratings but a very high potential rating in the future, and do some teams primarily have players with no potential room to grow? Also do certain countries have higher rated players than other countries, or do they have players with a larger amount of potential to improve? I hypothesize that the largest teams like Real Madrid or Barcelona will have players with a higher rating but no significantly greater potential growth. While smaller teams with a developmental academy like Ajax or Lisbon will have players with a lower rating but a higher level of potential growth.

  • Identify the types of variables in your research question. Categorical? Quantitative?

Nationality: Categorical

Club team: Categorical

Rating: Quantitative

Potential: Quantitative

Literature

  • Find one published credible article on the topic you are interested in researching.

https://fifacareermodetips.com/guides/understanding-potential/

  • Provide a one paragraph summary about the article.

The article “Understanding Potential” from FIFA Career Mode Tips provides an overview of player potential in FIFA. The article explains the concept of potential as a measure of a player’s maximum possible rating in the future, based on their current attributes and age. The article also covers the different types of potential ratings and how they are calculated, including fixed potential, range potential, and dynamic potential.

  • In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.

The article gives me a better understanding of my research question as I learned to think of potential as a player’s predicted or peak overall rating. This provided me the awesome idea of creating a stacked bar plot of extra potential over current rating to show the range certain players can grow and make a beneficial visualization. The article also provides examinations of examples of certain player’s potential which gives me a better understanding of how I could use this in my project.

Glimpse of data

Fifa_21 <- read.csv(file = 'data/FIFA-NEW.csv', header = T)
glimpse(Fifa_21)
Rows: 6,124
Columns: 10
$ X           <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ player_id   <int> 158023, 20801, 190871, 203376, 200389, 192985, 188545, 183…
$ name        <chr> "Lionel Messi", "Cristiano Ronaldo", "Neymar Jr", "Virgil …
$ nationality <chr> "Argentina", "Portugal", "Brazil", "Netherlands", "Sloveni…
$ position    <chr> "ST|CF|RW", "ST|LW", "CAM|LW", "CB", "GK", "CM|CAM", "ST",…
$ overall     <int> 94, 93, 92, 91, 91, 91, 91, 91, 90, 90, 90, 90, 90, 89, 79…
$ age         <int> 33, 35, 28, 29, 27, 29, 31, 29, 27, 28, 28, 28, 32, 21, 25…
$ hits        <int> 299, 276, 186, 127, 47, 119, 89, 66, 53, 94, 76, 68, 50, 2…
$ potential   <int> 94, 93, 92, 92, 93, 91, 91, 91, 91, 90, 90, 93, 90, 95, 83…
$ team        <chr> "FC Barcelona ", "Juventus ", "Paris Saint-Germain ", "Liv…

Data 2

Introduction and data

  • Identify the source of the data

    • The dataset “Proportion of people expressing discriminatory attitudes towards people living with HIV, 2014-2021” is sourced from UNICEF DATA.

    • https://data.unicef.org/resources/dataset/hiv-aids-statistical-tables/

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    • Data was originally collected by unicef from the year 2000 to 2022 through nationally representative population-based surveys.
  • Write a brief description of the observations.

    • Each observation includes the percentage of people expressing discriminatory attitudes toward people living with HIV. These observations are broken down by the country, global region, and year that the observation data was collected in, along with observations based on certain age groups, sexes, and socioeconomic status.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

    • Worldwide, are there regional differences for the proportion of individuals with discriminatory attitudes towards those living with HIV, how does socioeconomic/residential status impact the attitudes individuals hold towards those with HIV, and how has this changed over time?
  • A description of the research topic along with a concise statement of your hypotheses on this topic.

    • This research would look at how different factors of life, from age to sex, wealth to the region an individual lives in influences the attitude one has towards those with HIV.

    • We hypothesize that each region will have a different average of negative opinion, based on the culture of the region.

    • In addition, we predict that samples coming from populations of higher wealth, higher levels of education on HIV, and younger populations will have lower proportions of negativity; and that over time levels of negativity have decreased as HIV has become less stigmatized.

  • Identify the types of variables in your research question. Categorical? Quantitative?

    • Sex, Wealth Quintile, Education, and Residence are all Categorical variables.

    • The proportion of individuals with discriminatory attitudes towards people with HIV is a Quantitative variable.

Literature

  • Find one published credible article on the topic you are interested in researching.

    • article: Discriminatory attitudes towards people living with HIV declining in some regions, rebounding in others

    • link: https://www.unaids.org/en/resources/presscentre/featurestories/2021/january/20210125_discriminatory-attitudes

    • from: The Joint United Nations Programme on HIV/AIDS

  • Provide a one paragraph summary about the article.

    • Among 151 countries contributing to UNAIDS study on discriminatory attitudes towards those with HIV, UNAIDS is observing great strides in East and South Africa towards destigmatizing HIV. While this may be the case in this region, other regions still criminalize HIV exposure and transmission, inhibiting the control of the epidemic and further stigmatizing the disease. In the most recent set of data, over 50% of reporting countries returned data that showed the majority of their younger (15-49 y/o) population still holds discriminatory attitudes towards those with AIDS.
  • In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.

    • My research question aims to look at the regions with high and low rates of discriminatory attitudes and see if, regardless of current levels, globally these trends are decreasing. UNAIDS does not look at the impact of socioeconomic status, age, sex, or education about AIDS in how they categorize their data. I hope that by looking at these factors we can better understand where negative attitudes are originating so as to determine where education would be most necessary and effective in destigmatizing HIV.

Glimpse of data

HIV_data <- read.csv("data/HIV_Discrimination_2022.csv", header = TRUE)

glimpse(HIV_data)
Rows: 1,335
Columns: 11
$ ISO3            <chr> "BGD", "BGD", "BGD", "BGD", "BGD", "BGD", "BGD", "BGD"…
$ Country         <chr> "Bangladesh", "Bangladesh", "Bangladesh", "Bangladesh"…
$ UNICEF.Region   <chr> "South Asia", "South Asia", "South Asia", "South Asia"…
$ Indicator       <chr> "Per cent of people expressing discriminatory attitude…
$ Data.Source     <chr> "Multiple Indicator Cluster Survey 2019", "Multiple In…
$ Year            <int> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, …
$ Sex             <chr> "Female", "Female", "Female", "Female", "Female", "Fem…
$ Age             <chr> "15-19", "15-24", "15-24", "15-24", "15-24", "15-24", …
$ DISAGG_CATEGORY <chr> "Age", "Education", "Education", "Education", "Educati…
$ DISAGG          <chr> "15-19", "higher secondary+", "pre-primary or none", "…
$ Value           <dbl> 42.6, 28.9, 56.3, 56.8, 48.0, 44.0, 34.5, 52.7, 47.5, …

Data 3

Introduction and data

  • Identify the source of the data.

    This data comes from the Tidy Tuesday repositories on GitHub via the site “Data is Plural.”

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    It comes from Lynn Fisher who collected data from upwards of 630 movies, calculating the difference between the love interests’ ages. This was collected/published around February 7th, 2018.

  • Write a brief description of the observations.

    The observations are fairly simple to follow. They list the movie name, director, release year, and then the age difference before having columns for what number couple each one is (only matters if there are multiple in the movie), their names, their characters’ names, their birth dates, and then their ages.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

    Does the age difference of actors in movies decrease as the release date becomes more modern?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

    This would explore the culture of Hollywood, especially as it tended to have a history of huge power dynamics among older males and younger females. For this question’s significance on the male/female power dynamic in Hollywood, it would make more sense to only view the couples with a male and a female, although there are LGBTQ+ couple representations present in the data set. Based on intuition, I would believe that, generally, the age gap lessens as the movies become newer.

  • Identify the types of variables in your research question. Categorical? Quantitative?

    The variables are quantitative for the age gap and categorical for the gender.

Literature

  • Find one published credible article on the topic you are interested in researching.

    https://www.theguardian.com/film/2022/feb/06/move-over-silver-foxes-hollywood-gets-to-grips-with-the-age-gap

  • Provide a one paragraph summary about the article.

    The article starts off by addressing the issue of casting older males with younger females and how it has been prevalent for so long in Hollywood, but is slowly facing more criticism and opposition. Filmmakers have started to approach this issue in different ways. They have flipped the roles with an old female actress with a younger male romantic interest. Others have decided not to beat around the bush, purposely casting an older actor with a younger actress to show the truth of the power imbalance. Another issue the article explores is how male actors seem to have a longer time in the industry than female actresses. All of these issues were extremely interesting to read about, especially seeing how different people and films in the industry have responded.

  • In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.

    Our research will provide concrete data on whether the uncomfortable age gaps between male and female romantic interests in movies has actually been waning over time. This will tell us if criticism within the industry and from viewers has actually been effective in making a change in this issue.

Glimpse of data

AgeGaps <- read.csv(file = 'data/age_gaps.csv')
glimpse(AgeGaps)
Rows: 1,155
Columns: 13
$ movie_name         <chr> "Harold and Maude", "Venus", "The Quiet American", …
$ release_year       <int> 1971, 2006, 2002, 1998, 2010, 1992, 2009, 1999, 199…
$ director           <chr> "Hal Ashby", "Roger Michell", "Phillip Noyce", "Joe…
$ age_difference     <int> 52, 50, 49, 45, 43, 42, 40, 39, 38, 38, 36, 36, 35,…
$ couple_number      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ actor_1_name       <chr> "Ruth Gordon", "Peter O'Toole", "Michael Caine", "D…
$ actor_2_name       <chr> "Bud Cort", "Jodie Whittaker", "Do Thi Hai Yen", "T…
$ character_1_gender <chr> "woman", "man", "man", "man", "man", "man", "man", …
$ character_2_gender <chr> "man", "woman", "woman", "woman", "man", "woman", "…
$ actor_1_birthdate  <chr> "1896-10-30", "1932-08-02", "1933-03-14", "1930-09-…
$ actor_2_birthdate  <chr> "1948-03-29", "1982-06-03", "1982-10-01", "1975-11-…
$ actor_1_age        <int> 75, 74, 69, 68, 81, 59, 62, 69, 57, 77, 59, 56, 65,…
$ actor_2_age        <int> 23, 24, 20, 23, 38, 17, 22, 30, 19, 39, 23, 20, 30,…