Does the age difference of actors in movies relationships decrease as the release date becomes more modern?
THAN: Report
Introduction and Data:
Our data comes from the Tidy Tuesday repositories on GitHub via the site “Data is Plural.”
The data set, Hollywood Age Gap, was compiled by Lynn Fisher and explores the age difference between actors playing couples in Hollywood movies which demonstrates the male-female power dynamic in the movie industry.
Hollywood Age Gap can be accessed here.
Hollywood Age Gap has information from upwards of 630 movies, sampled from films from 1935 to 2022, the dataset was most recently updated on February 14th, 2023.
Hollywood Age Gap, while mostly compiled by Fisher, is open source and in turn anyone can submit data to the dataset for consideration contingent on three rules:
- The actors are atleast 17 years old
- The actors are actually love interests in the film
- The characters can not be animated
Observations of the data set include:
movie_name: the name of the movie
release_year: the year the movie was released
director: the movie’s director
actor_1_name/actor_2_name: the names of the actors whose characters are in a relationship
actor_1_birthdate/actor_2_birthdate: the birthdates of the actors
actor_1_age/actor_2_age: the ages of the actors when the movie was released
age_difference: the difference between the ages of the two actors
Additional variables that were added include:
age_gap_cat: whether the age difference was greater than 15 years, or 15 years or less.
- 15 Years was chosen given that after 15 years the age gaps is large enough for the older partner to be the parent of the younger partner.
older_partner_gender: the gender of the older partner in the relationship
decades: what decade the film was released in
era: whether the film was released pre-1950, between 1950 and 1975, between 1975 and 2000, or post-2000.
Research Question:
Does the age difference between actors in movie relationships decrease as the movie’s release date becomes more modern?
Our research will attempt to determine whether the uncomfortable age gaps between male and female romantic interests in movies has been waning over time. This will tell us if criticism within the industry and from viewers has actually been effective in making a change on this issue.
In addition we will explore whether in cases where an age gap is present if the male or female member of the couple tends to be older.
Some other things to explore could be the trends of specific actors/actresses that may have experienced both sides of the age gaps, as well as how this trend exists in movies with multiple couples.
Exploratory Data Analysis

# A tibble: 2 × 2
older_partner_gender count
<chr> <int>
1 Female 215
2 Male 917
# A tibble: 2 × 2
older_partner_gender mean_gap
<chr> <dbl>
1 Female 4.73
2 Male 11.7
The scatter plot above visualizes the age difference between actors in movie relationships over time. The observations are colored based on the gender of the older partner. The purple best fit line shows the trend of age gaps regardless of gender.
From the plot and summary statistics, we observe that in cases where an age gap is present, the older partner tends to be male (215 observations where the older partner was female, 917 where the older partner was male.) And in cases where the older partner is female, the age difference tends to be smaller than the age difference for couples where the older partner is male (mean of 4.73 years compared to 11.7 years respectively.)
Additionally, the majority of the observations appear in the last 40 years and there are not many observations from the 1930s and 1940s because the creation of Hollywood movies was just beginning.

The bar chart above groups movie observations into the decade that they were released. The plot shows that the proportion of films containing age gaps of over 15 years decreases over time.
In the 1940s, over 80% of films had a large age gap of over 15 years. This proportion could be attributed to World War II, which resulted in younger men enlisting in the military rather than staring in Hollywood films.

The box plot above visualizes the mean age difference in films by the five directors most represented in Hollywood Age Gaps. The plot shows that some directors, like Woody Allen and John Glen, when compared to the mean age difference of heterosexual couples in the dataset (\(\mu = 10.4\)) tend to produce movies with a larger than average age gap; whereas Mike Newell’s movies contain below average sized age gaps.

Unlike the data on heterosexual couples, overtime the age gaps shown in films depicting same sex couples has increased. However, the dataset only contains 23 observations regarding same sex couples, the first of which only appeared in 1997 so it is difficult to determine any true trend.
Literature Review
The attached literature describes a study that was conducted to determine how much older male leads were compared to their female partners. Following the requests from regular readers of the website, the publisher decided to compile a dataset of 422 romantic films in order to answer this question.
In filtering films to include in the study, the following criteria was followed:
- the film had to have grossed 1 million dollars
- only heterosexual couples were evaluated (the study was focused on revealing gender inequality)
- there could only be one clear main couple
The ages of the actors, rather than those of their characters, were measured (ages on the first day of screening). This study took observations of films spanning 30 years (1984-2014). Over this time period, the average age gaps was approximately 4.5 years with the largest average gap being in 1984 (men were 10.4 years older on average) and the smallest average gap in 1988 (men were 0.6 years older). At no point in the 30 year time frame was the average age of female leads older than male counter parts.
It is worth noting that the age gap was smaller on films directed by women during the time period (12% of all films studied). This is very similar to our study in that it focuses on bringing to light the gender inequality from the end of the 20th century to current times in the Hollywood film industry through data.
Methodology
Since same-sex couples have only begun to be shown in film in the last couple decades, we will filter the data set so that it only contains information about movies with heterosexual couples.
We predict that overtime the mean age gap between couples has decreased and that the proportion of movies with large age gaps today is less that it was in the past.
We believe this change could be attributed to criticism within the film industry
We will look at individual 25 year eras of film production, calculating the true mean age gap found in films produced during the years pre-1950, 1950-1975, 1975-2000, and post-2000. By comparing the confidence intervals produced either by bootstrapping or CLT (based on whether data fits CLT requirements), we can find a better understanding of how age gaps changed between each era.
We will also fit a linear regression model to the data, to see how well the year a film was released, the gender of the older partner, and the director of the film can predict the age gap found in the film. We will look at the r-squared of this model to show its strength and analyze the results significance in regards to our hypothesis (that as the year becomes more modern age gaps decrease), and to better understand what factors contribute to the age gap portrayed in a film. To test the theory that director could play a role in the age gaps portrayed, we will fit the model to the observations from the five most common directors in the data set.
We will create a logistic regression model to understand the connection between release year and the likelihood that a film will contain an age gap greater than 15 years. We will then create an ROC curve and look at the area under the curve to analyze the accuracy of our model.
We predict that the age difference is greater in couples where the older partner is male.
To investigate whether the age gap for couples where the older partner is male is greater than the age gap for couples where the older partner is female, we will perform a hypothesis test on the difference in mean age differences.
Results
Bootstrapping and CLT of Mean Age Gaps
From the EDA scatterplot of age gaps in heterosexual couples, we observe that all three best fit lines are trending negatively, indicating a decrease in the age gaps over time. This observation will be explored further in the next section through the use of bootstrapping and CLT (based on sample size) to find the true mean of different eras in time.
In order to use the Central Limit Theorem, the sample size must be greater than 30 and the data must show independence. Hollywood Agegap was randomly sampled, meeting the independence requirement.
Pre-1950: \(n = 19 < 30\)
1950-1975: \(n = 87 > 30\)
1975-2000: \(n = 283 > 30\)
Post-2000: \(n = 766 > 30\)
Based on sample size, we can use CLT for all eras besides pre-1950 films where will use bootstrapping.
pre-1950
# A tibble: 1 × 2
lower upper
<dbl> <dbl>
1 12.9 21.1
We can say with 95% confidence that the true mean age gap in pre-1950 films was between 12.9 and 21.1 years.
1950-1975
[1] 12.32864 16.38400
We can say with 95% confidence that the true mean age gap in films released between 1950 and 1975 was between 12.3 and 16.4 years.
1975-2000
[1] 13.30298 15.40967
We can say with 95% confidence that the true mean age gap in films released between 1975 and 2000 was between 13.3 and 15.4 years.
post-2000
[1] 8.901113 10.020558
We can say with 95% confidence that the true mean age gap in films released after 2000 was between 8.9 and 10.0 years.
Analysis of Confidence Intervals

We can observe that over time, the upper bound of the interval has consistently decreased, and the lower bound (besides in the 1975-2000 era) has also decreased, indicating a decrease in the true mean of age gaps over time. This is consistent with what we observed in the scatterplot of age gaps in heterosexual movie couples.
Linear Regression Model: Age Difference in Relation to Release Year, Director, and the Older Partner’s Gender
[1] 0.3126371
parsnip model object
Call:
stats::lm(formula = age_difference ~ release_year + director +
older_partner_gender, data = data)
Coefficients:
(Intercept) release_year directorJohn Glen
269.7868 -0.1302 -2.3018
directorMartin Scorsese directorMike Newell directorSteven Spielberg
-5.6559 -10.1880 -5.1004
older_partner_genderMale
10.8733
\(\widehat{Age Gap} = 269.7868-0.1302*year-2.3018*G-5.6559*S-10.1880*N-5.1004*S+10.8733*Male\)
\(G:\) {1 if director is John Glen; 0 if not}
\(S:\) {1 if director is Martin Scorsese; 0 if not}
\(N:\) {1 if director is Mike Newell; 0 if not}
\(S:\) {1 if director is Steven Spielberg; 0 if not}
0 for all directors if director is Woody Allen
\(Male:\) {1 if older partner is Male; 0 if Female}
Our linear regression model with release year, director name, and the gender of the older partner explains roughly 31.2% of the variation in age gaps. This shows that there is not a perfect correlation between age gap and the year a film was released, its director, and the gender of the older partner. However it does indicate a negative correlation between age difference and year, supporting our hypothesis that the age difference between actors in movie relationships decreases as the release date becomes more modern, as well as indicating that different sized age gaps are more common for different directors.
Logistic Regression Model: Probability of a Large Age Gap

\(log(p/(1-p)) = -43.61794 + 0.02238 * release\_year\)
Our estimated logistic regression model with release year shows that as the year becomes more modern, the odds of a film having a small age gap (as opposed to an age gap of over 15 years) become greater.

# A tibble: 1 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.583
The ROC curve shows that with our logistic regression model we have an AUC of 0.583 where \(0.583 > 0.5\) indicates more true positives than false positives, meaning the model can do a better job predicting the age gap category than by flipping a coin (50-50).
Hypothesis Testing for Difference in Means
From the initial scatter plot in EDA that visualizes the gender of the older partner for each observation with respect to movie release year and age difference, we observe that in cases where the older partner is female, the age difference tends to be less than the age difference for couples where the older partner is male. We will investigate this observation by performing a hypothesis test on the difference in mean age difference between couples where the older partner is male and those where the older partner is female.
Null Hypothesis: the true mean age gap for couples where the older partner is male is the same as the true mean age gap for couples where the older partner is female.
\(H_o: \mu_m - \mu_f = 0\)
Alternative Hypothesis: the true mean age gap for couples where the older partner is male is greater than the true mean age gap for couples where the older partner is female.
\(H_o: \mu_m - \mu_f > 0\)
# A tibble: 2 × 2
older_partner_gender mean_age
<chr> <dbl>
1 Female 4.73
2 Male 11.7
\(\bar{x}_m - \bar{x}_f = 11.7 - 4.73 = 6.97\)
Warning: Please be cautious in reporting a p-value of 0. This result is an
approximation based on the number of `reps` chosen in the `generate()` step. See
`?get_p_value()` for more information.
# A tibble: 1 × 1
p_value
<dbl>
1 0
With a \(p-value < 0.001 > \alpha = 0.05\) we decide to reject the null hypothesis that the true mean age gap for couples where the older partner is male is the same as the true mean age gap for couples where the older partner is female and conclude that there is strong evidence to conclude the alternative hypothesis that the true mean age gap for couples where the older partner is male is greater than the true mean age gap for couples where the older partner is female.
Discussion
As the release dates of Hollywood films become more modern, the age difference between actors whose characters are in relationships has decreased.
- The proportion of films with age gaps of over 15 years has also consistently decreased. With over 50% of age gap categories being able to be determined by the release year of a film.
A linear additive model that predicts the age difference in relation to release year, director, and the gender of the older partner is able to explain 31.2% of observations.
It is significantly more common for the older partner in couples with an age gap to be male.
In the future we hope that more observations from earlier years as well as films including same sex couples are added to Hollywood Agegap so that more accurate and comprehensive data analysis can be performed.
We would also like to look at data that includes films from other regions of the world with large film industries such as Europe or India in order be able to see if any cultural nuance can be currently or historically shown in the portrayal of couples.
Additionally, in addition to accounting for the progression of time, we could pursue further research wherein we facet the data by genre in order to observe any possible disparity in couple age difference.
Citations
Fisher, Lynn. “Hollywood Age Gap.” Hollywood Age Gap, 2023. https://hollywoodagegap.com/.
“Study: How Much Older Are Male Leads in Romantic Films than Their Female Co-Stars?” Women and Hollywood , June 1, 2015. https://womenandhollywood.com/study-how-much-older-are-male-leads-in-romantic-films-than-their-female-co-stars-43ddef908f19/.