Suggested Answers: More Models

Packages

Note - if you don’t have the plotly or widgetframe package, you can install these before running library by writing the following code in the console:

install.packages(“plotly”) install.packages(“widgetframe”)

These packages are not necessary to finish the AE.

Interaciton Model

– Fit the interaction model that models body mass by island and flipper length. Display the summary output.

int_model <- linear_reg() |>
  set_engine("lm") |>
  fit(body_mass_g ~ flipper_length_mm * island, data = penguins)

– Below, write out the estimate regression equation below.

\(\widehat{body\_mass\_g} = -5464 + 48.5*flipper + 3551*Dream + 3218*Torgersen - 19.4*flipper*Dream - 17.4*flipper*Torgersen\)

{1 if Dream; 0 else} {1 if Torgersen; 0 else}

– Now, write out the estimated regression equation for each island. Show how the slopes by island.

Biscoe: \(\widehat{body\_mass} = -5464 + 48.5*flipper}\)

Dream: \(\widehat{body\_mass} = (-5464 + 3551) + (48.5 - 19.7)*flipper\)

Torgersen: \(\widehat{body\_mass} = (-5464 + 3218) + (48.5 - 17.4)*flipper\)

Now, name your interaction model int_model and remove the tidy() from the pipeline. Use this named model to make this prediction below.

– Predict the body mass of a penguin with a flipper length of 200 on the Dream island

predict(int_model, data.frame(flipper_length_mm = 200, island = "Dream"))
# A tibble: 1 × 1
  .pred
  <dbl>
1 3915.

How does the picture change if our two explanatory variables are quantitative?

In this example, let’s explore body mass, and it’s relationship to bill length and flipper length.

– Brainstorm, how could we visualize this?

quanplot <- plot_ly(penguins, 
                    x = ~ flipper_length_mm, y = ~ body_mass_g, z = ~bill_length_mm,
                    marker = list(size = 3, color = "lightgray" , alpha = 0.5, 
                                  line = list(color = "gray" , width = 2))) |>
                      add_markers() |>
                      plotly::layout(scene = list(
                        xaxis = list(title = "Flipper (mm)"),
                        yaxis = list(title = "Bill (mm)"), 
                        zaxis = list(title = "Body Mass (g)")
                      )) |>
                    config(displayModeBar = FALSE)
                  frameWidget(quanplot)

– Now, fit the additive model between flipper length, bill length, and body mass below.

add_model <- linear_reg() |>
  set_engine("lm") |>
  fit(body_mass_g ~ flipper_length_mm + bill_length_mm, data = penguins) 

– Interpret bill length in the context of the problem.

“Holding all other variables constant” “Estimate a mean change in Y” “For a one unit increase in X…..”

Holding flipper length constant, for a one mm increase in bill length, we estimate a mean change in body mass of 6.05 grams.

– Finally, fit the interaction model between flipper length, bill length, and body mass.

int_model2 <- linear_reg() |>
  set_engine("lm") |>
  fit(body_mass_g ~ flipper_length_mm * bill_length_mm, data = penguins) 

body mass’s relationship with flipper length depends on the penguins bill length

– Lastly, calculate the R-squared (coefficient of determination) for each model fit today. Interpret the largest R-squared value in the context of the problem.

glance(int_model)$r.squared #let's interpret this one
[1] 0.7857486
glance(add_model)$r.squared
[1] 0.7599577
glance(int_model2)$r.squared
[1] 0.7694049

Our interaction model with flipper length and island explains roughly 79% of the variation in body mass.

What really is R-squared? (See slides)

Model Selection

How can we choose between the models that we fit? Which one is the best?

Adjusted r-squared and AIC are two common methods (see ae-14)

Extra Practice (Changing the default intercept term)

When we modeled body mass by island when learning SLR. The default was to have the Biscoe island as the baseline (intercept) term. What if we want to change that?

There are many ways to do this. Using mutate and factor, change the intercept term to be the Dream island. Note how the estimates change in the model as a result.

linear_reg() |>
  set_engine("lm") |>
  fit(body_mass_g ~ island *flipper_length_mm, data = penguins) |>
  tidy()
# A tibble: 6 × 5
  term                              estimate std.error statistic  p.value
  <chr>                                <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                        -5464.     431.      -12.7  2.51e-30
2 islandDream                         3551.     969.        3.66 2.89e- 4
3 islandTorgersen                     3218.    1680.        1.92 5.62e- 2
4 flipper_length_mm                     48.5      2.05     23.7  1.66e-73
5 islandDream:flipper_length_mm        -19.4      4.94     -3.93 1.03e- 4
6 islandTorgersen:flipper_length_mm    -17.4      8.73     -1.99 4.69e- 2
penguins <- penguins |>
  mutate(island = factor(island, levels = c("Dream" , "Biscoe", "Torgersen")))

linear_reg() |>
  set_engine("lm") |>
  fit(body_mass_g ~ island * flipper_length_mm, data = penguins) |>
  tidy()
# A tibble: 6 × 5
  term                              estimate std.error statistic  p.value
  <chr>                                <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                       -1913.      868.      -2.20  2.82e- 2
2 islandBiscoe                      -3551.      969.      -3.66  2.89e- 4
3 islandTorgersen                    -333.     1841.      -0.181 8.57e- 1
4 flipper_length_mm                    29.1       4.49     6.49  3.12e-10
5 islandBiscoe:flipper_length_mm       19.4       4.94     3.93  1.03e- 4
6 islandTorgersen:flipper_length_mm     1.99      9.60     0.208 8.36e- 1