Data Analysis: How to Interpret Regression Summary Plots (in R)

Have always wanted to collect a reference guide for interpreting the 4 regression plots provided in R when you do plot(lm(...)) - here it is!

Definitions & Notation

Fitted, $\hat{y_{i}}$
Residuals, $e_{i}$ = $observed_{i} - fitted_{i}$ = $y_{i} - \hat{y_{i}}$
Standardized Residuals, $r_{i}$: residual divided by estimate of its standard deviation
$r_{i} = \frac{e_{i}}{\sqrt{MSE * (1 - h_{ii})}}$
Leverage, $h_{ii}$: "leverage of ith data point" - amount coefficients would change if observation removed. Not fully defining here; can be accessed in a fitted model in R via function hatvalues. For more info see [1]

Residuals vs. Fitted

Using the Plot

X Axis: Fitted Values, $\hat{y_{i}}$
Y Axis: Residuals, $e_{i}$
What it Shows / How to Use:
- Shows whether linearity holds (i.e. mean of residual ~ 0 over whole x axis) --> visually, red line is near dashed line
- Shows whether homoskedasticity holds (spread of residuals should be approximately the same over whole x axis)
- Shows if there are outliers (labeled as numbered points)

Examples

Example 1: Visual Pass vs. Fail (source: [2])

Example 2: Seemingly quadratic pattern w/ some pretty extreme outliers (source: [3])

library(mlbench)
data(BostonHousing)
plot(lm(medv ~ crim + rm + tax + lstat, data = BostonHousing))

Q-Q Plot

Using the Plot

X Axis: Theoretical Quantiles - quantiles calculated for normal distribution (assuming normal here...can do Q-Q Plot for other distributions)
Y Axis: Standardized Residuals - sample data, sorted in ascending order, and then quantiles calculated within the sample
What it Shows / How to Use:
- Reference Line - Y = X; this is what we compare the scatterplot against. Plot follows diagonal if Y Axis & X Axis quantiles are from same distribution

Examples

Likely Normal (Pass) vs. Likely Non-Normal (Fail) (source: [2])

Scale-Location Plot

Using the Plot

X Axis: Fitted Values, $\hat{y_{i}}$
Y Axis: Square Root of Standardized Residuals, $\sqrt{r_{i}}$
What it Shows / How to Use:
- Verify red line is ~ horizontal (shows homoskedasticity, use above hypothesis test if needed)
- Verify no clear pattern among residuals - should be randomly scattered around red line with equal variability at all fitted values

Examples

Example 1: Visual Pass vs. Fail (source: [2])

Example 2: Boston dataset from above (source: [4])

Taking the Boston Housing data from Example 2 and formally test homoskedasticity, via a Breusch Pagan Test

(Here the null hypothesis is homoskedasticity, and with p ~ 0 we reject it)

model<-lm(medv ~ crim + rm + tax + lstat, data = BostonHousing)
bptest(model)

    studentized Breusch-Pagan test

data:  model
BP = 30.934, df = 4, p-value = 3.158e-06

Residuals vs. Leverage Plot

Using the Plot

X Axis: Leverage, $h_{ii}$
Y Axis: Standardized Residuals, $r_{i}$
What it Shows / How to Use
- Lets us identify influential observations in the regression model
- If any point falls outside of Cook's distance (red dashed lines, typically labeled 0.5, 1.0, ...), then consider that an influential observation and investigate it further

Examples

Visual Pass vs. Fail (source: [2])

References

[1] Penn State STAT 462: Applied Regression Analysis, "9.2 - Using Leverages to Help Identify Extreme X Values", https://online.stat.psu.edu/stat462/node/171/, accessed 2022-09-18

[2] University of Virginia Library Website, "Understanding Diagnostic Plots for Linear Regression Analysis", https://data.library.virginia.edu/diagnostic-plots/, accessed 2022-09-18

[3] Moreno, Alexander; Boosted ML: Articles on Statistics and Machine Learning for Healthcare, "Linear Regression Plots: Fitted vs Residuals", https://boostedml.com/2019/03/linear-regression-plots-fitted-vs-residuals.html, accessed 2022-09-18

[4] Moreno, Alexander; Boosted ML: Articles on Statistics and Machine Learning for Healthcare, "The Scale Location Plot: Interpretation in R", https://boostedml.com/2019/03/linear-regression-plots-scale-location-plot.html, accessed 2022-09-18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Analysis: How to Interpret Regression Summary Plots (in R)

Definitions & Notation

Residuals vs. Fitted

Using the Plot

Examples

Q-Q Plot

Using the Plot

Examples

Scale-Location Plot

Using the Plot

Examples

Residuals vs. Leverage Plot

Using the Plot

Examples

References

Clone this wiki locally