PERSONAL - Applied Regression

Recap: Basic Statistics

Introduction

Simple Linear Regression

Source d.f. SS (Sum of Squares) MS (Mean Square) F
Regression 1 1 1 SSR=\sum_{i=1}^n(\hat y_i - \overline y)^2 S S R = i = 1 n ( y ^ i y ) 2 SSR=\sum_{i=1}^n(\hat y_i - \overline y)^2 MSR = SSR M S R = S S R MSR = SSR \frac{MSR}{MSE} M S R M S E \frac{MSR}{MSE}
Residual (Error) n-2 n 2 n-2 SSE=\sum_{i=1}^n(y_i - \hat y_i)^2 S S E = i = 1 n ( y i y ^ i ) 2 SSE=\sum_{i=1}^n(y_i - \hat y_i)^2 MSE = s^2 M S E = s 2 MSE = s^2
Total n-1 n 1 n-1 SST=\sum_{i=1}^n(y_i - \overline y)^2 S S T = i = 1 n ( y i y ) 2 SST=\sum_{i=1}^n(y_i - \overline y)^2

Recap: Matrix Algebra

Operations and Special Types

Simple Regression (Matrix)

Multiple Regression

Source d.f. SS (Sum of Squares) MS (Mean Square) F
Regression k k k SSR=\sum_{i=1}^n(\hat y_i - \overline y)^2 S S R = i = 1 n ( y ^ i y ) 2 SSR=\sum_{i=1}^n(\hat y_i - \overline y)^2 MSR = \frac{SSR}k M S R = S S R k MSR = \frac{SSR}k \frac{MSR}{MSE} M S R M S E \frac{MSR}{MSE}
Residual (Error) n-k-1 n k 1 n-k-1 SSE=\sum_{i=1}^n(y_i - \hat y_i)^2 S S E = i = 1 n ( y i y ^ i ) 2 SSE=\sum_{i=1}^n(y_i - \hat y_i)^2 MSE = \frac{SSE}{n-k-1}=s^2 M S E = S S E n k 1 = s 2 MSE = \frac{SSE}{n-k-1}=s^2
Total n-1 n 1 n-1 SST=\sum_{i=1}^n(y_i - \overline y)^2 S S T = i = 1 n ( y i y ) 2 SST=\sum_{i=1}^n(y_i - \overline y)^2

Specification

Model Diagnostics

Lack of Fit

Model Selection

forward selection

INIT
  M = intercept-only model
  P = all covariates
  
REPEAT
  IF P empty STOP
  ELSE
    calculate AIC for sizeof(P) models, each model containing one covariate in P is added to M
    IF all AICs > AIC(M) STOP
    ELSE
      update M with covariate whose addition had minimum AIC
      remove covariate from P
forward selection

INIT
  M = intercept-only model
  P = all covariates
  
REPEAT
  IF P empty STOP
  ELSE
    calculate AIC for sizeof(P) models, each model containing one covariate in P is added to M
    IF all AICs > AIC(M) STOP
    ELSE
      update M with covariate whose addition had minimum AIC
      remove covariate from P
backward elimination

INIT
  M = model with all covariates
  P = all covariates
  
REPEAT
  IF P empty STOP
  ELSE
    calculate AIC for sizeof(P) models, each model without each of the covariates in P
    IF all AICs > AIC(M) STOP
    ELSE
      update M by deleting covariate that led to minimum AIC
      remove covariate from P
backward elimination

INIT
  M = model with all covariates
  P = all covariates
  
REPEAT
  IF P empty STOP
  ELSE
    calculate AIC for sizeof(P) models, each model without each of the covariates in P
    IF all AICs > AIC(M) STOP
    ELSE
      update M by deleting covariate that led to minimum AIC
      remove covariate from P
stepwise regression

INIT
  M = intercept-only model OR full model
  e = small threshold
  
REPEAT UNTIL STOP
  do a forward step on M
  do a backward step on M
  
# For both steps, the differences in AIC need to be
# greater than e for the selection to go forward, otherwise
# the changes can keep undoing each other.
stepwise regression

INIT
  M = intercept-only model OR full model
  e = small threshold
  
REPEAT UNTIL STOP
  do a forward step on M
  do a backward step on M
  
# For both steps, the differences in AIC need to be
# greater than e for the selection to go forward, otherwise
# the changes can keep undoing each other.

Nonlinear Regression

Time Series Models

Logistic Regression

Poisson Regression

Linear Mixed Effects Models

Statistical Learning (Machine Learning)

Prediction Methods

Statistical Decision Theory

Categorical Data


Summary by Flavius Schmidt, ge83pux, 2025.
https://home.cit.tum.de/~scfl/
Images from Wikimedia.