Panel Data Models

Example Exercise: Grunfeld Investment data

This data consists of 10 large US manufacturing firms from 1935 to 1954.

Your Task: Analyze the many types of panel models.

This code was based on the paper: Croissant, Y., Milo, G.(2008). Panel Data Econometrics in R: The plm Package, Journal of Statistical Software, 27(2).

Data

Variables:

invest: Gross investment, defined as additions to plant and equipment plus maintenance and repairs in millions of dollars deflated by the implicit price deflator of producers’ durable equipment (base 1947);
value: Market value of the firm, defined as the price of common shares at December 31 (base 1947);
capital: Stock of plant and equipment, defined as the accumulated sum of net additions to plant and equipment deflated by the implicit price deflator for producers’ durable equipment (base 1947);
firm: American manufacturing firms;
year: Year of data;
firmcod: Numeric code that identifies each firm.

Let’s begin!

Import libraries

library(readxl) #read excel files
library(skimr) #summary statistics
library(foreign) #panel data models
library(plm) # Lagrange multiplier test and panel models

Import dataset

data <- read_excel("Data/Grunfeld_data.xlsx")
df <- data.frame(data)

Take a first look at your data

invest	value	capital	firm	year	firmcod
317.6	3078.5	2.8	General Motors	1935	6
391.8	4661.7	52.6	General Motors	1936	6
410.6	5387.1	156.9	General Motors	1937	6
257.7	2792.2	209.2	General Motors	1938	6
330.8	4313.2	203.4	General Motors	1939	6
461.2	4643.9	207.2	General Motors	1940	6
512.0	4551.2	255.2	General Motors	1941	6
448.0	3244.1	303.7	General Motors	1942	6
499.6	4053.7	264.1	General Motors	1943	6
547.5	4379.3	201.6	General Motors	1944	6

Prepare your data

Take out the “firmcod” variable from the dataset

df$firmcod = NULL #another way or removing a variable

Factor categorical variables

firm is a categorical nominal variable, and should be treated as so in the modelling process. And for this example year should also be considered as a categorical ordinal variable, instead of a continuous one.

df$firm = factor(df$firm)
df$year = factor(df$year, ordered = T)

Take a look at the summary of your data. Do you see the differences regarding the categorical ones?

summary(df)

##      invest            value            capital                      firm    
##  Min.   :   0.93   Min.   :  30.28   Min.   :   0.8   American Steel   : 20  
##  1st Qu.:  27.38   1st Qu.: 160.32   1st Qu.:  67.1   Atlantic Refining: 20  
##  Median :  52.37   Median : 404.65   Median : 180.1   Chrysler         : 20  
##  Mean   : 133.31   Mean   : 988.58   Mean   : 257.1   Diamond Match    : 20  
##  3rd Qu.:  99.78   3rd Qu.:1605.92   3rd Qu.: 344.5   General Electric : 20  
##  Max.   :1486.70   Max.   :6241.70   Max.   :2226.3   General Motors   : 20  
##                                                       (Other)          :100  
##       year    
##  1935   : 11  
##  1936   : 11  
##  1937   : 11  
##  1938   : 11  
##  1939   : 11  
##  1940   : 11  
##  (Other):154

Ordinary least square model

First run an Ordinary least square model without the firm variable. Compare the results from this model to the many panel data models.

mlr = lm(invest ~ value + capital, data = df)
summary(mlr)

## 
## Call:
## lm(formula = invest ~ value + capital, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -290.33  -25.76   11.06   29.74  377.94 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -38.410054   8.413371  -4.565 8.35e-06 ***
## value         0.114534   0.005519  20.753  < 2e-16 ***
## capital       0.227514   0.024228   9.390  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 90.28 on 217 degrees of freedom
## Multiple R-squared:  0.8179, Adjusted R-squared:  0.8162 
## F-statistic: 487.3 on 2 and 217 DF,  p-value: < 2.2e-16