Clustered Standard Errors with data containing NAs

admin

Administrator
Staff member
I'm unable to cluster standard errors using R and guidance based on this <a href="http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/" rel="nofollow">post</a>. The cl function returns the error:

Code:
Error in tapply(x, cluster1, sum) : arguments must have same length

After reading up on
Code:
tapply
I'm still not sure why my cluster argument is the wrong length, and what is causing this error.

Here is a link to the data set that I'm using.

<a href="https://www.dropbox.com/s/y2od7um9pp4vn0s/Ec 1820 - DD Data with Controls.csv" rel="nofollow">https://www.dropbox.com/s/y2od7um9pp4vn0s/Ec 1820 - DD Data with Controls.csv</a>

Here is the R code:

Code:
# read in data
charter&lt;-read.csv(file.choose())
View(charter)
colnames(charter)

# standardize NAEP scores
charter$naep.standardized &lt;- (charter$naep - mean(charter$naep, na.rm=T))/sd(charter$naep, na.rm=T)

# change NAs in year.passed column to 2014
charter$year.passed[is.na(charter$year.passed)]&lt;-2014

# Add column with indicator for in treatment (passed legislation)
charter$treatment&lt;-ifelse(charter$year.passed&lt;=charter$year,1,0)

# fit model
charter.model&lt;-lm(naep ~ factor(year) + factor(state) + treatment, data = charter)
summary(charter.model)
# account for clustered standard errors by state
cl(dat=charter, fm=charter.model, cluster=charter$state)

# accounting for controls
charter.model.controls&lt;-lm(naep~factor)

# clustered standard errors
# ---------

# function that calculates clustered standard errors
# source: http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/
cl   &lt;- function(dat, fm, cluster){
  require(sandwich, quietly = TRUE)
  require(lmtest, quietly = TRUE)
  M &lt;- length(unique(cluster))
  N &lt;- length(cluster)
  K &lt;- fm$rank
  dfc &lt;- (M/(M-1))*((N-1)/(N-K))
  print(K)
  uj  &lt;- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
  vcovCL &lt;- dfc*sandwich(fm, meat=crossprod(uj)/N)
  coeftest(fm, vcovCL) 
}

# calculate clustered standard errors 
cl(charter, charter.model, charter$state)

The inner workings of the function are a little over my head.