I am working through the Andy Field textbook Discovering Statistics Using R and on p.899 came to a point where my R code was not returning the same results as shown in the book.
Specifically, the example deals with fitting a multilevel linear model to repeated measures data, and does so by slowly building up to the final model by creating simpler models one parameter at a time.
My issue stems from coercing the Time variable from a factor into a numeric, which is straightforward, but my solution involved imputing the Time integers as months (e.g., the data file uses "Satisfaction_6_Months" and "Satisfaction_12_Months") so I coded the four Time levels as c(0,6,12,18).
The text (actually the errata online) uses a different approach, coding the four Time levels as c(0,1,2,3). I would have expected that this difference between Time codes would have little impact on the data, but I am quite wrong.
The code below demonstrates how the final line in the anova (containing ARModel) is very different in these two situations.
I am trying to understand 1) why this difference is observed at all, 2) why it is only observed for this last point in the model building exercise, and 3) what it means in general for when I put this content into practice and might have good reason for adopting a month-wise coding structure.
First, run the full script to replicate the results. Then, uncomment the indicated line and run again.
library(reshape2)data_url <- "https://studysites.sagepub.com/dsur/study/DSUR%20Data%20Files/Chapter%2019/Honeymoon%20Period.dat"satisfactionData <- read.delim(data_url, header = TRUE)restructuredData <- melt(satisfactionData, id = c("Person", "Gender"), measured = c("Satisfaction_Base", "Satisfaction_6_Months", "Satisfaction_12_Months", "Satisfaction_18_Months"))names(restructuredData) <- c("Person", "Gender", "Time", "Life_Satisfaction")restructuredData$Time<-as.numeric(restructuredData$Time)-1# ***************************************************************# On second pass, uncomment the line below and rerun whole script# ***************************************************************# restructuredData$Time = restructuredData$Time*6# create models one parameter at a timeintercept <- gls(Life_Satisfaction~1, data = restructuredData, method = "ML", na.action = na.exclude)randomIntercept <- lme(Life_Satisfaction ~1, data = restructuredData, random = ~1|Person, method = "ML", na.action = na.exclude, control = list(opt="optim"))timeRI <- update(randomIntercept, .~. + Time)timeRS <- update(timeRI, random = ~Time|Person)ARModel <- update(timeRS, correlation = corAR1(0, form = ~Time|Person))anova(intercept, randomIntercept, timeRI, timeRS, ARModel)# Pass #1 - with Time as 0,1,2,3# Model df AIC BIC logLik Test L.Ratio p-value# intercept 1 2 2064.053 2072.217 -1030.0263 # randomIntercept 2 3 1991.396 2003.642 -992.6978 1 vs 2 74.65704 <.0001# timeRI 3 4 1871.728 1888.057 -931.8642 2 vs 3 121.66714 <.0001# timeRS 4 6 1874.626 1899.120 -931.3131 3 vs 4 1.10224 0.5763# ARModel 5 7 1872.891 1901.466 -929.4453 4 vs 5 3.73564 0.0533# Pass #2 - with Time as 0,6,12,18# Model df AIC BIC logLik Test L.Ratio p-value# intercept 1 2 2064.053 2072.217 -1030.0263 # randomIntercept 2 3 1991.396 2003.642 -992.6978 1 vs 2 74.65704 <.0001# timeRI 3 4 1871.728 1888.057 -931.8642 2 vs 3 121.66714 <.0001# timeRS 4 6 1874.627 1899.120 -931.3135 3 vs 4 1.10151 0.5765# ARModel 5 7 1876.627 1905.203 -931.3135 4 vs 5 0.00001 0.9978