Two-way Repeated Ordinal ANOVA with CLMM (2024)

A two-way repeated ordinal analysis of variance can addressan experimental design with two independent variables, each of which is afactor variable, plus a blocking variable. The main effect of each independentvariable can be tested, as well as the effect of the interaction of the twofactors.

The example here looks at students’ knowledge scores forthree speakers across two different times. Because we know which student haseach score at each time, we will use the Student as a blocking variable,with the suspicion that one student might have consistently lower knowledgethan another student. If we hadn’t recorded this information about students,we could use two-way ordinal regression.

As a matter of practical interpretation, we may not careabout the absolute scores for each of the three speakers, only if the knowledgescores for students increased from Time 1 to Time 2. At Time1, students’ knowledge scores are lower for Piglet that for the other twospeakers. This isn’t the fault of Piglet; he is just speaking about a topicthat the students have little knowledge of.

The clmm function can specify more complex modelswith multiple independent variables of different types, but this book will notexplore more complex examples.

Post-hoc analysis to determine which groups are differentcan be conducted on each significant main effect and on the interaction effectif it is significant.

Packages used in this chapter

The packages used in this chapter include:

• psych

• FSA

• lattice

• ggplot2

• ordinal

• car

• RVAideMemoire

• emmeans

• multcomp

• rcompanion

The following commands will install these packages if theyare not already installed:

if(!require(psych)){install.packages("psych")}
if(!require(FSA)){install.packages("FSA")}
if(!require(lattice)){install.packages("lattice")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(ordinal)){install.packages("ordinal")}
if(!require(car)){install.packages("car")}
if(!require(RVAideMemoire)){install.packages("RVAideMemoire")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(multcomp)){install.packages("multcomp")}
if(!require(rcompanion)){install.packages("rcompanion")}

Two-way repeated ordinal regression example

Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="

Speaker Student Time Likert
Pooh a 1 3
Pooh b 1 5
Pooh c 1 4
Pooh d 1 4
Pooh e 1 4
Pooh f 1 3
Pooh g 1 4
Pooh h 1 4

Pooh a 2 4
Pooh b 2 5
Pooh c 2 5
Pooh d 2 5
Pooh e 2 5
Pooh f 2 4
Pooh g 2 5
Pooh h 2 5

Piglet i 1 1
Piglet j 1 2
Piglet k 1 1
Piglet l 1 1
Piglet m 1 1
Piglet n 1 2
Piglet o 1 3
Piglet p 1 1

Piglet i 2 2
Piglet j 2 4
Piglet k 2 2
Piglet l 2 1
Piglet m 2 2
Piglet n 2 2
Piglet o 2 4
Piglet p 2 2

Eeyore q 1 4
Eeyore r 1 5
Eeyore s 1 4
Eeyore t 1 4
Eeyore u 1 4
Eeyore v 1 4
Eeyore w 1 4
Eeyore x 1 4

Eeyore q 2 5
Eeyore r 2 4
Eeyore s 2 4
Eeyore t 2 4
Eeyore u 2 3
Eeyore v 2 4
Eeyore w 2 4
Eeyore x 2 4
")

### Change Time into a factor variable

Data$Time = factor(Data$Time)

### Order levels of the factor; otherwise R will alphabetize them

Data$Speaker = factor(Data$Speaker,
levels=unique(Data$Speaker))

### Create a new variable which is the likert scores as an ordered factor

Data$Likert.f = factor(Data$Likert,
ordered = TRUE)

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

Summarize data treating Likert scores as factors

xtabs( ~ Time + Likert.f + Speaker,
data = Data)

Likert.f
Time 1 2 3 4 5
1 0 0 2 5 1
2 0 0 0 2 6

Bar plots by group

Note that these plots do not take into the effect of blockingvariable.

library(lattice)

histogram(~ Likert.f | Speaker,
data=Data,
layout=c(1,3) # columns and rows ofindividual plots
)

library(lattice)

histogram(~ Likert.f | Time,
data=Data,
layout=c(1,2) # columns and rows ofindividual plots
)

library(lattice)

histogram(~ Likert.f | Time + Speaker,
data=Data,
layout=c(2,3) # columns and rows ofindividual plots
)

Summarize data treating Likert scores as numeric

library(FSA)

Summarize(Likert ~ Speaker + Time,
data=Data,
digits=3)

Speaker Time n nvalid mean sd min Q1 median Q3 max percZero
1 Pooh 1 8 8 3.875 0.641 3 3.75 4 4.0 5 0
2 Piglet 1 8 8 1.500 0.756 1 1.00 1 2.0 3 0
3 Eeyore 1 8 8 4.125 0.354 4 4.00 4 4.0 5 0
4 Pooh 2 8 8 4.750 0.463 4 4.75 5 5.0 5 0
5 Piglet 2 8 8 2.375 1.061 1 2.00 2 2.5 4 0
6 Eeyore 2 8 8 4.000 0.535 3 4.00 4 4.0 5 0

Plot Likert scores before and after

Since we are most interested in pairing scores from Time1 to Time 2, we can plot the data in a way that pairs points.

Points above and to left of the blue line indicate cases inwhich the score for Time 2 is greater than that for Time 1.

Note that the data must be ordered by Student and Speakerfor this code to plot the data correctly. That is, the first observation for Time==1must be the same student and speaker as the first observation for Time==2.

Also note that the points on the plot are jittered so thatmultiple points with the same values will be visible.

Time.1 = Data$Likert[Data$Time==1]
Time.2 = Data$Likert[Data$Time==2]
Speaker = Data$Speaker[Data$Time==2]

plot(Time.1, jitter(Time.2),
pch = 16,
cex = 1,
col = Speaker,
xlab="Time 1",
ylab="Time 2")

abline(0,1, col="blue", lwd=2)

legend('bottomright',
legend = unique(Speaker),
col = 1:3,
cex = 1,
pch = 16)

Produce interaction plot with medians and confidenceintervals

The following plot will show the interaction between Speakerand Time. The median for each Speaker x Time combination isplotted, and the confidence intervals for each are shown with error bars. Inthis case some of the medians and confidence limits are the same values, sosome error bars are not visible.

Be aware that confidence intervals determined by bootstrapfor medians may not be appropriate with discrete data, like these Likert-itemdata, or with a small sample size.

The plot makes it easy to see the change in median score foreach speaker from Time 1 to Time 2.

library(rcompanion)

Sum = groupwiseMedian(Likert ~ Speaker + Time,
data = Data,
conf = 0.95,
R = 5000,
percentile = TRUE,
bca = FALSE,
digits = 3)
Sum

library(ggplot2)

pd = position_dodge(.2)

ggplot(Sum, aes(x=Time,
y=Median,
color=Speaker)) +
geom_errorbar(aes(ymin=Percentile.lower,
ymax=Percentile.upper),
width=.2, size=0.7, position=pd) +
geom_point(shape=15, size=4, position=pd) +
theme_bw() +
theme(axis.title = element_text(face = "bold")) +

ylab("Median Likert score")

Two-way repeated ordinal regression

In the model notation in the clmm function, here, Likert.fis the dependent variable and Speaker and Time are theindependent variables. The term Time:Spreaker adds the interactioneffect of these two independent variables to the model. Student is usedas a blocking variable and is entered as a random variable. The data=option indicates the data frame that contains the variables. For the meaningof other options, see ?clmm.

Here the threshold = "equidistant" option isused in order to avoid errors for this data set. This option does not need tobe used routinely.

Optional technical note

In the model below Student is treated as a simplerandom blocking variable, specified by (1|Student). Note that in thedata, students were lettered a–x straight through the Speakers.If the students had been lettered a–h for each Speaker,but there were actually 24 students, we would need to tell the model that Studentwas nested within Speaker, by specifying that the random effect was (1|Speaker/Student),or that the random effect of interest was the interaction of Speaker andStudent, by specifying (1|Speaker:Student). This second specificationis equivalent to that in the code below, with the original data, and willproduce the same results.

I consider it a good practice to assign unique individualswith unique labels. This practice is not always followed, but following thepractice will avoid mis-specifying models. In this case with 24 students, ifthey had been lettered a–h for each Speaker, and theoriginal model with the (1|Student) effect had been used, the resultswould be wrong because the model would treat Pooh’s student a as thesame individual as Piglet’s student a.

Define model and conduct analysis of deviance

library(ordinal)

model = clmm(Likert.f ~ Time + Speaker + Time:Speaker + (1|Student),
data = Data,
threshold = "equidistant")

anova(model, type="II")

Error in eval(predvars, data, env) : object 'Likert.f' not found

### I don’t know why this error occurs

library(car)

library(RVAideMemoire)

Anova.clmm(model,
type = "II")

Analysis of Deviance Table (Type II tests)

LR Chisq Df Pr(>Chisq)
Time 9.774 1 0.00177 **
Speaker 34.485 2 3.249e-08 ***
Time:Speaker 28.202 2 7.517e-07 ***

p-value for model and pseudo R-squared

In order to use the nagelkerke function for a clmmmodel object, a null model has to be specified explicitly. (This is not thecase with a clm model object).

We want the null model to have no fixed effects except foran intercept, indicated with a 1 on the right side of the ~. Butwe also may want to include the random effects, in this case (1 | Student).

We can compare our full model against a model with anoverall intercept and an intercept for each student.

Be sure that the threshold option in model.nullis the same as in the original model.

model.null =clmm(Likert.f ~ 1 + (1 | Student),
data = Data,
threshold = "equidistant")

library(rcompanion)

nagelkerke(fit = model,
null = model.null)

$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.544198
Cox and Snell (ML) 0.787504
Nagelkerke (Cragg and Uhler) 0.836055

$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-5 -37.172 74.344 1.275e-14

Another approach to determining the p-value and pseudoR-squared for a clmm model is comparing the model to a null model withonly an intercept and neither the fixed nor the random effects.

model.null.2= clm(Likert.f ~ 1,
data = Data,
threshold = "equidistant")

library(rcompanion)

nagelkerke(fit = model,
null = model.null.2)

$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.587734
Cox and Snell (ML) 0.842667
Nagelkerke (Cragg and Uhler) 0.880526

$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-6 -44.385 88.771 5.4539e-17

Determine the effect of the random variable

The effect of the random variable (Student, in thiscase) can be determined by comparing the full model to a model with only thefixed effects (Speaker, Time, and Speaker:Time in this case).

First, model.fixed is defined, and then the two models are passed to theanova function.

Be sure that the threshold option in model.fixedis the same as in the original model.

model.fixed = clm(Likert.f ~ Time + Speaker + Time:Speaker,
data = Data,
threshold = "equidistant")

anova(model,
model.fixed)

Likelihood ratio tests of cumulative link models:

no.par AIC logLik LR.stat df Pr(>Chisq)
model.fixed 7 101.197 -43.598
model 8 78.268 -31.134 24.928 1 5.95e-07 ***

Post-hoc tests

For clmm model objects, the emmean, SE, LCL,and UCL values should be ignored, unless specific options in emmeansare selected. See ?emmeans::models for more information.

Because the interaction term in the model was significant,the group separations for the interaction is explored.

library(emmeans)

marginal = emmeans(model,
~ Speaker + Time)
pairs(marginal,
adjust="tukey")

library(multcomp)

cld(marginal, Letters=letters)

Speaker Time emmean SE df asymp.LCL asymp.UCL .group
Piglet 1 -36.08909 0.002735251 NA -36.09629 -36.08190 a
Piglet 2 -17.88982 0.003252687 NA -17.89838 -17.88127 b
Eeyore 2 17.46501 2.047095879 NA 12.07902 22.85100 c
Pooh 1 18.19627 0.002535027 NA 18.18960 18.20294 c
Eeyore 1 19.30751 2.030625019 NA 13.96486 24.65016 c
Pooh 2 36.72566 0.002910484 NA 36.71800 36.73332 d

Confidence level used: 0.95
P value adjustment: tukey method for comparing a family of 6 estimates
significance level used: alpha = 0.05

NOTE: Compact letter displays can be misleading
because they show NON-findings rather than findings.
Consider using 'pairs()', 'pwpp()', or 'pwpm()' instead.

### Groups sharing a letter in .group are notsignificantly different

### Remember to ignore “emmean”, “LCL”, “LCL”, and“UCL” with CLMM

Focusing on meaningful comparisons

Because we wanted to focus on the comparing each speaker at Time1and Time 2, let’s rearrange the emmeans cld table to focuson these comparisons.

With the results in this format, it is easy to see that the scores for bothPooh and Piglet improved from Time 1 to Time 2, but that the scores for poorEeyore did not.

Speaker Time emmean SE df asymp.LCL asymp.UCL .group
Pooh 1 18.19627 0.002535027 NA 18.18960 18.20294 c
Pooh 2 36.72566 0.002910484 NA 36.71800 36.73332 d

Speaker Time emmean SE df asymp.LCL asymp.UCL .group
Piglet 1 -36.08909 0.002735251 NA -36.09629 -36.08190 a
Piglet 2 -17.88982 0.003252687 NA -17.89838 -17.88127 b

Speaker Time emmean SE df asymp.LCL asymp.UCL .group
Eeyore 1 19.30751 2.030625019 NA 13.96486 24.65016 c
Eeyore 2 17.46501 2.047095879 NA 12.07902 22.85100 c

Interaction plot with group separation letters

Group separation letters can be added manually to the plotof medians.

Check model assumptions

At the time of writing, the nominal_test and scale_testfunctions don’t work with clmm model objects, so we will define a similar modelwith clm and no random effects, and test the proportional oddsassumption on this model.

model.clm = clm(Likert.f ~ Time + Speaker + Time:Speaker + Student,
data = Data,
threshold = "equidistant")

Warning message:
(1) Hessian is numerically singular: parameters are not uniquely determined
In addition: Absolute convergence criterion was met, but relative criterion wasnot met.

### The fitting of this model failed. We’ll trya model without student.

model.clm = clm(Likert.f ~ Time + Speaker + Time:Speaker,
data = Data,
threshold = "equidistant")

nominal_test(model.clm)

Tests of nominal effects

formula: Likert.f ~ Time + Speaker + Time:Speaker
Df logLik AIC LRT Pr(>Chi)
<none> -43.598 101.197
Time 1 -43.149 102.298 0.8983 0.34325
Speaker 2 -40.053 98.106 7.0903 0.02886 *
Time:Speaker 5 -39.393 102.786 8.4110 0.13499

### Note the significant result for speaker inthis test.

scale_test(model.clm)

Warning messages:
1: (-1) Model failed to converge with max|grad| = 2.26258e-05 (tol = 1e-06)
In addition: iteration limit reached

### This test failed. It would be possible toadjust the fitting parameters,
### and try to get the model to converge.

Two-way Repeated Ordinal ANOVA with CLMM (2024)

Optional technical note

Define model and conduct analysis of deviance

p-value for model and pseudo R-squared

Determine the effect of the random variable

Focusing on meaningful comparisons

Check model assumptions

References