Part 1: Measurement Invariance

Peter F. Halpin

Overview of Workshop

  • Part 1. Intro + factor analysis + MI
  • Part 2. IRT + DIF
  • Part 3. Robust scaling + DIF + DTF
  • Abbreviations

    • DIF = differential item functioning
    • DTF = differential test functioning
    • IRT = item response theory
    • MI = measurement invariance

Overview of Part 1

  • General definitions of MI and DIF
  • Factor model for categorical data
  • “Levels” of MI defined for categorical data
  • Testing MI by comparing models (chi-square difference tests)
  • Worked example

Organization

MI and DIF in General

  • First goal: Define the main issues

Intuitive definitions (Bauer, 2017)

  • Measurement invariance (MI): An assessment performs the same across different groups of respondents

  • Differential item functioning (DIF): An item performs differently across different groups of respondents

  • Relation:

    • If MI holds, no items exhibit DIF
    • If MI does not hold, at least one item exhibits DIF

Examples (Curley & Schmitt, 1993)

A general framework

  • To discuss MI / DIF in general, represent psychometric models using definition of marginal distribution

\[p(\mathbf{x}) = \int p(\mathbf{x} \mid \eta) \, p(\eta) \, d\eta\]

  • \(\mathbf{x} = [X_1, X_2, \dots, X_J]\): observed variables (assessment items)
  • \(\eta\): latent variable (trait, factor, construct)
  • \(p\): probability distribution (mass, density)

A general framework

  • Different assumptions define different models (e.g, Holland & Rosenbaum, 1986)

\[p(\mathbf{x}) = \int p(\mathbf{x} \mid \eta) \, p(\eta) \, d\eta\]

  • Factor analysis: \(\mathbf{x}\) and \(\eta\) are normally distributed with \(\mathbf{x} = \mathbf{\nu} + \Lambda \eta + \mathbf{\epsilon}\)
  • IRT: \(\mathbf{x}\) multinomial and \(\eta\) is normal
  • ….

A general framework

  • This representation shows that there are two parts in a psychometric model

\[p(\mathbf{x}) = \int {p(\mathbf{x} \mid \eta)} \, {p(\eta)} \, d\eta\]

  • \({\text{The measurement model: } p(\mathbf{x} \mid \eta)}\)
  • \({\text{The population model: } p(\eta) }\)
  • Helpful for understanding MI / DF

The measurement model

  • The conditional distribution \(p(\mathbf{x} \mid \eta)\) relates the observed data to the latent trait

  • Typically assume conditional independence

\[p(\mathbf{x} \mid \eta) = \prod_j p(X_j \mid \eta) \]

  • The correlations among the items are explained by the latent trait only
    • “The items measure the construct”

The population model

  • The distribution of the latent trait \(p(\eta)\) describes how the target construct is distributed

  • Psychometric models require that we set the scale of the latent trait

    • e.g., set \(E[\eta] = 0\) and \(V[\eta] = 1\)
  • This will turn out to be a complicated aspect of MI /DIF

    • Different levels of MI allow for different parameters of the population to be estimated

MI

  • Let \(W\) be any other variable
    • Often gender, race, but could be anything

\[p(\mathbf{x} \mid {W}) = \int \prod_j p(X_j \mid \eta, {W}) \, p(\eta \mid {W}) \, d\eta\]

  • MI: \(p(X_j \mid \eta, {W}) = p(X_j \mid \eta)\) for all \(j\)
  • Measurement model does not depend on \(W\)
  • “The measure is not biased with respect to \(W\)

Implications of MI

  • The marginal model with MI:

\[p(\mathbf{x} \mid {W}) = \int \prod_j p(X_j \mid \eta) \, p(\eta \mid {W}) \, d\eta\]

  • If groups differ in their observed scores, this must be because they differ on the latent trait

\[ p(\mathbf{x} \mid W) \neq p(\mathbf{x}) \rightarrow p(\eta \mid W) \neq p(\eta)\]

DIF

\[p(\mathbf{x} \mid {W}) = \int \prod_j p(X_j \mid \eta, {W}) \, p(\eta \mid {W}) \, d\eta\]

  • DIF: \(p(X_j \mid \eta, {W}) \neq p(X_j \mid \eta)\) for item \(j\)
    • Measurement model does depend on \(W\) for some items
    • This is just the opposite of MI
    • Sometimes called “measurement bias”

Implications of DIF

\[p(\mathbf{x} \mid {W}) = \int \prod_j p(X_j \mid \eta, {W}) \, p(\eta \mid {W}) \, d\eta\]

  • If groups differ in their observed scores, this could be because:
      1. the population model differs over groups
      1. the measurement model differs over groups
      1. both

Why is DIF a problem?

\[p(\mathbf{x} \mid {W}) = \int \prod_j p(X_j \mid \eta, {W}) \, p(\eta \mid {W}) \, d\eta\]

  • For technical reasons, we cannot estimate this model if all items exhibit DIF
    • “Circular nature of DIF”
    • Will discuss more in Part 2
    • For now, just the implications…

Why is DIF a problem?

  • When we compute and report test scores using \(\mathbf{x}\), we are implicitly assuming that items do not exhibit DIF (i.e., are not biased)

  • If this assumption is mistaken:

    • Individuals’ test scores may be biased
    • Estimates of groups differences based on observed test scores may biased
    • Estimates of impact using latent variable models may be biased

What about partial MI?

  • Partial MI means that some but not all items exhibit DIF
    • Keeping biased items can be OK in some research settings
  • But the usual goal of DIF analysis is to remove any items with DIF
    • i.e., the goal is full MI, not partial MI
    • This is still the standard approach in test development – remove items with DIF before reporting scores

What about different models?

  • We have just seen how to define MI / DIF in general

  • However

    • MI was developed in the factor analysis literature
    • DIF was developed in the IRT literature
    • See Thissen (2023) for a historical review

What about different models?

  • Models, tests, conventions, and software for MI and DIF differ due to historical reasons

  • We can approach both MI and DIF using either model, but it is currently easier to go with the traditional distinctions

    • Factor analysis software makes testing MI easy!
    • IRT software makes testing DIF easy!
    • You can switch this up, but it requires (a bit) more work

Broad comparison between models

Feature Factor analysis IRT
Dimensionality of latent trait multidimensional Unidimensional (traditionally!)
Treatment of
categorical data
Latent response variables Item response functions
Model estimation Polychoric correlations (WLS) Maximum likelihood
Model parameterization General, many “extra” parameters Specific, only include parameters used in a given model
Main visualization Path diagram Item response functions

Broad comparison between models

Summary

  • MI / DIF are about the measurement model
    • We want to make sure measurement does not depend on, e.g., a person’s gender
  • Impact is about the population model
    • There may or may not be group differences on the target construct
  • Without (partial) MI, we cannot know if observed differences are due to measurement bias, “true” differences on the target construct, or both

Factor Analysis for Categorical Data

  • Focus on unidimensional models

Factor model

  • For continuous observed variables

\[ X_j = \nu_j + \lambda_j \eta + \epsilon_j \]

  • Assumptions
    • \(\eta \sim N(\kappa, \phi) \quad \; \; \,\text {and} \quad \epsilon_j \sim N(0, \theta_j)\)
    • \(\text{cov}[\epsilon_j, \eta] = 0 \quad \text {and} \quad \text{cov}[\epsilon_j, \epsilon_{k}] = 0\)
  • Doesn’t work when \(X_j\) is categorical!

Latent response variables (LRVs)

  • Factor analysis deals with categorical data by introducing a new type of latent variable

  • For each categorical observed variable \(X_j\), we assume there exists a latent response variable \(X^*_j\)

    • Not a variable of substantive interest, just a mathematical convenience!!

    • Illustrations on following slides

LRVs: Why?

  • Benefit: can do factor analysis “as usual” with \(X^*_j\)

    • When life gives you lemons, make lemonade
    • When life gives you categorical data, make continuous data
  • Cost: introduced new variables \(X^*_j\) (and their parameters) that don’t mean anything

    • Will be a bit of nuisance later on

LRVs: How They Work

Figure: Wirth & Edwards, 2007

Two main ideas

  • First idea: thresholding a latent response variable
    • Assumes LRVs are normally distributed
    • Allows us to deal with categorical data
  • Second idea: tetrachoric and polychoric correlations
    • Assumes pairs of LRVs are bivariate normal
    • Allows us to estimate factor model

Thresholding

Thresholding

Thresholding

Tetrachoric correlation

  • Correlation of observed responses: Phi-coefficient
  • Correlation of latent responses: Tetrachoric correlation

Polychoric correlation

  • Correlation of observed items: Spearman, …
  • Correlation of latent responses: Polychoric correlation

Summary

  • LRVs are used to deal with categorical data in factor analysis
  • The correlations between the LRVs are modeled, rather than modeling the categorical data directly
    • These are called tetrachoric and polychoric correlations
  • All this is done for mathematical convenience!
    • LRVs don’t (usually) represent substantive concepts
    • They don’t show up in IRT, which is a main difference between models

Back to the Factor Model …

Factor model for categorical data

  • Step 1: Assume categorical variables \(X_j\) with \(c = 1, \dots, C\) categories arise from thresholding an LRV

\[ X_j = \left\{ \begin{array}{ccc} 1 & if & -\infty < X_j^* \leq \tau_{j1} \\ 2 & if & \tau_{j1} < X_j^* \leq \tau_{j2} \\ ... & & \\ C & if & \tau_{j,C-1} < X_j^* \leq \infty \end{array} \right.\]

  • The parameters \(\tau_j = [\tau_{j1}, \dots, \tau_{j,C-1}]\) are the called the item thresholds

Factor model for categorical data

  • Step 2: Factor model for the LRVs

\[ X^*_j = \nu_j + \lambda_j \eta + \epsilon_j \]

  • Assumptions
    • \(\eta \sim N(\kappa, \phi) \quad \; \; \,\text {and} \quad \epsilon_j \sim N(0, \theta_j)\)
    • \(\text{cov}[\epsilon_j, \eta] = 0 \quad \text {and} \quad \text{cov}[\epsilon_j, \epsilon_{k}] = 0\)
  • Same as continuous model, but now for the LRVs

Model identification

  • Model identification for categorical data is complicated

  • LRVs introduce a lot of parameters we cannot actually estimate

  • Short version: In the single-group case, the only parameters we can estimate are

    • the factor loadings \(\lambda_j\)
    • the thresholds \(\tau_j\)

Model identification

  • It gets more complicated when testing for MI

  • We can estimate \(some\) of the excluded parameters, but different authors use different approaches

  • So, in order to be prepared for MI, its helps to go through the long version of this problem for the single group case …

Model identification

  • Standardize the latent trait as usual: \(\eta \sim N(0, 1)\)
    • e.g., set \(\kappa = 0\) and \(\phi = 1\)
    • Can fix one of the intercepts and factor loadings to 1 instead
  • For the LRVs, we must also set their scale, and there are two ways to do this
    • “Delta parameterization” - recommended for interpretation, default in lavaan
    • “Theta parameterization” - can simplify estimation, wont’ discuss much

Delta parameterization

  • Standardize the LRVs as \(X_j^* \sim N(0, 1)\)
    • i.e., set \(\mu_j = 0\) and \(\sigma^2_j = 1\)
    • “Delta” is defined as \(\Delta = 1 / \sigma_j\), so equivalent to setting \(\Delta = 1\)
  • Equivalent to standardizing continuous data
    • Factor loadings can be interpreted as correlations
    • Thresholds can be interpreted as z-scores

Implications of Delta parameterization

  • Setting \(\mu_j = 0\) implies the intercepts are also zero:

\[\mu_j = 0 = \nu_j + \lambda_j \kappa = \nu_j + \lambda_j (0) \]

  • So, $_j = 0 $

  • Implication: the intercepts of LRVs \(\nu_j\) cannot be estimated (fixed to zero)

Implications of Delta parameterization

  • Setting \(\sigma^2_j = 1\) implies the value of the residual variances

\[\sigma^2_j = 1 = \lambda_j^2 \phi + \theta_j = \lambda_j^2 (1) + \theta_j\]

  • So \(\theta_j = 1 - \lambda_j^2\)

  • Implication: the residual variance of the factor model cannot be estimated (fixed to \(1 - \lambda_j^2\))

Estimation in lavaan: Code

library(lavaan)
dat <- read.csv("cint_data.csv")

# Model syntax
mod1 <- 'depression =~ cint1 + cint2 + cint4 + cint11 + 
                       cint27 + cint28 + cint29 + cint30'

# Fit model
fit.delta <- cfa(mod1, 
                 data = dat, 
                 std.lv = T,  # standardize latent variable
                 ordered = T) # data are ordered
                 
# Print model summary
summary(fit.delta)

Estimation in lavaan: Output

Estimation in lavaan: Output

Estimation in lavaan: Output

Summary of Delta parameterization

  • The only parameters we estimate are factor loadings \(\lambda_j\) and the thresholds \(\tau_{jc}\)

  • The latent trait is standardized: \(\eta \sim N(0, 1)\),

  • The LRVs are standardized: \(X_j^* \sim N(0, 1)\)

    • Same as setting \(\Delta_j = 1/ \sigma_j = 1\)
  • The intercepts are fixed, \(\nu_j = 0\)

  • The residual variances are fixed, \(\theta_j = 1 - \lambda_j^2\)

    • In some MI models, can estimate \(\Delta_j\), but \(\theta\) is still fixed!

Theta parameterization

  • Instead of setting \(\sigma_j = 1\), set the residual variance \(\theta_j = 1\)

  • Implies variance of \(X_j^*\) is fixed to \(\sigma^2_j = \lambda_j^2 + 1\)

  • So the Delta parameter is fixed to

\[ \Delta_j = 1 / \sigma_k = 1 / \sqrt{\lambda^2 + 1}\]

  • Model interpretation is more complicated since \(\sigma^2_j \neq 1\)
  • See coding notes for example

Summary

  • Factor model for categorical data uses LRVs

    • Step 1. Represent categorical data using LRV
    • Step 2. Factor analyze LRVs
  • Convenient trick! But introduces many parameters that we can’t estimate

  • The only parameters we can estimate are the factor loadings and thresholds

  • In Delta, these are easy to interpret!

  • (In Theta, not easy to interpret)

Measurement Invariance

Recap of MI

  • We want our measurement model to be the same across groups

  • This will ensure that any group differences on the observed data \(\bf x\) are due only to difference on the target construct \(\eta\) (i.e., impact)

  • Important for ensuring unbiased comparisons between groups

Measurement model parameters

  • For groups \(g = 1, 2, ..., G\)

  • The factor loadings, \(\lambda_{jg}\)

  • The item thresholds, \(\tau_{jg} = [\tau_{j1g}, \tau_{j2g}, \dots, \tau_{jCg}]\)

  • We want to test if these are equal over groups:

\[\lambda_{j1} = \lambda_{j1} = \dots = \lambda_{jG} \]

\[\tau_{jc1} = \tau_{jc2} = \dots = \tau_{jcG} \]

Population model parameters

  • In a single group, we had to standardize \(\eta \sim N(0, 1)\) to estimate the model

  • In multiple groups, this approach is problematic

  • e.g., if we set the mean of the factor to be 0 in each group:

\[\kappa_1 = \kappa_2 = \dots \kappa_G = 0\]

  • We are asserting that all groups have the same mean on the latent trait – this is not an “arbitrary” constraint on the model!

Population model parameters

  • MI allows us to estimate the population model parameters (see Muthen & Asparouhov 2002, Millsap & Yun-Tien, 2004)

  • In fact, the goal of MI can be interpreted in terms of placing sufficient constraints on the model to estimate impact

    • More on this soon when we talk about “levels” of MI
  • Even with MI, still need to standardize \(\eta \sim N(0,1)\) in one group, called the reference group

Nuisance parameters

  • What about the LRV parameters?
    • \(X_j^* \sim N(\mu_j, \sigma^2_j)\)
    • The intercepts, \(\nu_j\)
    • The residual variances, \(\theta_j\)
  • These are technically part of the measurement model
  • With MI, we can estimate either \(\sigma^2_j\) (Delta) or residual variance (Theta)
    • Most software will do this by default …

Summary

  • The measurement parameters:
    • The factor loadings, \(\lambda_{jg}\)
    • The item thresholds, \(\tau_{jg} = [\tau_{j2g}, \tau_{j2g}, \dots, \tau_{jCg}]\)
  • The population parameters:
    • \(\eta \sim N(\kappa_g, \phi_g)\)
  • The nuisance (LRV) parameters:
    • \(X_j^* \sim N(\mu_j, \sigma^2_j)\); intercepts: \(\nu_j\); residual variances: \(\theta_j\)

Levels of Measurement Invariance

  • configural, weak, metric, scalar, strong, strict, …

There a lots of versions…

Table: Thissen, 2023

Summary of levels: Configural invariance

  • Measurement model: Same factor pattern over groups (which items go with which factors)
  • Population model: Not sufficient to estimate impact on any parameter
  • Not usually interpreted, but is a basis for testing other models

Summary of levels: Weak / metric invariance

  • Measurement model: All factor loadings are equal over groups
  • Population model: Sufficient to estimate impact on factor (co-) variances
  • Can serve as basis for multi-group Structural equation modeling (without mean structure)

Summary of levels: Strong / scalar invariance

  • Measurement model: All factor loadings and thresholds are equal over groups
  • Population model: Sufficient to estimate impact on factor (co-) variances and means
  • Considered acceptable for comparing groups on observed test scores

Summary of levels: Strict invariance

  • Measurement model: All factor loadings,thresholds, and residual variances are equal over groups
  • Population model: Sufficient to estimate impact on factor (co-) variances and means
  • Ensures test scores are equally reliable in both groups
  • Note: some issues distinguishing strong and strict MI with categorical data (we will see this soon)

The configural model: Recap

  • Measurement model: Same factor pattern over groups (which items go with which factors)
  • Population model: Not sufficient to estimate impact on any parameter
  • Not usually interpreted, but is a basis for testing other models

Configural model: Code

# Model (same as above)
mod1 <- ' depression =~ cint1 + cint2 + cint4 + cint11 + 
                        cint27 + cint28 + cint29 + cint30'
# Fit model
fit.config <- cfa(mod1, 
                  data = dat, 
                  std.lv = T,  
                  ordered = T, 
                  group = "cfemale") # <--- new 
                  
# Print model summary
summary(fit.config)

Configural model: Output

Weak / metric invariance: Recap

  • Measurement model: All factor loadings are equal over groups
  • Population model: Sufficient to estimate impact on factor (co-) variances
  • This model is more exciting in multidimensional settings, when we are interested in the covariance matrix of factors, not just the variance of a single factor

Weak / metric invariance: Code

# Fit model
fit.weak <- cfa(mod1, 
                  data = dat, 
                  std.lv = T,  
                  ordered = T, 
                  group = "cfemale",
                  group.equal = "loadings") # <--- new 
                  
# Print model summary
summary(fit.weak)

Weak / metric invariance: Output

Comparing the models

  • Nested CFA models can be compared using their chi-square statistics (e.g., Satorra & Bentler, 2001)

  • Two models are nested if one can be obtained from the other by setting some parameters to fixed values

  • Let “A” denote the larger model and “B” denote the smaller model

  • Define: \(\chi^2_\text{DIFF} = \chi^2_\text{B} - \chi^2_\text{A} \quad \text{and} \quad df_\text{DIFF} = df_\text{B} - df_\text{A}\)

  • Then \(\chi^2_\text{DIFF}\) has central chi-square distribution with \(df_\text{DIFF}\), when constrained model is true

Comparing the models: Code

lavTestLRT(fit.config, fit.weak)

Scaled Chi-Squared Difference Test (method = "satorra.2000")

lavaan NOTE:
    The "Chisq" column contains standard test statistics, not the
    robust test that should be reported per model. A robust difference
    test is a function of two standard (not robust) statistics.
 
           Df AIC BIC  Chisq Chisq diff Df diff Pr(>Chisq)  
fit.config 40         37.534                                
fit.weak   47         57.126     12.091       7    0.09761 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Do not reject weak invariance using \(\alpha = .05\)
  • 8 factor loadings constrained, estimated 1 variance for the latent trait, so \(df = 8 - 1 = 7\)

Summary of example

  • Weak invariance with respect to gender was satisfied
    • Can compare variance of latent trait over groups
  • In the example, depression was slightly more variable for females
    • Males: Est = 1.000; SE = NA
    • Females: Est = 1.263; SE = 0.120
  • To test homogeneity of variance, see coding examples

Strong / scalar invariance: Recap

  • Measurement model: All factor loadings and thresholds are equal over groups

  • Population model: Sufficient to estimate impact on factor (co-) variances and means

  • Considered acceptable for comparing groups on observed test scores

  • Wrinkle with categorical data, can also estimate variance of LRVs with strong invariance

    • Most software will do this by default

Strong / scalar invariance: Code

# Fit model
fit.strong <- cfa(mod1, 
                  data = dat, 
                  std.lv = T,  
                  ordered = T, 
                  group = "cfemale",
                  group.equal = c("loadings", "thresholds")) # <--- new 
                  
# Print model summary
summary(fit.strong)

Strong / scalar invariance: Output

Comparing models

lavTestLRT(fit.config, fit.weak, fit.strong)

Scaled Chi-Squared Difference Test (method = "satorra.2000")

lavaan NOTE:
    The "Chisq" column contains standard test statistics, not the
    robust test that should be reported per model. A robust difference
    test is a function of two standard (not robust) statistics.
 
           Df AIC BIC   Chisq Chisq diff Df diff Pr(>Chisq)    
fit.config 40          37.534                                  
fit.weak   47          57.126     12.091       7    0.09761 .  
fit.strong 62         112.075     63.910      15    5.3e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Reject strong invariance using \(\alpha = .05\)
  • \(8 \times 3\) thresholds constrained, but estimated mean of the latent trait and 8 \(\Delta\) parameters, so \(df = 24 - 1 - 8 = 15\)

Summary of example

  • Strong invariance with respect to gender was not satisfied

  • In the example, depression was much higher on average for females

    • Males: Est = 0.000; SE = NA
    • Females: Est = .450; SE = 0.086
  • .45 SD difference between groups (SD \(= 1\) for males)

  • But, we do not know if this is due to measurement bias, impact, or both (because the model was rejected)

Next Steps

Summary

  • We have seen how to test MI using factor analysis for categorical data

  • In our example, we found that the CINT assessment satisfied metric but not scalar invariance

    • Implication – comparisons over gender may reflect measurement bias, impact, or both
  • Next, we consider how to find items that exhibit DIF

    • Removing these items from the assessment will ensure mean comparisons on the CINT are unbiased and fair with respect to gender!

What we have done so far

  • MI and DIF in general
  • Factor analysis with categorical data
  • Testing MI using factor analysis
    • Configural, weak, strong
    • See Appendix and coding example for strict invariance
  • Illustrated methods using an example

What we will do next

  • Switch perspectives to IRT
  • IRT with binary and categorical data
  • Testing DIF using IRT
  • Illustrate methods using an example

References

Bauer, D. J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22(3), 507–526.

Curley, W. E., & Schmitt, A. P. (1993). Revising Sat®-Verbal Items to Eliminate Differential Item Functioning. ETS Research Report Series, 1993(2), i–18.

Holland, P. W. & Rosenbaum. P. R. (1986). Conditional Association and Unidimensionality in Monotone Latent Variable Models. The Annals of Statistics, 14(4), 1523–1543.

Millsap, R. E., & Yun-Tein, J. (2004). Assessing Factorial Invariance in Ordered-Categorical Measures. Multivariate Behavioral Research, 39(3), 479–515.

Muthen, B., & Asparouhov, T. (2002). Latent Variable Analysis With Categorical Outcomes: Multiple-Group And Growth Modeling In Mplus.

Satorra, A., & Bentler, P. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514.

Wu, H., & Estabrook, R. (2016). Identification of Confirmatory Factor Analysis Models of Different Levels of Invariance for Ordered Categorical Outcomes. Psychometrika, 81(4), 1014–1045.

Appendix

Strict invariance: Recap

  • Measurement model: All factor loadings,thresholds, and residual variances are equal over groups
  • Population model: Sufficient to estimate impact on factor (co-) variances and means
  • Ensures test scores are equally reliable in both groups

Strict vs strong invariance

  • In factor analysis for continuous data, strict invariance is rarely tested

    • It is unnecessary for estimating impact or comparing groups on observed scored
  • With categorical data: Should variances (“Deltas”) of LRVs be treated as real parameters?

    • I don’t think so; see supplementary material for other opinions
  • If we want to ignore LRV variance, then should use strict rather than strong MI

  • easier to do with Theta parameterization, requires some new code in lavaan…

  • In our example, it won’t make a difference since we already rejected strong invariance

Strict invariance: Code

# Model syntax to constrain Delta = 1 in both group
mod.strict <- 
  'depression =~ cint1 + cint2 + cint4 + cint11 + 
                 cint27 + cint28 + cint29 + cint30
                 
  cint1 ~*~ c(1, 1)*cint1
  cint2 ~*~ c(1, 1)*cint2
  cint4 ~*~ c(1, 1)*cint4
  cint11 ~*~ c(1, 1)*cint11
  cint27 ~*~ c(1, 1)*cint27
  cint28 ~*~ c(1, 1)*cint28
  cint29 ~*~ c(1, 1)*cint29
  cint30 ~*~ c(1, 1)*cint30'

fit.strict <- cfa(mod.strict, # <-- new
                  data = dat, 
                  std.lv = T,  
                  ordered = T, 
                  group = "cfemale",
                  group.equal = c("loadings", "thresholds")) 

summary(fit.strict)

Strict invariance: Code

lavTestLRT(fit.config, fit.weak, fit.strict)

Scaled Chi-Squared Difference Test (method = "satorra.2000")

lavaan NOTE:
    The "Chisq" column contains standard test statistics, not the
    robust test that should be reported per model. A robust difference
    test is a function of two standard (not robust) statistics.
 
           Df AIC BIC   Chisq Chisq diff Df diff Pr(>Chisq)    
fit.config 40          37.534                                  
fit.weak   47          57.126     12.091       7    0.09761 .  
fit.strict 70         121.781     73.542      23  3.411e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Note: \(8 \times 3\) thresholds constrained, but estimated mean of the latent trait so \(df = 24 - 1 = 23\)

  • I think this is the correct \(df\) for this comparison