Part 2: Differential Item Functioning

Peter F. Halpin

Overview of Workshop

  • Part 1. Intro + factor analysis + MI
  • Part 2. IRT + DIF \({\color{green}\leftarrow}\)
  • Part 3. Robust scaling + DIF + DTF

Overview of Part 2

  • Review the goals of DIF analysis
  • IRT binary data (2PL)
  • IRT for ordered categorical data (GRM)
    • GRM is the IRT analogue of categorical factor analysis
  • Testing DIF using the likelihood ratio test
    • Selecting anchor items
  • Worked example

Organization

Goals of DIF analysis

  • Item-by-item quality assurance

Recap

  • Psychometric models posit two non-mutually exclusive explanations of why the distribution of test scores may differ over groups of respondents

    1. Impact: the groups differ on the trait being measured

    2. DIF: the measure is biased with respect to group membership

  • The goal of DIF analysis is to detect biased items without making assumptions about impact

DIF as a follow up to MI?

  • In factor analysis framework
    • MI: testing the hypothesis of “no DIF” over all items
    • DIF: procedures for following-up rejection of MI
      • Also called partial MI
      • see semTools::partialInvariance()

DIF as a follow up to MI?

  • In IRT framework
    • The goal of DIF is to make item-level decisions (keep, revise, omit)
    • Levels of MI don’t translate directly into item level decisions
    • So DIF analysts usually skip MI (see Thissen, 2023)

Big picture

  • Generic measurement model: Weak / metric MI

Big picture

  • Generic measurement model: Strong / scalar MI

Big picture

  • Generic measurement model: Strict MI

Big picture

  • Generic measurement model: DIF proceeds item-by-item

Big picture

  • Generic measurement model: DIF proceeds item-by-item

Big picture

  • Generic measurement model: DIF proceeds item-by-item

Summary

  • MI and DIF are about the same issue
  • But they approach the issue differently
    • MI: invariance of parameters over groups
    • DIF: invariance of items over groups
  • Conceptually, could be combined into one “grand theory”
  • In practice, different models, different software, different research settings, different traditions …

IRT

  • Binary data, then ordered categorical data

IRT

  • There are many IRT models
  • We will focus on two models analogous to factor analysis for categorical data
  • In terms of DIF analysis, same principals apply regardless of which model is used

Note on terminology

  • IRT was developed in the context of educational testing
  • Much of the model terminology reflects this context
    • e.g., we talk about the difficulty of items, the ability of respondents
  • Different terminology is more suitable in other settings
    • e.g, the severity of symptoms, the depression of respondents
  • Common to use different terminology depending on the context

Note on math symbols

  • In IRT, usually the letter “\(\theta\)” denotes the latent trait (ability)

    • I will use this terminology to be consistent with IRT software
  • Like factor analysis, we usually assume \(\theta \sim N(0, 1)\) when considering a single group or population

    • Will address how to estimate impact later

The 2-parameter logistic (2PL) model

  • The 2PL is applicable to binary item responses, e.g.,
    • correct / incorrect
    • yes / no
    • endorsed / not endorsed
  • Also a building block for models with > 2 ordered response categories

Item response functions (IRFs)

  • It is customary to present IRT models in terms of the measurement model for each item: \(p(X_j \mid \theta)\)

  • For binary data:

\[ P_j(\theta) = \text{Prob}(X_j = 1 \mid \theta)\]

  • This is called the item response function (IRF)

  • It describes how the probability of endorsing an item depends on the level of the trait being measured

The 2PL IRF

\[\begin{equation} P_j(\theta) = \frac{\exp(a_j (\theta - b_j))} {1 + \exp(a_j (\theta - b_j)} \end{equation}\]

  • This is just the logistic function from logistic regression
  • \(a_j\) is called the item discrimination parameter
    • Slope of the logistic regression, similar to factor loadings
  • \(b_j\) is called the item difficulty parameter
    • Re-scaled intercept

2PL IRF examples

Interpretation of model parameters

  • The parameter \(b_j\) is called the item difficulty

  • Note that \(\theta = b_j\) implies

\[ P_j(b_j) = \frac{\exp(a_j (0))} {1 + \exp(a_j (0))} = 1/2 \]

  • So, item difficulty is the value of \(\theta\) at which the probability of endorsing the item is equal to 1/2

  • Respondents with ability above the difficulty level of the item have probability > 1/2 of answering the item correctly, and conversely

Interpretation of model parameters

  • Difficulty is the level of the trait (\(\theta\)) required to have probability ≥ 1/2 of endorsing the item

Interpretation of model parameters

  • The parameter \(a_j > 0\) is called the item discrimination

  • Items with higher discrimination have steeper slopes, stronger relationship to latent trait:

\[ \frac{\partial}{\partial \theta} P_j(\theta) = a_j P_j(\theta) Q_j(\theta) \]

  • Rate of change of IRF is higher for more discriminating items

Interpretation of model parameters

  • If we again plug-in \(\theta = b_j\), we have

\[ \frac{\partial}{\partial \theta} P_j(b_j) = a_j /4 \]

  • For values “close to” the difficulty parameter, the discrimination parameter is proportional to the slope

Interpretation of model parameters

  • Discrimination is (proportional to) the slope of the curve where it intersects \(P(\theta) = 1/2\).

Summary

  • The 2PL IRT model:
    • Is used to model the probability of endorsing a binary item
    • The difficulty parameter describes the level of the target construct at which probability of endorsement = 1/2
    • The discrimination parameter describes how strongly each item is related to the target construct

Other IRT Concepts

  • Not required for DIF analysis but relevant to understand the theory

Information in IRT

  • Let \(\hat \theta\) denote the maximum likelihood estimate (MLE) of \(\theta\)

  • In practical terms, \(\hat \theta\) is the “best” estimate of the trait we can obtain from an assessment

  • The standard error of the MLE, \(SE(\theta)\) describes how precisely we can estimate \(\theta\)

  • Information defined as the precision of the MLE, \(1/SE(\theta)^2\)

    • Not easy to interpret, but larger values mean more precise estimates

Information in IRT

  • One of the main contributions of IRT is to model how information depends on the parameters of test items

  • This provides a good theory for test development! It tells us how to build tests with a desired level of precision / information

  • We address information here for completeness but it is not required for DIF analysis

Item information function (IIF)

  • The item information function (IIF) is precision that results when estimating the latent trait using a single item

  • In practice, we would never use only a single item on a test

  • But, we can build up the information function of the entire test from that of each individual item

  • So, we start with the IIF and then use that to get the test information function (TIF)

Item information function (IIF)

For the 2PL, the IIF is:

\[\begin{equation} I_j(\theta) = a_j^2 P_j(\theta) Q_j(\theta) \end{equation}\]

  • Info increases with \(a_j\)
    • Items that are more strongly related to the latent trait (more discriminating) provide more information about the trait
  • The maximum of the IIF for each item occurs when \(\theta = b_j\)
    • Each item provides the most information at its difficulty level

IIFs

  • The location of the peak is \(b_j\); the height of the peak is \(a_j^2/4\)

Test information function (TIF)

The TIF is obtained by summing the information functions of all of the items on a test

\[ I(\theta) = \sum_{j = 1}^{J} I_j(\theta) \]

  • This result follows directly from the conditional independence assumption

TIF

  • Useful for comparing different tests, but values not easily interpreted…

Reliability

  • The TIF can be converted into a reliability function for the total score (see Nicewander, 2018):

\[R(\theta) = \frac{I(\theta)}{1 + I(\theta)}\]

  • Averaging this function over values of \(\theta\) provides a “marginal” reliability coefficient
    • Interpreted as the proportion of variability in the total score that is associated with the trait (like Cronbach alpha)

Reliability

Marginal reliability of the total score:

[1] 0.8148199

Summary

  • A central concept in IRT is (Fisher) information, which is the precision with which the target construct can be estimated

  • Item information functions describe the information provided by each item

  • Test information is the sum of the items’ information

  • Reliability of the total score can be derived from the test information

  • Information is useful for comparing different items / tests but reliability is easier to interpret

The Graded Response Model (GRM)

Ordered categorical data

  • Let the item response \(X_j\) take on values \(c \in {1, \dots C}\), where \(C\) is the number of categories for the item

    • e.g., if an item has 4 possible response categories, \(C = 4\) and \(c = 1, \dots 4\) are the 4 response categories.
  • It doesn’t matter what we label the categories as long as they are ordered

  • For CINT: never (0), rarely (1), sometimes (2), or almost always (3)

Item response functions

The cumulative response function is the probability of endorsing category \(c\) or higher, conditional on \(\theta\)

\[ P_{jc}(\theta) = \text{Prob} (X_j \geq c \mid \theta) \]

  • By definition, \(P_{j1}(\theta) = 1\), \(P_{j,C+1}(\theta) = 0\), and

\[\text{Prob} (X_j = c \mid \theta) = P_{jc}(\theta) - P_{j, c+1}(\theta)\]

  • \(\text{Prob} (X_j = c \mid \theta)\) are called the item category response functions (ICRFs)

GRM

  • GRM assumes a 2PL model for the cumulative response functions

\[ P_{jc}(\theta) = \frac{\exp(a_j (\theta - b_{jc}))} {1 + \exp(a_j (\theta - b_{jc}))} \]

  • Each item has only one discrimination (proportional odds assumption)

  • Each response category has its own difficulty parameter, now called a “threshold” parameter

GRM

  • From the cumulative response function, derive the ICRFs

\[\begin{align} \text{Prob} (X_j = 1 \mid \theta) & = 1 - P_{j1}(\theta) \\ \text{Prob} (X_j = 2 \mid \theta) & = P_{j1}(\theta) - P_{j2}(\theta) \\ ... \\ \text{Prob} (X_j = C \mid \theta) & = P_{jC}(\theta) - 0 \end{align}\]

  • These give the probability of endorsing each category

  • Note that this reduces to 2PL for \(C = 2\)

GRM: Example

library(mirt)

# Load data and separate depression items
cint <- read.csv("cint_data.csv")
depression_names <- c("cint1", "cint2", "cint4", "cint11", 
                      "cint27", "cint28", "cint29", "cint30")
depression_items <- cint[, depression_names]

# Run GRM model
grm <- mirt(depression_items, 
            itemtype = "graded")

# per item plots 
itemplot(grm,  
         item = 1, 
         type = "threshold",  
         main = "Cumulative response functions")

itemplot(grm, 
         item = 1, 
         type = "trace", 
         main = "Category response functions")

# Plotting all test items 
plot(grm, 
     type = "itemscore",  
     main = "Expected item score functions", 
     facet = F)

GRM: Example

  • Plots for CINT.1 (“Feels sad or depressed”)

  • The cumulative response functions (top) are not usually reported, shown here to illustrate the 2PL assumption

  • The ICRFs (bottom) are usually reported, show probability of endorsing each category

  • Values of item thresholds are shown by dashed vertical lines

    • “difficulties” in the cumulative response functions
    • “thresholds” or category boundary in ICRF

GRM: Example

  • Plotting the expected item score is a way of simplifying presentation of the entire assessment

  • It shows how the expected response (0-3) depends on the measured trait

  • Computed as \(\sum_c c\times \text{Prob} (X_j = c \mid \theta)\)

  • From left to right, items that were “easier” in the sense that higher item scores were expected for lower values of depression

GRM: Example

coef(grm, IRTpars = T, simplify = T)
$items
           a     b1     b2    b3
cint1  1.612 -1.622 -0.011 1.578
cint2  1.286 -1.049  0.306 1.665
cint4  1.129 -1.879 -0.195 1.758
cint11 1.219 -1.688 -0.402 1.608
cint27 1.692 -0.594  0.279 1.404
cint28 1.213 -1.292 -0.001 1.641
cint29 1.351 -0.004  0.889 2.217
cint30 1.155 -0.630  0.435 1.910

$means
F1 
 0 

$cov
   F1
F1  1
  • Usually plots are presented to summarize models, rather than coefficient tables

  • IIFs, TIFs, and reliability are provided in the Appendix

Summary

  • GRM is widely used for ordered categorical responses

  • The cumulative response functions are modeled using 2PL

    • Each item has same discrimination for all categories, proportional odds assumption
    • Each item has different threshold parameter for each category
  • The ICRFs are derived from the cumulative response functions

    • These give the probability of endorsing each category

DIF

Overview

  • The goal of DIF analysis is to detect biased items without making assumptions about impact

  • So, we want to test whether item parameters differ over groups

  • There are lots of ways to do this, but for the example we will focus on the likelihood ratio test for nested models

    • Same approached used for factor analysis
  • If an item exhibits DIF, should be investigated (e.g., revise, omit)

  • Sounds simple enough, but…

DIF is two interrelated problems

  • The more obvious problem: Infer whether item parameters differ as a function of some external variable(s)

  • For illustrative purposes, consider Lord’s (Wald) test for difficulty parameter of 2PL in groups g = \(0, 1\)

\[ z_j = \frac{\hat b_{1j} - \hat b_{0j}} {\sqrt{\text{var}(\hat b_{1j}) + \text{var}(\hat b_{0j})}}\]

  • (We will use likelihood ratio test later, but Lord’s test is simple interpret)

DIF is two interrelated problems

  • The less obvious problem: IRT models are identified only up to an linear transformation of the latent trait

  • This means that the item parameters and latent trait can be linearly transformed without changing the IRFs

  • Let \(\theta^∗ = A\theta + B\), \(b^∗_j = A b_j + B\), and \(a^*_j = a_j/A\):

\[\begin{align} \text{logit} (P_j(\theta)) & = a^*_j(\theta^* - b^*_j) \\ & = \frac{a_j}{A}(A\theta + B - (A b_j + B)) \\ & = a_j(\theta - b_j) \end{align}\]

  • This is the technical reason that we need to set the scale of the latent trait when estimating psychometric models

  • Setting \(\theta \sim N(0, 1)\) implies that \(A = 1\) and \(B = 0\), which solves the problem

Implications of scaling for Lord’s test

  • If we scale the latent trait to have the same mean \(\mu_g\) and variance \(\sigma_g\) in both groups, this has implications for testing model parameters

  • The scale transformations are

\[\begin{equation} \notag \theta^*_1 = \sigma_0 \left(\frac{\theta_1 - \mu_1}{\sigma_1}\right) + \mu_0 \quad \text{and} \quad b^*_{1i} = \sigma_0 \left(\frac{b_{1i} - \mu_1}{\sigma_1}\right) + \mu_0 \end{equation}\]

Plugging the rescaled item parameters into Lord’s test

\[\begin{align} \label{dstar}\notag z^*_i = \frac{\hat b^*_{1i} - \hat b_{0i}} {\sqrt{\text{var}(\hat b^*_{1i}) + \text{var}(\hat b_{0i})}} = \frac{\frac{\sigma_0}{\sigma_1} \left(\hat b_{1i} - \mu_1 \right) + \mu_0 - \hat b_{0i}} {\sqrt{\frac{\sigma^2_0}{\sigma^2_1} \text{var}(\hat b_{1i}) + \text{var}(\hat b_{0i})}} \end{align}\]

Conclusion: If there is impact on either the mean of the variance of the the latent trait, Lord’s test is biased

How do we solve the scaling problem?

  • Step 1. Arbitrarily scale the latent trait in the “reference group”
    • Warranted because IRT models for a single group are identified only up to an linear transformation of the latent trait
  • Step 2. Assume that (some of) the item parameters are equal over groups
    • These items are called anchors
    • Suffices to scale the latent trait in the comparison group}
    • e.g., set \(b_{0i} = b_{1i}\) for at least 2 items and solve for \(\mu_1\) and \(\sigma_1\) in previous slide
  • Conclusion: We need to know the items without DIF (anchors) to scale the latent trait

The “circular nature” of DIF

  • The problem just described has been referred to as the circular nature of DIF (Angoff, 198)

    • We want to compare the value of model parameters over groups

    • To do this we must scale the latent trait in both groups

    • To scale the latent trait, we must assume some model parameters are equal over groups

    • But this is what we wanted to test in the first place!

Anchor items

  • In practice, the problem is resolved by choosing an “anchor set” of items

  • Anchors are items that we treat as DIF-free when testing other items for DIF

  • There are many strategies, heuristics, etc. for choosing anchors

  • These are all flawed – anchor item selection is a limitation of traditional methods for DIF analysis

    • More on this in the last part of this workshop

Two-stage purification and refinement

  • One widely used approach to choosing anchors is called purification and refinement

    • Stage 1. Test each item assuming every other item is an anchor. Before starting stage two, remove any item with DIF from the anchor set (“purification”)

    • Stage 2. Test the items without DIF again, using the purified anchor set

  • Can repeat as desired

  • Procedure is exploratory, involves many tests of DIF, fitting models with different restrictions, …

Testing DIF with the Likelihood Ratio (LR) Test

The LR test

The LR test use a multi-group IRT model to test whether the parameters of an item differ over groups

  • This test is applicable to any IRT model
    • Will focus on 2PL for simplicity, but illustration will use GRM
  • This approach is subject concerns about anchor item selection mentioned above
    • We need to choose a set of items that are considered not to have DIF when testing which items do have DIF

LR test for 2PL

Write the 2PL in two groups as follows:

\[\begin{align} \text{Reference group: } & \text{logit} (P_{j0}(\theta)) = a_{j0}(\theta - b_{j0}) \quad \text{ with } \theta \sim N(0, 1) \\ \text{Comparison group: } & \text{logit} (P_{j1}(\theta)) = a_{j1}(\theta - b_{j1}) \quad \text{ with } \theta \sim N(\mu, \sigma) \\ \end{align}\]

  • The second subscript on the IRFs and item parameters indicates the reference group (0) or the comparison group (1)

  • In the reference group, we scale the latent trait arbitrarily

    • Usually, standardized to have \(E(\theta) = 0\) and \(V(\theta) = 1\)
  • In the comparison group we estimate the mean the variance of the latent trait

    • This rationale for this set up was discussed above when addressing the multi-group scaling problem

LR test for 2PL

  • In order to apply the LR test, we estimate the following two models

  • Model 1: The nested (smaller) model is obtained by setting all item parameters equal across groups

\[a_{j0} = a_{j1} = a_{j} \quad \text{ and } \quad b_{j0} = b_{j1} = b_j \quad \text{for all } \quad j = 1 \dots J\]

  • Same as strong invariance in factor analysis

  • Model 2: The nesting (larger) model is obtained by allowing the parameters of the focal item to vary across groups:

\[a_{j0} \neq a_{j1} \quad \text{ and } \quad b_{j0} \neq b_{j1} \quad \text{for the focal item } j^* \]

  • Note we are not requiring the items to be unequal – they may be equal or unequal, and we simply allow them to be estimated freely in each group

  • Software automates the fitting of these item-by-item models

LR test for 2PL

  • The LR test then proceeds by comparing the likelihood of the nested model the to nesting model

  • When the constraints imposed by the nested model are valid (i.e., if there is no DIF on the item), this test has a chi-square distribution with degrees of freedom equal to the number of constrained parameters

  • If the LR test of DIF is significant, we conclude that the item is biased

  • If not, then we conclude that the item is not biased

Strict invariance: Code

  • Step 1. Estimate a model in which item slopes and intercepts are invariant over groups (strong invariance)
# Groups need to be a factor 
gender <- factor(cint$cfemale)

# Invariance constraints used by mirt
strong.invariance <- c("free_mean", "free_var", "slopes", "intercepts")

# Estimate model (can request SE using SE = T)
strong.mod <- multipleGroup(depression_items, 
                            group = gender, 
                            itemtype = "graded",
                            invariance = strong.invariance)

# View output
coef(strong.mod, IRTpars = T, simplify = T)

Testing DIF using the LR test: Output

Testing DIF using the LR test: Code

  • Step 2. Run DIF analysis (without purification)
DIF(strong.mod, 
    which.par = c("a1", "d1", "d2", "d3"), # <- mirt notation
    scheme = "drop")  # <- drop item constraints
       groups converged     AIC   SABIC      HQ    BIC     X2 df     p
cint1     0,1      TRUE   3.113   9.344  10.370 22.047  4.887  4 0.299
cint2     0,1      TRUE   4.692  10.923  11.949 23.626  3.308  4 0.508
cint4     0,1      TRUE   3.169   9.400  10.425 22.102  4.831  4 0.305
cint11    0,1      TRUE   1.948   8.178   9.204 20.881  6.052  4 0.195
cint27    0,1      TRUE   0.629   6.860   7.886 19.563  7.371  4 0.118
cint28    0,1      TRUE   3.090   9.321  10.347 22.024   4.91  4 0.297
cint29    0,1      TRUE -23.411 -17.180 -16.154 -4.477 31.411  4     0
cint30    0,1      TRUE  -9.195  -2.964  -1.939  9.738 17.195  4 0.002

Testing DIF using the LR test: Code

  • Step 2. Run DIF analysis (with purification)
DIF(strong.mod, 
    which.par = c("a1", "d1", "d2", "d3"), 
    scheme = "drop_sequential", #<- different scheme
    seq_stat = .05,  # <- Type I Error rate for DIF
    max_run = 2) # <- two stages only

Checking for DIF in 6 more items
Computing final DIF estimates...
       groups converged     AIC   SABIC      HQ    BIC     X2 df     p
cint29    0,1      TRUE -18.863 -12.632 -11.606  0.071 26.863  4     0
cint30    0,1      TRUE  -4.647   1.584   2.610 14.286 12.647  4 0.013

Summary of example

  • DIF analysis identified two items that were biased with respect to gender

    • CINT 29: “Go to their room and cry”
    • CINT 30: “Feel restless and walk around”
  • More questions:

    • Direction and size of the effect?
    • Does DIF affect conclusions about impact?
  • One way to investigate these questions:

    • Fit a model that allows items with DIF to vary over groups

Follow-up analyses: Code

  • Fit the partial invariance model
# Invariance constraints
partial.invariance <- c("free_mean", "free_var", 
                        "cint1", "cint2", "cint4", "cint27", "cint28")

# Estimate model
partial.mod <- multipleGroup(depression_items, 
                             group = gender, 
                             itemtype = "graded",
                             invariance = partial.invariance)


# Plot IRFs of biased items
itemplot(partial.mod, type = "score", item = "cint29", main = "CINT 29")
itemplot(partial.mod, type = "score", item = "cint29", main = "CINT 30")

# Examine parameter estimates
coef(partial.mod, IRTpars = T, simplify = T)

Follow-up analyses: Output

  • On both items, females were expected to report higher scores that males, even if they had the same level of depression

Follow-up analyses: Output

Summary of example

  • DIF analysis identified two items that were biased with respect to gender

    • CINT 29: Go to their room and cry
    • CINT 30: Feel restless and walk around
  • Females were expected to report higher scores that males, even if they had the same level of depression

  • Gender differences on depression changed when items with DIF were allow to vary over groups (partial invariance)

    • Mean differences reduced about .1 SD
    • Variance in females scores reduced as well
  • A limitation of current methods is that we cannot directly test whether DIF affects conclusions about impact

    • More on this topic in part 3

Next Steps

Summary

  • We have seen how to test for DIF using IRT models (2PL GRM)

  • In our example, we found two items biased with respect to gender

    • These items could be considered for revision or removal
  • We discussed two limitations of DIF analysis

    • Choice of anchor items
    • No direct test of whether DIF affects impact

What we will do next

  • New procedures for addressing DIF and DTF

  • Do not require selection of anchor items

  • Guaranteed to work if < 50% of items exhibit DIF

    • Diagnostics available for greater proportions of items
  • Can be used to test for whether DIF affects impact (without having to first test for DIF in each item!)

  • Easy to implement

References

Angoff, W. (1982). Use of difficulty and discrimination indices for detecting item bias. In R. Berk (Ed.), Handbook of Methods for Detecting Test Bias (pp. 96–116). The Johns Hopkins Press.

Nicewander, W. A. (2018). Conditional reliability coefficients for test scores. Psychological Methods, 23(2), 351–362. https://doi.org/10.1037/met0000132

Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 67–113). Lawrence Erlbaum Associates.

Appendix

Example: IIFs

plot(grm, type = "infotrace", theta_lim = c(-3,3), lwd = 2)

Example: TIFs

plot(grm, type = "info", theta_lim = c(-3,3), lwd = 2)

Example: Reliabilty

plot(grm, type = "rxx", theta_lim = c(-3,3), lwd = 2)
marginal_rxx(grm)
[1] 0.7944443

Example: Item fit

itemfit(grm)
    item   S_X2 df.S_X2 RMSEA.S_X2 p.S_X2
1  cint1 37.080      39      0.000  0.558
2  cint2 51.364      44      0.014  0.207
3  cint4 40.499      43      0.000  0.580
4 cint11 59.168      43      0.021  0.051
5 cint27 81.114      40      0.035  0.000
6 cint28 41.579      43      0.000  0.533
7 cint29 55.568      43      0.019  0.095
8 cint30 39.844      46      0.000  0.727