robustDIF Technical Notes
Peter F Halpin
2026-04-23
technical-notes.RmdThese notes outline an updated version of the robust DIF procedure
described in Halpin (2022) that is now implemented by the
robustDIF package.
Assume two groups of respondents with sample sizes and and let . Also let denote item-level statistics derived from the parameter estimates of items . The asymptotic arguments presented below assume that and go to infinity at the same rate but that the number of items remains fixed and finite.
The are chosen so that under the null hypothesis of no DIF on any item. In this set-up, DIF means that converges in probability to a fixed value other than . Some specific choices of are detailed in Halpin (2024, 2025) and Halpin & Gilbert (2025). The notation is used to denote the finite sample variance of , and similarly for other statistics.
The robust DIF procedure can be seen as solving two interrelated problems. First, it provides an M-estimator of that is highly robust to DIF. Second, it provides a procedure for flagging items with DIF, which happens automatically as a by-product of estimating . Standard Wald tests of DIF are also available.
Halpin (2022) used the (unstated) assumption that is diagonal and attempted to combine efficiency and robustness in a way that did not clearly separate these two opposing considerations. These notes address these shortcomings and implement a new, simpler and more general, version of the robust DIF procedure. The analytical details are stated so that they apply to any bounded, redescending loss function, although for computation the focus is on Tukey’s bisquare. Differences with Halpin (2022) are pointed out as they arise.
The Updated Robust DIF Procedure
Defining the Estimator
The robust estimator can be defined in three related ways. Let where are item-specific scaling factors to be chosen subsequently. The three definitions of are as follows.
The minimizing argument of the loss function: This definition is useful for deriving results about the robustness of . For redescending loss functions there exists constants and such that whenever . It is usual to scale so that . The constant is treated as a tuning parameter that serves to identify outliers (i.e., items with DIF).
The solution to the estimating equation: where . The influence function is important for obtaining the variance of .
A weighted mean that is obtained by defining the weights and substituting these into the estimating equation to get: By convention, to avoid division by zero. Also note that, when , , so that outliers (as defined by ) are “redescended” to zero. The weighted mean is useful for computation via iteratively re-weighted least squares (IRLS). Especially for redescending loss functions, IRLS can be much more stable than Newton-based methods that solve (because can approach zero, leading the Newton steps to diverge).
Choosing the Scaling Factors
The scaling factors are required to ensure that is equivariant under re-scaling of the . In conventional applications, the are “raw” data points and the scaling factors are chosen to be an ancillary estimate of the scale of the (e.g., the median absolute deviation or MAD). In this situation, the scaling factors are constant over , so they factor out of the estimating equation and cancel out in the numerator and denominator of the normalized weights .
In the present application, item-specific scaling factors are available because we can derive (the asymptotic covariance matrix of the under the null hypothesis of no DIF) by applying the Delta method to the item parameter estimates. As shown above, this somewhat complicates the relationship among the different definitions of , because it is now important to keep track of how the item-specific scaling factors appear in the different definitions. However, the item-specific scaling factors are worth this additional complication for the following three reasons, all of which were noted in Halpin (2022).
First, obtaining item-specific scaling factors analytically from means that we no longer require an ancillary estimate based on the scale of the realized values of . This is important because it leads the resulting estimator to be highly robust to DIF. This is also the main detail that separates the proposed approach from that considered (and dismissed) by Stocking and Lord (1984). Stocking and Lord used . However, the MAD has a breakdown point of 1/4, so any estimator of that uses will breakdown if of the items exhibit DIF (see Huber and Roncetti, 2009, chapter 6). By contrast does not depend directly on the scale of the realized values of – it can be computed directly from the item parameters. The overall result is that the robustness of the resulting M-estimator no longer depends on that of an ancillary scaling factor.
The value of appears in the expression for . Thus, may yet be contaminated by DIF, leading to the potential for “masking”. Although this problem is not as severe as when using an ancillary estimate like the MAD, it is still a potential concern. This problem can be avoided as follows. The null hypothesis that the item does not exhibit DIF gives . This motivates using the substitution when estimating , where is a consistent, high-breakdown estimate of (e.g., the median). The overall result is a plug-in estimator of that is robust to DIF.
Third, using item-specific scaling factors implies that we can downweight items with DIF at the desired asymptotic false positive rate during IRLS-based estimation. For example, if we choose and as the quantile of , then items are down-weighted to zero once lies beyond confidence interval (CI) centered at . In this way, DIF detection arises as a by-product of robust scaling.
Halpin (2022) chose based on a (flawed) argument about efficiency in the absence of DIF, which also complicated the choice of the tuning parameter . The argument was flawed because (a) it was based on the unstated assumption that is diagonal, which is not true for many IRT estimators, and (b) it did not account for the scaling factors that appear outside the influence function in the estimating equation (see point 2 in the previous section).
To address these issues, the robust DIF estimator has been updated to use with tuning parameter based on the asymptotic CI rationale outlined above. This approach ignores sampling variation in when downweighting items with DIF. A more accurate downweighting procedure could instead be based on . Since and are positively correlated, . Thus, the flagging procedure based on is somewhat anti-conservative. To address this issue, it is recommended to compute item-by-item tests of DIF using a standard Wald test following estimation of :
Note that modifying the estimator to instead use is possible (see Halpin), but this leads to complications obtaining its asymptotic distribution. In practice, these complications do not appear to be worth the trouble as there is little change in finite sample performance when using the simpler approach outlined above.
The Asymptotic Distribution of
Halpin (2022) obtained the asymptotic distribution of using the Delta method and the implicit function theorem. The derivation is recapitulated here for general (i.e., non-diagonal) , and the results presented in Halpin (2022) are seen to follow from the assumption that is diagonal.
The estimator is implicitly defined as the solution to the estimating equation
Let the asymptotic distribution of be denoted as
with the null hypothesis of no DIF leading to and . Also let be defined as any solution to the population estimating equation . There may be multiple local solutions when using a redescending loss function, and the asymptotic results described here apply to any local solution. In practice, local minima can be diagnosed by plotting over a grid of values.
The following assumptions are used:
- A1: continuously differentiable.
- A2: is odd (i.e. ).
- A3: .
- A4: .
A1 allows the Delta method to be applied to . A2 ensures that is a solution to under the null hypothesis. A3 and continuity (A1) imply that the population estimating equation is monotone around and hence that is a locally unique solution. A4 is required by the implicit function theorem. Under the null hypothesis, A3 implies A4, but in general the two assumptions are distinct.
Applying the Delta method gives (using A1)
The gradient of is obtained from the implicit function theorem (using A4):
Evaluating the partial derivatives gives
and
Therefore the gradient has elements
The foregoing results provide the general (i.e., non-null) distribution of .
Next we consider the null distribution. First we show that and then derive . The null hypothesis is that . This implies that the standardized residuals are symmetrically distributed about zero. Combined with the assumption that is odd (A2), this gives by the following argument:
The second equality follows from the symmetry of about zero and the third from A2. The chain of equalities shows that . Together A1 and A3 ensure that this is a locally unique solution.
To obtain the null variance, we evaluate the gradient at :
The second equality follows from A3, which implies that is equal to a non-zero constant that factors out of the numerator and denominator.
Finally, using the gradient becomes a vector of precision weights. Letting denote the vector of precision weights, we can write the asymptotic null distribution of as
Under the additional assumption that is diagonal, the resulting expression for the null variance of is
This is the result given in part (a) of Theorem 1 in Halpin (2022).
A similar argument gives the asymptotic null distribution of as
where is the -th column of the identity matrix. Under the additional assumption that is diagonal, the resulting expression for the variance is
This is the result given in part (b) of Theorem 1 in Halpin (2022).
Halpin (2022) used the same overall approach to compare two esimates of – the unweighted mean of and the robust estimator outlined above.
Implementation via IRLS
The robust DIF estimator can be computed using IRLS based on the weighted mean definition given above. The IRLS algorithm is as follows:
- Initialize
and set iteration counter
.
- Compute standardized residuals .
- Compute weights with .
- Update with
.
- If stop; else set and return to step 2.
Once the algorithm has converged, set . Item-level DIF tests can be conducted using the Wald test statistic given above or the multi-parameter variant given in Halpin (2022).
References
Halpin, P.F. (2022) Differential Item Functioning Via Robust Scaling. Arxiv Preprint. https://arxiv.org/abs/2207.04598. Published in Psychometrika in 2024 under the same title.
Halpin, P.F. (2024) Differential Test Functioning Via Robust Scaling. Arxiv Preprint. https://arxiv.org/abs/2409.03502.
Halpin, P. F., & Gilbert, J. (2025). Testing Whether Reported Treatment Effects Are Unduly Influenced by Item-Level Heterogeneity. PsyArxiv Preprint. https://doi.org/10.31234/osf.io/9ru45_v1