Introduction

Nonprobability survey samples-whose members do not have known probabilities of sample inclusionare everywhere and have considerable potential for bias (see Baker et al., 2013, and the references therein). It has become popular to attempt to remove the bias of an estimate derived from a nonprobability sample by first combining that sample, denoted here by \(S_{0}\), with a probability sample \(S_{1}\) that covers the same population \(U\) but shares no members with it. After that, one estimates the probability \(\gamma_{k}\) that a population unit \(k\) in the blended sample \(S=S_{0} \cup\) \(S_{1}\) was originally from the nonprobability sample as a function of a vector of covariates \(\mathbf{z}_{k}\) available for members from both samples (when used here, “sample” always refers to a respondent sample).

Valliant and Dever (2011) suggest that the inverse of this estimated probability-which they, following much of the literature, call a “propensity”-can be used as a quasi-probability sampling weight either directly, \(w_{k}=1 / \widehat{\gamma}_{k}\), or indirectly after some form or poststratification. For example, Lee (2006) proposes sorting the blended sample by \(\hat{\gamma}_{k}\) values, then breaking the sample into cells of nearly equal size, and finally assigning the weight \(w_{k}=\widehat{N}_{1 c} / n_{0 c}\) where \(\widehat{N}_{1 c}\) is the estimated-from-the-probability-sample population size of cell \(c\) containing nonprobability-sampled unit \(k\), and \(n_{0 c}\) is the number of members of \(\mathrm{S}_{0}\) in \(c\).

Although estimating \(\gamma_{k}=\operatorname{Pr}\left(k \in S_{0} \mid k \in S ; \mathbf{z}_{k}\right)\) by fitting a logistic regression on the unweighted blended sample is often treated as an estimate for \(\operatorname{Pr}\left(k \in S_{0} \mid \mathbf{z}_{k}\right)\), Robbins and colleagues (2021) argue that a better estimate for the quasi-probability that \(k\) is in the nonprobability sample when \(S_{0} \cap S_{1}=\varnothing\) is

\[ p_{0 k}=\operatorname{Pr}\left(k \in S_{0} \mid \mathbf{z}_{k}\right)=\pi_{k} \widehat{\gamma}_{k} /\left(1-\widehat{\gamma}_{k}\right) \]

where \(\pi_{k}\) is the probability that \(k\) is chosen for the probability sample (which can include an adjustment for unit nonresponse when it is needed). It is assumed that \(\pi_{k}\) is known for members in the population that are not in \(S_{1}\).

To see how \(p_{0 k}=\operatorname{Pr}\left(k \in S_{0} \mid \mathbf{z}_{k}\right)\) is derived, start with

\[ \pi_{k}=\operatorname{Pr}\left(k \in S_{1} \mid k \in S ; \mathbf{z}_{k}\right) \times \operatorname{Pr}\left(k \in S \mid \mathbf{z}_{k}\right) \text {, and } \]

\(\operatorname{Pr}\left(k \in S_{0} \mid \mathbf{x}_{k}\right)=\operatorname{Pr}\left(k \in S_{0} \mid k \in S ; \mathbf{x}_{k}\right) \times \operatorname{Pr}\left(k \in S \mid \mathbf{x}_{k}\right)\),

then solve for \(\operatorname{Pr}\left(k \in S_{0} \mid \mathbf{z}_{k}\right)\). Elliott and Valliant (2017) make a similar point, suggesting a more sophisticated method could be used to estimate \(\gamma_{k}\). Robbins and colleagues (2021) offer methods for weighting the blended sample, but for now, we assume there are survey items of interest collected in the nonprobability sample but not in the probability sample so that quasi-probability weights for those items are only needed for the members of \(S_{0}\).

There are two critical assumptions underlying the use of \(p_{0 k}\). One is that the probability and nonprobability sample have no member in common. This can be assured by removing any member of \(S_{1}\) from the nonprobability sample. The other is that \(\operatorname{Pr}\left(k \in S_{0} \mid\right.\) \(\left.k \in S ; \mathbf{z}_{k}\right)\) can be modeled, whether by a logistic function (as in Robbins et al., 2021) or some other functional form (as suggested by Elliott & Valliant, 2017). We believe that it is more reasonable to assume that \(\operatorname{Pr}\left(k \in S_{0} \mid \mathbf{z}_{k}\right)\) itself can be modeled by a logistic (or some other) function whether or not \(S_{0} \cap S_{1}=\varnothing\).

We first describe in general terms how that assumption can be used to generate quasi-probability weights for a nonprobability sample given either population totals for the component of \(\mathbf{z}_{k}\) or their probability-sample-estimated analogues. In our setup, when the population total is known for a component \(\mathbf{z}_{k}\), it does not need to be collected on the probability sample (only the nonprobability sample).

Using the WTADJX procedure in SUDAAN 11, we then show how to estimate population means for variables collected from the nonprobability sample and for variables of interest collected from a blended sample comprising a nonprobability and a probability sample drawn from the same population when there are auxiliary variables collected on both samples that can be used to reduce or remove the biases of the estimates for the variables of interest.

This methodology was applied to a stratified simple random probability sample blended with a nonprobability sample drawn from the same population. The methodology can be used to assess the potential for bias in estimates based on the blended sample.

Solving a Calibration Equation

The model

\[ \operatorname{Pr}\left(k \in S_{0} \mid \mathbf{z}_{k}\right)=\left[1+\exp \left(\mathbf{z}_{k}^{T} \mathbf{g}\right)\right]^{-1} \tag{1} \]

is a selection model. If correctly specified, it models the probability that \(k \in U\) is included in the nonprobability sample \(S_{0}\) (which can involve selfselection and response) as a logistic function of the vector \(\mathbf{z}_{k}\) with unknown parameter-vector \(\mathbf{g}\). Kott (2019) points out that this selection model can often be estimated by solving a calibration equation when each component of the population total \(\mathbf{T}_{\mathbf{z}}=\sum_{k \in U} \mathbf{z}_{k}\) is either known or consistently estimated from a probability sample, which itself can have been weighted to account for unit nonresponse (here “a consistent estimate” computed from a probability sample converges into the population parameter it estimates as the probability sample size and population sizes grow arbitrarily large).

A calibration equation that can be used to estimate \(\mathbf{g}\) in equation (1) is

\[ \begin{equation*} \sum_{k \in S_{0}}\left[1+\exp \left(\mathbf{z}_{k}^{T} \mathbf{\widehat{g}}\right)\right] \mathbf{z}_{k}=\widehat{\mathbf{T}}_{\mathbf{z}} \tag{2} \end{equation*} \]

where each component of \(\widehat{\mathbf{T}}_{\mathbf{z}}\) is either assumed to be a known population quantity or a consistent estimate from a probability sample. The solution for \(\widehat{\mathbf{g}}\) in equation (2), when it exists, can usually be found using Newton’s method as Deville and colleagues (1993) show. That approach has been programmed into the SUDAAN 11® routines WTADJUST and WTADJX (RTI International, 2012), the R routines ‘calib’ and ‘gencalib’ in ‘Sampling’ (Tille & Matei, 2023), and elsewhere. We will describe how to use SUDAAN’s WTADJX for our purposes later. Other software packages could be used in a similar manner.

When a solution to equation (2) exists—and we will assume here it does— \(\widehat{\mathbf{g}}\) is a consistent estimator for \(\mathbf{g}\) under mild conditions we assume to hold. The quasiprobability weight for \(k \in S_{0}\) is then

\[ \begin{equation*} w_{k}=\left[1+\exp \left(\mathbf{z}_{k}^{T} \widehat{\mathbf{g}}\right)\right] \tag{3} \end{equation*} \]

The theory supporting this use of an assumed selection model like that in equation (1) together with a calibration equation like (2) first to estimate consistently the parameters of the selection model and then with an equation like (3) to use those estimates in generating quasi-probability weights (asymptotically equal to the inverses of the sample members’ probabilities of inclusion into the nonprobability sample) is analogous to the quasirandom theory supporting the use and calibrationequation fitting of a selection model for the response/ nonresponse mechanism in a probability sample. See, for example, Kott and Liao (2012).

In the nonresponse-adjustment setting of a probability sample, \(S_{1} \subset S_{1}^{*}\), the selection (response) model \(\operatorname{Pr}\left(k \in S_{1} \mid \mathbf{z}_{k} ; S_{1}^{*}\right)=\left[1+\exp \left(\mathbf{z}_{k}^{T} \mathbf{g}\right)\right]^{-1}\), where \(S_{1}^{*}\) is the probability sample before unit nonresponse, replaces equation (1), and \(\boldsymbol{w}_{k}=\boldsymbol{\pi}_{k}^{-1}\left[1+\exp \left(\mathbf{z}_{k}^{T} \widehat{\mathbf{g}}\right)\right]\) replaces equation (3), where \(\boldsymbol{\pi}_{k}\) is the probability that \(k\) has been chosen for \(S_{1}^{*}\).

The assumption that every member of the population has a probability of selection into the nonprobability sample equal to \(\operatorname{Pr}\left(k \in S_{0} \mid \mathbf{z}_{k}\right)=\left[1+\exp \left(\mathbf{z}_{k}^{T} \mathbf{g}\right)\right]^{-1}\) or any other monotonic differentiable function is a strong one.

An alternative justification for using equations (1) through (3) in creating weights in estimating a total like \(Y=\sum_{k \in U} y_{k}\) from a nonprobability sample of \(y\)-values follows. Suppose each \(y_{k} \in U\) behaves like random variables with mean \(\mathbf{z}_{k}^{T} \boldsymbol{\beta}\) for some unknown parameter \(\boldsymbol{\beta}\) whether (or not) \(k\) is in the nonprobability sample. That is, selection is ignorable in expectation with respect to this prediction model, so called because the model predicts a value for \(y_{k}\) (Royall, 1970). Given \(\widehat{\mathbf{T}}_{\mathbf{z}}\) and assuming equation (2) has a solution, if either this prediction model or the selection model holds among the members of the population, estimating \(Y\) with \(\widehat{Y}=\sum_{k \in S_{0}} \boldsymbol{w}_{k} \boldsymbol{y}_{k}\) will be nearly unbiased in some sense (technically, \(\widehat{Y}\) is a predictor, not an estimator, for the random variable \(Y\) under the prediction model). See Kott and Liao (2012) for a proof of this assertion. A similarly “doubly robust” approach can be found in in Chen and colleagues (2020).

When one of more components of \(\widehat{\mathbf{T}}_{\mathbf{z}}\) is consistently estimated from a probability sample, the near unbiasedness of \(\widehat{Y}\) requires the combination of probability-sampling inference (for \(\widehat{\mathbf{T}}_{\mathbf{z}}\) ) and either the selection model or prediction model (for \(\widehat{Y} \mid \widehat{\mathrm{T}}_{\mathbf{z}}\) ).

Nevertheless, we call the former the selection-model framework and the latter the prediction-model framework.

Observe that \(\sum_{k \in S_{0}} w_{k} y_{k} / \sum_{k \in S_{0}} w_{k}\) is a nearly unbiased predictor for the population mean \(\bar{y}=\Sigma_{k \in U}\) \(y_{k} / \Sigma_{k \in U} 1\), when each \(y_{k} \in U\) behaves like a random variable with mean \(\mathbf{z}^T_{k} \boldsymbol{\beta}\), and 1 is either a component of \(\mathbf{z}_{k}\) or the linear combination of the components of \(\mathbf{z}_{k}\).

As mentioned previously, when selected members of a probability sample \(S_{1}\) have design weights \(\left\{d_{k}\right\}\) before unit nonresponse, where \(d_{k}=\pi_{k}{ }^{1}\), then we can weight the unit respondents with

\[ \begin{equation*} w_{k}=d_{k}\left[1+\exp \left(\mathbf{z}_{k}^{T} \widehat{\mathbf{g}}\right)\right] \tag{4} \end{equation*} \]

when response is a logistic function of \(\mathbf{z}_{k}\) (which need not be the same as the vector in equation [1]), and \(\widehat{\mathbf{g}}\) (which likewise need not be the vector in equation [1]) satisfies the calibration equation \(\sum_{k \in S_{1}} d_{k}\left[1+\exp \left(\mathbf{z}_{k}^{T} \widehat{\mathbf{g}}\right)\right] \mathbf{z}_{k}=\widehat{\mathbf{T}}_{\mathbf{z}}\).

Calibration weighting was originally proposed to reduce the standard error of an estimated total in the absence of nonresponse. It works when \(y_{k}\) can be approximated by a linear function of the components of \(\mathbf{z}_{k}\), and the weight-adjustment function within the square brackets of equation (4) is replaced by exp \(\left(\mathbf{z}_{k}^{T} \widehat{\mathbf{g}}\right)\), where \(\widehat{\mathbf{g}}\) converges to \(\mathbf{0}\) and consequently the \(w_{k}\) converges to \(d_{k}\) as the probability sample grows arbitrarily large.

Both weight adjustments are special cases of the following more general weight-adjustment function:

\[ \begin{equation*} \alpha(\theta)=\frac{L+\exp (\theta)}{1+\exp (\theta) / U} \tag{5} \end{equation*} \]

where \([L, U]\) is the range of \(\alpha(\theta)\), and \(0 \leq L<U \leq\) \(\infty\). Software packages that do calibration weighting via what has been called “the logit transformation” in equation (5) allow the user to set \(L\) and \(U\). Some packages (like ‘calib’ in Tille & Matei, 2023) are only designed for probability samples with full response and restrict \(L\) to a value less than 1 . When used to adjust for unit nonresponse or nonprobability selection, however, the range for the implicitly estimated probability of unit response or selection is \([1 / U, 1 / L]\). Consequently, it is sensible to set \(L\) at either 1 or a value greater than 1 .

Calibration Weighting a Blended Sample

Suppose we have a probability sample and a nonprobability sample both chosen from the same population. We denote them by \(S_{1}\) and \(S_{0}\), respectively. At first, suppose both collect a variable \(y_{k}\) with the intention of estimating its population mean. The probability sample is a stratified multistage sample, which may suffer from some unit nonresponse. If used by itself, a vector \(\mathbf{z}_{1 k}\) of variables including one with known population totals can be employed to generate calibration weights for the probability sample rendering estimates for the population mean using those weights both nearly unbiased with respect to the selection model (probability sampling is a type of selection model) and with respect to the linear prediction model: \(\mathrm{E}\left(y_{k}\right)=\mathbf{z}_{1 k}^{T} \boldsymbol{\beta}_{1}\). If there is any unit nonresponse, the selection model assumes that the probability of unit response is correctly specified by the inverse of a weight-adjustment function of the components of \(\mathbf{z}_{1 k}\), while the linear prediction model assumes unit nonresponse is ignorable in expectation.

Similarly, a vector \(\mathbf{z}_{0 k}\) of variables including one with known population totals can be used to generate calibration weights for the nonprobability sample rendering estimates for the population mean using those weights both nearly unbiased with respect to the selection model when the probability of selection into the nonprobability sample is correctly specified by the inverse of a weight-adjustment function of the components of \(\mathbf{z}_{0 k}\) and with respect to the linear outcome model: \(\mathrm{E}\left(y_{k}\right)=\mathbf{z}_{0 k}^{T} \boldsymbol{\beta}_{0}\), assuming selection into \(S_{0}\) is ignorable in expectation.

Many of the components of \(\mathbf{z}_{1 k}\) and \(\mathbf{z}_{0 k}\) may coincide. We do not require that the two samples be disjoint but they must be selected independently.

The WTADJUST procedure in SUDAAN can be used to create weights and estimate the population means as described previously as long as the weightadjustment function in equation (5) is used for both samples. WTADJUST allows \(L\) and \(U\) to differ across the members of a sample. Here, one can set values \(L_{1}\) and \(U_{1}\) for every member of \(S_{1}\) and values \(L_{0}\) and \(U_{0}\) for every member of \(S_{0}\). When \(U_{f}\), \(f=0\) or 1, is unspecified, it is treated as virtually infinite \(\left(10^{20}\right)\), and a finite centering parameter \(C_{f}\) needs to be added to WTADJUST for the members of \(S_{f}\) say, \(\max \left\{1,2 L_{f}\right\}\), but the choice (as long as it is finite) has no impact on the results.

WTADJUST will also estimate standard errors that are nearly unbiased under the selection-model framework. Moreover, any linear combination of the two estimates is also a nearly unbiased estimator of the population mean and has a standard error that can be estimated (under the selection-model framework) using WTADJUST.

To this end, let \(S\) be the union of \(S_{1}\) and \(S_{0}\). A sample member may be in \(S\) twice, with each such member treated as two separate members of the blended sample \(S\). We treat the \(H\) design strata of \(S_{1}\) and the entire nonprobability sample as the \(H+1\) design stratum of the blended sample \(S\). For any \(k\) in \(S\), let \(\mathbf{z}_{k}{ }^{T}=\left(\mathbf{z}_{1 k}{ }^{T} \mathbf{z}_{0 k}{ }^{T}\right)\), where all the components of \(\mathbf{z}_{1 k}\) are 0 when \(k\) is in \(S_{0}\) and all the components of \(\mathbf{z}_{0 k}\) are 0 when \(k\) is in \(S_{1}\). The \(L\) and \(U\) parameters are the same for each member of \(S_{1}\), and they are the same for each member of \(S_{0}\), but the former and latter pairs may differ.

Consider the following calibration equation, which can be used to create quasi-probability weights for any positive \(\lambda\) :

\[ \begin{gather*} \sum_{k \in S} d_{k} \alpha_{(k)}\left(\mathbf{z}_{k}^{T} \widehat{\mathbf{g}}\right) \mathbf{z}_{\mathrm{k}}=\binom{\lambda \sum_{k \in S_{1}} \pi_{k}^{-1} \alpha_{1}\left({\mathbf{z}_{1 k}^T} \widehat{\mathbf{g}}_{1}\right) \mathbf{z}_{1 k}}{\sum_{k \in S_{0}} \alpha_{0}\left({\mathbf{z}_{0k}^{T}} \widehat{\mathbf{g}}_{0}\right) \mathbf{z}_{0 k}}=\binom{\lambda \mathbf{T}_{\mathbf{z}_{1}}}{\mathbf{T}_{\mathbf{z}_{0}}} \\ =\mathbf{T}_{\mathbf{z}}(\lambda), \tag{6} \end{gather*} \]

where \(\boldsymbol{\alpha}_{(k)}\left(\mathbf{z}_{k}^{T} \hat{\mathbf{g}}\right)=\boldsymbol{\alpha}_{f}\left(\mathbf{z}_{f k}^{T} \hat{\mathbf{g}}_{f}\right)\) for \(k \in S_{f}, \boldsymbol{\alpha}_{f}\left(\mathbf{z}_{f k}^{T} \hat{\mathbf{g}}_{f}\right)\) is weight-adjustment function for \(S_{f}(f=0\) or 1\(), \widehat{\mathbf{g}}^{T}=\) \(\left(\widehat{\mathbf{g}}_{1}^{T} \widehat{\mathbf{g}}_{2}^{T}\right)\), and

\[ \begin{equation*} d_{k}=\delta_{k} \lambda \pi_{k}^{-1}+\left(1-\delta_{k}\right), \tag{7} \end{equation*} \]

\(\delta_{k}=1\) when \(k\) was originally from \(S_{1}\) and 0 otherwise (we labeled the weight on the left-hand side of this equation \(d_{k}\) for convenience, although it depends on the choice of \(\lambda\); a more formal label would be \(d_k^{(\lambda)}\)). Observe that the relative contribution of the probability sample when estimating \(\bar{y}\) is \(\lambda /(1+\lambda)\). WTADJUST estimates both \(\bar{y}\) with the weights implied by equation (6) and the standard error of that estimate.

WTADJUST has one glaring limitation, however. It cannot be used to estimate standard errors when the probability of selection into the nonprobability sample includes variables with unknown population totals that need to be estimated by the probability sample. For that, one needs WTADJX (or something like it; Chen et al., 2020, discuss another approach).

For our purposes, the equation for the quasi-weights in \(S\) using WTADJX is

\[ \begin{equation*} w_{k}=d_{k} \frac{L_{k}+\exp \left(\mathbf{x}_{k}^{T} \hat{\mathbf{g}}\right)}{1+\exp \left(\mathbf{x}_{k}^{T} \mathbf{\widehat{g}}\right) / U_{k}}, \tag{8} \end{equation*} \]

where \(L_{k}=L_{1} \delta_{k}+L_{0}\left(1-\delta_{k}\right)\) and \(U_{k}=U_{1} \delta_{k}+\) \(U_{0}\left(1-\delta_{k}\right)\), and the model variable \(\mathbf{x}_{k}\) is a vector with the same number of components as the vector of values on which we are calibrating, such as the \(\mathbf{z}_{k}\) in equation (1). (When \(\mathbf{x}_{k}=\mathbf{z}_{k}\), WTADJUST can be used in place of WTADJX.)

Let \(\mathbf{q}_{k}\) denote a vector of additional variables included in the nonprobability sample’s selection model,

\[ \operatorname{Pr}\left(k \in S_{0} \mid \mathbf{z}_{0 k}, \mathbf{q}_{k}\right)=\frac{1+\exp \left({\mathbf{z}_{0k}^T}\mathbf{g}_{0}+\mathbf{g}_{k}^{T} \mathbf{q}_{\mathbf{q}}\right) / \mathbf{U}_{0}}{L_{0}+\exp \left({\mathbf{z}_{0k}^T} \mathbf{g}_{0}+\mathbf{q}_{k}^{T} \mathbf{g}_{\mathbf{q}}\right)}, \]

but with unknown population totals that need be estimated by the probability sample. With

\[ \begin{align*} & \mathbf{x}_{k}^{T}=\left(\mathbf{z}_{1 k}{ }^{T} {\mathbf{z}_{0 k}}^{T}\left[1-\delta_{k}\right] \mathbf{q}_{k}^{T}\right) \text {, and } \\ & \mathbf{z}_{k}^{T}=\left({\mathbf{z}_{1 k}}^{T} \mathbf{z}_{0k}{ }^{T}\left\{1-\delta_{k}[1+1 / \lambda]\right\} \mathbf{q}_{k}^{T}\right), \tag{9} \end{align*} \]

WTADJX can be used to estimate the population mean \(\bar{y}\) by finding the \(\widehat{\mathbf{g}}\) satisfying:

\[ \sum_{k \in S} d_{k} \frac{L_{k}+\exp \left(\mathbf{x}_{k}^{T} \widehat{\mathbf{g}}\right)}{1+\exp \left(\mathbf{x}_{k}^{T} \widehat{\mathbf{g}}\right) / U_{k}} \mathbf{z}_{k}=\left(\begin{array}{c} \lambda \sum_{k \in U} \mathbf{z}_{1 k} \tag{10}\\ \sum_{k \in U} \mathbf{z}_{0 k} \\ \mathbf{0} \end{array}\right) \]

where \(\mathbf{0}\) has as many components as \(\mathbf{q}_{k}\) (so that \(\left.\sum_{k \in S_{0}} w_{k} \mathbf{q}_{k}=\sum_{k \in S_{1}} w_{k} \mathbf{q}_{k}\right)\). To estimate the population total \(Y=\Sigma_{k \in U} y_{k}\) the quasi-weights in equation (8) need to be divided by \(1+\lambda\).

One can vary the choice of \(\lambda\) to find the optimal value that minimizes the standard error of the estimated mean. Recall that every choice for \(\lambda\) results in nearly unbiased estimation. Moreover, observe that given how the \(d_{k}\) in equation (7) and \(\mathbf{z}_{k}\) in equation (9) are defined, the choice of \(\lambda\) has no impact on the \(\widehat{\mathbf{g}}\) satisfying equation (10).

After choosing \(\lambda\), the components of the vector on the right-hand side of equation (10) are known before sampling. Because of this, WTADJX can estimate the standard error of an estimated total or mean (under the selection-model framework), assuming the indicators of selection into the nonprobability sample are independent of each other. When there is unit nonresponse in the probability sample, an analogous assumption is made about the indicators’ unit response.

Ignoring finite population correction (as we will), the key to nearly unbiased variance estimation via linearization is the near (i.e., asymptotic) equality of \(\Sigma_{S} w_{k} e_{k}=\Sigma_{S} d_{k} \alpha\left(\mathbf{x}_{k}^{T} \hat{\mathbf{g}}\right) e_{k}\) and \(\Sigma_{S} d_{k} \alpha\left(\mathbf{x}_{k}^{T} \mathbf{g}\right) e_{k}\), where

\[ \begin{equation*} e_{k}=y_{k}-\mathbf{z}_{k}^{T}\left[\sum_{S} d_{j} \alpha^{\prime}\left(\mathbf{x}_{j}^{T} \widehat{\mathbf{s}}\right) \mathbf{x}_{j} \mathbf{z}_{j}^{T}\right]^{-1} \Sigma_{S} d_{j} \alpha^{\prime}\left(\mathbf{x}_{j}^{T} \widehat{\mathbf{s}}\right) \mathbf{x}_{j} y_{j} \tag{11} \end{equation*} \]

The inclusion of the \(\alpha^{\prime}\left(\mathbf{x}_{j}^{T} \widehat{\mathbf{g}}\right)\) terms in \(e_{k}\) allows us to avoid directly accounting for \(\widehat{\mathbf{g}}\) itself being an estimate in large-sample variance estimation. For more theoretical details on variance estimation for a calibrated estimator when \(\mathbf{x}_{k} \neq \mathbf{z}_{k}\), see Kott and Liao (2015).

When there are many variables for which one needs to estimate a population mean from the blended sample, the optimal \(\lambda\) will likely vary across the variables. Consequently, a compromise will be needed if one desires a single \(\lambda\) to be used for all variables.

Estimation from the Nonprobability Sample

Often \(y\)-values are collected from the nonprobability sample but not the probability sample. We can treat whether (or not) an element of the blended sample was originally a member of the nonprobability sample as a class variable in WTADJX and then estimate \(\bar{y}\) and the standard error of that estimate with WTADJX. The estimates of \(\bar{y}\) based on the blended sample and for the class defined as the probability sample will be missing, while the estimate for the class defined by the nonprobability sample will be a nearly unbiased estimate for \(\bar{y}\). A nearly unbiased estimate for its standard error will accompany it. The selection of \(\lambda\) in equation (1) does not matter. Setting \(\lambda=1\) for simplicity is a straightforward choice.

An Example

Benoit-Bryan and Mulrow (2021) describe a simulated population of \(113,549(N)\) individuals created from the Culture and Community in a Time of Crisis survey of behavior and attitudes before and during the Covid-19 crisis. Both 10,000 stratified simple random probability samples of 1,000 persons and 10,000 vaguely described nonprobability samples of 4,000 persons were drawn from the population. There are no missing values in the data set as it now stands (in Mulrow, 2022). This allows for pure analyses of blending methodologies.

To demonstrate our methodology, we focus on a single probability and a single nonprobability sample. Our goal is to estimate population means for 14 survey variables of interest (that were chosen by Benoit-Bryan and Mulrow for a competition) using information from 32 not-of-interest (NOI) survey variables (chosen by us) as well as variables for which the population means were known. The last group includes 9 region indicators, 3 levels of urbanization, an Hispanicity indicator, 6 race categories, 4 age categories, and 7 education-level categories.

Most of the survey variables of interest and NOI variables were yes/no (1/0). Two of the survey variables of interest were originally on a five-point Likert scale. Thus, we had for analytical purposes 20 variables of interest whose proportion of \(1 \mathrm{~s}\) we were trying to estimate.

The survey variables are described and given variable names (e.g., q7_22) below:

Variables of Interest

q7_22 Attended classical music in 2019

q10_1 Missed experiencing artwork, performances

q_11 Offered online exhibitions or galleries

q25_11 Will see a play or musical when able in short term

q1_15 Participated in a live interactive event in past 30 days

q6_9 Want more fun in life

q11_4 Offered online materials or activities for kids

q10_3 Miss celebrating cultural heritage

q7_14 Attended community festival in 2019

q6_1 Want more hope in life

q1_6 Watch movie or tv series in past 30 days

q25_13 Will take art, music, or dance class when able in short term

The two five-level original variables of interest (and their replacements):

q17 During Covid, how important are arts and cultural organizations

q18 Before Covid, how important were arts and cultural organizations

_2 Slightly (e.g., q17 = 2 becomes q17_2 = 1)

_3 Moderately

_4 Important

_5 Very

NOI Variables

In 2019, did you attend or participate in . . .

q7_1 Art museum

q7_2 Children’s museum

q7_3 Art gallery/fair

q7_4 Botanical garden

q7_5 Zoo or aquarium

q7_6 Science or technology museum

q7_7 Natural history museum

q7_8 Public park

q7_9 Architectural tour

q7_10 Public/street art

q7_11 Film festival

q7_12 Music festival

q7_13 Performing arts festival

q7_15 Craft or design fair

q7_16 Read books/literature

q7_17 Food and drink experience

q7_18 Nonmusical play

q7_19 Musical

q7_20 Variety or comedy show

q7_21 Popular music

q7_23 Jazz music

q7_24 Opera

q7_25 World music

q7_26 Contemporary dance

q7_27 Ballet

q7_28 Regional dance

q7_29 Historic attraction/museum

q7_30 Television program

q7_31 Movies/film

q7_32 Library

q7_33 Cultural center

q7_34 Video games/online gaming

There were 18 strata in the stratified simple random probability sample. We assigned each member of that sample to Domain 1 and the members of the nonprobability sample to Domain 2 and to Stratum 19. Implicitly assuming \(\lambda=1\), both domains were at first calibrated separately to 30 population variables defined by region, urbanization, race, Hispanicity, age, and education level using WTADJUST with the weight-adjustment-function parameters in equation (5) set at \(L=0\) and \(U=10^{20}\) (virtually infinity).

An attempt at setting an initial weight of 1 with \(L=\) 1 for the nonprobability sample failed for technical reasons related to how the SUDAAN program runs rather than to the underlying theory. We set the initial weight for each member of the nonprobability sample at 25 (slightly less than \(N / n_{0}=113,549 / 4,000\) \(=28.38725\), which is what we set as the initial weight for each member) and \(L\) at \(1 / 25=0.04\). That is mathematically equivalent to assuming the probability on inclusion is a logistic function, because \(1+\exp \left(\mathbf{x}_{j}^{T} \mathbf{g}\right)=25\left(1 / 25+\exp \left(\mathbf{x}_{j}^{T} \mathbf{g}^{*}\right)\right.\), so that only the coefficients of the intercepts for \(\mathbf{g}\) and \(\mathbf{g}^{*}\), or their equivalents, differ.

Table 1 displays differences in the estimated means (the proportion of \(1 \mathrm{~s}\) expressed as a percent) for the variables of interest computed from the probability and nonprobability samples separately using WTADJUST for both estimates (the code is in the Appendix). These differences are sorted by their ascending \(p\)-values.

Table 1.Differences in estimated domain means of variables of interest (in percentage point) using WTADJUST
Variable Difference t-value p-value
q7_22 -5.87071 -3.09817 0.00196
q17_5 -5.67015 -2.77586 0.00553
q17_4 5.19282 2.75408 0.00591
q6_9 5.42981 2.65408 0.00798
q1_15 -4.96485 -2.44432 0.01455
q10_1 -4.51152 -2.40899 0.01603
q18_5 -4.76654 -2.38859 0.01695
q18_3 2.11118 1.71476 0.08645
q18_4 2.19730 1.18603 0.23567
q25_13 -1.26753 -1.09874 0.27194
q1_6 1.21429 1.07647 0.28177
q11_1 -2.11664 -1.05061 0.29349
q18_2 0.49995 0.97542 0.32940
q17_2 0.61545 0.78579 0.43203
q10_3 -0.67250 -0.78239 0.43402
q25_11 -1.15565 -0.61314 0.53981
q11_4 -0.70367 -0.36134 0.71786
q6_1 0.63118 0.31783 0.75063
q17_3 -0.31811 -0.22483 0.82212
q7_14 -0.03445 -0.01706 0.98639

Because there are so many differences being measured (20), it is advisable to use a Bonferroni correction when assessing whether estimated differences are significant. This correction divides the \(p\)-values for the variables of interest by 20 . If the difference with the smallest \(p\)-value among the variables of interest remains significant at some level, then the hypothesis of no difference between probability-sample and nonprobability-sample estimates for the variables of interest fails. The Bonferroni correction is known to be conservative, so it may be prudent to assesses significance at the .1 level rather than at the conventional .05 level. With this in mind, the hypothesis of no differences between the estimates fails among the variables of interest \((.00196<.1 / 20\) \(=.005)\). It would fail at the Bonferroni-corrected .05 level as well.

Table 2 is analogous to Table 1 displaying the differences among the estimates for the NOI variables with the 10 smallest \(p\)-values (the code is in the Appendix). We took the four NOI variables with significant differences at the Bonferroni-corrected .1 level (\(p\)-value \(<.1 / 32\)) and added them to the selection model for the nonprobability samples and recomputed the estimates for the nonprobability sample using WTAJDX. The revised differences between the probability-sample estimates and nonprobability-sample estimates for the variables of interest are displayed in Table 3. The \(t\) - and \(p\)-values of these differences were computed, without loss of generality, setting \(\lambda=1\).

Table 2.Ten largest differences in estimated domain means of NOI variables using WTADJUST (in terms of their p-values)
Variable Difference t-value p-value
Q7_28 -3.15237 -3.58028 0.00035
Q7_4 -6.97418 -3.45283 0.00056
Q7_5 -6.05481 -3.22138 0.00128
Q7_32 -6.15135 -3.07216 0.00214
Q7_24 -4.34578 -2.67472 0.00750
Q7_1 -4.42137 -2.51749 0.01185
Q7_27 -3.69502 -2.33730 0.01946
Q7_11 -3.42707 -2.27782 0.02278
Q7_13 -4.50724 -2.20741 0.02733
Q7_6 -3.90890 -2.09830 0.03593
Table 3.Revised differences in estimated domain means of variables of interest after using WTADJX
Variable Difference t-value p-value
q6_9 5.64438 2.75222 0.00594
q7_22 -4.92017 -2.59090 0.00960
q17_4 4.82479 2.54737 0.01088
q10_1 -4.12273 -2.18439 0.02898
q17_5 -4.40463 -2.14937 0.03165
q1_15 -4.23211 -2.09599 0.03613
q18_5 -3.68356 -1.85614 0.06349
q1_6 1.56700 1.37592 0.16891
q7_14 2.56980 1.31451 0.18874
q18_3 1.53855 1.23678 0.21623
q18_4 1.85522 1.00026 0.31723
q25_11 -1.67931 -0.88706 0.37509
q17_3 -1.04803 -0.72978 0.46556
q18_2 0.36094 0.69342 0.48808
q17_2 0.46723 0.59242 0.55360
q11_4 1.09904 0.57548 0.56499
q10_3 -0.44952 -0.52322 0.60084
q25_13 -0.60311 -0.52262 0.60126
q6_1 0.89665 0.45045 0.65240
q11_1 -0.78112 -0.38946 0.69695

Using a Bonferroni correction, the null hypothesis of no significant difference between the probability-sample estimates and revised nonprobability-sample estimates among the variables of interest no longer fails at either the .1 level \((.1 / 20<.00594)\) or at the .05 level. This gives us some confidence that the revised nonprobability-sample estimates for the variables of interest, unlike the original ones, are nearly unbiased. That confidence is mitigated somewhat by the observation that 6 out of the 20 differences in Table 3 have \(p\)-values less than .05. Even though the differences are not independent, we would have hoped to see only one difference to have a \(p\)-value less than or equal to .05.

Some Concluding Remarks

In practice, users will not be able to compare their estimates with full-population statistics as could have been done here. Nevertheless, users will be able to assess whether calibrated estimates from their nonprobability sample are significantly different from analogous probability estimates, as Tables 1 and 3 show. This remains true even if the method used here for choosing which NOI variables needed to be included in the calibration—beginning with a pared list of potential NOI variables without any interactions followed by the use of Bonferroni-adjusted \(p\)-values to reduce those variables further—proves not to be generally successful.

One thing that we did not consider is variance estimation under the linear-prediction-model framework (which is dubious for binary variables like those in our example). When there are no estimated variable totals from the probability sample used in the calibration equation for the nonprobability sample, and one assumes the errors in the linear models for the probability and nonprobability samples are independent across sample members, the near independence of the residuals in equation (11) suggest the variance estimator developed in the test is nearly unbiased under the predictionmodel framework as long as the probability and nonprobability samples are distinct. This is an unnecessary assumption under a selection-model framework. When it fails, a delete-a-group jackknife variance estimator (Kott, 2001) may be used with group membership of any repeated sample member for both its appearances. When it does not fail, however, replacing \(\alpha\left(\mathbf{x}_{j}^{T} \hat{\mathbf{g}}\right)\) with its derivative \(\alpha^{\prime}\left(\mathbf{x}_{j}^{T} \hat{\mathbf{g}}\right)\) in equation (11) is unnecessary because \(\hat{\mathbf{g}}\) is no longer an estimator for a selection-model parameter. Although the selection-model-based WTADJUST and WTADJX make this replacement when estimating variances, to our knowledge, analogous routines in R, such those found in ‘Survey’ (Lumley, 2023), do not.

We can show that the test being nearly unbiased under the prediction-model framework remains true when there are estimated variable totals from the probability sample used in the calibration equation for the nonprobability sample, although the revised prediction model assumes \(\mathrm{E}\left(y_{k}\right)\) is a linear function of the components of \(\mathbf{x}_{k}\) rather than \(\mathbf{z}_{k}\) as in Kott and Chang (2010) (and the expected value of each component of \(\mathbf{z}_{k}\) is a linear function of the components of \(\mathbf{x}_{k}\) ). The proof of this assertion is beyond the scope of this endeavor.

RTI Press Associate Editor: Cynthia Bland