Measurement invariance has been conceived as an important research topic for assessment in crosspopulation (e.g., crosscultural) contexts (Cheung & Rensvold, 2002; Kline, 2015). This issue is crucial in that without measurement invariance, there can be no comparison of scores across groups. The key question of measurement invariance rests on three sequential hypotheses or invariance levels, consisting of: 1. do multiple populations endorse the same measurement pattern for the latent constructs; 2. do multiple populations endorse the same psychological meanings for the latent constructs; 3. do multiple populations endorse the same levels of the latent constructs (Kline, 2015; Vandenberg & Lance, 2000). Different hypotheses have different implications for measurement research and each serves as the foundation for the subsequent hypothesis. However, they could be examined in a systematic manner by using multigroup confirmatory factor analysis (CFA). The purpose of the current article was to introduce the theoretical implications of measurement invariance as well as the corresponding analytic strategies, focusing on the three invariance conditions. One example is used to illustrate the theoretical and analytic points.
Theoretical Implication of Measurement Invariance [TOP]
Measurement invariance addresses the key question of whether measurement of latent constructs varies across multiple groups, with configural invariance, metric invariance, and scalar invariance as the most common conditions. The most common testing theory in evaluating measurement invariance is Classic Test Theory (CTT), which posits that observations of the latent construct (X) are comprised of true scores of the latent construct (T) and random measurement error (E). Mathematically, the relation of these three measurement components could be simplistically formulated as X = T + E (Lord, Novick, & Birnbaum, 1968; Vandenberg & Lance, 2000). So, the measurement model of latent constructs in CTT could imply three important pieces of measurement information: 1. certain observations could indicate latent constructs; 2. these observations indicate latent constructs in a systematic matter; 3. latent constructs exhibit certain levels based on sample observations. Measurement invariance therefore could be examined on three sequential levels, consisting of configural invariance, metric invariance, and scalar invariance (Kline, 2015; Vandenberg & Lance, 2000).
Configural invariance refers to the condition that the model of latent constructs being indicated by certain observations holds across multiple groups (Abrams et al., 2013; Vandenberg & Lance, 2000). When configural invariance is supported, it indicates that the same latent construct could be indicated by the same manifest observations across groups (Vandenberg & Lance, 2000). However, it does not imply that the relation of latent constructs with manifest observations are equivalent across groups. Therefore, the next step is to examine the more constrained metric invariance.
Metric invariance refers to the condition that the relation of latent constructs with observatory indicators holds across multiple groups (Abrams et al., 2013; Vandenberg & Lance, 2000). When metric invariance is supported, it indicates that the same latent construct could be represented by the same manifest observations in an equivalent manner across groups. In other words, it would indicate that the psychological meanings of the measured latent constructs are equivalent across groups (Vandenberg & Lance, 2000). This step warrants subsequent examination of whether the levels of latent constructs are invariant across groups (Abrams et al., 2013; Vandenberg & Lance, 2000). Without metric invariance, it is senseless to compare the means of latent constructs because they indicate psychologically different constructs.
Scalar invariance refers to the condition that the level of the compared latent construct holds across multiple groups (Abrams et al., 2013; Vandenberg & Lance, 2000). When scalar invariance is supported, it would be suggested that different groups could exhibit the same mean level of the same latent construct (Vandenberg & Lance, 2000). Otherwise, differential mean levels of the same latent construct across groups would be implied.
Measurement invariance serves as important tool to address crosscultural validity issues raised by the multicultural movement of counseling psychology (Arnett, 2008; Sue & Sue, 1990). It has been widely acknowledged that psychological knowledge garnered from one cultural group cannot automatically generalize to other cultural groups, as different cultures have appreciable differences regarding how individuals should perceive and react with the world (Arnett, 2008; Sue & Sue, 1990). This critique particularly applies to counseling psychology, as the filed often investigates culturally sensitive phenomena, such as mental health concerns. Therefore, a careful examination of the crosscultural measurement invariance is necessary and important as to helping counseling psychologists understand how psychological constructs are perceived crossculturally.
Analytic Strategies of Measurement Invariance [TOP]
One of the most common strategies to examine measurement invariance is multigroup confirmatory factor analysis (Kline, 2015). While confirmatory factory analysis examines whether the hypothesized measurement model fits the data well, multigroup confirmatory factor analysis could be used to precisely compare the measurement model across groups. The progressive analytic strategy in multigroup CFA involves three steps, corresponding to the three invariance conditions of configural, metric, and scalar invariance (Kline, 2015; Vandenberg & Lance, 2000).
To examine configural invariance, the same measurement model could be examined separately for each group by using CFA. The fit of the models could be evaluated using the common criteria recommended by Hu and Bentler (1999): robust chisquare, Comparative Fit Index (CFI < .90), Root Mean Square Error of Approximation (RMSEA < .08), and Standardized Root Mean Square Residual (SRMR < .08). If the same measurement model fits the data well across groups, then configural invariance is supported.
The second step of invariance examination is to examine metric invariance by comparing two nested models, consisting of a baseline model and an invariance model. The baseline model allows the factor loadings to be freely estimated across multiple groups. The invariance model constrains the factor loadings to be equivalent across multiple groups. Differences between the two nested models are examined with the chisquare difference test (Muthén & Muthén, 2012) and the ΔCFI (Cheung & Rensvold, 2002; Meade, Johnson, & Braddy, 2008). A nonsignificant result of the chisquare difference test would indicate that the invariance model is a better representation of the data because it fits the data equivalently relative to the baseline model but has better parsimony (Muthén & Muthén, 2012). In contrast, a significant result of the chisquare different test would indicate that the baseline model is a better representation of the data, suggesting that the psychological meanings of the latent constructs vary across groups.
However, the chisquare difference test has been revealed to be highly sensitive to the sample size and less sensitive to a lack of invariance than ΔCFI (Cheung & Rensvold, 2002; Meade et al., 2008). Simulation studies comparing multiple goodnessoffit indices (e.g., chisquare, AIC, RMSEA, and CFI) have recommended ΔCFI as it is independent of model complexity and sample size and a ΔCFI less than .01 indicates invariance (Cheung & Rensvold, 2002; Meade et al., 2008). Meade et al. (2008) has suggested that if ΔCFI indicates invariance and the sample size is greater than 200, any differences between groups are probably trivial and further analyses could proceed, even though the chisquare difference test is significant.
It is noteworthy that full metric invariance, where all latent constructs have the same psychological meanings across groups, could be rare in applied settings. In this case, partial metric invariance, where several of the latent constructs exhibit the same psychological meanings across groups, could warrant further examination of scalar invariance on those latent constructs.
The final step of invariance examination is to examine scalar invariance by comparing the means of the latent constructs. Essentially this step could be conducted by using the similar nestedmodel comparison strategy as introduced in the previous step. In this case, the baseline model would allow the means to be freely estimated across multiple groups. The invariance model would constrain the means to be equivalent across multiple groups. However, different statistical programs could have different default specifications regarding the mean structures. We used Mplus as an example as it is the most commonly used statistical package for structural equation modeling. In the environment of Mplus, means of latent constructs are freely estimated by default and means for the reference group would be set as zero (Muthén & Muthén, 19982010). So a significant mean of a compared group would indicate that this group has a different level of the latent construct relative to the reference group. A practical suggestion regarding this issue is to always check the default setting of the statistical program being used, through reading its manual or help documentation.
One Case Illustration [TOP]
Measurement invariance provides critical information as to how latent constructs investigated are manifested crossculturally and therefore it fits closely with the vision and value of multicultural counseling research and practice (Arnett, 2008; Sue & Sue, 1990). The next case example (Xu, Hou, Tracey, & Zhang, 2016) illustrates how to apply and examine measurement invariance stepbystep in one research topic of counseling psychology. The data and method were drawn from the original study and thus only a synapsis was provided. The analytic decision making and theoretical implications of each step were detailed. While the original study examined measurement invariance of all the factors, the current article reanalyzed the data and primarily focused on only one factor for the demonstration purpose.
There has been an emerging proposition in vocational psychology that ambiguity is an inevitable component in the career decisionmaking process and therefore the ability to handle this ambiguity is critical with respect to career decision outcomes (Xu & Tracey, 2014, 2015a; Xu & Tracey, 2015b). Xu and Tracey (2015b) have proposed and demonstrated that ambiguity tolerance specific to career decisionmaking (i.e., career decision ambiguity tolerance) is an important construct concerning career decision making and revealed a threefactor structure of preference, tolerance, and aversion in U.S. college students. However, it remains unestablished in terms of the measurement invariance of this construct across cultures. The current example thus examined the measurement invariance of career decision ambiguity tolerance across China and the U.S. college students. It was hypothesized that Chinese students would have a lower mean level of both preference and aversion relative to U.S. students, with no differences on tolerance.
The multigroup CFA was conducted based on comparison between a Chinese sample and a U.S. sample. The Chinese sample consisted of 356 college students recruited from five universities, all of which are located in the urban areas of mainland China. The U.S. sample consisted of 328 undergraduate students recruited from a southwest state university. The detailed demographic information for both groups can be found in Xu et al.’s (2016) original article. The 18item Career Decision Ambiguity Tolerance Scale (CDAT; Xu & Tracey, 2015b) was used in Xu et al.’s (2016) study to measure people’s ambiguity tolerance in career decision making. It contains three subscales, consisting of preference, tolerance, and aversion. The reliability and validity of CDAT have been reported to be satisfactory (Xu & Tracey, 2015b).
Configural Invariance [TOP]
The configural invariance of the threefactor measurement model between China and the U.S. was first examined (see Table 1 for summarized results).
Table 1
Model  χ^{2}  df  CFI  RMSEA

SRMR  

Estimate  90% C. I.  
Chinese College (n = 356)  
3factor original model  462.89  132  .77  .08  [.08, .09]  .09 
2factor final model  127.32  53  .92  .06  [.05, .08]  .06 
U.S. College (n = 328)  
3factor original model  245.70  132  .86  .07  [.06, .08]  .08 
2factor final model  86.07  53  .97  .04  [.03, .06]  .05 
As can been seen by the value of RMSEA (.08), SRMR (.09), and CFI (.77), the threefactor structure derived from the U.S. college students was not a good representation of the CDAT structure in Chinese college students. It was found that the factor of tolerance in the original structure had poor loadings on the last three items. Therefore, configural invariance was not supported on the whole threefactor measurement model. Given this, only the two factors of preference and aversion were examined with respect to metric invariance and the tolerance factor was dropped. We examined the structure of this twofactor model in the two crosscultural samples. As can be seen by the values of RMSEA (.06 and .04), SRMR (.06 and .05), and CFI (.92 and .97), the twofactor structure fit the data well in the Chinese sample as well as fitting in the U.S. sample. Thus, the factors of preference and aversion were supported in both Chinese college students and U.S. college students, which allowed for the following metric and scalar invariance examination.
Metric Invariance [TOP]
While the metric invariance of CDATpreference and CDATaversion were examined in the original study, the present case only focused on CDATaversion in the subsequent examinations of metric and scalar invariance between Chinese and U.S. college students for the purpose of brevity. As can been seen by the values of CFI (.93 and .99), RMSEA (.09 and .03), and SRMR (.04 and .03), the measurement model of CDATaversion in Chinese and U.S. college students both fit the data adequately. Table 2 summarized the results of modeldata fit for the two nested models examined for metric invariance.
We first specified the baseline model (Model a), where all factor loadings were freely estimated. As can be seen by the values of CFI (.94), RMSEA (.07), and SRMR (.04), Model a fit the data adequately. Based on Model a, we specified the invariant model (Model b), where all factor loadings across Chinese and U.S. college students were constrained to be equal. As can be seen by the values of CFI (.91), RMSEA (.08), and SRMR (.09), Model b fit the data adequately as well.
The corrected chisquare difference test indicated that Model b was significantly worse in fit than Model a, scaled Δχ^{2} (5, N = 684) = 26.79, p < .05. In addition, the ΔCFI was > .01, indicating that the two models could have substantial difference in practical terms. Therefore, it was indicated that the metric invariance of CDATaversion was not supported between Chinese and U.S. college students, suggesting that the factor of aversion could have different psychological meanings across China and the U.S. college students.
Scalar Invariance [TOP]
As the metric invariance of CDATaversion was not supported, comparison of the mean of CDATaversion across Chinese and U.S. college students became senseless. However, we still proceeded to examine scalar invariance only for the purpose of demonstration.
While Chinese students were set as the reference (i.e., zero), it was found that CDATaversion was not significantly different from zero (unstandardized estimate = .02, p > .05). The results, thus, suggested that Chinese and U.S. college students tend to experience an equivalent level of anxiety and avoidance tendency in career decision making. Again, it should be cautioned that such an interpretation relies on metric invariance, which was not supported in this case.
Conclusion [TOP]
The illustrational study demonstrated that Chinese and U.S. college students perceive ambiguity tolerance differently at both the qualitative and quantitative levels. As can be seen by the configural results, Chinese students do not consistently endorse tolerance as an approach in interacting with ambiguity in career decision making, while U.S. students perceive tolerance as an approach independent from preference and aversion. It was additionally revealed by the scalar results that students in both cultural contexts endorse a similar level of aversion associated with ambiguity in career decision making. The illustrational case thus spoke well to the potential variance of measurement psychometrics across cultures.
Over the past three decades, counseling psychology as a filed has witnessed an arising need to investigate and examine counseling theory in its multicultural and international context (Arnett, 2008; Sue & Sue, 1990). Meanwhile, there has been a noticeable shift of research methodology to quantitative methods in counseling psychology (e.g., Frazier, Tix, & Barron, 2004; Quintana & Maxwell, 1999). The quantitative method of measurement invariance introduced in this article reflects a promising fusion of the two trends, which makes precise examination of crosscultural comparison plausible.
Overall, measurement invariance is an important research topic when researchers hope to investigate whether the measurement model of the latent constructs (including pattern and magnitude) holds across groups and whether the mean levels of the latent constructs hold across groups. An equivalent measurement model serves as a prerequisite for subsequent mean comparison, as it is illogical to compare means when means indicate different psychological constructs. While the present paper used a crosscultural study to demonstrate a flexible and powerful statistical examination of measurement invariance, this analytic approach could be eventually applied in any multigroup comparison, such as comparison across gender, developmental stages, social economic status, and national boundaries.