Salgado JOOP 2014 The Validity of Ipsative and Quasiã Â Â Ipsative FC Personality Inventories

797
Journal of Occupational and Organizational Psychology (2015), 88, 797–834

© 2014 The British Psychological Society
www.wileyonlinelibrary.com
The validity of ipsative and quasi-ipsative forced-

choice personality inventories for different
occupational groups: A comprehensive meta-
analysis
Jesus F. Salgado1*, Neil Anderson2 and Gabriel Tauriz1
1
Faculty of Labour Relations, University of Santiago de Compostela, Spain
2
Brunel University London, UK
A comprehensive meta-analysis of two types of forced-choice (FC) personality

inventories (ipsative and quasi-ipsative) across nine occupational groups (Clerical,
Customer Service, Health Care, Managerial, Military, Police, Sales, Skilled Manual, and
Supervisory) is reported. Quasi-ipsative measures showed substantially higher opera-
tional validity coefficients and validity generalization across all occupations than ipsative
measures. Results also showed that, compared with the findings of previous meta-
analyses, quasi-ipsative personality inventories are better predictors of job performance
than previously thought and that operational validities for ipsative measures are notably
congruent with past findings. We conclude that quasi-ipsative scale formats are superior
for predicting job performance for all occupational groups. Theoretical and practical
implications of these findings for personnel selection are discussed in Conclusion.
Practitioner points
Personality inventories have been widely used in personnel selection, but it was thought that their
predictive validity was small. We found that they are substantially more valid than was previously
thought.
The traditional opinion among researchers in I/O psychology is that single-stimulus personality
inventories (e.g., normative Likert-type scales) have superior predictive validity to FC personality
questionnaires (e.g., ipsative inventories), but our research findings suggest that this is not true for
quasi-ipsative inventories.
In comparison with ipsative and normative personality inventories, quasi-ipsative personality
inventories showed higher predictive validity regardless of occupational group.
Based on our results, we recommend the use of quasi-ipsative FC personality measures in personnel
selection decisions regardless of the occupation group being recruited for.
Several meta-analyses conducted over the last 20 years have shown that the Five-Factor
Model (FFM) of personality predicts a wide range of performance outcomes, including job
performance, training proficiency, counterproductive behaviours, accidents, job satis-
faction, leadership, and innovative behaviours in the workplace (Barrick & Mount, 1991;
Bartram, 2005; Clarke & Robertson, 2005; Feist, 1998; Hough, 1992; H€ ulsheger,
*Correspondence should be addressed to Jesus F. Salgado, Facultad de Relaciones Laborales, University of Santiago de
Compostela, Campus Vida, 15782 Santiago de Compostela, Spain (email: jesus.salgado@usc.es).
DOI:10.1111/joop.12098
20448325, 2015, 4, Downloaded from https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/joop.12098 by Universidade de Santiago de Compostela, Wiley Online Library on [14/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
798 Jesus F. Salgado et al.
Anderson, & Salgado, 2009; Hurtz & Donovan, 2000; Judge & Bono, 2001; Judge, Rodell,
Klinger, Simon, & Crawford, 2013; Salgado, 1997, 2002, 2003; Tett, Rothstein, & Jackson,
1991). Research has also demonstrated that the FFM is a robust framework for grouping
the large variety of personality measures developed within the various theoretical
approaches (Barrick & Mount, 1991; Hough, 1992; Hurtz & Donovan, 2000; Salgado,
1997; Tett et al., 1991). Across these meta-analytic efforts, conscientiousness and
emotional stability were consistently found to be predictors of job performance for all
occupations and that the other three personality dimensions were valid predictors for
specific criteria and specific occupations (Barrick, Mount, & Judge, 2001).
All the meta-analyses mentioned above were carried out almost exclusively with
validity studies that used single-stimulus (SS) personality inventories, where individuals
evaluate one item at a time (i.e., Likert scales, yes/no, true/false items). In other words,
inventories where individuals evaluate each item separately from the other items.
Respondents therefore make absolute judgments about the extent to which the item
describes their personality (Brown & Maydeu-Olivares, 2013).
Contrasting with SS inventories, another type of personality inventories used in
personnel selection are forced-choice (FC) inventories. In connection with the latter, Tett,
Christiansen, Robie, and Simonet (2011) found that 30% of companies used FC personality
inventories. Despite the widespread use of this type of inventories among practitioners,
they have received relatively little attention from researchers in comparison with SS
inventories, in part due to the fact that many of them result in ipsative scores that produce
controversial statistical dilemmas (see, for instance, Bartram, 1996; Furnham, Steele, &
Pendleton, 1993; Johnson, Wood, & Blinkhorn, 1988; Saville & Willson, 1991; Tenopyr,
1988). Only recently has the validity of FC inventories begun to be investigated meta-
analytically (e.g., Bartram, 2005, 2007; Salgado & Tauriz, 2014).
The purpose of the present study is therefore to shed light on the validity of the FC
personality inventories, more specifically, on two important issues that have generally
been overlooked in existing research. The first is to examine whether occupational group
(i.e., job type) is a moderator of the validity of the two main types of FC inventories (i.e.,
ipsative and quasi-ipsative inventories). The second issue is to compare the overall results
of the current meta-analysis of ipsative and quasi-ipsative inventories with of the validity of
SS inventories as reported in the previous meta-analyses of Barrick and Mount (1991),
Salgado (1997), and Hurtz and Donovan (2000). Our main contribution lies in highlighting
the role that these two issues play in the validity of personality inventories for predicting
job performance. A final issue examined is whether the type of score (ipsative vs. quasi-
ipsative) has effects on the validity of FC personality inventories.
Validity of the FFM for predicting job performance

In general, there is a good degree of agreement among the findings of the meta-analyses
mentioned above (De Fruyt & Salgado, 2003). To summarize the meta-analyses conducted
until the year 2000, Barrick et al. (2001) carried out a so-called second-order meta-analysis
(an integration of previous meta-analyses), and their findings serve as a robust overall
summary of the validity of the FFM for predicting performance. Globally, the findings of
Barrick et al. (2001) showed that conscientiousness was the best predictor of job
performance and that its validity generalized across occupations and criteria. Emotional
stability also showed generalized validity across criteria and occupations but its validity
size was smaller than of conscientiousness. Extraversion showed generalized validity for
managerial, sales, and police occupations and for training performance and teamwork.
Validity of forced-choiced inventories 799
Agreeableness showed validity generalization for teamwork and occupations related to

customer service and health occupations (Hurtz & Donovan, 2000; Salgado, 1997).
Finally, openness predicted training performance. The meta-analyses of the FFM
conducted between 2001 and 2013 (e.g., Dudley, Orvis, Lebiecki, & Cortina, 2006;
Judge et al., 2013; Salgado, 2003) generally confirmed the findings of Barrick et al.
(2001).
Examining the list of references of these meta-analyses, a noteworthy characteristic is
that the vast majority of the primary validity studies included in the calculations was
conducted with SS personality inventories. For example, the meta-analyses by Hogan and
Holland (2003), Hurtz and Donovan (2000), Salgado (2002, 2003), Dudley et al. (2006),
and Judge et al. (2013) do not contain any validity studies carried out with a FC inventory
and the meta-analyses by Barrick and Mount (1991) and Salgado (1997) contain a
insignificantly small number of validity coefficients of FC inventories. Therefore, it can be
concluded that practically, all the meta-analyses conducted to date have reported
estimates of the validity of the Big Five personality factors as measured with SS inventories.
Despite the impressive meta-analytic evidence mentioned above, the size of the
validity coefficients of personality measures produced some debate among leading
researchers of industrial and organizational (I/O) psychology. For example, a panel of past
and current editors of top I/O journals (Morgeson et al., 2007a,b) maintained a rather
pessimistic view of the usefulness of personality measures.
To improve the criterion validity of personality inventories, some researchers
(Bartram, 2005, 2007; Christiansen, Burns, & Montgomery, 2005; Jackson, Wroblewski,
& Ashton, 2000; see also Dipboye and Hollenbeck in Morgeson et al., 2007a) have
suggested that FC personality inventories might be more useful than traditional SS
personality inventories because the former may be more resistant to faking. Some meta-
analytic evidence appears to support this last suggestion that FC inventories are more, but
not totally, robust against faking (Nguyen & McDaniel, 2000; see also Ones, Viswesvaran,
& Reiss, 1996).
Types of FC personality inventories

Typically, the FC method gives the individual (e.g., the applicant and the rater) a number
of words or phrases, along with instructions to select the ones he or she most, or in some
cases least, likes when it is applied to the evaluated person. The number of words or
phrases may be, for instance, pairs, triads, or tetrads which are paired in terms of an index
of preference and discrimination (e.g., social desirability). Thus, the FC format can be
distinguished from SS formats (e.g., Likert, yes/no, true/false) in that a choice must be
made among the alternatives rather than rating each statement as occurs in the SS formats.
However, FC personality inventories are not a single, discrete, or unique category of
predictor measures. In fact, three types of scores can be obtained from a FC personality
inventory (Bartram, 1996, 2005; Cattell & Brennan, 1994; Clemans, 1966; Hicks, 1970):
Normative, ipsative, and quasi-ipsative (partially ipsative). For its part, all SS personality
inventories are normative measures, although their scores can be transformed into
ipsative ones.
The main characteristic of the normative FC measures is that items representing a
given bipolar scale are never paired with items representing another bipolar scale. For
example, items assessing conscientiousness are never paired with items of emotional
stability. The Myers-Briggs Type Indicator (MBTI, Myers, McCaulley, Quenk, & Hammer,
1998) is a prototypical example of a normative FC inventory. The second type refers to
those measures that totally meet Clemans’ (1966) criterion of ipsativity, according to
which ‘any score matrix is said to be ipsative when the sum of the scores obtained over the
attributes measured for each respondent is constant’. The Occupational Personality
Questionnaire (OPQ; SHL, 2006), the Edwards Personal Preferences Schedule (EPPS;
Edwards, 1957), and the Description en Cinq Dimensions (D5D; Rolland & Mogenet,
2001) are three good examples of ipsative inventories. The third type, quasi-ipsative
scores, includes measures that do not totally meet the criterion of pure ipsativity
suggested by Clemans (1966), because, for example, not all alternatives ranked by
respondents are scored or the scales have different numbers of items. The Gordon
Personal Profile-Inventory (GPP-I, Gordon, 1993) and the IPIP-MFC (Heggestad, Morrison,
Reeve, & McCloy, 2006) are two examples of a quasi-ipsative inventory.
Each score type has important metric characteristics. For example, in the case of
normative scoring, the scores of an individual are statistically dependent on other
individuals in the population and independent of other scores of the assessed individual
(e.g., scores in other attributes) (see Bartram, 1996; Clemans, 1966; Hicks, 1970).
Consequently, normative scoring allows the comparison of individuals or groups on each
measured variable (i.e., they are interindividual scores).
In the case of ipsative measurement, the scores in a variable are dependent on the level
of the individual in other variables that are assessed and statistically independent of the
scores of other individuals in the population. In others words, a high score in one attribute
(e.g., conscientiousness) is necessarily accompanied by a low score in another attribute
(e.g., extroversion) due to the statistical dependence between the two scores. Therefore,
ipsative scores allow to compare the level of the individual across variables (i.e., they are
intra-individual scores), but they may be less appropriate for comparisons among
individuals (Cattell & Brennan, 1994; Clemans, 1966; Hicks, 1970). Additionally, ipsative
scores may have some characteristics which are seen as problematic from the
psychometric point of view, including negative correlations with other measures
(Meglino & Ravlin, 1998), inflated reliability coefficients (Johnson et al., 1988; Tenopyr,
1988; see also Bartram, 1996; and Thompson, Levitov, & Miederhoff, 1982; for a different
conclusion), and being inappropriate for factor analysis (Cattell & Brennan, 1994;
Cornwell & Dunlap, 1994; Dunlap & Cornwell, 1994; Meade, 2004).
According to Hicks (1970), Horn (1971), Gleser (1972), Gordon (1993), Cattell and
Brennan (1994), and Meade (2004), among others, quasi-ipsative scores are obtained if
any of the following conditions applies: (1) the summed attributes vary between
individuals over a certain range of score units of the tests, (2) the inventory yields attribute
scores that does not sum to the same constant for all individuals, even if the scores in these
tests posses properties in common with ipsative tests, and (3) the score elevation on one
attribute does not necessarily produce a score depression on other attributes. These
conditions for data to be quasi-ipsative may be achieved by means of several strategies. For
example, the following six strategies produce quasi-ipsativization: (1) individuals only
partially order the items, rather than ordering them completely; (2) scales have different
numbers of items; (3) not all alternatives ranked by respondents are scored; (4) scales are
scored differentially for individuals with different characteristics or involve different
normative transformations on the basis of respondent characteristics; (5) scored
alternatives are weighted differentially; and (6) the questionnaire has normative sections.
In view of this, quasi-ipsative scores share some psychometric characteristics with both
normative and purely ipsative scores. For example, they allow the comparison of
individuals and groups but simultaneously some degree of dependence can be found
among the scales of the questionnaire (Horn, 1971). Quasi-ipsative measures do not have
the psychometric limitations of ipsative scores (Cattell & Brennan, 1994; Hicks, 1970;
Horn, 1971).
Meta-analytic evidence of the validity of FC personality inventories

As previously mentioned, meta-analytic evidence on the criterion-related validity of FC
personality measures is notably scant. To the best of our knowledge, only three meta-
analyses have recently been conducted and published. Bartram (2005, 2007) examined
the validity of an ipsative personality inventory, the OPQ (SHL, 2006) for predicting job
performance. He found that the OPQ predicted job performance and that the validity was
larger if the criterion measures were obtained using a FC format. Salgado and Tauriz (2014)
examined the validity of the three types of FC personality inventories for predicting job
performance and academic outcomes. In other words, they examined the moderator
effects of criterion type on the validity of FC inventories. A second issue examined was the
potential differences in validity among the three types of FC personality inventories. They
found that the FC inventories predicted both job and academic performance criteria but
that there were differences in the magnitude of the validities depending on the type of FC
measures. Quasi-ipsative personality inventories showed a remarkably larger validity than
ipsative and normative FC inventories. For example, the validity coefficients of
Conscientiousness were .40, .26, and .22 for quasi-ipsative, ipsative, and normative FC
inventories, respectively. The quasi-ipsative inventories predicted job performance better
than the other two types of FC inventories.
With regard to the validity of FC personality inventories, at least two important issues
remain unexamined. The first one is whether the occupation is a moderator of validity.
The second issue is whether SS personality inventories are more valid predictors of job
performance than FC inventories. Neither of these issues was specifically examined in the
meta-analyses of Bartram (2005, 2007) and Salgado and Tauriz (2014). Consequently,
there are important differences between the present meta-analytic effort and Salgado and
Tauriz’s (2014) meta-analysis. The first difference is that the main objective of the present
study is to examine the moderator effects of the occupational category on the validity of
ipsative and quasi-ipsative FC personality inventories. The second major objective of the
present meta-analytic effort is to compare the validity of SS personality inventories with
the validity of ipsative and quasi-ipsative inventories. This issue is important to inform
personnel practitioners over the relative validity of different types of personality
inventories.
With regard to the occupational group as a moderator of criterion validity, previous
meta-analyses of SS personality inventories unequivocally suggest that this categorization
is a potent moderator of predictive validity (e.g., Barrick & Mount, 1991; Hurtz &
Donovan, 2000; Salgado, 1997). In other words, beyond the minimum validity
generalization offered for all job types, the validity of the Big Five may be larger or
smaller depending on the type of occupation. Although this finding was obtained for SS
personality inventories, there is no compelling theoretical rationale for expecting
differential effects of occupational group upon the validity of the FC personality
inventories. However, the magnitude of the moderator effect of occupations on the
validity of FC personality inventories remains unknown. Moreover, from the practical
point of view, it is important for practitioners to know how they can make better
predictions of job performance. Practitioners are typically interested in using the best
combination of predictors for a specific job or an occupational category, rather knowing
the average validity across multiple, often unrelated, occupations. Consequently, a key
aim of the present meta-analysis is to examine and comprehensively report the moderator
effects of occupational group on the validity of FC inventories. Based on the previous
findings, we state the following hypothesis:
Hypothesis 1: The validity of personality measures will be moderated by occupational
category. More specifically, as measured by FC personality inventories,
Emotional Stability and Conscientiousness are valid predictors of job
performance in all occupational categories, Extraversion is a valid
predictor of job performance in sales and managerial occupations, and
Agreeableness is a valid predictor in customer service and health
occupations.
With regard to the validity of SS measures in comparison with FC measures, Hicks

(1970) hypothesized that validity increases as an inverse function of ipsativity. According
to Hicks, quasi-ipsative measures would be more valid predictors than purely ipsative
ones, and SS measures would be more valid than quasi-ipsative ones. From a theoretical
point of view, Cattell and Brennan (1994), and Meade (2004), among others, concurred
with Hick’s view and have suggested that SS inventories should have larger validity than
quasi-ipsative and ipsative inventories. Under normal assessment conditions, that is
conditions without pressure towards a specific response, they reasoned that SS
personality inventories will show larger validity than quasi-ipsative and ipsative FC
measures. However, there may exist conditions in which quasi-ipsative measures are
more valid than normative ones. Further, according to Hicks (1970), quasi-ipsative
measures can be more valid in situations (e.g., personnel selection) in which resistance to
faking and resistance to acquiescence bias might be an influential factor. In these
situations, potential resistance to faking and acquiescence responses to quasi-ipsative
measures can be successfully exploited without the statistical limitations of ipsativity.
Nevertheless, research into this question remains scant and findings are notably
inconclusive (see, for instance, Converse et al., 2008; Heggestad et al., 2006; Jackson
et al., 2000). At present, no meta-analysis has tested the potential difference between SS
and ipsative and quasi-ipsative inventories in an empirical way, even though it is clearly
important to know whether the predictive validity of FC personality inventories is smaller,
equal, or larger than the validity of the SS inventories. Consequently, based on the view by
Clemans, Hicks, Cattell and Brennan, Meade, among other researchers, we state the
following two hypotheses:
Hypothesis 2: The validity of SS inventories will be larger than the validity of quasi-
ipsative and ipsative inventories.
Hypothesis 3: Quasi-ipsative personality measures will show larger criterion validity

than purely ipsative formats.
Aims of the study

Consequently, the overall objective of this investigation is to extend past validity
generalization research conducted mainly with SS measures and to estimate the
operational validity and the true validity of the FFM as assessed with FC personality
inventories across a variety of occupational categories, including a broad range of
occupations. The second objective is to confirm whether, based on predictive validity, SS
personality inventories should be preferred to the FC inventories or whether these latter

inventories can be effectively used as an alternative or even a supplement to SS
inventories.
This meta-analysis therefore has four important characteristics. First, it centres on the
validity of FFM personality measures for predicting job performance. Second, it focuses on
the validity of FC personality inventories. As suggested in the psychometric literature
mentioned above, the three formats of FC inventories represent a remarkably different set
of psychometric properties that can have important effects on the predictive validity of
personality inventories. Therefore, the type of FC inventory will serve to classify the
validity coefficients. Although, our initial intention was to compare the respective validity
of the three types of FC inventories, that is normative, ipsative, and quasi-ipsative, the
small number of validity studies which have used normative FC inventories precludes
the use of this category for comparative purposes (however, we include the results for the
normative FC personality inventories in the Appendix B for information). Third, we
examine the moderator role of occupational category on the validity coefficients. The
fourth important characteristic is that we compare the overall validity estimates of this
meta-analysis with those of the three independent meta-analyses carried out with SS
personality inventories, including the meta-analyses of Barrick and Mount (1991), Salgado
(1997), and Hurtz and Donovan (2000).
Method
Literature search and coding of studies
Computer-based and manual literature searches were conducted to identify published
and unpublished studies carried out up until 2012. To cover the literature on FC
personality measures as comprehensively as possible, and to prevent any bias in the
inclusion of studies, we adopted a series of search strategies. First, we identified the most
popular FC inventories. They included, for example, the OPQ, EPPS, MBTI, D5D, Survey of
Interpersonal Values Inventory (SIV), and GPP-I. Second, the PsycInfo, Social Sciences
Citation Index, and ABI/Inform databases were searched to identify studies on the
relationship between FC measures and organizational criteria. Several keywords were
used for the computer-based literature search (e.g., ipsative, FC, ipsativity, job
performance), as well as the acronyms of the most popular FC personality inventories.
Third, Internet searches using Google were carried out systematically to look for articles,
unpublished manuscripts, and master’s and doctoral dissertations not included in the
most common databases. Fourth, a manual article-by-article search was carried out in a
number of top-tier journals, including Applied Psychology, an International Review,
Educational and Psychological Measurement, the European Journal of Work and
Organizational Psychology, Human Performance, International Journal of Selection
and Assessment, Journal of Applied Psychology, Journal of Business and Psychology,
Journal of Occupational and Organizational Psychology, Journal of Organizational
Behavior, Personnel Psychology. Fifth, the reference sections of classic meta-analyses
were reviewed to identify articles not covered in our computer-based search. Sixth, we
contacted a number of researchers and asked for both published articles and unpublished
papers on the topic. Seventh, the technical manual of the most popular FC personality
inventories was examined to find validity coefficients. By means of these search strategies,
a preliminary database of 115 documents (i.e., articles, manuals, technical reports,
unpublished papers, dissertations, and so on) was established for further inspection. Of
these, 26 studies were excluded for various reasons: (1) some studies reported only the
significant correlations, (2) a number of studies only reported multiple correlation results,
(3) several of them did not report correlations or enough information to calculate the
effect size, and (4) several studies reported findings for the same data set. As a result of
these points, the present meta-analysis was conducted with 97 independent samples and a
total sample size of 18,593 individuals.
The next step was to classify the scales from the inventories into the Big Five
personality dimensions. A number of studies used a Big Five measure or estimates of the
Big Five (e.g., Bartram, 2007; McDaniel, Yost, Ludwick, Hense, & Hartman, 2004; Nyfield,
Gibbons, Baron, & Robertson, 1995; Robertson, Baron, Gibbons, MacIver, & Nyfield,
2000; SHL, 2006; Warr, Bartram, & Brown, 2005), and the coefficients of these studies
were used directly. With the rest of the studies, we used the following method for
classifying the coefficients within the FFM. First, an exhaustive description of the Big Five
was written and given to the coders (based on the definitions of the Big Five given by
Barrick & Mount, 1991; Hough, 1992; Hough & Ones, 2001; McCrae & Costa, 1990;
Salgado, 1997; among other sources). Next, a list and the definition of the personality
scales from each questionnaire were provided for each coder, with instructions to assign
each scale to the most appropriate factor. Furthermore, some studies reporting factor
analyses of the questionnaires were also used as a basis for the decision (e.g., Matthews,
Stanton, Graham, & Brimelow, 1990; McCrae & Costa, 1990; Piedmont, Costa, & McCrae,
1992; SHL, 2006) because these factor analyses were informative about the Five-Factor
structure of some FC inventories (e.g., OPQ, EPPS). Finally, we also checked the coding
list used by Ones (1993; Hough & Ones, 2001) and Salgado (2003). Two researchers
served as coders, working independently to code every study. One of them was a full
professor of work psychology with extensive experience in the area of personality at work
and meta-analysis. The second coder was a PhD student who is conducting research on
this topic. If the coders agreed on a dimension, the scale was coded in that dimension. The
disagreements (<10%) were solved by a discussion until the coders agreed on a dimension.
All the scales were assigned to a single dimension.
For each study, the following information was recorded, if available: (1) sample
characteristics, (2) occupation and related information, (3) personality measures used, (4)
criterion type, (5) reliability of personality measures, (6) criterion reliability, (7) range
restriction (RR) value or data for calculating this value, (8) statistics concerning the
relation between personality measures and criterion, and (9) correlation among the
personality measures when more than one was used. When a study contained conceptual
replications (i.e., two or more measures of the same construct were used in the same
sample), linear composites with unit weights for the components were formed. Linear
composites provide more construct valid estimates than the use of the average
correlation. As demonstrated by Warr, Bartram, and Brown (2005), the average validity
corrected with Mosier’s formula for composite reliability produces very accurate
estimates when the appropriate intercorrelations are used.
An important difference between this meta-analysis and previous meta-analyses is that
we used the type of FC measure (i.e., normative, ipsative, and quasi-ipsative) for grouping
the validity coefficients. This is especially relevant because different degrees of ipsativity
can result in different validity levels (Clemans, 1966; Hicks, 1970; Radcliffe, 1963). Using
Hicks’ (1970) taxonomy, we divided the personality inventories into three categories:
Purely ipsative, quasi-ipsative, and normative FC questionnaires. To classify the
questionnaires, each one was inspected in terms of the scoring method and the format
of items. Furthermore, we used the technical manuals of the questionnaires when
available, and other articles which were not relevant for this meta-analysis because they
did not include validity data but did include relevant information about the questionnaire
characteristics and scoring system. The initial agreement level of the coders was 95%, and
the disagreements were solved by a discussion until the coders agreed on a questionnaire
category.
We classified the questionnaires as being purely ipsative if the sum of the scores
obtained over the scales measures was constant. Examples of purely ipsative question-
naires in our database are the EPPS, the OPQ, and the D5D.
We classified FC inventories as quasi-ipsative if any of the strategies and alternatives
mentioned by Hicks (1970) applied. Examples of quasi-ipsative questionnaires in our
database are the GPP-I, the Self-Description Inventory (Ghiselli, 1954), the ESQ (Jackson,
2002), and Assessment Individual Motivation (AIM; Knapp, Heggestad, & Young, 2004;
White, 2002).
Next, we classified a questionnaire as normative FC if it yielded scores which posses
the empirical properties of absolute measures. For example, this is the case of the
inventories in which items representing different degrees of a personality dimension are
never paired with items representing another personality dimension. The MBTI and the
questionnaire of Need of Achievement (Fineman, 1975) are representative examples of
normative FC questionnaires. The list of the FC inventories used in the validity studies
included in the database can be obtained from the first author upon request.
Finally, we examined whether the nature of studies (published vs. unpublished) and
publication year were potential moderators of the validity size. With regard to the nature
of the studies, comparing the overall results of ipsative measures, we found the following
results for published and unpublished studies: ES (.02 vs. 05), E (.06 vs. 06), O (.06 vs. 09),
A (.00 vs. .09), and C (.09 vs. 11). With regard to the quasi-ipsative studies, the results for
published and unpublished validities were: ES (.06 vs. .07), E (.14 vs. .14), A (.06 vs. .01),
and C (.16 vs. 14). Therefore, we can confidently conclude that the nature (published/
unpublished) of the studies has not practical effects on the overall results. With regard to
the second potential moderator, the publication year, we examined the correlation
between the validity size and the publication year using the totality of coefficients. We
found a very small correlation of .10, which indicates that the publication year explain
only 1% of the variance of the validities studies. We have also calculated the correlation
between the publication year and the validity for each personality dimension. The
correlations among the year and the validity size for the Big Five were .2, .15, .12, .18,
and .08 for ES, E, O, A, and C, respectively. In this last case, the average correlation was
.06. Therefore, our conclusion is that the publication year can be discarded as a
moderator of the validity.
Meta-analytic procedure
The following step was to apply the psychometric meta-analysis of Hunter and Schmidt
(1990, 2004). Psychometric meta-analysis estimates how much of the observed variance
of findings across studies is due to artefactual errors. The artefacts considered here were
sampling error, criterion and predictor reliability, and indirect range restriction (IRR) in
personality scores. To correct the observed validity for these last three artefacts, the most
common strategy was to develop specific distributions for each of them. Some of these
artefacts reduce the correlations below their operational value (e.g., criterion reliability
and RR), and all of them produce artefactual variability in the observed validity (Carreta &
Ree, 2001; Carretta & Ree, 2000). In this meta-analysis, we report the weighted average of
observed validity, the variance and SD of observed validity, the operational validity, the
theoretical validity, the SD of the theoretical validity, and the percentage of variance
accounted for by artefactual errors. We also calculated the 90% credibility value (lower
limit of the 80% credibility interval) and the 95% confidence interval of q. Both intervals
serve different purposes (Hunter & Schmidt, 2004; Judge & Bono, 2001). The confidence
interval provides an estimate of the variability of q that is if the 95% confidence interval
does not include zero then it can be 95% confident that q is nonzero. The credibility
interval is the estimate of the variability of individual correlations across studies, and its
lower limit means that 90% of individual correlations are equal to or greater than this value.
Predictor reliability
The reliability of the personality dimensions was estimated from the coefficients reported
in the studies included in the meta-analysis. As in previous meta-analyses, we used internal
consistency coefficients as estimates of reliability and a reliability distribution was
estimated for each personality dimension. We also examined the question of whether the
three types of FC scoring systems showed different levels of reliability, but they proved to
be very similar. For example, the average reliability estimates were .73, .81, and .81 for
normative, ipsative, and quasi-ipsative FC inventories. Based on the fact that ipsative and
quasi-ipsative inventories have the same average reliability, we pooled the coefficients for
these two formats, we create a distribution for each personality dimensions, and we used
these distributions in the meta-analysis. Table 1 presents a summary of these artefact
distributions. Our estimates of the average reliability for the three types of the FC
inventories are very similar to the estimates used in previous meta-analyses with SS
personality inventories (e.g., Barrick & Mount, 1991; Hurtz & Donovan, 2000; Salgado,
1997, 2002, 2003). For example, our figures are very similar to the estimates used the
values found by Viswesvaran and Ones (2000) for the reliability generalization study of the
FFM. Consequently, the empirical evidence we were able to find appears to suggest that
the internal consistency of FC and SS personality inventories is practically the same. This is
an important finding as some researchers had suggested that FC inventories would show a
different reliability size than SS inventories, with some authors claiming that in these cases
the reliability is overestimated (inflated) (Johnson et al., 1988; Tenopyr, 1988). Our
findings agree with Bartram’s (1996) perspective on the reliability of ipsative measures.
However, it should be taken into account that, to be totally conclusive about whether SS
and FC personality inventories have equal internal consistency, further studies are
needed, specifically comparing the number of items. Typically, FC inventories have a very
much larger number of items than SS inventories. The mean and the SD of reliability
distributions served different purposes in this meta-analysis. Reliability mean served for
correcting the observed validity coefficients, and the SD served for estimating the
corrected standard deviations of the operational validity and q.
Table 1. Distributions of reliability of Big Five personality dimensions
Personality dimension K Mean SD Min–Max
Emotional stability 12 0.73 0.09 0.52–0.81

Extraversion 8 0.75 0.13 0.47–0.84
Openness to experience 6 0.81 0.12 0.60–0.88
Agreeableness 9 0.80 0.08 0.70–0.89
Conscientiousness 13 0.72 0.11 0.53–0.92
Criterion reliability
The studies included in our database used four types of measures of job performance: (1)
job performance ratings, (2) productivity data (e.g., sales), (3) training proficiency, and (4)
overall job performance. From the literature (e.g., Hunter, 1986; Natham & Alexander,
1988; Ones, Viswesvaran, & Schmidt, 1993; Salgado & Moscoso, 1996; Salgado et al.,
2003; Schmidt & Zimmerman, 2004; Viswesvaran, Ones, & Schmidt, 1996), it is well
known that each type of job performance measure shows a different degree of reliability.
For example, Viswesvaran et al. (1996) found that the average inter-rater reliability of job
performance ratings was .52. Salgado et al. (2003) found exactly the same value with an
independent database. The reliability of productivity data was .80 in the meta-analysis by
Schmidt and Zimmerman (2004). For training proficiency, Salgado et al. (2003) found an
average reliability of .60 for the ratings of training performance and Hunter (1986)
reported a reliability of .80 for objective measures of training proficiency. In the current
study, the number of studies does not allow the calculation of separate meta-analyses for a
triple combination (FC type measure 9 occupational group 9 job performance measure
type) and not all studies provided information regarding the job performance reliability.
Therefore, we develop an empirical distribution for the job performance criterion for each
FC type measure 9 occupational group combination. To do this, we collapsed all studies
in a single job performance measure and estimated the average reliability for the total. If
the study used job performance ratings, the coefficient of interest was inter-rater reliability
when a meta-analysis of random effects is performed (Hunter, 1986; Sackett, 2003;
Schmidt & Hunter, 1996) because, if this type of reliability is used in the correction for
attenuation, it will correct most of the unsystematic errors in supervisor ratings (Hunter &
Hirsh, 1987), although not all researchers agree with this point of view (e.g., Murphy & De
Shon, 2000). We found 11 studies reporting inter-rater coefficients of job performance
ratings. The average coefficient was .52 (SD = 0.05) which is exactly the figure found by
Viswesvaran et al. (1996). For the studies using training success, we found two studies
reporting reliability. The average coefficient was .80 (SD = 0.09), which corresponds
with the estimate of Hunter (1986). For the studies using objective productivity measures,
seven studies reported reliability coefficients and the average reliability was .83, which is
practically the same as that found by Schmidt and Zimmerman (2004). Pooling together
the reliability coefficients and weighting for the number of studies using each criterion
type, we calculated the average reliability coefficient for job performance. This was .61
(SD = 0.13). Next, we estimated the criterion reliability to be used for each FC type
measure 9 occupational group combination. To do this, we used the number of studies
and the type of performance measures to calculate the average reliability for within each
combination. These estimates appear in Table 2. As a whole, the average reliability
estimates of job performance were .62, .58, and .59 for the ipsative, quasi-ipsative, and
normative personality measures. These values are in line with the meta-analyses of the
inter-rater reliability in existing personnel selection literature (i.e., Salgado & Moscoso,
1996; Salgado et al., 2003; Viswesvaran et al., 1996). Therefore, the differences in job
performance reliability across the types of FC inventories are minimal and without
practical effects.
RR distributions
The distributions for RR were based on the following three strategies: (a) some RR
values were obtained from the studies that reported both restricted and unrestricted
standard deviation data, (b) a second group of RR values was obtained using the
Table 2. Distributions of performance reliability for occupational categories
Ipsative Quasi-ipsative Normative FC
Occupation Mean SD Mean SD Mean SD
Clerical – – 0.52 0.05 – –

Customer service 0.56 0.12 0.60 0.11 – –
Health care – – 0.50 0.08 – –
Managerial 0.57 0.12 0.65 0.12 0.59 0.11
Military – – 0.68 0.11 – –
Police 0.61 0.15 – – – –
Sales 0.82 0.08 0.60 0.15 – –
Skilled – – 0.60 0.11 – –
Supervisory 0.52 0.05 0.52 0.05 – –
Note. FC, forced-choice.
reported selection ratio (we applied the formula derived by Schmidt, Hunter, & Urry,
1976), and (c) another group of RR values were obtained using the SD reported in
the studies (SD restricted) and as SD unrestricted, we used the SD reported in the
manual of the specific inventory. This last strategy of using national norms of SD is
warranted, as Ones and Viswesvaran (2003) found that the SDs of the personality
measures of job applicants are about 2–9% less than those based on the national
norms. In the present study, the number of RR values based on the national norms is
small. We have obtained four norm-based RR values for EX, O, A, and C, and 5 values
for ES. Additionally, we have obtained 19 SR-based RR values for ES and A, 20 for EX,
16 for O, and 25 for C. Next, we compared the average RR values using national
norms with the average RR values obtained with the strategies ‘a’ and ‘b’ mentioned
above, and we did not find statistically significant differences. In the present case,
the differences were 6–10% less than those based on the national norms. Therefore,
the triple strategy produced a large number of RR estimates, and these were grouped
according to the personality dimensions. The average RRs (u) were .87 for emotional
stability, .90 for extraversion, .92 for openness to experience, .90 for agreeableness,
and .88 for conscientiousness. These RR values are very similar to the figures used in
previous meta-analysis (e.g., Barrick & Mount, 1991; Hurtz & Donovan, 2000;
Salgado, 1997, 2003) and are in accordance with the observation by Schmidt, Shaffer,
and Oh (2008) that the RR of personality measures is remarkably smaller than the RR
found in the validity studies of cognitive ability tests. A summary of these
distributions appears in Table 3.
Table 3. Distributions of range restriction for Big Five dimensions (u values)
Personality dimension K Mean SD Min Max
Emotional stability 28 0.87 0.16 0.48 1.04

Extraversion 28 0.90 0.14 0.48 1.03
Openness to experience 14 0.92 0.13 0.74 1.02
Agreeableness 26 0.90 0.14 0.48 1.08
Conscientiousness 30 0.88 0.17 0.67 1.09
Meta-analytic software
We used a software program developed by Schmidt and Le (2004). This is the only
available software which includes recent advances to correct for IRR. According to
Hunter, Schmidt, and Le (2006), IRR is the most common case of RR and it is present in all
concurrent validity studies and in practically all predictive validity studies conducted in
personnel selection (some studies in military selection research are the exception).
Consequently, correction for direct range restriction (DRR), rather than IRR, results in an
underestimation of the operational and true validity coefficients and in an overestimation
of the true variance. In a series of studies, Schmidt and his colleagues have demonstrated
the effects of IRR correction on the validity of the Big Five (Schmidt, Oh, & Le, 2006;
Schmidt et al., 2008). They found that IRR produces slightly larger validity sizes in the case
of personality measures.
We are interested in the relationship between the Big Five and performance, both as
theoretical constructs and as operational predictors, and therefore, we will report both
the operational validity and the true correlation. In summary, we will correct the observed
validity for criterion reliability and IRR for obtaining the operational validity, and for
predictor unreliability for obtaining the true correlation. The observed variance will be
corrected for by artefactual errors: Sampling error, criterion and predictor reliability, and
IRR.
Results
We carried out two groups of meta-analyses, one for ipsative personality inventories and
another for quasi-ipsative FC personality measures. We also conducted a third group of
meta-analyses with the studies that used normative FC personality inventories. However,
due to the fact that the number of available studies for this category is very small, the
findings can only serve informative purposes. The results for the normative FC can be
consulted in Appendix B. The results of the meta-analyses of personality–occupation
combinations appear in Tables 4 and 5. The validity coefficients were pooled across
personality dimensions and occupations. In the two tables, from left to right, the first four
columns represent the number of independent coefficients (K), the total simple size (N),
the observed validity weighted by the study simple size (rw), and the standard deviation of
the observed validity (SDr). The next three columns show the observed validity corrected
for the measurement error in criterion and IRR in predictor (operational validity, rc), the
full-corrected correlation (true validity, q), and the standard deviation of q. Finally, the last
four columns are the percentage of variance explained by the four artefactual errors
(sampling error, predictor reliability, criterion reliability, and IRR), the 90% credibility
value (90%CV), and the lower and upper limits of the 95% confidence interval of q. We
reported the rc and q in this table and in the next tables because they serve different
purposes. Operational validity is the coefficient used for predicting the criterion in
applied settings (e.g., making decisions about employees). True validity is the theoretical
correlation between the personality dimension and the criterion. This coefficient is used
for modelling the theoretical relationship between dependent and independent variables.
Although we are interested in both coefficients, we will concentrate in q in our comments.
We were able to create nine occupational categories, using the Dictionary of
Occupational Titles (D.O.T.) (USES, 1991) as a reference. However, not all were available
for the three types of FC measures. In the following two subsections, we comment on the
results separately for ipsative and quasi-ipsative FC personality measures.
Table 4. Results of the meta-analyses of the validity of ipsative measures of Big Five for occupational
groups
Personality
dimension K N rw SDr rc qi SDq %VE 90%CV CIU CIL
Emotional stability
Customer service 2 344 .04 0.03 .06 .07 0.00 100 .07 .18 .04
Managerial 21 4,222 .04 0.08 .07 .08 0.09 68 .03 .11 .05
Police 3 700 .04 0.05 .07 .08 0.00 100 .08 .15 .01
Sales occupations 4 401 .02 0.03 .03 .03 0.00 100 .03 .07 .13
Supervisory 4 423 .01 0.04 .01 .01 0.10 100 .01 .11 .09
Mean (across .03 0.07 .06 .07 0.07 94 .02 .10 .04
occupations)
Extraversion
Managerial 20 4,154 .09 0.09 .14 .15 0.08 68 .04 .18 .12
Police 3 700 .02 0.07 .03 .04 0.02 96 .01 .11 .03
Supervisory 4 423 .04 0.08 .07 .07 0.00 100 .07 .03 .17
Mean (across .07 0.08 .11 .12 0.06 93 .04 .14 .10
occupations)
Openness to experience
Managerial 20 4,154 .05 0.08 .07 .08 0.07 72 .01 .11 .05
Police 3 700 .08 0.14 .11 .12 0.20 22 .13 .19 .05
Supervisory 4 423 .01 0.06 .02 .02 0.00 100 .02 .12 .08
Mean (across .05 0.09 .06 .07 0.08 68 .02 .10 .04
occupations)
Agreeableness
Managerial 20 4,154 .00 0.09 .00 .00 0.09 60 .11 .03 .03
Police 2 475 .03 0.02 .04 .05 0.00 100 .05 .14 .04
Supervisory 4 423 .06 0.09 .09 .10 0.00 100 .10 .19 .01
Mean (across .02 0.09 .02 .03 0.08 78 .02 .06 .00
occupations)
Conscientiousness
Managerial 21 4,401 .08 0.10 .13 .14 0.12 54 .01 .17 .11
Police 3 700 .07 0.12 .12 .13 0.11 58 .00 .20 .06
Supervisory 4 423 .08 0.09 .14 .16 0.00 100 .16 .25 .07
Mean (across .10 0.09 .13 .14 0.09 82 .02 .16 .12
occupations)
Note. K, number of independent samples; N, total sample size; rw, observed validity; SDr, standard
deviation of observed validity; rc, operational validity; qi, validity corrected for criterion reliability and
indirect range restriction in predictor; SDq, standard deviation of qi; %VE, percentage of variance
accounted for by artefactual errors; 90%CV, 90% credibility value based on q; CIU, upper limit of the 95%
confidence interval of q; CIL, lower limit of the 95% confidence interval of q.
Table 5. Results of the meta-analyses of the validity of quasi-ipsative measures of Big Five for occupational groups
Personality dimension K N rw SDr rc qi SDq %VE 90%CV CIU CIL
Emotional stability
Health 4 803 .05 0.20 .08 .09 0.35 13 .36 .16 .02
Managerial 10 1,528 .01 0.10 .02 .02 0.10 69 .10 .03 .07
Military 9 2,799 .09 0.08 .14 .16 0.08 60 .05 .20 .12
Skilled 5 1,350 .30 0.19 .46 .51 0.26 21 .18 .55 .47
Supervisory 3 171 .37 0.07 .61 .68 0.00 100 .68 .76 .60
Mean (across occupations) .11 0.12 .17 .20 0.13 54 .03 .22 .18
Extraversion
Clerical 2 288 .14 0.21 .23 .25 0.33 17 .18 .36 .14
Health 3 773 .10 0.11 .16 .18 0.16 36 .03 .25 .11
Managerial 11 1,685 .21 0.05 .31 .34 0.00 100 .34 .38 .30
Military 10 3,007 .07 0.10 .10 .11 0.12 37 .05 .15 .07
Skilled 4 1,018 .17 0.13 .25 .28 0.17 32 .06 .34 .22
Openness to experience
Clerical 2 288 .27 0.24 .41 .44 0.34 14 .01 .53 .35
Health 3 773 .17 0.04 .27 .29 0.00 100 .29 .35 .23
Managerial 8 1,085 .21 0.06 .29 .32 0.00 100 .32 .37 .27
Military 6 748 .15 0.05 .21 .23 0.00 100 .23 .30 .16
Agreeableness
Clerical 2 288 .25 0.01 .40 .44 0.00 100 .44 .53 .25
Health 3 773 .17 0.07 .28 .31 0.00 100 .31 .37 .25
Managerial 8 1,085 .04 0.14 .06 .07 0.17 38 .15 .13 .01
Military 8 1,760 .05 0.12 .07 .07 0.07 32 .11 .12 .02
Validity of forced-choiced inventories
Continued
811
812
Table 5. (Continued)
Jesus F. Salgado et al.
Skilled 2 796 .28 0.03 .42 .45 0.00 100 .45 .51 .39
Conscientiousness
Clerical 3 357 .16 0.09 .28 .31 0.00 100 .31 .40 .22
Health 3 773 .21 0.09 .35 .40 0.05 88 .33 .46 .34
Managerial 9 1,132 .10 0.09 .15 .17 0.00 100 .17 .23 .11
Military 10 3,007 .12 0.10 .19 .21 0.12 46 .05 .24 .18
Skilled 8 2,338 .43 0.09 .64 .71 0.00 100 .71 .73 .69
Supervisory 3 171 .09 0.10 .16 .18 0.00 100 .18 .33 .03
Note. K, number of independent samples; N, total sample size; rw, observed validity; SDr, standard deviation of observed validity; rc, operational validity; qi, validity
corrected for criterion reliability and indirect range restriction in predictor; SDq, standard deviation of qi; %VE, percentage of variance accounted for by artefactual
errors; 90%CV, 90% credibility value based on q; CIU, upper limit of the 95% confidence interval of q; CIL, lower limit of the 95% confidence interval of q.
Meta-analysis of personality–occupation combinations for ipsative FC measures

Table 4 reports the results for the combination of the Big Five-occupation for the ipsative
personality inventories. We found studies for five occupational groups, including
customer service, managerial, police, sales occupations, and supervisor. As posited by
Hypothesis 1, conscientiousness proved to be a valid predictor for the five occupational
groups, with validity coefficients ranging from .13 for police occupations to .27 for sales
occupations. The last finding is consistent with Hypothesis 1, as the occupational group
was a moderator of validity. We found evidence of validity generalization for three of five
occupations, as the 90% CV was positive and nonzero. The average q across occupations
was .14, and 82% of variability was accounted for by artefactual errors.
In comparison with conscientiousness, the validities for emotional stability were
smaller, ranging from .03 to .08. The average q was .07, and 94% of variance was
explained by artefactual errors. The q values ranged from .03 to .08 for the various
occupational groups. This finding provides some support for Hypothesis 1.
As hypothesized, extraversion showed a small but positive validity for managerial,
customer service, and sales occupations. Furthermore, in all three cases, extraversion
showed validity generalization. The average validity of ipsative measures of extraversion
for the occupational groups mentioned in Hypothesis 1 (i.e., customer service,
managerial, sales, and supervisor) was .12. The average q for the totality of occupations
was also .12, and 93% of variance was accounted for by artefactual errors. The q values
ranged from .07 to .15, supporting Hypothesis 1 concerning the moderator effects of the
occupation.
An unexpected but very relevant result was found for agreeableness for predicting
job performance in sales jobs. Previous meta-analyses by Barrick and Mount (1991),
Salgado (1997), and Hurtz and Donovan (2000) found coefficients of .00, .01, and .05,
respectively. We found a validity coefficient of .25 and the 90%CV was .07 which
means that there is validity generalization in this case. Based on this result, less
agreeable individuals will show better job performance in sales occupations. One
potential explanation of this result could lie in one of the characteristics of ipsative
scores: The fact that they produce dependence between attributes. As mentioned
above, if, for instance, the individual has to choose between two alternatives, C and A,
and the individual chooses C and this option is correlated with job performance, then A
would also be correlated with job performance but with a negative correlation sign. In
our view, this conjecture is plausible in the present case. The q values of agreeableness
ranged from .21 to .05, which supported Hypothesis 1. The average validity of
agreeableness was .03.
The validity of openness to experience ranged from .03 for sales occupations to .12
for police. This result supported Hypothesis 1. Openness did not prove to be a predictor of
job performance for any of the occupational groups included in the meta-analysis. Its
overall validity was .08.
Meta-analysis of personality–occupation combinations for quasi-ipsative FC measures

The results for the combination of the Big Five and occupations for the quasi-ipsative
personality measures appear in Table 5. We found validity studies for eight occupational
categories, including clerical, customer service, health care, managerial, military, sales
jobs, skilled, and supervisor occupations.
In the case of quasi-ipsative measures, conscientiousness was a consistently valid
predictor for the eight occupations, with validity coefficients ranging from .17 for
managerial occupations to .71 for skilled labour occupations. All the 90%CVs were
positive and very different from zero, which is supporting evidence of validity
generalization for predicting job performance. On average, the validity of conscien-
tiousness was .38, a coefficient remarkably larger than the coefficients found in
previous meta-analyses. The validity for clerical, customer service, health care, sales
occupations, and skilled labour occupations was especially relevant as they were larger
than .30 in all cases. These results fully supported Hypothesis 1. Compared to the
average validity of ipsative measures of conscientiousness, the average validity of quasi-
ipsative inventories is nearly three times larger (.38 vs. .14). This finding fully
supported Hypothesis 2.
Emotional stability was a valid predictor of job performance for military, sales, skilled,
and supervisor occupations and showed validity generalization for these four occupa-
tions. The validity of emotional stability ranged from .02 to .68, which is consistent with
Hypothesis 1. The average q was .20, which is practically three times larger than the
validity of ipsative inventories of this personality dimension. Therefore, Hypothesis 2 was
also confirmed for emotional stability.
With regard to extraversion, this was a valid predictor for managerial and skilled
labour occupations and showed validity generalization in both cases. Extraversion
also showed a small but relevant validity coefficient for clerical and health care
occupations, but there was no evidence supporting validity generalization in these
cases, as the 90%CVs were negative. The values of q ranged from .28 (skilled
labour occupations) to .34 (managerial occupations), which is consistent with
Hypothesis 1. The average validity of the quasi-ipsative inventories was .12, which is
the same value found for the ipsative inventories. The average validity for the
occupational groups mentioned in Hypothesis 1 (i.e., managerial and sales) was .28.
Comparing this last value with the validity of the ipsative measures for the same
occupations, the quasi-ipsative inventories showed more than twice the validity of
the ipsative inventories. Therefore, Hypothesis 2 was also supported for extraver-
sion.
No hypothesis was advanced for openness to experience with regard to specific
occupational groups. However, we found that openness consistently predicted job
performance across all occupations, with coefficients ranging from .44 to .32. Openness
predicted job performance negatively for clerical occupations and positively for health
care, managerial, military, and sales occupations. These results confirmed Hypothesis 1
about the moderator effects of the occupational group on the validity of personality
measures. Overall, the validity of the quasi-ipsative measures of openness was .20, which
is about three times larger than the validity of the ipsative measures. This finding
supported Hypothesis 2.
Finally, agreeableness was a valid predictor of job performance for clerical, health
care, and skilled labour occupations, with coefficients ranging from .31 to .45. The
result for health care occupations was predicted by Hypothesis 1. Agreeableness did
not predict job performance for managerial and sales occupations, which is
consistent with the findings of the meta-analyses by Barrick and Mount (1991),
Salgado (1997), and Hurtz and Donovan (2000). The validity coefficients ranged from
.07 to .45, supporting Hypothesis 1. The average validity was .16 for the totality of
the occupational groups. This value clearly contrasts with the average validity of
.03 found for the ipsative inventories. Consequently, Hypothesis 2 was also
confirmed for agreeableness.
Comparison among the validity of SS and FC personality inventories

Table 6 reports the overall results of three classic (and independent) meta-analyses
carried out with SS personality inventories, and the overall results for ipsative and quasi-
ipsative inventories reported in Tables 4 and 5. The meta-analyses by Barrick and Mount
(1991), Salgado (1997), and Hurtz and Donovan (2000) were carried out with three non-
overlapping databases consisting of a large number of studies on the validity of the FFM.
The studies included in these three meta-analyses mainly used SS personality inventories
(95% of the total reported studies). Therefore, the weighted-sample average validity for
these three meta-analyses can serve as a good estimate of the validity of the Big Five as
assessed with SS personality inventories.
As can be seen in Table 6, the average validity of SS inventories was .11 for emotional
stability, .12 for extraversion, .05 for openness, .08 for agreeableness, and .22 for
conscientiousness. With regard to the ipsative personality inventories, the average validity
was .07 for emotional stability, .12 for extraversion, .07 for openness, .03 for
agreeableness, and .14 for conscientiousness. Comparing the validity of the SS personality
inventories and the validity of ipsative inventories, the first type showed larger validity
than the second in four of the five factors, especially in conscientiousness (.22 vs. .14).
Therefore, Hypothesis 3 was supported with regard to these FFM dimensions.
With regard to the quasi-ipsative inventories, the average validity was .20 for emotional
stability, .12 for extraversion, .22 for openness, .16 for agreeableness, and .38 for
conscientiousness. In this case, the comparison of the validity of the SS inventories with
the quasi-ipsative inventories shows the opposite difference. The quasi-ipsative inven-
tories showed greater validity than the SS inventories for emotional stability, openness,
agreeableness, and conscientiousness, and similar for extraversion. Particularly relevant
was the validity difference for conscientiousness (.22 vs. .38), for openness (.05 vs. .22),
and for emotional stability (.11 vs. 20). Therefore, our findings of the comparisons
between quasi-ipsative versus SS inventories do not in fact support Hypothesis 3.
Conversely, the results appear to be in concordance with Hick’s (1970) suggestion that in
some circumstances (e.g., personnel selection), quasi-ipsative inventories can show larger
validity than SS inventories. It is necessary to acknowledge that the validity of SS
inventories was corrected for DRR, whereas the validity of quasi-ipsative inventories was
corrected for IRR. However, this difference does not explain the magnitude of the validity
differences between both types of inventories. Recently, Schmidt et al. (2008), p. 842)
estimated the effects of moving from DRR to IRR corrections on the validity of the
personality dimensions. They found that this move increases the average validity by 5.51%
for conscientiousness and 5.98% for emotional stability. In other words, the increase is
very small and the estimates remain practically the same.
Table 7 presents the overall results for the validity of SS and quasi-ipsative inventories
for three occupational groups, including managerial, sales, and skilled jobs. These are the
three occupations with validity estimates for both SS and quasi-ipsative inventories. In
general, the number of coefficients and the total sample size is small for the quasi-ipsative
inventories. For this reason, the comparisons with the SS validities should be taken with
caution as the second-order sampling error can affect the estimates of quasi-ipsative
inventories.
With regard to managerial occupations, the quasi-ipsative inventories showed
remarkably larger validities for extraversion (.17 vs. .34) and openness (.07 vs. .32),
while the SS inventories showed a slightly larger validity for conscientiousness (.21 vs.
.17). In the case of sales occupations, the quasi-ipsative inventories showed a remarkably
larger validity for emotional stability (.06 vs. .21), openness (.00 vs. .17), and, especially,
816
Table 6. Comparison among the meta-analyses of the validity of SS and FC personality inventories
B&M-91 SAL-97 H&D-00 Average-SS FC-QI FC-IP
Variable K N q K N q K N q K N q K N q K N q
ES 124 19,507 .08 32 3,877 .19 37 5,671 .14 193 29,055 .11 35 7,123 .20 34 6,090 .07
E 123 18,719 .13 30 3,806 .12 39 6,453 .10 192 28,978 .12 34 7,243 .12 34 6,215 .12
O 82 14,236 .04 18 2,722 .09 35 5,525 .07 135 22,483 .05 21 3,130 .22 33 6,022 .07
A 112 17,520 .07 26 3,466 .02 40 6,447 .13 178 27,433 .08 25 4,938 .16 31 5,646 .03
C 123 19,721 .22 24 3,295 .25 45 8,083 .22 192 31,103 .22 43 8,648 .38 36 6,740 .14
Note. K, number of coefficients; N, total sample size; q, theoretical validity; B&M-91, Barrick and Mount (1991); SAL-97, Salgado (1997); H&D-00, Hurtz and Donovan
(2000); Average-SS, weighted-sample average validity of single-stimulus personality inventories; FC-QI, forced-choice quasi-ipsative personality inventories; FC-IP,
forced-choice ipsative personality inventories.
Table 7. Comparison among the meta-analyses of SS, quasi-ipsative, and ipsative validities
B&M-91 SAL-97 H&D-00 Average-SS FC-QI
Variable K N q K N q K N q K N q K N q
Manager
ES 55 10,324 .08 6 987 .12 4 495 .13 65 11,806 .08 10 1,528 .10
E 59 11,335 .18 6 987 .05 4 495 .13 69 12,817 .17 11 1,685 .34
O 37 7,611 .08 5 787 .03 4 495 .03 46 8,893 .07 8 1,085 .32
A 47 8,597 .10 6 987 .04 4 495 .04 57 10,079 .08 8 1,085 .07
C 52 10,058 .22 6 987 .16 4 495 .19 62 11,540 .21 9 1,132 .17
Sales
ES 19 2,486 .07 6 576 .07 7 799 .15 32 3,861 .06 4 472 .21
E 22 2,316 .15 6 576 .11 8 1,044 .16 36 3,936 .11 4 472 .08
O 12 1,566 .02 6 732 .04 18 2,298 .00 2 236 .17
A 16 2,344 .00 6 576 .02 8 959 .06 30 3,879 .02 2 236 .12
C 21 2,263 .23 6 576 .18 10 1,369 .29 37 4,208 .24 4 472 .39
Skilled
ES 26 3,694 .12 12 1,264 .25 11 1,874 .09 49 6,832 .14 5 1,350 .51
E 23 3,888 .01 11 1,209 .08 12 2,385 .01 46 7,482 .02 4 1,018 .28
O 16 3,219 .01 7 1,208 .17 11 1,874 .02 34 6,301 .03
A 28 4,585 .06 6 876 .05 12 2,385 .11 46 7,846 .07 2 796 .45
C 25 4,588 .21 8 1,264 .23 14 3,841 .17 47 9,693 .20 8 2,338 .71
Note. K, number of coefficients; N, total sample size; q, theoretical validity; B&M-91, Barrick and Mount (1991); SAL-97, Salgado (1997); H&D-00, Hurtz and Donovan
(2000); Average-SS, weighted-sample average validity of single-stimulus personality inventories; FC-QI, Forced-choice quasi-ipsative personality inventories.
817
conscientiousness (.24 vs. .39). Finally, with regard to skilled jobs, the quasi-ipsative
inventories showed larger validity than the SS inventories for the four factors with
estimates of validity, that is emotional stability (.14 vs. .51), extraversion (.02 vs. .28),
agreeableness (.07 vs. .45), and conscientiousness (.20 vs. .71).
Discussion
This meta-analysis makes some unique contributions that should be noted. The first was to
examine meta-analytically the validity of the FFM as assessed with ipsative and quasi-
ipsative measures derived from FC personality inventories for predicting job performance
in nine occupational categories. Globally, quasi-ipsative FC measures of the five
personality factors proved to be valid predictors of job performance and showed validity
generalization for openness, agreeableness, and conscientiousness. With regard to
ipsative FC inventories, only extraversion and conscientiousness showed validity
generalization, although the validity size was very small.
Second, our findings show that occupation is a potent moderator of validity for both
ipsative and quasi-ipsative FC measures. This finding concurs with previous meta-analytic
findings which showed that occupational group moderated the validity of SS personality
inventories (e.g., Barrick & Mount, 1991; Hurtz & Donovan, 2000; Salgado, 1997, 1998a).
Therefore, the role of the occupation being assessed seems to be similar for SS and FC
personality inventories.
Third, across occupations, quasi-ipsative inventories of personality predict job
performance substantially better than ipsative FC inventories. This held true across the
five personality dimensions of the FFM. Globally, the validity of quasi-ipsative measures
was about three times larger than the ipsative equivalent for emotional stability, openness,
agreeableness, and conscientiousness. This is a substantial improvement and has
important ramifications for the design and use of personality inventories for employee
selection. In essence, our findings demonstrate unequivocally that a quasi-ipsative scaling
format produces substantially higher validity than the ipsative formats that have typically
been more widely used in the past.
Fourth, a unique contribution of this meta-analysis was to compare the validity of
ipsative and quasi-ipsative measures of personality with the validity of SS inventories. The
results of this comparison showed that SS inventories were better predictors of job
performance than ipsative FC inventories. Therefore, Hicks’ (1970) hypothesis that SS
inventories would show larger validity than ipsative inventories was supported in our
study. However, the quasi-ipsative inventories were better predictors than the SS ones,
contrary to the prediction of Hypothesis 3. The difference between quasi-ipsative and SS
inventories was especially remarkable in the case of conscientiousness. The estimate of
the average validity of conscientiousness from the independent meta-analyses of Barrick
and Mount (1991), Hurtz and Donovan (2000), and Salgado (1997) was .22 (cumulated
N = 30,999), while the validity found here for the quasi-ipsative measures of conscien-
tiousness was .38 pooled across occupations (N = 8,648). The quasi-ipsative measures of
emotional stability, openness, and agreeableness were also more valid than their
counterpart SS measures, although the magnitude of the validity was smaller. Therefore, as
a whole, this may be the most important finding of this meta-analytic effort, and it suggests
that quasi-ipsativity matters.
Fifty, an additional contribution was to show that openness to experience may be a
more important predictor of job performance than has been thought. Previous meta-
analyses generally agreed that openness to experience was a predictor of training success,
but openness was irrelevant for predicting job performance overall and across
occupations (Barrick & Mount, 1991; Hurtz & Donovan, 2000; Salgado, 1997, 2003).
The present findings suggest that, if openness to experience is assessed by quasi-ipsative
personality inventories, its validity may be relevant for a number of occupations, including
clerical, health, managerial, military, and sales occupations. Nevertheless, although the
global finding is relatively robust as it is based on 21 studies and 3,130 individuals, the
results for the specific occupations must be treated with caution as several validity
coefficients were estimated with only two or three samples. There is no clear explanation
for the differential effect found for openness to experience at present. We conjecture two
possible explanations. The first explanation is based on the possible relationship between
openness to experience and general mental ability (GMA). Previous research has shown
that the correlation between these two variables is around .22 when openness is assessed
with SS personality measures (e.g., Judge, Jackson, Shaw, Scott, & Rich, 2007). It is
possible that quasi-ipsative measures of openness may be loaded more strongly by GMA
than SS measures of openness to experience, and consequently, an indirect effect of GMA
might explain the criterion validity of openness. However, there is currently no
conclusive empirical evidence on this. The second potential explanation is based on the
possible relationship between openness and conscientiousness. The Big Five personality
dimensions are orthogonal when the factor space is examined but show moderate to large
correlations among them when raw scores are used for their estimation (see, Costa &
McCrae, 1992). Ones (1993; see also Ones et al., 1996) found that the correlation
between openness and conscientiousness was .06. However, recent research has
shown that this correlation may be substantially larger. Hogan, Barrett, and Hogan (2007),
using a very large data set of applicants, found that the observed correlation between
openness and conscientiousness was .31 and .36 (N = 5,266) in two evaluations of the
same applicants, 6 months apart. The correlations corrected for measurement error in
openness and conscientiousness are .43 and .48 (average correlation = .45). At present,
the correlation between quasi-ipsative measures of openness and conscientiousness
remains to be estimated. If the correlation in this last case was similar to the one found by
Hogan et al. (2007) for SS measures, then the validity of openness could be partially due to
conscientiousness. Nevertheless, with regard to this second conjecture, it should be taken
into account that there is also research that finds that conscientiousness, agreeableness,
and emotional stability tend to show moderate intercorrelations and extraversion and
openness tend to show some moderate intercorrelations (e.g., Chang, Connelly, & Geeza,
2012). Future research is therefore needed into these and other explanatory alternatives.
Avenues and implications for future research

The present findings have a number of implications for ongoing and future research. For
example, one interesting future research question would be to examine the relationship
between GMA and the Big Five if they are assessed with quasi-ipsative measures. Several
researchers have suggested that the cognitive strategies used to respond to SS and FC
personality measures are different, with the first being more cognitively demanding than
the second (e.g., Brown & Maydeu-Olivares, 2013; Meade, 2004; Vasilopoulos, Cucina,
Dyomina, Morewitz, & Reilly, 2006). The first type of measures would require the
individual to make just one decision, to agree or disagree with the item content. The
second type would require a number of decisions, of two types: One to agree or disagree
with the content of each item and the other to rank the item in comparison with the rest of
the items included as alternatives. Therefore, based on the volume of cognitive demands,
quasi-ipsative measures could show a higher correlation with GMA than SS measures.
However, the empirical results are not conclusive. For example, in one experimental
study, Vasilopoulos et al. (2006) found that GMA correlated .36 with FC measures of
openness and conscientiousness in an applicant condition but not in an honest condition
( .03 and .09, respectively). Converse et al. (2008) found that GMA correlated .01, .12,
and .01 with the FC measures of time management, extraversion, and resilience,
respectively. An implication of the potential relation between GMA and FC personality
measures is that their incremental validity may be comparatively smaller than the
incremental validity of SS inventories (Vasilopoulos et al., 2006). The potential
relationships between GMA and FC inventories may also have implications for test
fairness and adverse impact (Converse et al., 2008). The larger the correlation between
GMA and FC inventories is the more potential adverse impact.
Another relevant question is related to applicant reactions to FC measures. The meta-
analysis by Anderson, Salgado, and H€ ulsheger (2010) found that personality inventories
are positively evaluated in general. However, this may be not the case for FC inventories.
For example, Converse et al. (2008) found that the FC format did affect self-reported test-
taking ease, test-taking anxiety, and positive affect, and the applicant may have a less
positive reaction to FC measures than to SS personality inventories. Future studies should
therefore examine applicant reactions to quasi-ipsative inventories.
A third research question is about the relationships among the Big Five as measured
with quasi-ipsative personality inventories. This is a relevant question to estimate the
validity of a composite of personality measures (e.g., composites of emotional stability,
conscientiousness, and agreeableness) and to compare its validity with the validity of
other personality compounds frequently used for selection decisions (i.e., integrity tests).
It would be also interesting to explore issues of incremental validity of one type of format
(e.g., SS inventories) over another (e.g., quasi-ipsative or normative inventories).
There are two further research issues that future studies should examine. The first is to
achieve a better estimation of the validity of normative FC personality inventories with a
larger database. According to Hicks (1970), these should show greater validity than quasi-
ipsative and ipsative inventories. However, the number of studies included in our
database precludes being conclusive with regard to this point. The second is to ascertain
whether the FC format is more resistant to faking than the SS format. In theory, resistance
to faking should be similar for the three types of FC personality inventories, as protection
against faking is related to the format of the inventory and not the type of scores produced
by the format. Nevertheless, this last prediction has to be empirically tested in order to be
able to reach conclusions.
Implications for selection practices and personality inventory design

Our findings also have important implications for the practice of personnel selection. The
empirical evidence provided by this meta-analysis suggests that quasi-ipsative personality
measures can be used for making personnel decision because their validity is similar to, or
even greater than, other well-known procedures (e.g., structured interviews, assessment
centres, situational judgment tests). A second implication is that quasi-ipsative personality
measures can be considered a good alternative to, or supplementary to, SS inventories.
The recent developments of Thurstonian IRT technology not only allows us to create new
ipsative and quasi-ipsative FC inventories and recover normative scores (McCloy,
Heggestad, & Reeve, 2005; Stark, Chernyshenko, Drasgow, & Williams, 2006), but also to
recover normative scores from already existing ipsative and quasi-ipsative FC inventories
(Brown & Maydeu-Olivares, 2013). These two advances may change some previously
negative views over the validity of personality inventories and reinforce the more positive
views of practitioners (e.g., Anderson, 2005).
Study strengths and limitations

A key contribution, and hence strength, of the present study is that it reports the first
comprehensive meta-analysis undertaken into the comparative validity of different types
of personality inventories across major occupational groups. However, our study also has
some limitations that must be acknowledged. The first is that the number of studies (K) is
small for some personality measures in some of the occupational groups covered.
Therefore, some results should be interpreted with caution in these cases, and they will be
subject to change as the number of studies and the sample size increase. This may be
especially true in the case of the validity of quasi-ipsative conscientiousness for skilled
labour occupations. Contrary to this, Viswesvaran, Schmidt, and Ones (2002) suggested
that meta-analytic procedures can be carried out on K sample sizes that are not large, as we
noted earlier in this study. Indeed, practically all existing meta-analyses contain some
cases in which the number of studies is small (e.g., 2 or 3) and the total sample in these
cases is even smaller than the size of our smallest sample. It should also be considered that
in all cases, they are larger than the sample sizes typically found in validity studies
published in other top journals in I/O psychology (Salgado, 1998b; Shen et al., 2011).
Nevertheless, although we find evidence of validity generalization, a strong conclusion
should be refrained as second-order sampling error is observed in some of the meta-
analyses with four or less studies and the estimates can change as the number of studies
increases.
A second limitation is that the majority of the studies included in the database used
personality inventories developed out of the FFM. Of course, this is not a problem for
grouping the validity coefficients as has repeatedly been demonstrated, but the construct
validity of non-FFM personality inventories may be smaller than that of inventories based
on the FFM when the objective is to assess the Big Five personality dimensions. The loss of
construct validity produces two relevant effects, one on the magnitude of the criterion
validity and another on the variability of the validity coefficients (Hunter & Schmidt,
2004). Salgado (2003) found that FFM-based inventories showed remarkably larger
validity than non-FFM-based inventories. Therefore, our findings could be low estimates of
the operational and true validity of the Big Five.
Conclusion
In summary, quasi-ipsative FC measures of personality, mainly of conscientiousness, were
valid predictors across all occupational categories. In general, they were twice as high as
the validity of ipsative FC personality measures. Furthermore, quasi-ipsativity appears to
have positive effects on the predictive validity of other personality dimensions, such as
openness, that until now had shown low efficiency for predicting job performance. This
initial meta-analysis suggests that quasi-ipsative FC measures may be more valid predictors
of job performance than SS personality inventories. Future research should be devoted to
examining other aspects of ipsativity not studied in this meta-analysis, and new studies
should be carried out for other occupations. Also, we recommend that new personality
inventories should be developed based upon quasi-ipsative scaling to optimize criterion

validity for personnel selection applications.
Acknowledgements
The research reported in this article was partially supported by Grant PSI2011-27943 from the
Ministry of Economy and Competitiveness (Spain) to Jes us F. Salgado and by Grant IN-2012-095
from the Leverhulme Trust (U.K.) to Neil Anderson.
References
Articles with an asterisk are included in the meta-analysis
*Adkins, C. L., & Naumann, S. E. (2001). Situational constraints on the achievement-performance
relationship: A service sector study. Journal of Organizational Behavior, 22, 453–465. doi:10.
1002/job.96
Anderson, N. (2005). Relationships between practice and research in personnel selection: Does the
left hand know what the right is doing? In A. Evers, N. Anderson & O. Smit-Voskuyl (Eds.), The
blackwell handbook of personnel selection (pp. 1–24). Oxford, UK: Blackwell.
Anderson, N., Salgado, J. F., & H€ ulsheger, U. R. (2010). Applicant reactions in selection:
Comprehensive meta-analysis intro reaction generalization versus situational specificity.
International Journal of Selection and Assessment, 18, 291–304. doi:10.1111/j.1468-2389.
2010.00512.x
*Antler, L., Zaretsky, H. H., & Ritter, W. (1967). The practical validity of the Gordon Personal Profile
among United States and foreign medical residents. Journal of Social Psychology, 72, 257–263.
doi:10.1080/00224545.1967.9922323
*Balch, D. E. (1977). Personality trait differences between successful and non-successful police
recruits at a typical police academy and veteran police officers (Unpublished Doctoral
Dissertation). U.S. International University.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A
meta-analysis. Personnel Psychology, 44, 1–26. doi:10.1111/j.1744-6570.1991.tb00688.x
Barrick, M. R., Mount, M. K., & Judge, T. (2001). Personality and performance at the beginning of the
new millennium: What do we know and where do we go next? International Journal of
Selection and Assessment, 9, 9–30. doi:10.1111/1468-2389.00160
Bartram, D. (1996). The relationship between ipsatized and normative measures of personality.
Journal of Occupational and Organizational Psychology, 69, 25–39. doi:10.1111/j.2044-
8325.1996.tb00597.x
*Bartram, D. (2005). The great eight competencies: A criterion-centric approach to validation.
Journal of Applied Psychology, 90, 1185–1203. doi:10.1037/0021-9010.90.6.1185
*Bartram, D. (2007). Increasing validity with forced-choice criterion measurement formats.
2007.00386.x
Bennett, M. (1977). Testing management theories cross-culturally. Journal of Applied Psychology,
62, 578–581. doi:10.1037/0021-9010.62.5.578
*Brown, A., & Bartram, D. (2009, April 2–4). Doing less but getting more: improving forced-choice
measures with IRT. Paper presented at the 24th Annual Conference of the Society for Industrial
and Organizational Psychology, New Orleans.
Brown, A., & Maydeu-Olivares, A. (2013). How IRT can solve problems of ipsative data in forced-
choice questionnaires. Psychological Methods, 18, 36–52. doi:10.1037/a0030641
Carreta, T., & Ree, M. J. (2001). Pitfalls of ability research. International Journal of Selection and
Assessment, 9, 325–335. doi:10.1111/1468-2389.00184
Carretta, T., & Ree, M. J. (2000). General and specific cognitive and psychomotor abilities in
personnel selection: The prediction of training and job performance. International Journal of
Selection and Assessment, 8, 227–236. doi:10.1111/1468-2389.00152
Cattell, R. B., & Brennan, J. (1994). Finding personality structure when ipsative measurements are
the unavoidable basis of the variables. American Journal of Psychology, 107, 261–274. doi:10.
2307/1423040
Chang, L., Connelly, B. S., & Geeza, A. A. (2012). Separating method factors and higher order traits of
the Big Five: A meta-analytic multitrait–multimethod approach. Journal of Personality and
Social Psychology, 102, 408–426. doi:10.1037/A0025559
Christiansen, N. D., Burns, G. N., & Montgomery, G. E. (2005). Reconsidering forced-choice item
formats for applicant personality assessment. Human Performance, 18, 267–307. doi:10.1207/
s15327043hup1803_4
Clarke, S., & Robertson, I. T. (2005). A meta-analytic review of the Big Five personality factors and
accidents involvement in occupational and non-occupational settings. Journal of Occupational
and Organizational Psychology, 78, 355–376. doi:10.1348/096317905X26183
Clemans, W. V. (1966). An analytical and empirical examination of some properties of ipsative
measures. Psychometric Monographs, 14. Richmond, VA: Psychometric Society.
*Clevenger, J., Pereira, G. M., Wiechman, D., Schmitt, N., & Schmidt-Harvey, V. (2001). Incremental
validity of situational judgment tests. Journal of Applied Psychology, 86, 410–417. doi:10.1037/
0021-9010.86.3.410
Converse, P. D., Oswald, F. L., Imus, A., Hedricks, C., Roy, R., & Butera, H. (2008). Comparing
personality tests and warnings: Effects on criterion-related validity and test-taker reactions.
2008.00420.x
*Conway, J. M. (2000). Managerial performance development constructs and personality correlates.
Human Performance, 13, 23–46. doi:10.1207/S15327043HUP1301_2
Cornwell, J. M., & Dunlap, W. P. (1994). On the questionable soundness of factoring ipsative data: A
response to Saville and Wilson (1991). Journal of Occupational and Organizational
Psychology, 67, 89–100. doi:10.1111/j.2044-8325.1994.tb00553.x
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-
Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment
Resources.
De Fruyt, F., & Salgado, J. F. (2003). Applied personality psychology: Lessons learned from the IWO
field. European Journal of Personality, 17, 123–131. doi:10.1002/per.486
Dudley, N. M., Orvis, K. A., Lebiecki, J. E., & Cortina, J. M. (2006). A meta-analytic investigation of
conscientiousness in the prediction of job performance: Examining the intercorrelations and the
incremental validity of narrow traits. Journal of Applied Psychology, 91, 40–57. doi:10.1037/
0021-9010.91.1.40
Dunlap, W. P., & Cornwell, J. M. (1994). Factor analysis of ipsative measures. Multivariate
Behavioral Research, 29, 115–126. doi:10.1207/s15327906mbr2901_4
Edwards, A. L. (1957). Manual for the Edwards Personal Preference Schedule. New York, NY:
Psychological Corporation.
Feist, G. J. (1998). A meta-analysis of personality in scientific and artistic creativity. Personality and
Social Psychology Review, 2, 290–309. doi:10.1207/s15327957pspr0204_5
*Fine, S., & Dover, S. (2005). Cognitive ability, personality, and low fidelity simulation measures in
predicting training performance among customer service representatives. Applied HRM
Research, 10, 103–106.
*Fineman, S. (1975). The work preference questionnaire: A measure of managerial need for
achievement. Journal of Occupational Psychology, 48, 11–32. doi:10.1111/j.2044-8325.1975.
tb00293.x
*Francis-Smythe, J., Tinline, G., & Allender, C. (2002). Identifying high potential police officers and
role characteristics. Paper presentation at the Division of Occupational Psychology Conference,
British Psychological Society, Bournemouth.
*Furnham, A. (1994). The validity of the SHL Customer Service Questionnaire (CSQ). International
Journal of Selection and Assessment, 2, 157–165. doi:10.1111/j.1468-2389.1994.tb00136.x
Furnham, A., Steele, H., & Pendleton, D. (1993). A psychometric evaluation of the Belbin Team-Role
Self-Perception Inventory. Journal of Occupational and Organizational Psychology, 66, 245–
257. doi:10.1111/j.2044-8325.1993.tb00535.x
*Furnham, A., & Stringfield, P. (1993). Personality and work performance: Myers-Briggs Type
Indicator correlates of managerial performance in two cultures. Personality and Individual
Differences, 14, 145–153. doi:10.1016/0191-8869(93)90184-5
Garcıa-Izquierdo, A. L. (2001). Validaci on orientada al criterio de procedimientos de selecci
on de
personal de oficio para el pron ostico del rendimiento laboral y formativo en el sector de la
construcci on [Criterion-oriented validation of personnel selection procedures for predicting
traing and job performance in the building trade industry] (Unpublished doctoral dissertation).
University of Muricia, Spain.
Ghiselli, E. E. (1954). The forced-choice technique in self-description. Personnel Psychology, 7,
201–208. doi:10.1111/j.1744-6570.1954.tb01593.x
Gleser, L. J. (1972). On bounds for the average correlation between subtest scores in ipsatively
scored tests. Educational and Psychological Measurement, 32, 759–765. doi:10.1177/
001316447203200314
*Goffin, R. D., Jan, I., & Skinner, E. (2011). Forced-choice and conventional personality assessment:
Each may have unique value in pre-employment testing. Personality and Individual
Differences, 5, 840–844. doi:10.1016/j.paid.2011.07.012
*Gordon, L. V. (1993). Gordon Personal Profile-Inventory. Manual 1993 revision. San Antonio,
TX: Pearson educ.
*Graham, W. K., & Calendo, J. T. (1969). Personality correlates of supervisory ratings. Personnel
Psychology, 22, 483–487. doi:10.1111/j.1744-6570.1969.tb00349.x
*Grimsley, G., & Jarret, H. F. (1973). The relation of past managerial achievement to test measures
obtained in the employment situation: Methodology and results. Personnel Psychology, 26, 31–
48. doi:10.1111/j.1744-6570.1973.tb01115.x
*Guller, M. (2003). Predicting performance of law enforcement personnel using the candidate
and officer personnel survey and other psychological measures (Unpublished Doctoral
Dissertation). Seton Hall University.
Heggestad, E. D., Morrison, M., Reeve, C. L., & McCloy, R. A. (2006). Forced-choice assessments of
personality for selection: Evaluating issues of normative assessment and faking resistance.
Hicks, L. E. (1970). Some properties of ipsative, normative, and forced-choice normative measures.
Psychological Bulletin, 74, 167–184. doi:10.1037/h0029780
Hogan, J., Barrett, P., & Hogan, R. (2007). Personality measurement, faking, and employment
selection. Journal of Applied Psychology, 92, 1270–1285. doi:10.1037/0021-9010.9
Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job performance relations:
A socioanalytic perspective. Journal of Applied Psychology, 88, 100–112. doi:10.1037/0021-
9010.88.1.100
Horn, J. L. (1971). Motivation and dynamic calculus concepts from multivariate experiment. In R. B.
Cattell (Ed.), Handbook of multivariate experimental psychology (2nd printing, pp. 611–641).
Chicago, IL: Rand McNally.
Hough, L. M. (1992). The “Big Five” personality variables-construct confusion: Description versus
prediction. Human Performance, 5, 139–155. doi:10.1080/08959285.1992.9667929
Hough, L. M., & Ones, D. S. (2001). The structure, measurement, validity, and use of personality
variables in industrial, work and organizational psychology. In N. R. Anderson, D. S. Ones, H. K.
Sinangil & C. Viswesvaran (Eds.), Handbook of industrial, work, and organizational
psychology. Vol 1: Personnel Psychology (pp. 233–276). London, UK: Sage.
*Hughes, J. L., & Dood, W. E. (1961). Validity versus stereotype: Predicting sales performance by
ipsative scoring of a personality test. Personnel Psychology, 15, 343–355. doi:10.1111/j.1744-
6570.1961.tb01241.x
*Hughes, G. L., & Prien, E. P. (1986). An evaluation of alternate scoring methods for the mixed
standard. Personnel Psychology, 39, 839–847. doi:10.1111/j.1744-6570.1986.tb00598.x
H€ulsheger, U. R., Anderson, N., & Salgado, J. F. (2009). Team-level predictors of innovation at work:
A comprehensive meta-analysis spanning three decades of research. Journal of Applied
Psychology, 94, 1128–1145. doi:10.1037/a0015978
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance.
Journal of Vocational Behavior, 29, 340–362. doi:10.1016/0001-8791(86)90013-8
Hunter, J. E., & Hirsh, H. R. (1987). Applications of meta-analysis. In C. L. Cooper & I. T. Robertson
(Eds.), International review of industrial and organizational psychology (Vol. 2, pp. 321–
357). Chichester, UK: Wiley.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis. Correcting error and bias in
research findings. Newbury Park, CA: Sage.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis. Correcting error and bias in
research findings. (2nd ed.) Newbury Park, CA: Sage.
Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of direct and indirect range restriction for
meta-analysis methods and findings. Journal of Applied Psychology, 91, 594–612. doi:10.1037/
0021-9010.91.3.594
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The big five revisited.
*Illiescu, D., Ilie, A., & Aspas, D. G. (2011). Examining the criterion-related validity of the Employee
Screening Questionnaire: A three-sample investigation. International Journal of Selection and
Assessment, 19, 222–228. doi:10.1111/j.1468-2389.2011.00550.x
Jackson, D. N. (2002). Employee screening questionnaire manual. Port Huron, MI: Sigma
Assessment Systems.
*Jackson, D. N., Wroblewski, V. R., & Ashton, M. C. (2000). The impact of faking on employment
tests: Does forced choice offer a solution? Human Performance, 13, 371–388. doi:10.1207/
S15327043HUP1304_3
Johnson, C. E., Wood, R., & Blinkhorn, S. F. (1988). Spuriouser and spuriouser: The use of ipsative
personality tests. Journal of Occupational Psychology, 61, 153–162. doi:10.1111/j.2044-8325.
1988.tb00279.x
Judge, T., & Bono, J. E. (2001). Relationship of core self-evaluations traits – self-esteem, generalized
self-efficacy, locus of control, and emotional stability – with job satisfaction and job
performance: A meta-analysis. Journal of Applied Psychology, 86, 80–92. doi:10.1037/0021-
9010.86.1.80
Judge, T., Jackson, C. L., Shaw, J. C., Scott, B. A., & Rich, B. L. (2007). Self-efficacy and work-related
performance: The integral roles of individual differences. Journal of Applied Psychology, 92,
107–127. doi:10.1037/0021-9010.92.1.107
Judge, T. A., Rodell, J. B., Klinger, R. L., Simon, L. S., & Crawford, E. R. (2013). Hierarchical
Representations of the Five-Factor Model of personality in predicting job performance:
Integrating three organizing frameworks with two theoretical perspectives. Journal of Applied
Psychology, 98, 875–925. doi:10.1037/a0033901
Knapp, D., Heggestad, E., & Young, M. (2004). Understanding and improving the Assessment of
Individual Motivation (AIM) in the Army’s GED Plus Program (Study Note 2004–03).
Alexandria, VA: US Army Research Institute for the Behavioral and Social Sciences.
*Kriedt, P. H., & Dawson, R. I. (1961). Response set and the prediction of clerical job performance.
Journal of Applied Psychology, 45, 175–178. doi:10.1037/h0041918
*Kusch, R. I., Deller, J., & Albrecht, A. G. (2008, July 20–25). Predicting expatriate job performance:
using the normative NEO-PI-R or the ipsative OPQ32i? Paper presented at the 29th
International Congress of Psychology, Berlin, Germany.
*Lievens, F., Harris, M. M., Keer, E. V., & Bisqueret, C. (2003). Predicting cross-cultural training
performance: The validity of personality, cognitive ability, and dimensions measured by an
assessment center and a behavior description interview. Journal of Applied Psychology, 88,
476–489. doi:10.1037/0021-9010.88.3.476
Matthews, G., Stanton, N., Graham, N. C., & Brimelow, C. (1990). A factor analysis of the scales of the
occupational personality questionnaire. Personality and Individual Differences, 11, 591–596.
doi:10.1016/0191-8869(90)90042-P
McCloy, R. A., Heggestad, E. D., & Reeve, C. L. (2005). A silk purse from the sow’s ear: Retrieving
normative information from multidimensional forced-choice items. Organizational Research
Methods, 8, 222–248. doi:10.1177/1094428105275374
McCrae, R. R., & Costa, P. T. (1990). Personality in adulthood. New York, NY: Guilford Press.
*McDaniel, M. A., Yost, A. P., Ludwick, M. H., Hense, R. L., & Hartman, N. S. (2004, April).
Incremental validity of a situational judgment test. Paper presented at the 19th Annual
Conference of the Society for Industrial and Organizational Psychology, Chicago.
Meade, A. W. (2004). Psychometric problems and issues involved with creating and using ipsative
measures for selection. Journal of Occupational and Organizational Psychology, 77, 531–
552. doi:10.1348/0963179042596504
Meglino, B. M., & Ravlin, E. C. (1998). Individual values in organizations: Concepts, controversies,
and research. Journal of Management, 24, 351–389. doi:10.1177/014920639802400304
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007a).
Reconsidering the use of personality tests in personnel contexts. Personnel Psychology, 60,
683–729. doi:10.1111/1744-6570.2007.00089.x
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N.
(2007b). Are we getting fooled again? Coming to terms with limitations in the use of personality
tests for personnel selection. Personnel Psychology, 60, 1029–1049. doi:10.1111/j.1744-6570.
2007.00100.x
Murphy, K. R., & De Shon, R. (2000). Inter-rater correlations do not estimate the reliability of job
performance ratings. Personnel Psychology, 53, 873–900. doi:10.1111/j.1744-6570.2000.
tb02421.x
Myers, I. B., McCaulley, M. H., Quenk, N., & Hammer, A. (1998). MBTI handbook: A guide to the
development and use of the Myers-Briggs Type Indicator. (3rd ed.). Palo Alto, CA: Consulting
Psychologists Press.
Natham, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A meta-analytic
investigation. Personnel Psychology, 41, 517–535. doi:10.1111/j.1744-6570.1988.tb00642.x
*Nelson, C. A. (2008). Job type as a moderator of the relationship between situational judgment
and personality (Unpublished doctoral dissertation). Capella University.
*Neuman, G. A. (1991). Autonomous work group selection. Journal of Business and Psychology, 6,
283–291. doi:10.1007/BF01126715
*Neuman, G. A., & Kickul, J. R. (1998). Organizational citizenship behaviors: Achievement
orientation and personality. Journal of Business and Psychology, 13, 263–279. doi:10.1023/
A:1022963108025
Nguyen, N. T., & McDaniel, M. A. (2000, April). Faking and forced-choice scales in applicant
screening: A meta-analysis. Paper presented at the 15th Annual Conference of the Society for
Industrial and Organizational Psychology, New Orleans, LA.
*Nyfield, G., Gibbons, P. J., Baron, H., & Robertson, I. (1995). The cross-cultural validity of
management assessment methods. Surrey, UK: Saville & Holdsworth.
Ones, D. S. (1993). The construct of integrity tests (Unpublished doctoral dissertation). University of
Iowa, Iowa City.
Ones, D. S., & Viswesvaran, C. (2003). Job-specific applicant pools and national norms for
personality scales: Implications for range-restriction corrections in validation research. Journal
of Applied Psychology, 88, 570–577. doi:10.1037/0021-9010.88.3.570
Ones, D. S., Viswesvaran, C., & Reiss, A. (1996). Rose of social desirability in personality testing for
personnel selection: The red herring. Journal of Applied Psychology, 81, 660–679. doi:10.1037/
0021-9010.81.6.660
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test
validities: Findings and implications for personnel selection and theories of job performance.
Journal of Applied Psychology, 78, 679–703.
*Perkins, A. M., & Corr, P. J. (2005). Can worriers be winners? The association between worrying and
job performance. Personality and Individual Differences, 38, 25–31. doi:10.1016/j.paid.2004.
03.008
Piedmont, R. L., Costa, P. T., & McCrae, R. R. (1992). An assessment of the Edwards Personal
Preference Schedule from the perspective of the Five-Factor Model. Journal of Personality
Assessment, 58, 67–78. doi:10.1207/s15327752jpa5801_6
Radcliffe, J. (1963). Some properties of ipsative score matrices and their relevance for some
current interest tests. Australian Journal of Psychology, 15, 1–11. doi:10.1080/
00049536308255468
*Robertson, I. T., Baron, H., Gibbons, P., MacIver, R., & Nyfield, G. (2000). Conscientiousness and
managerial performance. Journal of Occupational and Organizational Psychology, 73, 171–
180. doi:10.1348/096317900166967
*Robertson, I., Gibbons, P., Baron, H., MacIver, R., & Nyfield, G. (1999). Understanding management
performance. British Journal of Management, 10, 5–12. doi:10.1111/1467-8551.00107
*Rolland, J. P., & Mogenet, J. L. (2001). Syst eme de description en cinq dimensions (D5D). Paris,
France: Les Editions du Centre de Psychologie Appliquee.
Rust, J. (1999). The validity of the Giotto integrity test. Personality and Individual Differences,
27, 755–768. doi:10.1016/S0191-8869(98)00277-3
Sackett, P. R. (2003). The status of validity generalization research: Key issues in drawing inferences
from cumulative research studies. In K. R. Murphy (Ed.), Validity generalization: A critical
review (pp. 91–114). Mahaw, NJ: Lawrence Erlbaum.
*Sackett, P. R., Gruys, M. L., & Ellington, J. E. (1998). Ability–personality interactions when
predicting job performance. Journal of Applied Psychology, 83, 545–556. doi:10.1037/0021-
9010.83.4.545
*Salgado, J. F. (1991). Validity of the Preferences and Perception Inventory (PAPI) in a financial
service company (Unpublished technical report). Department of Social Psychology, University
of Santiago de Compostela.
Salgado, J. F. (1997). The five-factor model of personality and job performance in the European
Community. Journal of Applied Psychology, 82, 30–43. doi:10.1037/0021-9010.82.1.30
Salgado, J. F. (1998a). The Big Five personality dimensions and job performance in army and civil
occupations: A European perspective. Human Performance, 11, 271–288. doi:10.1080/
08959285.1998.9668034
Salgado, J. F. (1998b). Sample size in validity studies of personnel selection. Journal of Occupational
and Organizational Psychology, 71, 161–164. doi:10.1111/j.2044-8325.1998.tb00669.x
Salgado, J. F. (2002). The Big Five personality dimensions and counterproductive behaviors.
International Journal of Selection and Assessment, 10, 117–125. doi:10.1111/1468-2389.
00198
Salgado, J. F. (2003). Predicting job performance using FFM and non-FFM personality measures.
Journal of Occupational and Organizational Psychology, 76, 323–346. doi:10.1348/
096317903769647201
Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., De Fruyt, F., & Rolland, J. P. (2003). A meta-
analytic study of GMA validity for different occupations in the European Community. Journal of
Applied Psychology, 88, 1068–1081. doi:10.1037/0021-9010.88.6.1068
Salgado, J. F., & Moscoso, S. (1996). Meta-analysis of the interrater reliability of job performance
ratings in validity studies of personnel selection. Perceptual and Motor Skills, 83, 1195–1201.
doi:10.2466/pms.1996.83.3f.1195
Salgado, J. F., & Tauriz, G. (2014). The Five-Factor Model, forced-choice personality inventories and
performance: A comprehensive meta-analysis of academic and occupational validity studies.
European Journal of Work and Organizational Psychology, 23, 3–30. doi:10.1080/
1359432X2012.716198
*Saville, P., Sik, G., Nyfield, G., Hackston, J., & Maclver, R. (1996). A demonstration of the validity of
the Occupational Personality Questionnaire (OPQ) in the measurement of job competencies
across time and in separate organizations. Applied Psychology, 45, 243–262. doi:10.1111/j.
1464-0597.1996.tb00767.x
Saville, P., & Willson, E. (1991). The reliability and validity of normative and ipsative approaches
in the measurement of personality. Journal of Occupational Psychology, 64, 219–238.
doi:10.1111/j.2044-8325.1991.tb00556.x
*Schippmann, J. S., & Prien, E. P. (1989). An assessment of the contributions of general mental ability
and personality characteristics to management success. Journal of Business and Psychology, 3,
423–437. doi:10.1007/BF01020710
Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26
research scenarios. Psychological Methods, 1, 199–223. doi:10.1037/1082-989X.1.2.199
Schmidt, F. L., Hunter, J. E., & Urry, V. W. (1976). Statistical power in criterion-related validation
studies. Journal of Applied Psychology, 61, 473–485. doi:10.1037/0021-9010.61.4.473
Schmidt, F. L., & Le, H. (2004). Software for the Hunter-Schmidt meta-analysis methods. Iowa City,
IA: Department of Management and Organizations, University of Iowa.
Schmidt, F. L., Oh, I.-S., & Le, H. (2006). Increasing the accuracy of corrections for range restriction:
Implications for selection procedure validities and other research practices. Personnel
Psychology, 59, 281–305. doi:10.1111/j.1744-6570.2006.00065.x
Schmidt, F. L., Shaffer, J. A., & Oh, I.-S. (2008). Increased accuracy for range restriction corrections:
Implications for the role of personality and general mental ability in job and training
performance. Personnel Psychology, 61, 827–868. doi:10.1111/j.1744-6570.2008.00132.x
Schmidt, F. L., & Zimmerman, R. D. (2004). A counterintuitive hypothesis about employment
interview validity and some supporting evidence. Journal of Applied Psychology, 89, 553–561.
doi:10.1037/0021-9010.89.3.553
Shen, W., Kiger, T. B., Davies, S. E., Rasch, R. L., Simon, K. M., & Ones, D. S. (2011). Samples in
applied psychology: Over a decade of research in review. Journal of Applied Psychology, 96,
1055–1064. doi:10.1037/a0023322
*SHL (2006). OPQ32 manual and user’s guide. Surrey, UK: Author.
*Slocum, Jr, J. W., & Hand, H. H. (1971). Prediction of job success and employee satisfaction for
executives and foremen. Training and Development Journal, 25, 28–36.
*Sommerfeld, D. (1997). Maintenance worker selection: High validity and low adverse impact.
Michigan municipal league. Employment Testing Consortium Project.
Stark, S., Chernyshenko, O., Drasgow, F., & Williams, B. (2006). Examining assumptions about item
responding in personality assessment: Should ideal point methods be considered for scale
development and scoring? Journal of Applied Psychology, 91, 25–39. doi:10.1037/0021-9010.
91.1.25
Tenopyr, M. (1988). Artifactual reliability of forced-choice scales. Journal of Applied Psychology,
73, 749–751. doi:10.1037/0021-9010.73.4.749
Tett, R. P., Christiansen, N. D., Robie, C., & Simonet, D. V. (2011, May 25–28). International
survey of personality test use: An American baseline. Paper presented at the 15th
Conference of the European Association of Work and Organizational Psychology, Maastricht,
The Netherlands.
Tett, R., Rothstein, M. G., & Jackson, D. J. (1991). Personality measures as predictors of
job performance: A meta-analytic review. Personnel Psychology, 44, 703–742. doi:10.1111/j.
1744-6570.1991.tb00696.x
Thompson, B., Levitov, J. E., & Miederhoff, P. A. (1982). Validity of the Rokeach value
survey. Educational and Psychological Measurement, 42, 899–905. doi:10.1177/
001316448204200325
United States Department of Labor (USES) (1991). Dictionary of occupational titles. (4th ed.)
Washington, DC: U.S. Government Printing Office.
Vasilopoulos, N. L., Cucina, J. M., Dyomina, N. V., Morewitz, C. L., & Reilly, R. R. (2006). Forced-
choice personality tests: A measure of personality and cognitive ability? Human Performance,
19, 175–199. doi:10.1207/s15327043hup1903_1
Viswesvaran, C., & Ones, D. (2000). Measurement error in “Big Five” personality assessment:
Reliability generalization across studies and measures. Educational and Psychological
Measurement, 60, 224–235. doi:10.1177/00131640021970475
Viswesvaran, C., Ones, D., & Schmidt, F. L. (1996). Comparative analysis of the reliability of job
performance ratings. Journal of Applied Psychology, 81, 557–574. doi:10.1037/0021-9010.81.
5.557
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2002). The moderating influence of job performance
dimensions on convergence of supervisor and peer ratings of job performance: Unconfounding
construct-level convergence and rating difficulty. Journal of Applied Psychology, 87, 345–354.
doi:10.1037/0021-9010.87.2.345
Warr, P., Bartram, D., & Brown, A. (2005a). Big Five validity: Aggregation method matters. Journal of
Occupational and Organizational Psychology, 78, 377–386. doi:10.1348/096317905X53868
*Warr, P., Bartram, D., & Martin, T. (2005b). Personality and sales performance: Situational variation
and interactions between traits. International Journal of Selection and Assessment, 13, 87–91.
doi:10.1111/j.0965-075X.2005.00302.x
Whetzel, D. L., McDaniel, M. A., Yost, A., & Kim, N. (2010). Linearity of personality–performance
relationships: A large-scale examination. International Journal of Selection and Assessment,
18, 310–320. doi:10.1111/j.1468-2389.2010.00514.x
*White, L. A. (2002). A quasi-ipsative temperament measure for assessing future leaders. 44th
Annual Conference of the International Military Testing Association (p. 169).
*Willingham, W. W., & Ambler, R. K. (1963). The relation of the Gordon Personal Inventory to
several external criteria. Journal of Consulting Psychology, 27, 460. doi:10.1037/h0040658
*Willingham, W. W., Nelson, P., & O’Connor, W. (1958). A note on the behavioral validity of the
Gordon Personal Profile. Journal of Consulting Psychology, 22, 378. doi:10.1037/h0045815
*Witt, L. A., & Jones, J. W. (1999). Very particular people quit first. Paper presented at the Annual
Conference of the Society for Industrial and Organizational Psychology, Atlanta, GA.
Young, M., & Dulewicz, V. (2007). Relationship between emotional and congruent self-awareness
and performance in the British Royal Navy. Journal of Managerial Psychology, 22, 465–478.
Received 7 June 2012; revised version received 30 October 2014

Appendix A: Main codes and input values for the primary studies included in the meta-analysis
830
References N Inventory Type Job Criterion ES EX O A C
Adkins and Naumann (2001) 264 CES I SO SLS – – – – .18

Antler, Zaretsky, and Ritter (1967) 30 GPI Q HE JPR .58 – – – –
Balch (1977) DD 100 EPPS I POL TRA .10 .12 .10 .01 .20
Bartram (2007) Korea 366 OPQ I MAN JPR .17 .21 .11 .13 .22
Bartram (2007) S. Africa 68 OPQ I MAN JPR .02 .07 .02 .09 .04
Bartram (2007) USA 86 OPQ I MAN JPR .04 .25 .09 .18 .05
Bennett (1977) 45 SDI Q MAN JPR .40 – – – .05
Bennett (1977) 49 SDI Q MAN JPR .46 – – – .07
Brown and Bartram (2009)* 835 OPQ I MAN SFR .12 .13 .08 .01 .15
Christiansen et al. (2005) 60 IPIP Q MIX JPR – – – – .46
Christiansen et al. (2005) 62 IPIP Q MIX JPR – – – – .17
Clevenger, Pereira, Wiechman, 207 OPQ I CUS JPR – – – – .16
Schmitt, and Schmidt-Harvey (2001)
Conway (2000) 1,567 MBTI N MAN JPR – .06 .06 .06 .06
Fine and Dover (2005) 193 ICS I SO TRA – .06 – .02 –
Fineman (1975) 293 WPQ N MAN JPR – – – – .21
Fineman (1975) 246 WPQ N MAN SAL – – – – .20
Fineman (1975) 84 WPQ N MAN PRG – – – – .25
Francis-Smythe, Tinline, 225 OPQ I POL JPR .08 .08 .28 – .16
and Allender (2002)*
Furnham (1994) 176 CSQ I CUS CWB .01 .03 .00 – .06
Furnham (1994) 176 CSQ I CUS JPR .12 .10 .17 – .23
Furnham and Stringfield (1993) 222 MBTI N MAN JPR – .10 .09 .05 .01
Furnham and Stringfield (1993) 148 MBTI N MAN JPR – .05 .02 .02 .12
Garcıa-Izquierdo (2001) 84 OPQ I SKW TRA .04 .02 .26 – .28
Goffin, Jan, and Skinner (2011) 114 ESQ Q SKW CWB – – – – .21
Gordon (1993) T4.18 94 GPP Q MAN PRG .18 .28 .05 .17 .10
Continued
*. (Continued)
Appendix A: (Continued)
Gordon (1993) T4.20 47 GPI Q MAN JPR .37 – – – .26

Gordon (1993) T4.21 396 GPI Q MAN SAL .07 .22 – – .11
Gordon (1993) T4.22 101 GPI Q MAN PRG – .25 – – –
Gordon (1993) T4.22 103 GPI Q MAN PRG – .28 – – –
Gordon (1993) T4.24 200 GPP Q MAN JPR .06 .22 .24 .06 .08
Gordon (1993) T4.25 97 GPP Q MAN JPR .06 .22 .26 .09 .12
Gordon (1993) T4.27 78 GPP Q SO JPR .09 .36 .24 .13 .30
Gordon (1993) T4.27 158 GPP Q SO JPR .22 .10 .04 .17 .22
Gordon (1993) T4.27 146 GPI Q SO JPR .04 .18 – – .19
Gordon (1993) T4.28 75 GPI Q SKW CWB .36 .18 – – .41
Gordon (1993) T4.28 99 GPI Q SKW JPR .10 .18 – – .48
Gordon (1993) T4.28 77 GPI Q SUP JPR .30 .19 – – .19
Gordon (1993) T4.28 97 GPI Q SKW JPR .33 .02 – – .43
Gordon (1993) T4.29 747 GPP Q SKW CWB .45 .23 .30 .29 .51
Gordon (1993) T4.31 216 GPP Q MIL TRA .21 .08 .15 .01 .30
Gordon (1993) T4.32 72 GPP Q MIL JPR .09 .21 .07 .20 .23
Gordon (1993) T4.33 90 GPP Q MIL JPR .19 .04 .06 .20 .33
Gordon (1993) T4.36 531 GPP Q CLI JPR .09 .13 .19 .13 .15
Gordon (1993) T4.37 29 GPP Q CLI JPR .25 .51 .27 .41 .42
Gordon (1993) T4.39 213 GPP Q SKW PRR .28 .04 .10 .24 .32
Graham and Calendo (1969) 69 SDI Q CLE JPR – – – – .02
Grimsley and Jarret (1973) 100 GPP Q MAN PRG .13 .18 .12 .27 .18
Continued
831
Appendix A: (Continued)
*. (Continued)
832
Guller (2003) DD 375 EPPS I POL JPR .06 .04 .00 .04 .01
Hughes and Dood (1961) 90 GPI I SO SLS .22 .17 – – .14
Hughes and Dood (1961) 90 GPI Q SO SLS .06 .08 – – .08
Hughes and Prien (1986) 49 GPI Q SKW JPR – – – .15 .03
Illiescu, Ilie, and Aspas (2011) 475 ESQ Q SKW CWB – – – – .43
Illiescu et al. (2011) 358 ESQ Q SKW CWB – – – – .52

Illiescu et al. (2011) 112 ESQ Q SKW CWB – – – – .32
Jackson et al. (2000) 106 ESQ Q SKW CWB – – – – .41
Kriedt and Dawson (1961) 41 GPI Q CLE JPR – .38 .31 .22 .37
Kusch, Deller, and Albrecht (2008)* 145 OPQ I MAN SFR .20 .23 .07 .05 .18
Lievens, Harris, Keer, 78 OPQ I MAN TRA .03 .03 .31 .24 .18
and Bisqueret (2003)
McDaniel et al. (2004)* 384 OPQ I MAN JPR .06 .16 .09 .04 .03
Nelson (2008)* 114 OPQ I MAN JPR .11 .02 .07 .21 .00
Neuman (1991) 247 GPP Q CLE JPR – .23 .37 .26 .17
Neuman and Kickul (1998) 284 GPP Q CUS JPR – – – – .32
Nyfield et al. (1995) Turkey 503 OPQ I MAN JPR .04 .02 .02 .05 .04
Nyfield et al. (1995) USA 103 OPQ I MAN JPR .01 .15 .15 .06 .03
Perkins and Corr (2005) 68 OPQ I MAN JPR .00 – – – –
Robertson, Gibbons, Baron, 114 OPQ I MAN JPR .03 .02 .05 .14 .03
MacIver, and Nyfield (1999)
Robertson et al. (1999) 68 OPQ I MAN JPR .24 .01 .10 .20 .20
Rust (1999) 432 GIT I MIX JPR .01 .09 – .08 .02
Sackett, Gruys, and Ellington (1998) 247 EPPS I MAN PRG – – – – .16
Salgado (1991)* 189 PAPI I MAN JPR .03 .08 .04 .08 .09
Salgado (1991)* 118 PAPI I MAN JPR .05 .02 .02 .12 .02
Continued
*.
Appendix
(Continued) A: (Continued)
Saville, Sik, Nyfield, Hackston, 440 OPQ I MAN JPR .03 .03 .07 .00 .02
and Maclver (1996)
Saville et al. (1996) 270 OPQ I MAN JPR .03 .12 .12 .02 .08
Schippmann and Prien (1989) 148 EPPS I MAN JPR .02 .05 .09 .05 .09
Schippmann and Prien (1989) 148 GPP + SDI Q MAN JPR .06 .15 .27 .12 .14
SHL (2006) – val19 79 OPQ I SUP JPR .06 .13 .11 .03 .24
SHL (2006) – val22 120 OPQ I MAN JPR .03 .06 .14 .05 .01
SHL (2006) – val30 114 OPQ I SO SLS .04 .10 .06 .28 .06
SHL (2006) – val32 36 OPQ I PRO TRA .05 .19 .03 .15 .20
Slocum and Hand (1971) 57 EPPS I MAN JPR .17 .07 .05 .02 .17
Slocum and Hand (1971) 37 EPPS I SUP JPR .05 .17 .05 .20 .02
Sommerfeld (1997) 332 GPI Q SKW JPR .01 – – – .32
Warr, Bartram, and Brown (2005), 119 CCSQ I SO SLS .03 .10 .10 .19 .26
Warr, Bartram, and Martin (2005)
Whetzel et al. (2010) 1,152 OPQ I PRO JPR .02 .07 .08 .04 .08
White (2002)* 613 AIM Q MIL JPR .06 .22 – .01 .20
White (2002)* 399 AIM Q MIL JPR .07 .06 – .01 .04
Willingham, Nelson, 1,039 GPI Q MIL TRA .05 .00 – – .03
and O’Connor (1958)
Willingham and Ambler (1963) 208 GPI Q MIL CWB – .17 – – .19
Witt and Jones (1999) 168 OPQ I CUS JPR .01 .07 .01 .01 .03
Young & Dulewicz (2007) 261 OPQ I MIL JPR .16 .14 .11 .17 .20
Note. CWB, counterproductive work behaviour; JPR, job performance rating; PRG, progress; PRR, peer rating; SAL, salary; SFR, self-rating; SLS, sales; TRA, training; I,
psative; N, normative; Q, quasi-ipsative; *, unpublished study.

833
Appendix B: Meta-analysis of personality–occupation combinations for

normative FC measures
In the case of normative FC measures of the Big Five, we only found a sufficient number of
validity studies to conduct the meta-analysis for managerial occupations. Table B1
presents the results for this meta-analysis. As can be seen, we found studies for
extraversion, openness, agreeableness, and conscientiousness, but there were no studies
for emotional stability. The results showed that the validity was small for the four
personality dimensions.
Table B1. Meta-analysis of the validity of normative FC measures of the FFM for managerial occupations
Managers
Extraversion 3 1,937 .05 0.03 .08 .09 0.00 100 .09 .13 .05
Openness to experience 3 1,937 .06 0.02 .09 .10 0.00 100 .10 .14 .06
Agreeableness 3 1,937 .06 0.01 .09 .09 0.00 100 .09 .13 .05
Conscientiousness 6 2,560 .09 0.07 .15 .17 0.08 60 .07 .21 .13
Note. K, number of independent samples; N, total sample size; rw, observed validity; SDr, standard
deviation of observed validity; rc, operational validity; qi, validity corrected for criterion reliability and
indirect range restriction in predictor; SDq, standard deviation of qi; %VE, percentage of variance
accounted for by artefactual errors; 90%CV, 90% credibility value based on q; CIU, upper limit of the 95%
confidence interval of q; CIL, lower limit of the 95% confidence interval of q.

Salgado JOOP 2014 The Validity of Ipsative and Quasiã Â Â Ipsative FC Personality Inventories

Uploaded by

Copyright:

Available Formats

Salgado JOOP 2014 The Validity of Ipsative and Quasiã Â Â Ipsative FC Personality Inventories

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Salgado JOOP 2014 The Validity of Ipsative and Quasiã Â Â Ipsative FC Personality Inventories

Uploaded by

Copyright:

Available Formats

797

Journal of Occupational and Organizational Psychology (2015), 88, 797–834

The validity of ipsative and quasi-ipsative forced-

A comprehensive meta-analysis of two types of forced-choice (FC) personality

Validity of the FFM for predicting job performance

Agreeableness showed validity generalization for teamwork and occupations related to

Types of FC personality inventories

Meta-analytic evidence of the validity of FC personality inventories

With regard to the validity of SS measures in comparison with FC measures, Hicks

Hypothesis 3: Quasi-ipsative personality measures will show larger criterion validity

Aims of the study

personality inventories should be preferred to the FC inventories or whether these latter

Table 1. Distributions of reliability of Big Five personality dimensions

Personality dimension K Mean SD Min–Max

Emotional stability 12 0.73 0.09 0.52–0.81

Table 2. Distributions of performance reliability for occupational categories

Ipsative Quasi-ipsative Normative FC

Occupation Mean SD Mean SD Mean SD

Clerical – – 0.52 0.05 – –

Note. FC, forced-choice.

Table 3. Distributions of range restriction for Big Five dimensions (u values)

Personality dimension K Mean SD Min Max

Emotional stability 28 0.87 0.16 0.48 1.04

Personality dimension K N rw SDr rc qi SDq %VE 90%CV CIU CIL

Personality dimension K N rw SDr rc qi SDq %VE 90%CV CIU CIL

Meta-analysis of personality–occupation combinations for ipsative FC measures

Meta-analysis of personality–occupation combinations for quasi-ipsative FC measures

Comparison among the validity of SS and FC personality inventories

B&M-91 SAL-97 H&D-00 Average-SS FC-QI FC-IP

B&M-91 SAL-97 H&D-00 Average-SS FC-QI

Avenues and implications for future research

Implications for selection practices and personality inventory design

Study strengths and limitations

inventories should be developed based upon quasi-ipsative scaling to optimize criterion

Received 7 June 2012; revised version received 30 October 2014

References N Inventory Type Job Criterion ES EX O A C

Adkins and Naumann (2001) 264 CES I SO SLS – – – – .18

Gordon (1993) T4.20 47 GPI Q MAN JPR .37 – – – .26

References N Inventory Type Job Criterion ES EX O A C

Illiescu et al. (2011) 358 ESQ Q SKW CWB – – – – .52

References N Inventory Type Job Criterion ES EX O A C

psative; N, normative; Q, quasi-ipsative; *, unpublished study.

Appendix B: Meta-analysis of personality–occupation combinations for

Personality dimension K N rw SDr rc qi SDq %VE 90%CV CIU CIL

You might also like