In this final blog, I want to look at the psychometric properties of ipsative measures and also look at the supporting evidence for ipsative tests.
As most of our readers are HR practitioners are not statisticians I will try to make the psychometric critique relatively brief. However, the psychometric weaknesses of ipsative testing are well reviewed and for those interested, I strongly suggest a thorough read of Meade (2004). In essence, the critiques are at both a factor structure level as well as the corollary of reliability of measurement.
The factor analysing of data using an ipsative tool is more complex. The way that it was done in Saville and Wilson’s article (1991) was IMO artificial and to quote Barrett:
“This (their) finding completely invalidates Saville and Willson’s (1991) and, by extension, Cronbach’s contention that a factor analysis can be reasonably implemented on ipsative data by simply dropping one score. The interpretation of factor analysis depends entirely on the weights of the variables after regression onto a number of underlying traits. Thus, unless the focus of a factor analysis was simply to determine the amount of variance accounted for by each factor, this procedure is quite insupportable. The choice of which scale to drop will dramatically affect the interpretation of the factor solution”.
In short, ipsative data does not lend itself well to factor analysis. Factor analysis, in turn, is the basis for which we determine construct validity (i.e. the basis for understanding the psychological phenomena we are hoping to measure). As a result, it is not surprising that the reliability of ipsative scales has consistently been shown to be lower than that of normative scales.
In reference to a famous article entitled Spurious and Spurious: The Use of Ipsative Personality Tests Johnson, Wood, and Blinkhorn (1988) re-stated the arguments for the abandonment of ipsative testing via questionnaire on psychometric grounds, and provided some empirical examples of the error-prone consequences of their use. This article was, perhaps, the strongest indictment of ipsative measurement until the more recent paper by Meade 2004.
Moreover, Hough and Ones (2001) make the issues very clear. The key issue is not reliability and factor analysis or even what an ipsative test correlates with. You may be able to reliably produce results from ipsative questionnaires, but they are WITHIN PERSON RANKS thus as soon as you compare two people’s results you are treading on dangerous ground. Between people comparisons, are necessary for selection when you have more than one candidate.
All of the rebuttals (to my knowledge) on ipsative testing for use in selection come from one company, SHL. This is not surprising given that SHL has developed tests which they hope to sell for selection that is ipsative. Their line of reason, as is often the case, is based on a good story, that the tests are equally as valid and difficult to fake.
Despite a lack of independent support, direct criticism, and a recent top-class paper using SHL data (Meade, 2004) it would be amiss not to tackle the points raised by Dave Bartram (SHL Director of Research) directly. In essence, they are based on the main premise that the key difference is the number of scales. This has been critiqued thoroughly by Paul Barrett and much of what is cited below comes from direct posting and conversation between myself and Paul.
The first key defence of ipsative testing was published by Dave Bartram in 1996, in his pre-SHL Director of Research role as Professor at the University of Hull (unfortunately after Sean Hammond’s and my conference paper was given in January 1996). The paper reference and abstract is Bartram, D. (1996) The relationship between ipsatized and normative measures of personality. Journal of Occupational & Organizational Psychology. Vol 69(1), Mar 1996, pp. 25-39.
Abstract: Presents a general expression for computing the relationships between normative scales and ipsative ones derived from them, based on the number of scales and the intercorrelations between the normative scales. The results obtained from various empirical and computer generated data sets were compared with those expected on the basis of the equations and a close correspondence was found. Expressions for computing the reliability of ipsatized scales and the reliability of ipsatized scale differences were also produced and the implications of these for profile analysis are discussed. It is noted that ipsatized measures are unreliable when the number of scales is less than about 10 or when the correlations between normative scales are greater than .30. This unreliability is increased by full ipsatization and by inequality of the variances of the normative scales from which the ipsatized scales are derived.
Now, this was a very well thought out study – using computer-generated data (N=2000) which allowed normative data to be reconstructed as ipsative – thus permitting a direct “head-to-head” comparison without worrying about confounding by social desirability. This paper really did put to rest the psychometrics part of the debate on ipsative vs. normative measures. The reason why every SHL employee does not have this paper indelibly stamped in their minds is because of several cautionary passages in the paper which do not mesh well with their sales message, one of which I quote below:
“These results show that ipsative and normative scales have a high degree of equivalence only when the normative scales are independent of one another [0.0 correlation between scales]. When there are correlations between the normative scales, the correlations between them and ipsative scales rapidly decrease. When the number of scales is large, reasonable levels of equivalence are only maintained for low levels of normative scale intercorrelation” (Pg. 30, Bartram 1996).
Quite by chance (or maybe not!), the Barrett et al. (1996) paper looking at the OPQ Normative Concept Model analysis was published, containing on page 15, a histogram of the inter-scale correlations of the OPQ within a dataset of 2301 applicants. Of these inter-scale correlations 64 out of 465 were greater than r=0.3 and 149 were greater than r=0.2. Obviously, the level of correlation between scales is low – but not 0.0.
The interesting feature of Bartram’s paper is that he shows that you can compute comparative ipsative scale reliabilities (albeit from a derived formula that works using the normative values to estimate ipsative values). It was left to Helen Baron (1996) (formerly of SHL) to conclude “However, for larger sets of scales (N~30) with low average intercorrelations, ipsative data seems to provide robust statistical results in reliability analysis, but not under factor analysis”. Thus, by her omission, the factor structure of ipsative data is poor. This leaves the practitioner with little knowledge of what construct was indeed measured. This is compounded of course by the fact that the items responded to are different in every case!
Saville and Wilson (1991) responded to criticisms by attempting to demonstrate that ipsative tests manifest equal, if not superior, validity to normative tests. Using a novel, if somewhat ill-specified computer-generated dataset, they showed that under certain conditions ipsative and normative tests will yield equivalent psychometric parameters.
In addition, they went on to show that, with certain real datasets, the expected statistical results from Johnson et al. (1988) were not observed. However, these conclusions have been challenged by Cornwell and Dunlap (1994) who carried out a re-analysis of the Saville and Wilson data and found little support for their claims. The reality is that gains in validity have not been shown and indeed the scores on ipsative and normative measures are often cited as comparable (Bartram, 2006). So, not only does the practitioner end up with a faulty measure they do so for no comparative gain! Practical and robust are not mutually exclusive. This is a classic red herring to imply that those that take measurement seriously are just pie in the sky. The complete opposite is true. Those interested in psychometrics are the people who want to see things done right so the discipline goes forward.
The Issues in Summary
The key issue is that you cannot practice unless you understand what you are using. To again quote Paul Barrett:
“Yes, it is important to have a good bedside manner but this is secondary to knowing what medication to prescribe.”
- Are a within person measure to be used for individual counselling not comparisons across people
- Have questionable psychometric properties
- Are not resistant to faking 4
- Have no demonstrable validity gains
- Are in the main supported by only one company with a vested interest in determining their usefulness. Their application is, therefore, more market-driven than science driven.
We have a lot of psychological interventions prescribed by people who know little about what they are prescribing. At least with ipsative testing, we know what the medication can be prescribed for. The application of ipsative testing for selection, a within person measure, is ill-advised and it is time that this practice was eradicated once and for all on the grounds that I/O psychology is truly a discipline guided by science and not marketing whims.
Now rather than it being ‘MY’ view, and for those that want the references:
Baron, H. (1996) Strengths and limitations of ipsative measurement. Journal of Occupational and Organisational Psychology, 67, 89-100.
Cattell, R.B. (1944) Psychological Measurement: ipsative, normative, and interactive. Psychological Review, 51, 292-303.
Clemans, W. V. (1966) An analytic and empirical investigation of some properties of ipsative measures. Psychometric Monographs, vol.14
Closs, S.J. (1976) Ipsative vs normative interpretation of test scores or “What do you mean by like?”. Bulletin of the British Psychological Society, 29, 228-299
Cornwell, J .M. and Dunlap, W.P. (1994) On the questionable soundness of factoring ipsative data: a response to Saville and Willson. Journal of Occupational and Organisational Psychology, 67, 89-100.
Hicks, L.E. (1970) Some properties of ipsative, normative, and forced-choice normative measures. Psychological Bulletin, 74, 167-184.
Hough, L. and Furnham, A. (2003) Use of Personality Variables in Work Settings. In W. Borman, Ilgen, D.R., and Klimoski, R.J. (eds) Handbook of Psychology, Volume 12: Industrial and Organizational Psychology. New York, Wiley. (Chapter 5, pp 77-106)
Hough, L. and Ones, D. (2001) The Structure, Measurement, Validity, and Use of Personality Variables in Industrial, Work, and Organizational Psychology. Chapter 12 (pp 233-267) in N. Anderson, D. Ones, Sinangil, H., and Viswesvaran, C. (eds.) Handbook of Industrial, Work, and Organizational Psychology, Volume 1: Personnel Psychology. New York: Wiley.
Johnson, C. E., Wood, R., and Blinkhom, S. F. (1988) Spurious and Spurious: the use of ipsative personality tests. Journal of Occupational Psychology, 61, 153-162.
Martin, B.A., Bowen, C., and Hunt, S. (2002) How effective are people at faking on personality questionnaires? Personality & Individual Differences, 32(2), 247-256.
Saville, P. & Wilson, E. (1991). The reliability and validity of normative and ipsative approaches in the measurement of personality. Journal of Occupational Psychology, 64, 219-238.
Schmit, M.J., and Ryan, A.M. (1993) The big five in personnel selection: factor structure in applicant and non-applicant populations. Journal of Applied Psychology, 78(6), 966-974.