STATISTICS Siegel
STATISTICS Siegel
STATISTICS Siegel
, 1957), pp. 13-19 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2685679 . Accessed: 17/10/2012 08:02
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The American Statistician.
http://www.jstor.org
Power When alternative statistical testsare available to treat data froma givenresearchdesign,as is veryoftenthe case, it is necessary for the researcher to employsome rationalein choosingamong them.The criterion most oftensuggested is thatthe researcher shouldchoose the mostpowerful test. The powerof a testis defined as the probability that the testwill rejectthe null hypothesis whenin factit is false and should be rejected. That is, Power -1 -- probability of a Type II error Thus,a statistical testis considered a good one if it has small probabilityof rejectingHo the null hypothesis when Ho is true, but a large probability of rejecting Ho whenHo is false. However,thereare considerations other than power whichenterintothechoiceof a statistical test.One must considerthe natureof the populationfromwhich the samplewas drawn,and the kind of measurement which was employed in the operational definitions of the variablesof the research.These matters also enterinto whichstatistical determining testis optimalfor analyz-ing a particular set of data. It is suggested here thatthe choice amongstatistical testswhichmightbe used witha given researchdesign shouldbe based on thesethreecriteria: 1. The statistical modelof the testshouldfitthe conditionsof the research. 2. The measurement of thetestshouldbe requirement metby the measuresused in the research. 3. From amongthosetestswithappropriate statistical modelsand appropriate measurement that requirements, testshouldbe chosenwhichhas greatest power-efficiency. The Statistical Model When we have assertedthe natureof the population and the mannerof samplingin the research,we have a statistical established model on the basis of whichwe may conducta statistical test.The validityof the conclusionbased on the statistical testdependson whether or not the conditions of the statistical modelunderlying the test are met. That is, the conclusionbased on a statistical testcarriesa qualifier:"If themodelused was thenwe may concludethat . correct, Sometimes theresearcher is able to determine whether the conditions of a particular statistical model are met in his research, but moreoften he simply has to assume that they are met. Thus the conditions of a statistical
13
of the called the"assumptions" modelof a testare often test. the commonparametric The model underlying tests, the t and F tests,imposes these conditions: (a) the (b) the observations observations mustbe independent, distributed mustbe drawn fromnormally populations, or, in must havethesamevariance, (c) thesepopulations ratioof variances, specialcases, theymusthave a known and (d) in the case of the analysis of variance,the means of these normaland homoscedastic populations mustbe linear combinations of effects due to columns mustbe additive.In addition, and/orrows-the effects as we shall notefurther below,thenatureof the t and F a teston testsalso imposesa measurement requirement: the means imposes the requirement that the measures mustbe additive, i.e., numerical. As we have alreadynoted in the discussionof their testsare not based on a statistitle,the nonparametric tical model which specifiessuch restrictive conditions. In additionto assuming are indethatthe observations tests assume that the pendent,some nonparametric -variable Morecontinuity. under studyhas underlying requirement of nonparametric over, the measurement testsis weaker; as will be shown,most nonparametric tests require eitherrankingor classificatory measurement. tests, parametric Comparedto the modelsunderlying tests are far less the models underlying nonparametric the conclusions based on nonand therefore restrictive, are more general.When a paraparametric inference we must inetric test,say the t test,is used forinference, like "If the prefaceour conclusionswith a statement and are drawn from are trulynumerical, observations populationswhich are equal in normallydistributed
variance, then we may conclude that . . .", whereas when a nonparametrictest is used for inference,we may say, "Regardless of the nature of the underlyingpopulations, that . . .99 we may concludeBy the criterionof generality,then, the nonparametric tests are preferableto the parametric. By the single criterion of power, however, the parametric tests are superior, precisely because of the strength of their assumptions; with data for which the strong and extensive assumptions and requirementsassociated with the parametric tests are valid, these tests are most likely of all tests to reject Ho when Ho is false. Attracted by the power of parametric tests, and seeking to justifytheir use of these tests with their data, researchers have developed certain approaches in an attempt to determine whether the assumptions of the parametric tests are valid for their data. For example, in connection with the assumption that the scores are drawn from a normally distributedpopulation, it is common practice to test the normalityof the distributionof the scores in the sample by use of say the 14 The American Statistician, June, 1957
x2 goodnessof fittest. If this testdoes not lead to the rejectionof Ho, the researcher concludesthat he may safelyuse testswhose statistical modelspose the condition that the populationmust be normally distributed. At least two objections to thisprocedure may be raised: (a) it involves an attempt to "prove"thenull hypothesis that the sample is froma normally distributed population-the statistical testis employed in orderto enable theresearcher to acceptthatHo, and (b) ambiguous and difficult situations arise whenthe obtainedprobability of deviations fromnormality as large as thoseobservedin the sample is close to the arbitrarily set significance level. Similar objectionsmay be raised to comparableattemptsto justifythe homoscedasticity assumptionby thatthe variattempting to "prove" the null hypothesis ances of the two or moresamplesdo not differ. testof his data indicates that When the investigator's the obtained sample of scores could well have been drawnfrom a population his earnest whichis notnormal, wish to justify the use of the mostpowerful test leads himto alterthedistribution of scores.By a mathematical on the originalscores,he "transforms" .operation them so thatthe normality assumption becomestenable.The questionwhichmustbe raised in connection withsuch an attempt is this: Will theprocessof "normalizing" the distribution by alteringthe numericalvalues of the of theexperimental 'scores cause a distortion effect under investigation? This is a questionwhichthe investigator may or may not be able to answer.If the process of of diminishing the scoreshas the effect the transforming experimental effect, thentheinvestigator has placed himselfin a paradoxicalsituation. The stepshe has takenin order to justifythe use of a statistical test whichhas maximumcapacityto reject Ho when it should be rejectedare stepswhichhave reducedthe sensitivity of the measurement and have therebyreduced the likelihood thatHo will be rejectedwhenit shouldbe. That is, his to gain power paradoxically efforts resultin a loss of
power.
of scores When the researchinvolvesthe comparison in two or more samples and when a test of their variancesrenders the homoscedasticity assumption quesof transforming scores in order tionable,the procedure to justify that assumption is open to the same objection. When we have reason to believe that the conditions of the parametric model are met in the data under choose a parametric analysis,then we should certainly statisticaltest, such as a t or F test, for analyzing those data, because of the power of parametric tests. But if the assumptions of the test are not met,then it is difficult if not impossibleto say what is reallythe
power of the parametric test. It is even difficult to estimate the extent to which a probability statement
when that probis meaningful about the hypothesis results from the inappropriateapability statement evidence some empirical )licationof the test. Although has been gatheredto show that slight deviationsin parametrictests meetingthe assumptionsunderlying on the obtainedprobability have radical effects may niot as to what thereis as yet no generalagreement figure, a "slight"deviation. constitutes Measurement involvesperforming of any statistic The computation of the researchdata. To comcertain manipulations the arithmetic pute a mean, for example,we perform of additionand divisionon thescores. manipulations may meanThe manipulations to whichobservations be subjecteddependon thesortof measurement inigfully represent.For example,if the whichthe observations the measurement, non-numerical observations represent thecentral tendency of a meanto represent computation distortion. of the observations introduces kinds of statisticaltests require different Different and thereof the researchobservations, imianipulations tests are usefulin iiaking infore different statistical of differmeasurement ferences fromdata representing emit strengths or levels. In general,we can clearly Levels of measurement. levelsat whichmeasurement define at least fourdistinct is underiiay be achieved (12), when measurement to of the assigning symbols process stood to mean manner. in some consistent observations Measurement is weakest when the objects in the universe are simply partitionedinto mutuallyexclua nominal of classesconstitutes sive classes. This system or classificatory scale. Each class may be represented a name, a number, or some othersymbol. l)y a letter, The only relationwhichholds in the nominalscale is the relationof equivalence,which holds betweenentities in the same class. We use nominal scaling in identifying the fieldsof scholarlyendeavor: we assign scale when we a scholar to a class in a nomninal say he is a "physicist,""linguist,""biochemist,"or are used as the symbolsin "historian." Oftennumbers nominal scaling. The numberson automobileplates and postal zone numbersare examples. When the objects in the various classes of a scale the scale standin some kind of relationto one another, difis an ordinal or rankingscale. The fundamental a nominaland an ordinalscale is that ference between not only the relationof the ordinal scale incorporates equivalence ( ) but also the relation"greaterthan" ( > ) - Given a group of equivalenceclasses, if the relation > holds betweensome but not all pairs of classes, we have a partiallyorderedscale. If the relation > holds for all pairs of classes so that a comiplete rank ordering of classes arises, we have an ordinal scale. In the academic world, professorial posi-
tions stand in an ordinalrelationto one another:professor > associate professor> assistant professor> instructor. The use of numbers an ordinal to represent relationis illustrated by Civil Servicejob classifications (GS12 > GSII > GS1O) and by streetaddresses. Whenthedistances between any twoclasseson a scale are knownnumerically, the classes fall on an interval scale. An interval scale has all the characteristics of an ordinal scale, and in additionhas a commonand constantunit of measurement whichassignsa real number to all pairs of objects in the orderedset. On an interval scale, the ratio of any two intervals is independent of the unitof measurement and of the zero point,both of which are arbitrary. Our two scales to measure temperature, the Fahrenheit and centigrade scales, are both examplesof interval scales. When a scale has all the characteristics of an intervalscale and in additionhas a true zero point as its origin,it is called a ratio scale. On a ratio scale, the ratio of any two scale pointsis independent of the unit of measurement. We measuremass or weightin a ratio scale. The scale of ounces and pounds has a true zero point, as does the scale of grams. The ratio betweenany two weightsis independent of the unit of measurement. For example,if we should determine the weightsof two different objects not only in pounds but also in grams,we would find that the ratio of the two pound weightswould be identical to the ratio of the two gram weights. The Permissible operations and appropriate statistics. purposeof the preceding discussionof levels of measurement is to remindthe readerthat at different times we use typographically identicalnumbersto represent observations and coding procedures of widelyvarying The manipulations strengths. which may meaningfully be performed on a set of numbersdepend on the strength of measurement whichthe numbersrepresent. It is clear, for example, that while it is certainly meaningful to add two weights(when we combinethe contentsof a two-pound box of candy with the contents of a one-poundbox we will indeed have three pounds of candy), there is no comparablesimple or usefulmeaningto the sum of two automobilelicense numbersor the sum of two streetaddresses. Each of the four levels of measurement has certain associatedwithit. In order appropriate manipulations to be able to make certain operationswith numbers that have been assignedto operations, the structure of our method of mapping numbers (assigning scores) mustbe isomorphic to some numerical which structure includesthese operations. If two systemsare isomorphic, theirstructures are the same in the relationsand operations theyallow.
In a nominal scale, the informationmay be equally well repsresented by any set of symbols, as long as the
15
ably a reflection of the lack of sensitivity of our measuring instruments,which fail to distinguish small
equivalencerelationis preserved.That is, the nominal scale is unique up to a one-to-one the transformation: symbols designating the various classes in the scale may be changedor even exchanged, if thisis done consistently and completely.Thus, a postal area may be in zone 5 now falling rezoned,with blocks formerly in zone 12, and those formerly in zone 8 now falling in zone 5, etc., and no information will be lost in the rezoning if it is accomplishedconsistently and throughly.With nominal data, the meaningful statistics are those whose information would be unchanged transformation: by a one-to-one frequency counts,the mode,etc. Undercertain conditions, we can testhypotheses regarding the distribution of cases among classes by using statistical tests which use frequencies in unorderedcategories,i.e., enumerative data. The tests are of this type. The most commonmeasure x2 of associationfor classificatory data is the contingency coefficient. All of these are nonparametric statistics. The information in an ordinal scale may be equally well represented by any orderedset of symbols. That transis, the ordinalscale is unique up to a monotonic transformation does formation-any order-preserving not diminishthe information it encodes. At present, a corporalwearstwo stripes wearsthree. and a sergeant These insignia denote that sergeant> corporal. This relation would be as well expressedif the corporal wore fourstripesand sergeant wore seven. The statistic mostappropriate for describing the centraltendency of scores in an ordinal scale is the median, for the medianis not affected by changesof any scores which are above or below it as long as the numberof scores above and below remainsthe same. With ordinaldata, hypotheses can be tested by using that large group of nonparametric statistical tests which are sometimes called "order tests" or "ranking tests." Correlation coefficients based on rankings, e.g., those of Spearman and Kendall, are appropriate. Some ranking testsassumethatthereis a continuum the observed ranks. Such an assumption underlying of is frequently quite tenableeven thoughthe grossness conour measuringdevices obscures the underlying tinuity.For example,althoughwe may classifycollege studentsat the end of their college careers in only three ranks-graduatingwith honors,graduating, and this rankingis a confailingto graduate-underlying tinuumof achievement in college. Data based on such a rankingcould appropriately be subjected to a test whichcarriedthe assumption of underlying continuity. then If a variate is trulycontinuously distributed, of a tie-betweentwo observations is the probability zero. However,tied ranks and tied scores frequently occur in researchdata. Tied scores are almostinvari-
differences which in fact do exist betweenthe tied observations they "exist" in the sense that a more sensitive measuring instrument would distinguish them. Therefore, even when ties are observed it may not be unreasonable to assume that a continuous distribution underliesthe observations. Most nonparametric techniques incorporate a correction fortied observations. In the case of an intervalscale, any change in the numbersassociated with the positions of the objects on the scale must preservenot only the orderingof the objects but also the relative differences between transformation. For example,although for a givenheat the readingson our two temperature scales, centigrade and Fahrenheit, would differ, both scales contain the same amountand the same kind of informationthey are linearlyrelated. Althoughthe two scales have a different zero pointand a different unitof measurement, a reading on one scale can be transformed to the equivalentreading on the other by the linear transformationF = 9/5 C + 32, where F - number of degrees on the Fahrenheitscale and C number of degreeson the centigrade scale. It can be shownthat the ratios of temperature differences (intervals) are independent of the unitof measurement and of the zero point. Some readings of the same heat on the two scales are: 0 Centigrade 10 30 100 Fahrenheit 32 50 86 212 Notice that the ratio of the differences betweentemperaturereadings on one scale is equal to the ratio betweenthe equivalentdifferences on the other scale. For example, on the centigradescale the ratio of the differences between30 and 10, and 10 and 0, is 30 - 10/10- 0 2. For the comparablereadingson the Fahrenheit scale,theratiois 86 - 50/50- 32 2. The interval scale is a quantitative (numerical) scale and statisticswhich are obtained by treating scores as numbers (such as the mean, the standard deviation,the Pearson product-moment correlation coefficient, etc.) may appropriately be used to represent data based on interval scaling. Most parametric statistical tests,including the t and F tests,are applicableto such data. numerical observations on the scale are preserved when the scale values are all multiplied by a positiveconstant, and thus such a transformation does not alter the information encoded in the scale. Any statistical test is usable when ratio measurement has been achieved. In additionto those statistics as mentioned previously
A ratio scale is unique up to mnultiplication by a positive constant. That is, the ratios between any two them. That is, an intervalscale is unique up to a linear
being appropriate for data in an interval scale, with a ratio scale one may meaningfullyuse such statistics as the geometric mean and the coefficientof variation-
16
Table 1-Four Levels of Measurement and the Statistics Appropriate to Each Level Defining Relations . Nominal . (1) Equivalence Examplesof Statistics Appropriate Appropriate Tests
Scale
_~~~~~~~~~~~~~~~~~~~~et
Ordinal
(1) Equivalence Interval (2) Order (3) Ratio of intervals (1) Equivalence (2) Order (3) Ratio of intervals (4) Ratio of values
Ratio
statisticswhich require knowledgeof the true zero point. Table 1 summarizes the discussionwhich has been presented concerning the relationbetweenthe strength of measurement represented by the data and the statistics and statisticaltests which are appropriate. Of of the tests' course this presumes that the assumptions statistical models are satisfied. Power-Efficiency The researchermay find that the test which suits the level of measurement he has achieved and whose statisticalmodel is appropriateto the conditionsof his research is not the most powerfultest available. Confronted by the dilemmaposed by the contradictory outcomesof the criteriaof power and appropriateness, the researchermay resolve the conflictby choosing his sample themoreappropriate testand thenenlarging in orderto increasethe power of thattest. The assertionthata testwithgreater is usuallyweaker generality by many asin the test of Ho than is a test restricted sumptions is generally true for any given sample size. But it may very well not be true in a comparisonof two statistical testswhichare appliedto two samplesof unequal size. That is, with samples of say 30, test A maybe morepowerful thantestB. ButtestB maybe more with a sample of 30 than test A is with a powerful sample of 20.
amountof increase in sample size which is necessary to make test B as powerful as is test A with a given samplesize. If testA is the mostpowerful knowntest of its type (when used with data which meet the conditions of its statistical model) and if test B is anothertest suitablefor the same researchdesign which is just as powerful withN1,cases as test A is withN, cases,then
Na
N1, For example,if test B requiresa sample of Nb- 30 cases to have the same powerthat testA has withNa 27 cases, thentest B has power-efficiency of (100) is 90 per cent. 27/30 percent, i.e., its power-efficiency This means that in order to equate the power of test A withthe powerof testB (when all the conditions of bothtests are metbythedata,and when testA is themore we need to draw 10 cases fortestB forevery powerful), 9 cases drawnfortestA. Relativeto the t and F test,the nonparametric tests suitablefortesting hypotheses analogousto thosetested from63 by the t and F testsvary in power-efficiency to 100 per cent. The weakerones are the tests percent which use classificatory data-for example, if scores suitablefor treatment by the t test were split at their
17
Power-efficiency of test B
100
per cent
median and testedfor differences in location by the median test,the power-efficiency of thattest would be about 63 per cent. Many testswhichuse rankeddata have power-efficiency around95 per cent. The randomization tests, which are used when the scores have numerical meaning, have 100 per cent power-efficiency. In considering these values, the reader is cautioned to remember thattheycomparethepowerof parametric and nonparametric testswhenused withdata for which the parametrictests are appropriate, i.e., when used withdata whichmeetthe assumptions and requirements of the statistical model for parametric tests. That is, whenwe say, for example,that the Mann-Whitney test has power-efficiency of about 95 per cent,we mean that when the Mann-Whitney test is used on two samples of scores which were independently drawn,fromnormally distributed populationsof equal variance,then with N - 40 it will reject Ho at the same level of thatthe t testwill withN = 38. But if the significance two tests were both used with data fromnon-normal populationsor with data from populations differing in variance, the Mann-Whitney test might very well reject Ho at a more stringent significance level than would a t test. Whitney (14) has shownthatfor some populationdistributions a nonparametric statistical test is clearlysuperiorin powerto a parametric one. With such data, we mustrely on the inference based on the nonparametric test, forwithsuchdata a t testis inappropriate and therefore yields less meaningful results. Advantages of Nonparametric Tests for a great many researchdesigns,both At present, parametric and nonparametric statistical testsare available. We have suggestedthat the choice betweenthe alternative tests suitable for a given research design should be based on threecriteria: (a) the applicability of the statistical models on which the tests are based to thedata of theresearch, (b) thelevelof measurement achievedin the research,and (c) the power-efficiency of each alternative test.In termsof thesecriteria, nonparametric tests have certainmerits.The enumeration of these may serve as a sumrmary of the arguments presented above, and will introducecertainadditional considerations as well. obtained from most non1. Probabilitystatements statistical testsare exactprobabilities parametric (except in thecase of largesamples, forwhichexcellent approximationsare available), regardlessof the shape of the fromwhichthe randomsample populationdistribution was drawn.That is, a conclusionbased on a nonparaas does a metric testdoes not carrystringent qualifiers, on a based test. conclusion parametric 2. If samplesas small as 6 are used,thereis no alternative to using a nonparametricstatisticaltest unless the nature of the population distribultion is known exactly.
in pilottesting and in research with This is an advantage populationswhose nature precludesthe use of large samples (e.g., populationsof persons having a rare formof illness). 3. There are suitable nonpara tests metricstatistical fromsevfor treating samplesmade up of observations tests eral different None of the parametric populations. ascan handle such data unless seemingly unrealistic sumptions are made.
4. Nonparametric statistical tests are available to treat data which are inherently in ranks as well as scores which have merely the strengthof ranks. In many fields of investigation, ordinal measurement is the strongest that usually can be achieved. Tnis is the case, for example, in the behavioral science:,. Such data, as well as those for which only gross ordering (plus or minus, for example) can be achieved, can be treated by nonparametric methods, whereas they cannot be treated by parametric methods unless precarious, untestable, and perhaps unrealistic assumptions are made about the underlyingdistributions. 5. Nonparametricmethods are available to treat classificatory data. No parametric technique applies to such data. 6. Nonparametric statistical tests are typically much easier to learn and to apply than are parametric tests. The advantage which parametric tests hold over their nonparametric counterpart is, of course, that if all the assumptions of the parametric statistical tnodel are in fact met in the research and if the measurement is of then nonparametricstatisticaltests the required strength, are wasteful of data. The degree of wastefulnessin such of the nonparacases is expressed by the power-efficiency metrictests. Table 2 indicates the variety of nonparametric tests which are now available, and shows the research design and the level of measurementfor which each is useful. This list is by no means exhaustive, but an attempthas been made to include a diversity of tests and measures of association and to include those which are most commonly used. To save space, citations for all tests are not given in this article. In most cases, Table 2 gives the names of the authors of the tests,and the reader may turn to (9) or (10) for referencesfor these tests. The randomization tests were originated by Fisher; early work on their developmentwas presented by Pitman and Welch, and more recentlyKempthorne (4, 5) has made important contributionsto them. Included in Table 2 are citations for the two most recent tests. The inclusion of several tests in the same category in Table 2 does not imply that the several tests are equivalent or interchangeable. For e xample, five tests are listed for use with k independent samples when ordinal
18
Tests and Measures of Correlation Table 2-Nonparametric Statistical forVarious Designs and Various Levels of Measurement
NONPARAMETRIC LEVEL OF MEASUREMENT Nominal Two-Sample Case One-Sample Case Related Samples
x2
STATISTICAL TEST k-Sample Case Related Samples or Randomized Blocks Independent Samples x2 test NONPARAMETRIC MEASURE OF CORRELATION -Contingency coefficient Cureton biserial rank correlation (1) Kendall rank correlation coefficient Kendall partial rank correlation coefficient
Binomialtest test
McNemartest
Cochran Q test Extension of the median test Jonckheere test Kxuskal-Wallis test Mood runs test
Festingertest KolmogorovSmirnovtest Mann-Whitney U test Ordinal KolmogorovSmirnov test Runs test Sign test Wilcoxontest Median test Friedman test
Moses test of extremereactions Wilson test (15) Wald-Wolfowitz runs test White test Wilcoxon test
Mosteller slippage Kendall coefficient of concordance test Whitneyextension Moran multiple of the U test rank correlation Spearman rank correlationcoefficient Olmstead-Tukey corner test Randomization test
Interval
Randomization test
Randomization test
Randomization test
has been achieved. Each of these has a measurement of the median test application.The extension different is useful when only incomplettordering has been may be classed either achieved,so thatany observation median.The Kruskal-Wallis above or belowthe common test is a more generaltest for data in whichcomplete has been achieved,and thus it is morepowerordering ful than the extensionof the median test. Whitney's extension of the U testis notan analogueof theanalysis beinga signifitest, as is theKruskal-Wallis of variance, cance testfor onlythreesampleswhichteststhe prediction thatthe threeaverageswill occur in a specificoralternatestis a testagainstordered der.The Jonckheere one group tives; the Mostellertechniquetestswhether to the rightof the others; and has slippedsignificantly to any sorts of difruns testis sensitive the k-sample in location. notjust differences ferences amonggroups,
REFERENCES' 1. Cureton,E. E., "Rank-BiserialCorrelation,"Psychometrika, 1956, 21, 287-290. 2. Dixon, W. J., and Massey, F. J., Introductionto Statistical Analysis. (2nd Ed.) New York: McGraw-Hill,1957. 3. Fraser, D. A. S., Nonparametric Methods in Statistics. New York: Wiley, 1957.
4. Kempthorne, O., The Design and Analysis of Experiments. New York: Wiley,1952. 5. Kempthorne,O., "The Randomization Theory of Experimental Inference," J. Amer. Statist.Assn., 1955, 50, 946-967. 6. Mood, A. M., Introduction to the Theoryof Statistics.New York: McGraw-Hill,1950. 7. Moses, L. W., "Non-Parametric Statistics for Psychological Research," Psychol. Bull., 1952, 49, 122-143. 8. Mosteller,F., and Bush, R. R. "Selected QuantitativeTechniques." In G. Lindzey (Ed.) Handbook of Social Psychology. Vol. 1. Theoryand Method. Cambridge,Mass.: Addison-Wesley,1954. Pp. 289-334. 9. Savage, I. R., "Bibliography of Nonparametric Statistics and Related Topics," J. Amer. Statist. Assn., 1953, 48, 844906. 10. Siegel, S., Nonparametric Statistics: For the Behavioral Sciences. New York: McGraw-Hill,1956. 11. Smith, K., "Distribution-FreeStatistical Methods and the Concept of Power Efficiency."In L. Festinger and D. Katz (Eds.), Research Methods in the Behavioral Sciences. New York: Dryden,1953. Pp. 536-577. 12. S'evens, S. S., "On the Theory of Scales of Measurement," Science, 1946, 103, 677-680. 13. Tukey, J. W. The Simplest Signed-Rank Tests. Mimeographed Report No. 17, Statistical Research Group, Princeton University, 1949. 14. Whitney, D. R. "A Comparisonof the Power of Non-Parametric Tests and Tests Based on the Normal Distribution under Non-NormalAlternatives.'Unpublished doctor's dissertation,Ohio State University,1948. 15. Wilson, K. V., "A Distribution-Free Test of Analysis of Variance Hypotheses,"Psychol. Bull., 1956, 53, 96-101.
19