Measurement and Evaluation in Education - MA-Edu - ED-804 E - English - 21072017 PDF
Measurement and Evaluation in Education - MA-Edu - ED-804 E - English - 21072017 PDF
Measurement and Evaluation in Education - MA-Edu - ED-804 E - English - 21072017 PDF
EVALUATION IN EDUCATION
MA [Education]
Second Semester
EDCN 804E
[ENGLISH EDITION]
Authors:
Dr Harish Kumar and Santosh Kumar Rout: Units (1.2, 2.3-2.4, 2.5, 3.3, 4.8) © Dr Harish Kumar and Santosh Kumar Rout, 2016
Dr Manisha Dalabh: Units (1.3.1, 3.4-3.4.1, 3.4.3, 5.4-5.5) © Dr Manisha Dalabh, 2016
Dr Jasim Ahmad and Dr Aerum Khan: Units (1.4, 1.6, 2.2, 3.2, 4.3, 4.6) © Dr Jasim Ahmad and Dr Aerum Khan, 2016
JS Chandan: Unit (4.4, 4.6.1) © JS Chandan, 2016
CR Kothari: Unit (4.5, 4.7) © CR Kothari, 2016
Lokesh Koul: Unit (5.2, 5.3) © Lokesh Koul, 2016
Vikas Publishing House: Units (1.0-1.1, 1.3, 1.5, 1.7-1.11, 2.0-2.1, 2.4.1-2.4.2, 2.6-2.10, 3.0-3.1, 3.4.2, 3.5-3.9, 4.0-4.2, 4.9-4.13,
5.0-5.1, 5.6-5.10) © Reserved, 2016
Books are developed, printed and published on behalf of Directorate of Distance Education,
Tripura University by Vikas Publishing House Pvt. Ltd.
All rights reserved. No part of this publication which is material, protected by this copyright notice
may not be reproduced or transmitted or utilized or stored in any form of by any means now known
or hereinafter invented, electronic, digital or mechanical, including photocopying, scanning, recording
or by any information storage or retrieval system, without prior written permission from the DDE,
Tripura University & Publisher.
Information contained in this book has been published by VIKAS® Publishing House Pvt. Ltd. and has
been obtained by its Authors from sources believed to be reliable and are correct to the best of their
knowledge. However, the Publisher and its Authors shall in no event be liable for any errors, omissions
or damages arising out of use of this information and specifically disclaim any implied warranties or
merchantability or fitness for any particular use.
Unit - II
Major Tool and Techniques in Educational Evaluation, Unit 2: Major Tools and Techniques
Different Types of Tests-teacher made Vs. Standardized, in Educational Evaluation
Criterin-referenced vs. Norm-referenced Test, Essential (Pages: 47-73)
Qualities of Good Measuring Instrument.
Education Tests, Measurement of Achievement-
Construction of Achievement Test and Standardization,
Relative Merits and Demerits of Using Different Types of
Test Items, Diagnostic Test-construction and Usefulness.
Unit - III
Acquaintance with Psychological Tests in the Area of Unit 3: Psychological Testing
Intelligences, Attitude and Personality, Examination System- (Pages: 75-109)
current Strategies-examination Reforms-open Book
Examination-Semester System.
Unit - IV
Statistical Treatment of Data: Frequency Distribution and Unit 4: Statistics in Measurement
Graphic Representation of Data, Measures of Central and Evaluation-I
Tendency and Variability, Co-efficient of Correlation by Rank (Pages: 111-209)
Differences and Product Moment Methods, Percentile and
Percentile Rank, Skewness and Kurtosis, Normal Probability
Curve, Derived Scores (Z-Score, Standard Score and
T-Score).
Unit - V
Reliability-Concept, Determining Factors-Methods of Unit 5: Statistics in Measurement
Determining Different Reliability Coefficient, Validity- and Evaluation-II
Concept and Use-Types of Validity-Determination of (Pages: 211-248)
Validity Co-efficient-relation between Validity and Reliability,
Trends in Evaluation: Grading, Credit System, Cumulative
Record Card. Computer in Evaluation.
CONTENTS
INTRODUCTION 1
INTRODUCTION
Quality of education has become very important in today’s competitive environment.
NOTES
There is definitely a need to adapt to change in the educational processes in order
to improve. Educational measurement requires the establishment of a strong
feedback loop with evaluation being a continuous process and not just left until the
end of the program of study.
Measurement refers to the process by which the attributes or dimensions of
some physical object are determined. When used in the context of learning, it
would refer to applying a standard scale or measuring device to an object, series
of objects, events or conditions, according to practices accepted by those who
are skilled in the use of the device or scale. On the other hand, evaluation is a
complex and less understood term. Inherent in the idea of evaluation is ‘value’. It
involves engaging in some process that is designed to provide information that will
help us make a judgment about a given situation. Generally, any evaluation process
requires information about the situation in question.
This book Measurement and Evaluation in Education introduces the
concept of testing and evaluation to its readers. This book is crucial in explaining
how, even though different evaluation scales have been designed, the success is
not entirely determined by the efficiency of the test. More often than not, the right
test administered to the right individual makes all the difference. The way in which
tests are constructed and comprehended is crucial to the process of evaluation.
Different tests are designed to examine different attributes of intelligence and its
evaluation is contingent on how well these tests have been understood. In other
words, the lucidity of a test, inclusive of the kind of questions, is an important
factor in determining its success and accuracy.
The book is divided into five units. It has been designed keeping in mind the
self-instruction mode (SIM) format and follows a simple pattern, wherein each
unit of the book begins with Introduction followed by Unit Objectives to the topic.
The content is then presented in a simple and easy-to-understand manner, and is
interspersed with Check Your Progress questions to test the understanding of the
topic by the students. A list of Questions and Exercises is also provided at the end
of each unit, and includes short-answer as well as long-answer questions. Key
Terms and the Summary section is a useful tool for students and are meant for
effective recapitulation of the text.
Self-Instructional Material 1
Measurement and
EVALUATION IN
NOTES
EDUCATION: OVERVIEW
Structure
1.0 Introduction
1.1 Unit Objectives
1.2 Concept of Measurement and Evaluation
1.3 Different Types of Measuring Scales
1.3.1 Need for Measurement and Evaluation in Education
1.4 Placement, Diagnostic, Formative and Summative Evaluation
1.5 Role of Teachers in an Evaluation Programme
1.6 Taxonomy of Educational Objectives
1.6.1 Specification of Objective Steps in Evaluation Process
1.7 Summary
1.8 Key Terms
1.9 Answers to ‘Check Your Progress’
1.10 Questions and Exercises
1.11 Further Reading
1.0 INTRODUCTION
Types of measurement
Measurement is of two types: (i) physical measurement and (ii) mental measurement/ NOTES
psychological measurement/educational measurement.
(i) Physical measurement: Physical measurement is the measurement of the
object which has absolute existence. For example, we measure the height of
individuals, the weight of rice, etc. Here, we directly measure the height or
weight of an individual and all the measuring tools of physical measurement
start from zero. Physical measurement is always accurate and quantitative,
and there are some set of tools for physical measurement all over the world.
(ii) Mental measurement: Mental measurement is also known as ‘educational
measurement’ or ‘psychological measurement’. It is always relative and
there is no absolute zero in case of mental measurement. For example, for
measuring the intelligence of a person we have to take the help of intelligence
tests which are subjective in nature. Through his response, we can know
the level of intelligence of the person concerned. Mental measurement is
both qualitative and quantitative in nature, and there are no fixed tools for
such measurement i.e., the same set of tools may not be applied to different
types of persons.
The application of the principles of measurement in the field of education is
known as ‘educational measurement’. In the educational system, measurement is
the quantitative assessment of performance of the students in a given test. It can be
used to compare performance between different students and to indicate the strengths
and weaknesses of the students. It helps in classifying students into homogenous
group, to assign educational and vocational guidance and to provide remedial measures
to the low achievers. Measurement is a tool in the hands of educational psychologists
to study human behaviour. Educational psychologists take the help of different valid
and reliable psychological tests to know the level of different traits within an individual.
The different kinds of such tests are: intelligence test, achievement test, attitude
test, aptitude test, interest inventory, personality test, etc. The methods used for
these tests are: observation, interview, checklist, rating scale, examinations, cumulative
record card and anecdotal records etc.
In the teaching–learning situation, teachers should be competent enough to measure
the student’s achievement, intelligence, attitude, aptitude etc. To develop competency
among the teachers in educational measurement, Ebel has suggested the following
measures:
(i) Know how to administer a test properly, efficiently and fairly.
(ii) Know how to interpret test scores correctly and fully, but with recognition of
their limitations.
(iii) Know how to select a standardized test that will be effective in a particular
situation.
(iv) Know how to plan a test and write the test questions, to be included in it.
Self-Instructional Material 5
Measurement and (v) Know the educational uses as well as the limitations of educational tests.
Evaluation in
Education: Overview (vi) Know the criteria by which the quality of a test should be judged and how to
secure evidence relating to these criteria.
NOTES Characteristics of good measurement tool
To measure the psychological traits with validity and reliability, the measuring
instrument or tests should be far away from the aspects like personal errors, variable
errors, constant errors and interpretative errors. The important characteristics of a
good measuring tool is as follows:
(i) Should be valid: Validity of a test refers to its truthfulness. It refers to the
extent to which a test measures what it actually wishes to measure. Suppose
we want to know whether a Numerical Reasoning Test is valid. If it really
measures the reasoning ability, the test can be said to be valid.
(ii) Should be reliable: Reliability means the consistency of a measuring
instrument (how accurately it measures). It refers to the faithfulness of the
test. To express in a general way, if a measuring instrument measures
consistently, it is reliable. For example, a test is administered on English to the
students of class VI. In this test, Ram scores 50. After a few days, the same
test is administered and Ram scores 50. Here, the test is reliable because
there is consistency in the result.
(iii) Should be objective: Objectivity of a test refers to two aspects: (a) item
objectivity (i.e., objectivity of the items), and (b) scoring objectivity (i.e.,
objectivity of scoring). By ‘item objectivity’ we mean that the items of the
test must need a definite single answer. If the answer is scored by different
examiners the marks would not vary. Ambiguous questions, lack of proper
direction, double barrelled questions, questions with double negatives, essay-
type questions must be avoided because they lack objectivity. By ‘objectivity
of scoring’ we mean that by whomsoever scored, the test would fetch the
same score. Thus, mostly the objective-type questions should be framed to
maintain the objectivity of the test.
(iv) Should be usable and practicable: ‘Usability’ refers to the practicability of
the test. In the teaching–learning situation, by usability we mean the degree
to which the test (or the measuring tool) can be successfully used by teachers
and school administrators.
(v) Should be comprehensive and precise: The test must be comprehensive
and precise. It means that the items must be free from ambiguity. The directions
to test items must be clear and understandable. The directions for
administration and for scoring must be clearly stated so that a classroom
teacher can easily understand and follow them.
(vi) Should be easy in administering: If the directions for administration are
complicated, or if they involve more time and labour, the users may lag behind.
For example, Wechsler Adult Intelligence Scale (WAIS) is a good test, but its
administration is very difficult.
6 Self-Instructional Material
(vii) Should be economical: A measurement tool should be less time consuming. Measurement and
Evaluation in
The cost of the test must be reasonable so that the schools/educational Education: Overview
institutions can afford to purchase and use it.
(viii) Should be easy in scoring: The scoring procedure of the test should be
clear and simple. The scoring directions and adequate scoring key should be NOTES
provided to the scorer so that the test is easily scored.
(ix) Should be easily available: Some standardized tests are well-known all
over India, but they are not easily available. Such tests have less usability. It
is desirable that in order to be usable, the test must be readily and easily
available.
(x) Should have good and attractive get up/appearance: The quality of papers
used, typography and printing, letter size, spacing, pictures and diagrams
presented, its binding, space for pupil’s responses etc., need to be of very
good quality and attractive.
Nature of Educational Measurement and Evaluation
Evaluation is an act or process that assigns ‘value’ to a measure. When we are
evaluating, we are making a judgment as to the suitability, desirability or value of a
thing. In the teaching–learning situation, evaluation is a continuous process and is
concerned with more than just the formal academic achievement of students.
Evaluation refers to the assessment of a student’s progress towards stated objectives,
the efficiency of the teaching and the effectiveness of the curriculum. Evaluation is
a broad concept dealing not just with the classroom examination system; but also
evaluating the cognitive, affective and psychomotor domain of students. The success
and failure of teaching depends upon teaching strategies, tactics and aids. Thus, the
evaluation approach improves the instructional procedure. Glaser’s basic model of
teaching refers to this step as a ‘feedback function’.
J.M. Bradfield defines evaluation as ‘the assignment of symbols to
phenomenon in order to characterize the worth or value of the phenomenon usually
with reference to some social, cultural and scientific standards’. Wright Stone stated,
‘evaluation is a relatively new technical term introduced to designate a more
comprehensive concept of measurement than is implied in conventional test and
examination’. Hanna defined evaluation as ‘the process of gathering and interpreting
evidence on change in the behaviour of all students as they progress through school’.
Evaluation takes place with the help of tests and measurements. In a classroom
situation, teachers first use classroom tests to evaluate students according to their
different traits. After getting the answer papers, teachers provide some numerals to
the answer papers, this step is known as measurement. So measurement deals with
only the quantitative description. After the measurement step, the teachers arrange
the students as first, second, third etc., according to their achievements. This step is
evaluation. So evaluation is a philosophical and subjective concept. It includes both
quantitative and qualitative descriptions, and value judgment.
Therefore, Evaluation = Quantitative Description (Measurement) and/or
Qualitative Description (Non-measurement) + Value Judgments.
Self-Instructional Material 7
Measurement and Characteristics of evaluation
Evaluation in
Education: Overview The characteristics of evaluation are as follows:
• It is a systematic process.
NOTES • It measures the effectiveness of learning that experiences provide.
• It measures how far the instructional objectives have been achieved.
• It uses certain tools like tests, observation, interview etc.
• It is a continuous process.
• It is a subjective judgment.
• It is philosophical in nature.
• It includes quantitative description, qualitative description and value judgment.
• It gets data from measurement.
• It not only determines the magnitude, but also adds meaning to measurement.
• It involves values and purposes.
Evaluation from educational angle
Anything that needs to be evaluated has certain aims and objectives, and through
evaluation we assess how far these objectives have been fulfilled. From an educational
angle, we can evaluate many aspects which are the part and parcel of an educational
system such as:
(i) Evaluation of a school site (with reference to its location, building, hygienic
condition, strength of students and teachers etc.).
(ii) Evaluation of a school programme (school syllabus, cocurricular activities,
guidance programmes etc.).
(iii) Evaluation of teaching methods (with reference to aims, purposes, suitability
and efficacy).
(iv) Evaluation of total programme of instruction (with reference to cognitive,
affective and psychomotor domain).
(v) Evaluation of school administration; discipline, control, management and
organization.
(vi) Evaluation of textbooks and other course materials.
(vii) Evaluation of students’ growth etc.
The steps involved in an evaluation process take place in a hierarchy. These steps
are:
• Evaluating
• Planning of appropriate learning experiences
• Selecting appropriate teaching points
• Specification of desired student behaviour
• Identification and definition of specific objectives
• Identification and definition general objectives
8 Self-Instructional Material
Measurement and
Evaluation in
CHECK YOUR PROGRESS Education: Overview
1. Define measurement.
2. What do you understand by objectivity of a test? NOTES
3. How does J.M. Bradfield define evaluation?
Self-Instructional Material 11
Measurement and (ii) Qualitative assessment
Evaluation in
Education: Overview The qualitative assessment description implies observation by teachers and records
maintained by them pertaining to various facets of student personality and performance
in the school. The systematic record of statement by teachers about the special
NOTES
achievement or exceptional ability in non-scholastic area or exceptional behaviour
of students is one of the illustrations of the qualitative assessment description.
(iii) Teachers’ opinion
Teachers form opinions about the students on the basis of conclusions drawn by
them from quantitative measures and a qualitative description of the behaviour of
students. The opinion of a teacher arrived at on the basis of qualitative and quantitative
measurement gives a comprehensive picture of the students’ progress. Though it is
an evaluation in an informal setting, yet it conveys information about the students’
performance on some vital issues.
Importance of Continuous and Comprehensive Evaluation
It does not suffice in the modern era that one is good only in academics. There are
lots of opportunities for the students who somehow could not perform satisfactorily
in academics. These activities need to be evaluated at a regular basis and are classified
into scholastic and non-scholastic areas. In order to check the level of achievement
and performance both in scholastic and non-scholastic areas, it is imperative that
continuous and comprehensive evaluation should be practised in schools.
Evaluation on a regular basis should be used to monitor individual progress
and to show the achievement level of each student in scholastic and non-scholastic
areas throughout the year. With the help of such an evaluation, it can be assessed
whether the objectives mentioned in the curriculum have been achieved or not. The
objectives are set in order to improve not only cognitive domain in the students but
also to the affective and psychomotor domains. This is so because these domains
are complementary to each other and the absence of any one of these can create a
sense of lack in the life of an individual.
The assessment of the achievement and performance in all the three domains
is ‘comprehensive’ evaluation. The students’ education is a continuous process which
is a result of their continuous learning both in formal and informal settings. They go
through various experiences such as teaching, learning, practical observations, self
study, etc. The expected behavioural changes in children are reinforced by their
direct involvement and self-learning. Objective records of such changes are required
to be maintained from time to time. They form a major part of continuous evaluation.
It is obvious that evaluation of behavioural modifications observed in the students’
personalities are to be undertaken continuously. These modifications may be related
to the development of intellect, emotions and practical skills.
The evaluation in school should aim at assessing these modifications, stated
in terms of behavioural objectives, continuously and comprehensively through formal
or informal techniques. The techniques to be used may be in the form of a written
12 Self-Instructional Material
examination or observation tests, group work or individual assignments. Provision Measurement and
Evaluation in
has to be made for their periodic evaluation. In the course of the teaching–learning Education: Overview
process, it is of paramount importance to aim for a continuous and comprehensive
evaluation that uses formal or informal assessment. These assessments may be
based on observations that are made taking into account the pupil’s overall progress NOTES
by asking them oral or written questions individually or in a group.
The national curriculum framework of school education prescribes social
and qualities that all children need to develop regardless of their academic aptitude.
In other words, even if the student is not able to compete academically, he or she
can explore other capabilities such as arts, humanities, sports, athletics, and music
etc. Excelling in any field can increase the students’ confidence manifold and may
even inspire them to put in more efforts into the curricular studies. Students, with
the help of this revolutionary evaluation system, can finally understand that their
competition is with themselves and not with the others.
Not just children, even teachers benefit from this kind of evaluation since
they are encouraged to take important decisions regarding the learning process,
efficiency, ensuring quality, and accountability.
Continuous and comprehensive evaluation helps to empower teachers in the following
ways:
• They get pointers on how to improve the evaluation procedure.
• They can formulate ways to help students master competencies in which the
students are falling behind through planned remedial steps
• They can select and deploy media such as audio visual aids for teaching in
classrooms, especially subjects which cannot be understood purely with
theoretical learning.
To sum up, following are the reasons why continuous and comprehensive evaluation
is important to improve the overall education system in India:
• For making sure that learning in schools is not just about rote methods
• For ensuring that the examination system is more flexible and tests capabilities
of a wider range
• For making sure the curriculum enriches the individual as a whole besides
promoting academic learning
• For ensuring that classroom knowledge is relevant to life outside as well
Self-Instructional Material 13
Measurement and
Evaluation in 1.4 PLACEMENT, DIAGNOSTIC, FORMATIVE AND
Education: Overview
SUMMATIVE EVALUATION
14 Self-Instructional Material
The goals of placement assessment are to determine for each student the Measurement and
Evaluation in
position in the instructional sequence and the mode of instruction that is most beneficial. Education: Overview
For example, the B.Ed. entrance test is conducted to give admission to the students’
in B.Ed. course. This type of evaluation is called ‘placement evaluation’.
NOTES
(iii) Diagnostic Evaluation
It is concerned with the persistent learning difficulties that are left unresolved by the
corrective prescriptions of formative assessment. It aims at identifying or diagnosing
the weaknesses of students in a given course of instruction. Diagnostic evaluation
involves the use of specially prepared diagnostic tests and various observational
techniques. The aim of diagnostic assessment is to determine the causes of persistent
learning problems of students and to formulate a plan for remedial action. When a
teacher finds that in spite of the use of various alternative methods and techniques,
the student still faces learning difficulties, he takes recourse to a detailed diagnosis.
This type of evaluation includes vision tests, hearing tests and other tests used to
determine how the student approaches a reading assignment, as well as it examines
whether the student relies on pictures, sound, context clues, skip over unfamiliar
words, etc.
(iv) Summative Evaluation
As the name indicates, summative evaluation is done at the end of a course semester,
or a class or topic. It is meant to evaluate the quality of the final product and to find
out the extent to which the instructional objectives have been achieved.
No remedial teaching is given after summative evaluation. The process of
certification is done on the basis of the results of summative evaluation. Results of
this evalauation reflect the effectiveness of the curriculum transaction process.
Important examples of summative evaluations are annual exams, semester end exams
and terminal exams. This is much more concerned with judging about the final
product. The important tools of summative evaluation are achievement test, rating
scales, project evaluation by experts, interviews, viva-voce examination, etc. The
characteristic features of summative evaluation are as follows:
• This evaluation is conducted at the end of a topic, class, chapter, unit or
course of instruction.
• Evaluation results give the final progress of the students in a class, in a
topic, in a unit, in a course or in any educational programme.
• Summative evaluation results are used for preparing merit list, finalizing
position, taking decisions of pass/fail/promotion and awarding degrees or
diplomas.
Difference between formative and summative evaluation
Take 1.1 summarizes the difference between formative and summative evaluations.
Self-Instructional Material 15
Measurement and Table 1.1 Difference between Formative and Summative Evaluations
Evaluation in
Education: Overview
Formative Evaluation Summative Evaluation
1. Conducted during the process of teaching 1. Conducted at the end of the process of
NOTES and learning, during the class, during the teaching and learning, e.g., at the end of
semester or a session. class, at the end of semester, at the end of
2. Determines the level of achievement in a session, etc.
small task learned in a short duration. 2. Determines the level of achievement in a
3. Conducted regularly during the class, major task learned in a longer duration.
course or session. 3. Conducted at the end of the course or
4. Gives limited generalization. session or programme.
External Evaluation
The evaluation procedure in which the evaluators or examiners are invited from
outside is called external evaluation. The teachers who are teaching a particular
group of students are not involved in the evaluation of their students. Hence, the
teachers and the examiners are different; they are never common. The teacher
who is teaching a particular group of students or class is not the examiner of the
same group of students. Their performances or achievements are judged or evaluated
by outside teachers.
External evaluation includes all board examinations, viva conducted by expert from
outside, annual examinations in schools in which paper comes from outside and not
set out by the teachers of the same school, etc.
The significance of external evaluation is that it eliminates the element of bias from
the evaluation system.
Internal Evaluation
Internal evaluation is the evaluation in which the teacher and the examiner are the
same person. The same teacher teaches a particular subject and himself or herself
sets the paper and evaluates the achievement of the students. No external expert is
invited in this type of evaluation.
Class test, unit test, weekly test, monthly test, quarterly test, etc. are the examples
of internal evaluation
Significance
• It helps in finding out students’ progress.
• Remedial teaching can be organized for weaker students.
• Teachers himself or herself comes to know about his/her own and their
students’ weak areas.
• It helps in building strong foundation of students.
16 Self-Instructional Material
Measurement and
Evaluation in
CHECK YOUR PROGRESS Education: Overview
Self-Instructional Material 17
Measurement and 3. The interest children develop.
Evaluation in
Education: Overview 4. The attitudes children manifest.
If education imparted is effective, then the child will behave differently from
NOTES the way he did before he came to school. The pupil knows something of which he
was ignorant before. He understands something which he did not understand before.
He can solve problems he could not solve before. He can do something which he
could not do before. He revises his attitudes desirably towards things.
These objectives must involve points of information, the skills and attitudes to
be developed and interests that could be created through the particular topic or
subject taken up for work in the classroom. A statement of classroom objectives:
1. Serves as a basis for the classroom procedures that should provide for suitable
experiences to the children.
2. Serves as a guide in seeking evidence to determine the extent to which the
classroom work has accomplished what it set out to do.
A learning experience is not synonymous with the content of instruction or
what the teacher does. Learning results from the active reaction of the pupil to the
stimulus situation which the teacher creates in the class. A pupil learns what he
does. He is an active participant in what goes on in the class. Changes in a pupil’s
way of thinking and developing concepts, attitudes and interests have to be brought
about gradually. No simple experience will result in the change. Many experiences,
one reinforcing another, will have to be provided. They may have to be repeated in
increasing complexity or levels in meaningful sequence extended over a period of
time. A cumulative effect of such experiences will evoke the desired change of
behaviour with reference to a specific objective.
The following considerations will be useful in the selection of such experiences:
1. Are they directly related to goals?
2. Are they meaningful and satisfying to the learners?
3. Are they appropriate to the maturity of the learners?
Why to Assess
1. To assess the achievement of the students. The abilities and the
achievements of the students must be assessed. Examinations are conducted
to discover whether or not the child had been able to acquire a certain amount
of knowledge or skill.
2. To measure personality. They are used to test the power of clear thinking,
quickness of mind, calmness and perseverance.
3. To measure the efficiency of teachers and of the school. They provide
a suitable occasion for the authorities to measure the efficiency of the teachers.
The efficiency of the institution is also measured through the examinations.
They provide a proper occasion to the teachers to know whether or not their
methods of teaching are appropriate.
4. To help in diagnosis. They help parents to know the progress of their
children. They help to discover the specific weak points of an individual or
18 Self-Instructional Material
class and thus give an opportunity to the teachers as well as to the taught to Measurement and
Evaluation in
remove these defects. Education: Overview
5. To act as an incentive. Stimulation to work hard is provided to the students
through the institution of examinations. Some objectives are placed before
the students and for the realization of those objectives the students develop in NOTES
them the habits of constant hard work.
6. To help in prognosis. The examinations have a prognostic value as well.
With this device, the aptitudes of the students are determined.
7. To provide uniformity of standard. The external examinations facilitate
the problem of uniformity of standards attained by the students of different
institutions.
8. To help in guidance. They facilitate the work of grouping individuals for the
purpose of teaching by bringing those together who have more or less the
same attainment.
9. To measure fitness for admission to higher courses. They are designed
to determine the capacity and fitness of the candidate to pursue higher courses
of general or professional study or training. Examinations which serve this
purpose are called entrance or qualifying examinations.
10. To help in selection by competition. The examinations are also conducted
to select the best candidates for appointment to public service or for award of
prizes and scholarships.
11. Study of every subject. The students study all the subjects including what
are known as ‘dull ones’ for fear of examinations.
12. Parents’ point of view. Examinations provide an opportunity to the parents
to know where their wards stand and whether money spent on them is being
utilized properly.
13. To certify competency. A technological society needs qualified doctors,
engineers and scientists, and so on, and the examination system certifies their
presence even if inadequately.
14. To link schools with the world. Examinations fulfill an important role in
linking schools with each other and with the world.
All the three domains are not divided entirely by a water tight compartment.
Each of the three domains is interrelated and achievements in one domain influence
the achievement by other domain. However, all domains develop in a hierarchical
order and constitute the whole personality of an individual. If an individual develops
his or her three domains in any area of study, he or she is having a good personality,
and if any one domain is underdeveloped, then he or she has not nurtured himself or
herself as a groomed personality. Hierarchical means of learning at the higher level
in any domain is entirely dependent on having acquired prerequisite knowledge and
skills at lower level. The three domains may be represented in short as:
1. Cognitive domain (about knowing-intellectual aspects of personality of an
individual)
2. Affective domain (about attitudes, feelings, interests, values and beliefs of an
individual)
20 Self-Instructional Material
3. Psychomotor domain (about doing-skill aspects of the personality of an Measurement and
Evaluation in
individual) Education: Overview
All the three domains have their own taxonomy or classification. In all domains,
the levels of expertise or abilities are arranged in order of increasing complexity, i.e.,
in hierarchical order of difficulty levels. Learning outcomes that require higher levels NOTES
of expertise require more effective teaching and more sophisticated classroom
techniques and method of teaching.
Dimensions of learning basically form a framework of learning focused on
preparing instructional planning, keeping in view cognition (the awareness part) and
learning in practical classroom situations. This framework serves three major purposes.
These are as follows:
1. It provides a process for planning and delivering curriculum and instruction
that integrates much of the research on effective teaching and learning.
2. It offers a way of integrating the major instructional models by showing how
they are connected and where the overlaps occur.
3. It provides a framework for organizing, describing and developing research-
based teaching strategies that engage students in the types of thinking that
can lead to meaningful learning.
The following five aspects of learning should be considered while finalizing
curriculum, instruction and assessment:
• Attitudes and perceptions about learning
• Using knowledge meaningfully
• Identifying productive habits of the mind
• Acquiring and integrating knowledge
• Extending and refining knowledge
I. Cognitive Domain
Cognitive domain includes those objectives of education which attempt to develop
mental faculties or intellectual abilities, i.e., the ability of knowing, understanding,
thinking and problem solving. It develops our factual knowledge, conceptual
understanding and all levels (lower, middle and higher) of thinking. It covers whole
of mental abilities and mental operations.
Classification of cognitive domain
The classification of cognitive domain was done in 1956 by Bloom and is commonly
known as Bloom’s Taxonomy of Cognitive Domain (Bloom et.al, 1956). Cognitive
taxonomy or classification of cognitive domain has knowledge-based goals. This
domain has been classified into six major categories which are arranged in a
hierarchical order based on the levels of complexities of cognitive or mental or
intellectual tasks or operations. These are arranged from simple to complex and
concrete to abstract starting from category one to category six, respectively.
Self-Instructional Material 21
Measurement and 1. Knowledge: This is the first and the lowest level of cognitive domain. It
Evaluation in
Education: Overview represents memory and constitutes recall and recognition of various facts,
concepts, principles, theories and laws of physical science. No addition or
deletion is done in this category; we simply recall and recognize things. In the
NOTES revised Bloom’s taxonomy, this category is given new name ‘remembering’.
Example: The symbol of iron is ‘Fe’.
The chemical formula of sulphuric acid is ‘H2SO4’.
Laws of motion were given by ‘Isac Newton’.
A magnet has two poles– N (north) and S (south)
2. Understanding: This is the second level of cognitive domain and develops
only after the development of the first category, i.e., knowledge or remembering
in any particular area of study, also in physical science. Learners are expected
to go beyond the level of recall and recognition. After having developed this
level of understanding on any topic, learners become capable of doing the
following major tasks, which in turn indicates that the learners have acquired
the level of understanding in a given topic:
(i) Translate, summarize or define the acquired knowledge in their own
words.
(ii) Describe, elaborate, extrapolate and explain natural phenomena or
events or process or method, etc.
(iii) Interpret the acquired information or knowledge in their own way and
give their own examples. They can discriminate or differentiate between
two or many objects or concepts. Classify and categorize various objects
into groups on the basis of some criteria. Verify and generalize facts
and concepts.
Example: After having understood the structure of atom, learners not only
recall protons, electrons and neutrons but also describe the structure of an
atom. Now learners can also explain why an atom is neutral with the help of
charges acquired by every fundamental particle and numbers of all three
particles in an atom.
3. Application: After having acquired knowledge and understanding levels of
any topic (may be a fact, concept, principle, theory or law), learners should
be able to apply them in their day-to-day lives. Application of any concept,
principle, theory or law in daily life and solving problems of varied nature is
impossible without its knowledge and understanding. Unless the learner is
able to apply whatever knowledge and understanding he or she has acquired,
it has no meaning at all and indicates that the learners have not understood
the content properly. by applying or implementing the gained knowledge and
understanding of various contents, you can solve many problems of daily life,
under concrete and abstract situations.
Example: If learners know and understand the importance of natural
resources, underground water crisis, electricity supply and demand relationship
22 Self-Instructional Material
and other such problems of daily lives, they will take care of these things in Measurement and
Evaluation in
their day-to-day life, and by applying this understanding, they will try to minimize Education: Overview
wastage of water and electricity in their homes, schools and society by proper
and judicial use of these things.
4. Analysis: This is the fourth higher level category of cognitive abilities. At NOTES
this stage, learners develop the potential to analyse and breakdown the whole
into its various components or constituents and detect the relationship and
organization of its various components. Learners develop the ability to break
a law and theory into its various inherent facts, concepts and principles on the
basis of which that theory or law has been created or proposed.
Example: Learners are taught about the laws of motion. Suppose they know
and understand the third law of motion which states, ‘to every action there is
an equal and opposite reaction’. They have also developed the ability to apply
this knowledge and understanding in their daily lives. If the analytical ability
has developed, they would be able to analyse this law in the event of some
likely situations. They would also be able to describe its every concept like
action and reaction. One can analyse anything if he or she has knowledge
and understanding of that thing and also has the potential to apply it. In the
process of analysis, three tasks are performed in general. These are as follows:
(a) Analysis of elements or constituents making the whole
(b) Analysis of relationship among various constituents
(c) Analysis of the organizational patterns of the constituents
5. Synthesis: This is the process of putting together various constituents to
make a whole. This is a higher level thinking ability and is complex in nature,
which involves the creation of a new pattern or structure by manipulating
various constituents. It has the elements of creativity attached with it.
Development of creative personality requires this level of cognition to be
achieved by the learners. All creative people have this ability in common.
Synthesis involves the following three things:
• Development of a unique communication
• Development of a plan, procedure or proposed set of operation
• Development of a set of abstract relations
6. Evaluation: This is the process of judgment about the worth or value of a
process or a product. It includes all the content, i.e., facts, concepts, principles,
theories and laws of physical sciences. It is the highest and the most complex
level of cognitive ability and involves all the five categories discussed earlier.
It is a quantitative as well as a qualitative process. It leads to the development
of decision-making ability among the learners and involves judgment in terms
of internal as well as external criteria.
Assessment of cognitive learning
Cognitive learning is defined as the acquisition of knowledge and skills by the mental
process. Cognition includes representation of physical objects and events and other
information processing.
Self-Instructional Material 23
Measurement and The cognitive learning is a result of listening, watching, touching or
Evaluation in
Education: Overview experiencing. Cognitive learning is mainly awareness towards the environment—
the environment that provides some meaning to acquired knowledge.
After these activities, it is important to process and remember the information.
NOTES No motor movement is needed in cognitive learning and the learner is active only in
a cognitive way. Cognitive learning includes symbols, values, beliefs and norms.
When we say the word ‘learning’, we usually mean ‘to think using the brain’.
This basic concept of learning is the main viewpoint in the Cognitive Learning Theory
(CLT). The theory has been used to explain mental processes as they are influenced
by both intrinsic and extrinsic factors, which eventually bring about learning in an
individual.
CLT implies that the different processes concerning learning can be explained
by analysing the mental processes first. It posits that with effective cognitive
processes, learning is easier and new information can be stored in the memory for a
long time. On the other hand, ineffective cognitive processes result in learning
difficulties that can be seen anytime during the lifetime of an individual.
Insight and motivation
In CLT, motivation and purpose are much the same. The learner’s goal is the end
result he anticipates and desires. The goal controls the behaviour of the learner. The
teacher’s most important responsibility is to help the learner find worthwhile goals
which may be clear and realistic. They should recognize and use the prominence of
social and achievement motives in school learning. The teacher must know about
what is familiar to students and then he must introduce elements of novelty but not
too rapidly. A good teacher must pace his presentation to maintain the interest and
attention in learning.
The teachers’ management of conflicting motives may be an important factor
in student’s success.
Guiding exploration and action
The cognitive theorists in teaching of reading begin with thoughts interesting and
understandable to the learner. In every type of instruction, we start with meaningful
wholes.
• An attempt is made to focus attention on elements and relationship that
determine the correct response.
• Teachers’ guidance must match the students’ level of thought or ways of
working. If a student has not advanced above the level of concrete thinking,
information presented symbolically will not help him.
• The teacher can help students to find purpose or order in learning.
The formation of concepts may be regarded as the organization of experience.
The teacher’s role is to use appropriate means to clarify the critical features of both
old and new experiences.
24 Self-Instructional Material
Thinking skills required in cognitive learning Measurement and
Evaluation in
The following critical thinking skills are required in cognitive learning: Education: Overview
1. Convergent thinking
NOTES
Convergent thinking is a term introduced by the American Psychologist Joy Paul
Guilford. It generally means the ability to give the ‘correct’ answer to standard
questions that do not require the utilization of much creative faculty. Divergent and
convergent thinking skills are both important aspects of intelligence, problem solving
and critical thinking. Bringing facts and data together from various sources and then
applying logic and knowledge to solve the problem to achieve an objective is known
as convergent thinking.
Assessing convergent thinking: Convergent thinking can be assessed by
administering standard IQ (Intelligent Quotient) tests, by various recognition or
knowledge tests, logical discussions and by giving some problems to the students.
2. Divergent thinking
Divergent thinking is thinking outwardly instead of inwardly. It is the ability to develop
original and unique ideas and then come up with a solution to a problem or achieve
an objective.
The goal of divergent thinking is to generate many different ideas about a
topic in a short period of time. It involves breaking a topic down into its various
component parts in order to gain insight into the various aspects of the topic. Divergent
thinking typically occurs in a spontaneous, free-flowing manner, such that the ideas
are generated in a random, unorganized fashion.
Self analysis
The following questions may help to find out the potential of a learner:
• What are my activities during a normal day?
• What do I know about?
• How do I spend my time?
• What are my areas of expertise?
• What would I like to change in my world or life?
• What are my strongest beliefs and values?
• What am I studying in school?
• What bothers me?
• What do I like? What are my hobbies? What are my interests?
Topic analysis
The following questions may help in refining a large topic into a specific, focused
one:
• What are the most important aspects of some specific things?
• What are the effects of a thing?
Self-Instructional Material 25
Measurement and • How has an object changed? Why are those changes important?
Evaluation in
Education: Overview • What are the different aspects of anything you can think of?
• What are the smaller parts that comprise an object?
NOTES • What do I know about a thing?
• What suggestions or recommendations can be made about that thing?
• Is something good or bad? Why?
Techniques to stimulate divergent thinking
The following are some of the techniques used to stimulate divergent thinking:
• Brainstorming: This is a technique which involves generating a list of ideas
in a creative, but unstructured manner. The goal of brainstorming is to generate
as many ideas as possible in a short period of time. All ideas are recorded
during the brainstorming process.
• Recording ideas: By recording ideas, one can create a collection of thoughts
on various subjects that later becomes a source book of ideas.
•Free writing: The idea is to write down whatever comes to mind about the topic,
without stopping. This can help generate a variety of thoughts about a topic in
a short period of time.
• Mind or subject mapping: This involves translating brainstormed ideas into
the form of a visual map or picture.
3. Critical thinking
Critical thinking is described as reasonable reflective thinking focused on what to
believe and what not to believe. It is that mode of thinking — about any subject,
content, or problem — in which the thinker improves the quality of his or her thinking
by skillfully taking charge of the structures inherent in thinking and imposing intellectual
standards upon them. Michael Scriven, british polymath and academic, and Richard
Paul, an internationally recognized authority on critical thinking, believe that critical
thinking is the intellectually disciplined process of:
• Analysing the situation
• Synthesizing two or more pieces of information
• Applying knowledge
• Active and skilful conceptualization
• Evaluating gathered information based on certain universal intellectual values
Characteristics of critical thinking: It comprises various modes of thinking,
such as scientific thinking, mathematical thinking, historical thinking, anthropological
thinking, economic thinking, moral thinking and philosophical thinking. The following
are its four characteristics:
1. It is self-guided, self-disciplined thinking which attempts to reason at the highest
level of quality.
2. Critical thinking of any kind is never universal in any individual; everyone is
26 Self-Instructional Material subject to episodes of irrational thought.
3. It is affected by the motivation. Measurement and
Evaluation in
4. Critical thinking can be seen as having two components: (a) a set of information Education: Overview
and belief generating and processing skills, and (b) the habit, based on
intellectual commitment to use those skills to guide behaviour.
NOTES
Importance of critical thinking skills: Critical thinking skills are essential
to:
• Learn how to approach problems logically and confidently
• Balance using both the right and left sides of the brain
• Make wise decisions in life
• Put oneself on the path to knowledge
The list of core critical thinking skills includes observation, interpretation,
analysis, inference, evaluation, explanation and meta-cognition. There is a reasonable
level of consensus among experts that an individual or group engaged in strong
critical thinking gives due consideration to the procedure involved.
Critical thinking calls for the ability to:
• Recognize problems
• Find workable means to solve those problems
• Comprehend and use the text with accuracy and clarity
• Interpret data to evaluate arguments
• Recognize unsaid assumptions
• Understand the importance of steps in problem solving
• Collect relevant information
• Recognize the existence (or non-existence) of logical relationships between
propositions
• Draw conclusions
• Cross-check conclusions
4. Problem solving
Problem solving is a mental process that involves discovering, analysing and solving
problems. The ultimate goal of problem-solving is to overcome obstacles and find a
solution that best resolves the issue.
Formalized learning theory developed in late 1930, when proponents of various
approaches attempted to build their own theory to explain the problems of learning.
A theory of learning cannot be defined to satisfy all interested persons. We can
quote the definition of a theory as ‘a provisional explanatory proposition or set of
propositions, concerning some natural phenomena and consisting of symbolic
representations of: (a) the observed relationships among independent and dependent
variables, (b) the mechanisms or structures presumed to underlie such relationships,
or (c) inferred relationships and underlying mechanisms intended to account for
observed data in the absence of any direct empirical manifestations of the
relationships’ (Learning Theories edited by Melvin H. Marx).
Self-Instructional Material 27
Measurement and Approaches to problem solving
Evaluation in
Education: Overview Traditionally, two different approaches have been mentioned by psychologists,
adhering to two families of learning theories: (a) cognitive field theory, and (b) stimulus-
response theory.
NOTES
Cognitive field theory emphasizes the importance of perception of total situation
and relationship among its components, and restructuring the cognitive field. German
Psychologist and Phenomenologist Wolfgang Köhler conducted his classical
experiments on a chimpanzee named Sultan to study the process of problem solving
in animals. He, from his study on problem solving, proposed that solution of a problem
is arrived at, all of a sudden, after some initial efforts by the individual. Many studies
have been conducted on children and adults which confirm that solution of a problem
is reached, all of a sudden, through insight into the situation.
The second point of view has been advanced by stimulus-response theorists
who emphasize the importance of trial and error. They hold that a problem is solved
through a gradual process of elimination of errors and putting together correct
responses. There has been considerable controversy as regards the superiority of
one approach over the other as an interpretation of problem solving. Some
psychologists are of the opinion that cognitive field theory, approach is most effective
for solving problems which require higher mental processes, while stimulus-response
approach is effective for solving simple problems.
To do away with the controversy of cognitive and stimulus response theorists’
approach, American psychologist Harry Harlow (1959) proposed a third explanation.
His approach is more realistic and rational in nature. He conducted a series of
experiments on monkeys and human subjects of low mental abilities. He presented
his human subjects with simple problems of discrimination. He observed that in the
beginning, his subjects showed trial and error behaviour to solve a series of problems,
but he noticed that when similar problems were presented to the subjects in future
for the first time, they made correct discrimination. The later stage appears to be
insightful learning, that is, suddenly getting the problem solved. According to Harlow,
the underlying assumption is that in the previous trial and error learning, the subjects
have learned ‘how to learn’. They acquired what he called a learning set. They
acquired a method of learning that transferred positively to other problem situations
of similar type.
Harlow says, ‘Generalizing broadly to human behaviour, we hold that original
learning within an area is difficult and frustrating, but after mastery of the basic
facts, learning within the same area becomes simple and effortless.’
The steps in problem solving
In order to correctly solve a problem, it is important to follow a series of steps. Many
researchers refer to this as the problem-solving cycle, which includes developing
strategies and organizing knowledge. The following are the steps required in the
process of problem solving:
1. Identifying the problem: While it sounds like the simplest thing to do, it
actually is not. It is very common for people to identify the reason incorrectly,
28 Self-Instructional Material which obviously renders all following efforts useless.
2. Defining the problem: Once the problem has been correctly discovered, Measurement and
Evaluation in
the next step is to define it. Defining the problem carefully offers solutions Education: Overview
and insights into the problem.
3. Forming a strategy:Next, a strategy must be devised which would be as
per the individual’s mindset, attitude, experience and available resources. NOTES
4. Organizing information:The strategy should ideally lead to a solution, but
first, the information needs to be organized in a coherent manner, such that it
leads naturally to the best possible solution.
5. Allocating resources: It is good to allocate more resources to solve a
problem, so that the available resources can be used to find out a solution.
Problem solving techniques
These techniques are usually called problem-solving strategies. These are as follows:
• Brainstorming: Suggesting a large number of solutions or ideas and combining
and developing them until an effective solution is found
• Abstraction: Solving the problem in a model of the system before applying it
to the real system
• Assessing the output and interactions of an entire system
• Lateral thinking: Approaching solutions indirectly and creatively
• Dividing and conquer: Breaking down a large, complex problem into smaller,
solvable problems
• Employing existing ideas or adapting existing solutions to similar problems
• Analogy: Using a solution that solved an analogous problem
• Hypothesis testing: Assuming a possible explanation to the problem and
trying to prove one’s perspective
• Synthesizing seemingly non-matching characteristics of different objects into
something new
• Transforming the problem into another problem for which solutions exist
• Eliminating the cause of the problem
• Testing possible solutions until the right one is found
5. Decision making
Decision making can be regarded as the mental processes resulting in the selection
of a course of action among several alternative scenarios. The end result of each
decision making process is a final selection. The output can be an action or a suggestion.
Steps in decision making
The following are the steps that are to be followed in the decision-making process:
• Objectives are to be established first.
• Objectives must be classified and placed in the order of importance.
• Alternative actions must be developed.
• The alternative must be evaluated. Self-Instructional Material 29
Measurement and • A tentative decision can be made.
Evaluation in
Education: Overview • The tentative decision is evaluated and analysed.
• Few steps are followed that result in a decision model. This can be used to
NOTES determine an optimal plan.
• In a situation of conflict, role-playing can be used.
Each step in the decision-making process may include social, cognitive and
cultural problems. It has been suggested that becoming more aware of these obstacles
allow one to better anticipate and overcome them. The following few steps can help
in establishing good decision making:
• Creating and nurturing the relationships, norms and procedures that will
influence how problems are understood and communicated.
• Recognizing that a problem exists.
• Identifying an explanation for the problem and evaluating it.
• Finding more suitable justice among many responses.
• Following through with action that supports the more justified decision.
Integrity is supported by the ability to overcome distractions and obstacles,
developing implementing skills and ego strength.
Decision-making stages
There are four stages that are involved in all group decision making. These stages,
or sometimes called phases, are important for the decision-making process to begin.
These were developed by communication researcher B. Aubrey Fisher. The four
stages are as follows:
1. Orientation stage: This is introductory stage, when members meet for the
first time and get to know each other.
2. Conflict stage: Once group members become familiar with each other,
disputes, little fights and arguments take place. Nevertheless, group members
eventually work it out.
3. Emergence stage: The group begins to clear up vague opinions by talking
about them.
4. Reinforcement stage: Members finally make a decision, while justifying to
themselves that it was the right decision.
II. Affective Domain and Formulation of Specific Objectives
The affective domain of Bloom’s taxonomy of educational objectives is related to
the development of emotions, values, attitudes, and the development of those aspects
of personality which are more influenced by heart than the mind. It also includes the
development of interests, appreciation, feelings, likes and dislikes towards something.
Classification of Affective Domain
The classification of this domain was done by American educational psychologists
D. R. Krathwohl, B. S. Bloom and B. B. Masia in 1964. The categories in this
30 Self-Instructional Material
domain are also arranged hierarchically from the lowest to the highest level of
complexity.
1. Receiving: This is the ability, inclination and readiness of learners to receive Measurement and
Evaluation in
information. It requires attention, awareness, listening, seeing and willingness Education: Overview
on the part of the learners. These are preconditions of learning, personality
development, and imbibing culture and values. It needs sensitization of learners
to stimuli, phenomena or environment. On the whole, learners should be made NOTES
receptive in their habit and attitude. Whatever you want learners to learn, you
should make them receptive toward those things.
Examples:
• Reading newspapers, magazines, journals, books, reports, etc., of interest
to the learner
• Watching news, shows, reports, programmes as per interest
• Listening patiently and attentively to teachers, parents, seniors, friends
and more experienced persons
• Having curiosity to learn from various sources
2. Responding: This is the second level objective under the affective domain.
Learners are required to be responsive along with being receptive; otherwise
it will not serve the purpose. Responding behaviour reflects that the learners
are receiving or trying to receive. Continuity in attention and motivation
behaviour (receiving) leads to the development of responding behaviour.
This category of ability is represented by interest, which is the tendency to
respond to a particular object or event or situation. This creates the way for
two-way communication and facilitates the process of teaching and learning.
Students ‘listen’ to the teachers attentively and ‘respond’ to them to give their
reflection and share their experiences.
Example:
• Response of students in class
• Interaction of students with teachers, friends and seniors on various
issues or problems
• Visit to clubs, libraries, museums and other knowledge resource centres
• Participation in various activities, competition, seminars, conferences,
cross word and such other programmes
3. Valuing: During the cyclic process of receiving and responding, learners are
automatically inclined towards taking value judgment regarding the things
they are concerned with. These things may be an object, an event, an idea, a
rule, any ritual, a set norm or any traditional or modern aspects of our culture.
Through the process of valuing, individuals set guidelines for regulating their
own behaviour. Character formation or value inculcation in the growing
generation is done through the following three sequential steps:
(a) Value acceptance
(b) Value preference
(c) Value commitment
Example: A class is taught by several teachers. All teachers practice various
values in which some are common and some are unique for individual teachers.
Students attend their classes and interact with them. They observe and analyse Self-Instructional Material 31
Measurement and various values being practiced by their teachers. Through regular observation
Evaluation in
Education: Overview and analysis, students develop their own value based on their preference,
acceptance and commitment.
4. Organization: Through the process discussed above, students absorb various
NOTES values from their teachers, parents and society. They analyse various values
absorbed from different sources and finally construct relatively enduring value
system through the process of synthesis and organization of values for a
balanced conduct and behaviour pattern. This leads to the development of a
set value structure or philosophy of life for every individual. It assists individuals
in decision-making process about conduct in real life situations and in forming
opinions on major issues of social and personal concern.
5. Characterization of values or value complex: This is the highest level
category of objectives under affective domain. At this level, individuals develop
a set of values, attitudes and beliefs for themselves that build their character
and give shape to their philosophy and personality. This process goes on
continuously throughout life resulting into the shift of preference of various
values, depending upon situation, age and experience.
Assessment of Affective Learning
The affective domain given by Krathwohl, Bloom and Masia includes the manner in
which we deal with things emotionally, such as feelings, values, appreciation,
enthusiasm, motivations and attitudes. The five major categories are listed from the
simplest behaviour to the most complex in Table 1.2.
Table 1.2 Five Major Categories Describing the Affective Domain
32 Self-Instructional Material
Attitude and Values Measurement and
Evaluation in
‘An attitude can be defined as a positive or negative evaluation of people, objects, Education: Overview
events, activities, ideas, or just about anything in your environment.’
—Philip George Zimbardo, 1999 NOTES
In the opinion of American sociologist Read Bain, attitude is ‘the relatively
stable overt behaviour of a person which affects his status’. An attitude is a state of
mind or a feeling or disposition. It is important to have a positive attitude about work.
Values
Affective learning also involves internalizing a set of values expressed in behaviour.
Teachers affect the values and behaviour in a student by setting examples by their
own behaviour.
Attitude formation
Attitudes are expected to change as a function of experience, environment and
education. American psychologist Abraham Tesser has argued that hereditary
variables may also affect attitudes.
Measurement of attitude
A number of techniques for measuring attitudes are in use. However, they all suffer
from different kinds of limitations. Largely, the different types of techniques focus
on the components of attitudes, namely the cognitive, the affective and the behavioural
components. The two basic categories that attitude measurement methods can be
divided into are as follows:
1. Direct measurement, such as Likert scale
2. Indirect measurement, such as projective techniques
Direct observation
This is a simple and logical method which records the behaviour patterns of people
under study. This method is widely used for various purposes. However, even if the
individuals to be studied are easily accessible, observing the behaviour of a large
sample of individuals is not practically feasible.
Direct questioning
This method involves asking pre-set questions on certain topics on which the
individual’s behaviours are to be evaluated. While it seems like the most straightforward
approach to simply ask questions to test attitude, the results may not be accurate
because an individual may try to hide his or her real opinions and attitudes.
Some other approaches
In projective techniques, attitude gauging objects are hidden and results are interpreted
on the basis of pre-set criteria. While this technique overcomes some limitations of
the direct observation technique, the projective technique falls short when it comes
to objective and reliable interpretation of data.
Self-Instructional Material 33
Measurement and Thurstone scale
Evaluation in
Education: Overview The first official method for measuring attitude was formulated by Louis Leon
Thurston, a US pioneer in the fields of psychometrics and psychophysics, in 1928.
His objective was to measure people’s attitudes regarding religion. The Thurstone
NOTES
scale contains statements regarding the issue in question and every statement is
assigned a numerical value based on the value the evaluator considers it to have.
When people have selected the statements for each question, the values are added
up and the average is calculated, which corresponds to a particular attitude.
Likert scale
Likert introduced the Likert scale of attitudes in A Technique for the Measurement
of Attitudes. The scale covered a range of attitudes, from extremely positive to
neutral to extremely negative. This scale also consists of statements and the subjects
are asked to express their opinion regarding the statement through a 0–5 point scale.
Once each statement has been assigned a numerical value, again they are added up
and the mean is calculated. This kind of scale is often used in career assessment
programmes to gauge a learner’s interests and tendencies, so that they can help
themselves select the right career path.
The Likert scale is applied in the form of questionnaires. A Likert scale
questionnaire would contain a statement, which would need to be evaluated by
theindividual on the basis of the following kind of responses:
(a) Strongly disagree
(b) Disagree
(c) Neither disagree nor agree
(d) Agree
(e) Strongly agree
The individual will tick or circle one response for each statement or question.
Each statement/question and its possible responses are together known as a Likert
item. A Likert scale is, in turn, a sum of responses to multiple Likert items. A Likert
scale is considered a ‘balanced’ form of attitude testing because it contains an equal
number of positive and negative options.
In some cases of Likert scale questionnaires, the neutral response, such as
‘neither disagree nor agree’, is removed. Such an evaluation method is known as
the forced choice method, because the individual cannot remain neutral on any
statement. The forced choice method is used when it is thought that the individual
might select the neutral response to avoid controversy or hide ignorance.
Generally, three kinds of common biases may render Likert scale
questionnaires unreliable. These are as follows:
• Central tendency bias: In a professional setting especially, the individual
may be reluctant to admit strong feelings about an issue.
• Acquiescence bias: The individual may feel obliged to agree with the
presented statement.
34 Self-Instructional Material
• Social desirability bias: The individual may select an option to preserve the Measurement and
Evaluation in
self-image or to be popular. Education: Overview
Once the respondent gives in the complete questionnaire, the responses may
be evaluated individually or in a grouped form, depending on the pattern being studied.
When studied in a grouped form, the items provide a score, which can be categorized NOTES
and evaluated. The Likert scale is also sometimes known as the Summative Scale.
Interest and its measurement
The reason for finding out the interest of the students is to help them in getting their
dream careers. This inventory helps the teachers in knowing the interest areas of
their students so that they can encourage and get various opportunities to grow in a
particular field.
Thus, an interest inventory is often used in career assessment. The goal of
this assessment is to give insight into the students’ interests, so that they may face
less difficulty in deciding on an appropriate career choice for themselves. It is also
frequently used for educational guidance as one of the most popular career
assessment tools. The test was developed in 1927 by psychologist E. K. Strong to
help people who were quitting the military jobs to find suitable jobs.
Supporting students
Prior to selecting a career, students need to identify the right path for themselves.
This can be done through an assessment, which would help them get an insight into
their own interests, preferences and personal styles. Analysing these aspects will
direct them into identifying the right courses, jobs, internships and activities that are
suitable for them.
Self-concept and its assessment
Self-concept defines how we assess ourselves as individuals or what we think of
ourselves. Self-concept is a commonly used term. There are two aspects to the
development of self-concept. These are as follows:
1. The existential self: This aspect of the self can be defined as,‘the sense of
being separate and distinct from others and the awareness of the constancy
of the self’. (Bee 1992)
2. The categorical self: Once a child realizes that he hasa distinct identity,
hegradually become aware of the world around him and his own place in the
world. He starts relating to the world and starts thinking of himself as more
than hisphysical characteristics, such as hair colour or height, etc. Finally, he
becomes aware of the fact that others perceive himin a certain way, which
may or may not be similar to how he perceives himself.
According to American psychologist Carl Rogers, self-concept has three different
components. These are as follows:
1. The view you have of yourself (self-image)
2. How much value you place on yourself (self-esteem or self-worth)
3. What you wish you were really like (ideal self)
Self-Instructional Material 35
Measurement and Factors affecting self-concept
Evaluation in
Education: Overview The following factors affect self-concept:
1. How others react to us: People’s approval and recognition and affection
NOTES helps to develop a positive self-image.
2. Comparison with others: Comparing oneself with people who seem to be
doing better financially or more popular socially than oneself can lead to a
negative self-image. However, if the comparison is made with people who
are less successful, it would lead to a positive self-image.
3. Social roles: There are some roles that are associated with prestige and
positive self-image, such as that of a doctor.
4. Identification: Self-concept is also influenced by the role we play in the
group that we belong to.
III. Psychomotor Domain
Psychomotor domain is concerned with those objectives which are intended to develop
various skills. For example: typing, painting, drawing, dissecting, preparing food,
beautification, carpentry, book binding, sculpture, photography, operation of computer
or any other machine, working with any tools to produce something. It includes all
manipulative skills. For any motor activity or skill work, psychological readiness is
an essential condition. If a person is psychologically ready, he or she will also be
mentally ready and will act towards the desired, skilled work.
Classification
Psychomotor domain is classified by many psychologists. Some of them are Ragole
(1950), E. J. Simpson (1966), J. P. Guilford (1958), R. H. Dave (1969), A. J. Harrow
(1972), Allyn & Bacon (1994). In the following paragraphs, we are going to discuss
the classification given by R. H. Dave (1969) in an adapted form. There are five
categories under this domain, arranged from 1 to 5 in the order of increasing complexity,
difficulty level and fineness in the skill being developed.
1. Initiation-observation or observation-initiation: For learning any skill
(simple or complex), learners need psychological readiness. It is common to
most of the learners that they hesitate of any skilled work at the time of
beginning. They generally hesitate to take initiative. Contrary to this, there
are some learners who are highly motivated to start as they have observed
somebody doing it and they are highly motivated to perform that skill.
If learners are not interested or motivated, they need to become motivated by
means of promise of some reward, discussion, fulfillment of some desires
and aspirations of the learners, etc. In the above case, ‘initiation’ is the first
step towards the development of skill. On the other hand, if a learner has
observed some person performing some skill and is highly motivated and
encouraged to do that skill, then he takes initiative automatically. In this case,
observation is followed by initiation.
2. Manipulation: When the learner is ready to take initiative, he observes others
performing that skill and sees how performers are manipulating tools required
36 Self-Instructional Material
in that skill and he also starts manipulation of those tools to produce or copy Measurement and
Evaluation in
that skill. Manipulation and observation work together continuously for quite Education: Overview
some time. As a result, improvement in performance is achieved and it leads
towards perfection. During this process, learners perform the following three
tasks: NOTES
(i) Perform selected steps
(ii) Follow directions
(iii) Fix their performance through necessary practice
3. Precision: Repeated observation of the expert performers and continuous
practice leads learners to the performance of the skill with a desired level of
precision, i.e., accuracy and exactness. They reach at a higher level of
refinement. They achieve this level through the following points of
consideration:
• Controlling faults
• Eliminating errors
• Reproducing the desired skill with precision
4. Articulation: This is the level at which learners bring some novel attributes
or features to their skill performance in addition to the general attributes.
5. Naturalization: This is the highest level of performance in skill development.
The act of the performer becomes automatic or natural. Achiever of this
level of proficiency, which is rare, performs with the highest degree of
refinement and convenience as natural as possible. For performer as well as
audience or observer, it looks like an effortless performance.
Assessment of Performance or Psychomotor Learning
Performance assessments are designed to judge a student’s abilities to use specific
knowledge and research skills. Most performance assessments require the student
to manipulate equipment to solve a problem or make an analysis. Rich performance
assessments reveal a variety of problem-solving approaches, thus providing insight
into a student’s level of conceptual knowledge.
Performance assessment is a form of testing that requires students to perform
a task rather than select an answer from a ready-made list. Experienced teachers
or other trained staff then judge the quality of the student’s work based on a pre-
decided set of criteria. This new form of assessment is most widely used to directly
assess writing ability based on the text produced by students under test instructions.
How does it work?
The student’s work is assessed by teachers or other staff on the basis of pre-set
criteria. The teachers assess the student’s learning abilities when given specific
instructions. It is important to understand how the assessment works. The following
methods have been used successfully to do so:
• Open-ended answers: The student has to come up with the answer
independently after some degree of analysis, either orally or in writing. No
Self-Instructional Material 37
Measurement and options are given and the students are usually asked to discuss their
Evaluation in
Education: Overview observations about some material given to them.
• Extended tasks: In this kind of assessment, the student is required to research,
draft and revise and come up with a well-though out analysis of the given
NOTES problem. It is like an assignment or a project. It may take the student many
hours to complete.
• Portfolios: They are pictorial descriptions of the student’s past work and the
teacher’s evaluation of it. Usually, portfolios include a person’s best work
and some work in progress which showcases the student’s strengths.
What is common among all these methods is that they are all specific tasks
completed by the student according to the instructions from the teachers and the
students are made well-aware of the standards expected from them. They are also
aware that the teachers will be grading them based on their work. This aspect
renders performance assessment distinct from other testing methods.
Purpose of assessment
Performance assessments evaluate a student’s work and assess how well the student
is able to apply the learned concepts. Assessments may be of different types—
some may require the student to replicate a procedure learnt in class, some may
require simple recall or even intricate analysis.
A limitation of performance assessments is that such assessments may not
turn out to be suitable for simply testing the knowledge of facts.
Teaching goals of performance assessment
Performance assessment helps the student to:
• Apply systematic procedures
• Make use of resource texts, laboratory equipment and computers
• Use scientific methodology
• Evaluate and apply several approaches
• Work out difficult problems
The three main purposes of performance assessments are as follows:
1. Diagnostic purposes: The assessment is expected to diagnose the students’
knowledge about solving specific categories of problems.
2. Instructional purposes: A well-crafted assessment can also be used for
instructional purposes, except for the fact that it is standardized and graded.
3. Monitoring purposes: Since assessments are undertaken at regular intervals,
it is very useful for tracking a student’s levels of learning, especially in subjects
like science and mathematics, which signifies a student’s skills of problem
solving.
This methodology is revolutionary, in the sense that it is not just about testing,
but also about the learning process. In this methodology, students are encouraged to
think and create the finished product, during the course of which they are automatically
38 Self-Instructional Material testing their skills and knowledge.
Unlike the old model of teaching where students were tested after each lesson Measurement and
Evaluation in
or unit was covered in the classroom, here the approach is to use the book knowledge Education: Overview
merely as a resource or means to an end, and not the end itself.
Let us study a relevant example to understand performance assessment better.
If a coach assembles 11 players and tells them all the rules of the game of cricket, NOTES
the spirit in which the game must be played and the dangers of playing without
protective padding, will not make them world champions. The players need to go
and play an actual game. The performance assessment method involves students
‘playing the game’. This way, what goes on in the class is actual preparation for life
after school.
Typically, a performance assessment activity can be broken down into the
use of the following skills:
1. Obtaining information: Students would need to research, compile and
organize information as per the specified goal.
2. Employing the information: After they have organized the information,
they will need to work with it or infer, sequence, analyse and synthesize it to
create something new.
3. Use the information: When they have the information in the required form,
for example, a flowchart, they can use it to convince a target group, or simply
present ideas and concepts.
4. Communication: Making presentations, oral and visual, doing projects and
demonstrating skills is all a part of communicating with others. When a student
or a group of students presents their ideas in any form to others, they are
automatically honing their communication skills.
Here are the key benefits of employing the performance assessment
methodology:
1. The variety of tasks call for use of various learning styles and skills:
Each student has a core competency, which simply put, means a skill they
have more refined than others. Someone may be good at writing descriptive
book reports and someone else may be proficient at making Powerpoint
presentations. Some learners are exceptional at organizing information and
still others may be brilliant speakers, and so on. Not only does performance
assessment allow students to display their particular skills, it also encourages
them to acquire new ones. For example, a student who likes writing detailed,
informational pamphlets should be encouraged to make a persuasive speech
to a group of adults describing the pamphlet he or she has made.
2. Performance assessment tasks encourage teamwork: Working in teams
is a skill that will help a student throughout life, especially if he or she chooses
to work in a corporate set up. Therefore, it is a useful skill to acquire. Making
group presentations, working on projects in a team, and so on, helps students
to learn the value of collaboration at an early age. Division of tasks, cooperation,
empathy are all valuable lessons that each student must learn.
3. There is focus on personal habits and attitudes: In performance
assessment tasks, success depends on the learner’s attitude almost as much Self-Instructional Material 39
Measurement and as book learning. Inculcating habits such as flexibility, diligence, cooperation,
Evaluation in
Education: Overview tolerance, persistence, problem solving, planning and many more are all inherent
to the performance assessment methodology.
It is important to have predetermined criteria to evaluate the students’
NOTES performance. Students should not be scored or graded against their classmates and
should be provided with the criteria before the assessment.
Student’s feedback reflects levels of competency, rather than comparative
scores. It is always useful to try to find in students’ performance patterns of
appropriate and inappropriate responses.
1.6.1 Specification of Objective Steps in Evaluation Process
Bloom’s taxonomy of educational objectives was revised in 2001 under the title of
A Taxonomy for Learning, Teaching and Assessing. The revision work was done
by authors Lorin. W. Anderson, David R. Krathwohl, Kathleen A. Cruikshank and
others. Due to the changes brought in the cognitive domain, the instructional objectives
for this domain also need changes accordingly.
The following changes were made in the classification of educational objectives:
The nomenclatures of the categories under cognitive domain were changed.
• Knowledge was replaced by remembering.
• Comprehension was replaced by understanding.
• Application was replaced by applying.
• Analysis was replaced by analysing.
• Synthesis was replaced by creating.
• Evaluation was replaced by evaluating.
Rationale behind the changing concepts
It is evident that the six categories in the original classification of Bloom are in the
form of ‘noun’, which has been changed to ‘verb’. The rationale behind this change
is simple, logical and easy to digest. Taxonomy of educational objectives, i.e., various
categories reflect different forms of thinking. Thinking, being an active process,
represents the forms of verbs more accurately than the noun. Therefore, some of
the sub categories were reorganized and renamed. Knowledge, the first category, is
a product of thinking and was considered inappropriate to describe a category of
thinking and, therefore, it was replaced with the word ‘remembering’, the lowest
level of cognitive domain activities.
Knowledge is a product of thinking (lower as well as higher order products)
and was not clarifying its level of category; hence change was needed. Remember
is the right word because it indicates the product of lower order thinking. We can
cram, recall, recognize and remember something without understanding it.
Comprehension is replaced by understanding, synthesis is replaced by creating, and
so on. All these changes reflect the nature of the thinking process and the outcomes
described by each category in a better way.
40 Self-Instructional Material
Functions of Educational Objectives/Usefulness of Taxonomical Measurement and
Evaluation in
Classification Education: Overview
Educational objectives perform the following functions:
• Bloom’s taxonomy motivates educators to focus on all three domains of NOTES
educational objectives.
• Creates a more holistic form of education to develop the whole personality of
the learners.
• Whole personality development makes an individual more useful for his ownself
and for the society.
• Allows teachers to select appropriate classroom assessment techniques for
student evaluation and for the performance or skills to be evaluated.
• Understanding Bloom’s taxonomy develops the competencies in the teachers
through which a teacher can design effective learning experiences.
• Helps in devising and organizing fruitful and interesting co-curricular activities.
• Takes care of the all-round development of the personality of the child.
• Helps teachers in framing instructional objectives in behavioural terms.
Guidance Functions of Evaluation
Guidance and counselling of students have become one of the most important areas
of school education. Due to diverse kinds of problems like stress and suicide
committed by students, this area has become very important for students, parents
and educational administration. As a result of these reasons, counsellors are being
appointed in schools and several guidance and counselling clinics and centres are
coming up at different places to support students and parents. Some of the important
guidance functions of evaluation are listed as follows:
• Evaluation results are very much useful for the guidance and counselling of
needy students.
• Proper educational and vocational guidance can be given to students on the
basis of their evaluation on various abilities, interests, skills, attitudes, aptitude,
values, and after knowing the strength and weakness of the needy students.
• On the basis of evaluation results of the test of anxiety and stress among
students, proper guidance may be given to students to reduce their level of
stress.
• Various types of personality disorders may be diagnosed by using variety of
tests and the subject may be given proper guidance to cure or control them
properly.
• Students may be helped in choosing the right kind of vocation or profession or
career in which they may succeed and grow.
• Students may be helped in choosing suitable courses of study which suit them
and in which they may excel.
Self-Instructional Material 41
Measurement and • Based on the findings of evaluation, students may be guided and counselled
Evaluation in
Education: Overview properly to nurture and develop their life skills.
1.7 SUMMARY
• Rating scale: It refers to a scale used to evaluate the personal and social
conduct of a learner.
• Diagnostic evaluation: It aims at identifying or diagnosing the weaknesses
of students in a given course of instruction.
• Anecdotal records: Those records which maintain the description of
significant event and work or performance of the students
• Grading: It is a means for reporting the result of measurement.
• Formative evaluation: It is the type of evaluation which is done during the
teaching-learning process to assess the ongoing construction or formation of
knowledge and understanding of students.
• Summative evaluation: It is a type of evaluation that evaluates the quality
of the final product and finds out the extent to which the instructional objectives
have been achieved.
• Cognitive domain: It includes those objectives of education which attempt
to develop mental faculties or intellectual abilities, that is, the ability of knowing,
understanding, thinking and problem solving.
• Affective domain: It is related to the development of emotions, values,
attitudes and the development of those aspects of personality which are more
influenced by heart than the mind.
• Psychomotor domain: It deals with the development of motor abilities which
in turn are responsible for skill development for performing any task.
• Likert scale: It is a tool used to determine opinions or attitudes; it contains a
list of declarative statements, each followed by a scale on which the subject
is to indicate degrees of intensity of a given feeling.
• Self-concept: It is the composite of ideas, feelings and attitudes that a person
has about his or her own identity, worth, capabilities and limitations.
44 Self-Instructional Material
3. J.M. Bradfield defines evaluation as ‘the assignment of symbols to Measurement and
Evaluation in
phenomenon in order to characterize the worth or value of the phenomenon Education: Overview
usually with reference to some social, cultural and scientific standards’.
4. Scaling in social sciences refers to the process of measuring.
NOTES
5. A test consists of a set of questions to be answered or tasks to be performed.
6. The goals of placement assessment are to determine for each student the
position in the instructional sequence and the mode of instruction that is most
beneficial.
7. Diagnostic evaluation aims at identifying or diagnosing the weaknesses of
students in a given course of instruction.
8. Assessment is the process of determining the following:
(i) he extent to which an objective is achieved.
(ii) The effectiveness of the learning experiences provided in the classroom.
(iii) How well the goals of teaching have been accomplished.
9. The study of classification is known as taxonomy.
10. Bloom classified educational objectives into three ‘domains’, namely: (i)
cognitive domain, (ii) affective domain, and (iii) affective domain.
11. Problem solving is a mental process that involves discovering, analysing and
solving problems.
12. The affective domain of Bloom’s taxonomy of educational objectives is related
to the development of emotions, values, attitudes, and the development of
those aspects of personality which are more influenced by the heart than the
mind.
13. The psychomotor domain is concerned with those objectives which are
intended to develop various skills.
14. Self-concept defines how we assess ourselves as individuals or what we
think of ourselves.
15. The three main purposes of performance assessment are:
• Diagnostic purposes
• Instructional purposes
• Monitoring purposes
Short-Answer Questions
1. Discuss the two types of measurement.
2. Differentiate between formative and summative evaluation.
3. What are aptitude tests?
4. How is cognitive learning defined?
Self-Instructional Material 45
Measurement and 5. What are the thinking skills required in cognitive learning?
Evaluation in
Education: Overview 6. What are the steps in the problem solving process?
7. What are the features of external evaluation?
NOTES 8. What was the rationale behind changing the six categories of Bloom’s
educational objectives?
9. State few ways which can ensure that one makes good decisions.
10. What are the factors that affect self-concept?
Long-Answer Questions
1. Discuss the characteristics of a good measurement tool.
2. Discuss the importance of measurement scales. You may give your argument
with reference to various measurement scales.
3. Describe qualitative evaluation techniques, as well as the five categories that
come under it.
4. How are the objectives of the cognitive domain classified? Discuss in detail.
5. Describe various types of evaluation with proper examples.
6. Describe the techniques that help to stimulate divergent thinking.
7. Explain the classification of psychomotor domain under educational objectives.
8. What are the characteristics of critical thinking?
46 Self-Instructional Material
Major Tools and Techniques
TECHNIQUES IN
NOTES
EDUCATIONAL
EVALUATION
Structure
2.0 Introduction
2.1 Unit Objectives
2.2 Different Types of Tests: Teacher-made vs. Standardized
2.2.1 Criterion-referenced vs. Norm-referenced Tests
2.2.2 Essential Qualities of Good Measuring Instruments
2.3 Education Tests
2.4 Measurement of Achievement
2.4.1 Construction of Achievement Tests and Standardization
2.4.2 Relative Merits and Demerits of Different Test Items
2.5 Diagnostic Test Construction and Usefulness
2.5.1 Types of Diagnostic Tests
2.6 Summary
2.7 Key Terms
2.8 Answers to ‘Check Your Progress’
2.9 Questions and Exercises
2.10 Further Reading
2.0 INTRODUCTION
In the previous unit, you learnt about the concept of measurement and evaluation. In
it, difference types of measuring scales and the need for measurement and evaluation
were discussed. In this unit, the discussion will turn towards the major tools and
techniques in educational evaluation. Evaluation is an attempt to appraise the quality/
suitability of a resource. There are different types of evaluation techniques. Some
are standardized tests, while others are made by the teachers. Moreover, while
some tests are norm-referenced, other tests are criterion referenced. This unit will
look at all of these in detail.
NOTES Tools or question papers prepared by the teachers to evaluate their own students
whom they have been teaching are called as teacher made tests. These tests are
not standardized and are prepared for a small number of students, generally for a
section or a class or a school.
Standardized tests or tool
A standardized test is a test, which is developed in such a way that it reaches up to
a specific level of quality or standardization. The test is standardized with respect to
the form and construction, administration procedure and test norms. It is a kind of
test, which is standardized in terms of its development, administration, scoring and
interpretation. The test is standardized to bring objectivity, reliability, validity and to
have all the other characteristics of a good test. Standardized tests have a manual
with it, which instructs and guides its users regarding its administration, scoring and
interpretation. The following are some of the important definitions of standardized
tests to clarify the concept of standardized test or tool of measurement and evaluation.
According to C. V. Good, ‘a standardized test is that for which content has
been selected and checked empirically, for which norms have been established, for
which uniform methods of administration and scoring have been developed and
which may be scored with a relatively high degree of objectivity.’
According to L. J. Cronbach, ‘a standardized test is one in which the procedure,
apparatus and scoring have been fixed so that precisely the same test can be given
at different times and places.’
The most important benefits of a standardized test are that it minimizes or
reduces four types of errors. These are personal error, variable error, constant error
and interpretive error.
Characteristics of standardized tests
Following are the important characteristics of a standardized test:
• It has norms, which contains everything about the test, starting from its
preparation to its scoring and interpretation. Norms of the test describe every
aspect of the test in detail so that any user can use it properly by using the
norms of the test.
• It has norms developed for transferring raw score to a standard score.
• Instruction for administration of the test is pre-determined and fixed.
• Duration of the test is fixed.
• The test is standardized on a suitable sample size selected from a well-defined
population.
• It has high reliability and validity.
• The test has high objectivity.
48 Self-Instructional Material
• Answer key and scoring procedure of the test is fixed and is predetermined. Major Tools and Techniques
in Educational Evaluation
• Test manual is properly developed.
• The errors in the standardized tests are minimized or reduced.
2.2.1 Criterion-referenced vs. Norm-referenced Tests NOTES
Norm-referenced Tests
To understand non-referenced type of evaluation, we have to first learn about the
term ‘norm’. The term ‘norm’ has two meanings. One is the established or approved
set of behavriour or conduct to be followed or displayed by all members of a family
or society or any organization. It is the established customs of the society which
most of the people follow without question. The other meaning of the term, which is
meaningful for us here, is the average performance of the group.
Example: A group of students are tested for the awareness towards environmental
pollution through a written test. The test consists of 50 objective type questions of
one mark each. There was no negative marking in the test. The full mark of the test
is obviously 50. After the conduction of the test, it is marked by the examiner. There
are 150 students in the group. Marks of all students are added and the additive
marks are divided by 150 to find the average performance of the group. Suppose it
is found to be 30, so 30 marks is the average obtained by the whole group in which
some achieve 49 out of 50 and some other achieve very less, say 12 out of 50. The
30 mark, i.e., the average of the group is said to be the norm of this group.
Now, the evaluation of all 150 students is done considering this 30 (the norm)
as a point of reference. All students who have got marks above 30 are considered
as above average, all those who have achieved below 30 are considered as below
average and all those who have achieved just 30 are supposed as average. There is
no pass or fail in this type of evaluation as there is set marks for passing the test.
This type of evaluation is called as norm-referenced evaluation and the test is
considered as norm-referenced test.
Criterion-referenced evaluation
The type of evaluation in which the performance of the testees are evaluated with
reference to some predetermined criteria is called as criterion-referenced evaluation.
No weightage is given to norm or average performance of the group in this evaluation.
All decisions, such as pass or fail, distinction, excellent, etc., are taken with reference
to criteria set out in advance.
In the preceding example, if some criteria is set before the test with reference to
which the performance of each students will be evaluated, it will become criterion-
referenced devaluation. Suppose the following criteria are finalized for this test:
Pass marks: 40 %
Distinction: 80%
In the test discussed above, all those students who get 20 or more than 20
(40%) are declared as pass. All those students who score less than 20 are declared
Self-Instructional Material 49
Major Tools and Techniques as fail. All those who get 40 or more (80%) are declared as distinction. If any prize
in Educational Evaluation
is given to those who score at least 90%, then only the students who will get 45 or
more will get the prize. As all decisions are taken on the basis of some criteria, this
evaluation is called as criterion-referenced evaluation. Let us now look at how a
NOTES creterion-referenced evaluation can be constructed.
Construction of criterion referenced test
During the preparation of a criterion-referenced test, the test constructor is required
to take the following steps:
Step 1: Indentifying the purpose of the test
First of all, the objectives of the test are finalized. The test developer must know the
purpose of the test. He should be well informed and well aware of the purpose of
the test for which it is going to be prepared. In addition to the purpose, he should also
know the following aspects of the test:
• Content areas of the test from where the items will be developed
• Level of students or examinees for whom test is being prepared
• Difficulty level of the test items
• Types of the test objective type or subjective type or mixed type of test
• Criteria for qualifying the test
After having understood these points, the test developer starts the work of
constructing the criterion-referenced test. He moves on to the second step.
Step 2: Planning the test
This step requires the following work to be done by the test constructor:
(i) Content analysis: The test developer analyses the content of the test. It
involves the selection of contents, i.e., the testing areas and its peripherals.
He also decides the key areas of the content from where more questions are
to be developed.
(ii) Types of items: The decisions regarding the type of items are taken at this
stage. In case of subjective type, it may be essay type, short-answer type and
very short-answer type. In case of objective type, it may be multiple- choice
question, fill in the blanks, true or false type, sentence completion type, one
word answer type, etc., If the test is of mixed type, then questions are developed
accordingly. But what is planned at this stage is the proportion of objective
and subjective type items in terms of marks.
(iii) No of items: Total number of questions of each type which is to be included
in the test is decided.
(iv) Weightage: It is very important to decide the weightage of each type of
items and each content area. It depends upon the level of the student being
tested. As we move from lower to higher level, the percentage of knowledge
domain items decreases and higher order thinking abilities, such as
50 Self-Instructional Material
understanding, application and skill, increases. The test developer also decides Major Tools and Techniques
in Educational Evaluation
the weightage of each of the content areas being included in the test considering
its relevance.
(v) Duration of the test: With consideration to the total number of questions in
the test, the level of examinees, difficulty levels of the test items and the NOTES
duration of the test are decided.
(vi) Mechanical aspects: It includes the quality of paper, ink, diagrams, type
setting, font size and printing of the test papers.
(vii) Development of key for objective scoring: To bring objectivity in the
process of evaluation, it becomes essential to achieve interpersonal agreement
among the examiners with regard to the meaning of the test items and their
scoring. For this purpose, a ‘key’ is prepared for each paper and given to all
examiners while scoring the test. They are supposed to score the test following
the key.
(viii) Instructions for the test: The test developer also prepares instructions for
its administration, and scoring and evaluation procedure ‘test manual’. It shows
the whole procedure of testing. It acts as a guide to the individuals involved in
testing procedure at all stages. This manual is strictly applied to bring objectivity
in the test.
Step 3: Preparing blueprint of the test
Blueprint is a specification chart which shows the details of the test items to be
prepared. It shows all the content areas and the number and type of questions from
those areas. It also reflects the objectives to be tested. The blueprint describes the
weightage given to various content areas, objectives, types of items and all other
details of the test. It serves as a guideline or frame of reference for the person
constructing the test.
Step 4. Construction of test items
According to the blueprint all questions are constructed, covering all the content
areas, all the objectives or abilities and all types of items. Questions may be objective
or subjective type as mentioned in the blueprint.
Examples of objective-based and objective-type questions:
1. Multiple choice questions
(i) Which of the following metal is present in blood?
(a) Cobalt (b) Iron (c) Calcium (d) Sulpher
(ii) Which of the following is not associated with photosynthesis?
(a) Dark reaction (b) Light reaction (c) Calcium (d) Chlorophyll
(iii) Which of the following is a port city?
(A) Patna (B) Ahmadabad (C) Calcutta (D) Durgapur
Self-Instructional Material 51
Major Tools and Techniques (iv) The capital of India was transferred from Calcutta to Delhi in the year:
in Educational Evaluation
(A) 1911 (B)1905
(C) 1912 (D) 1906
Column A Column B
Ganga Bhakra
Gomti Patna
Yamuna Lucknow
Sutlej Delhi
52 Self-Instructional Material
The questions prepared in the box are objective in nature as all of them have Major Tools and Techniques
in Educational Evaluation
only one correct answer. The students will get either full marks (for correct response)
or no marks (for wrong response). The experience, feelings and other academic,
personal or social factors of the examiners or scorers will not influence the marking
of these types of questions. We can say that, no subjectivity is possible in scoring NOTES
these types of tests. Hence, these are called objective type and objective-based
tests.
Step 5: Selecting the items for the test
Items have already been constructed as per the guideline of the blue print. It may
be that some items are not suitable or up to the mark. To avoid this fact, generally
some more questions are prepared so that if any question is rejected at any stage, it
might be able to be replaced immediately. The following steps are followed to select
the right item for the test. This process of selection is done through a process known
as ‘try out’. The process of try out involves the following steps:
(i) Sampling of subjects: As per the size of population for which the test is
being prepared, a workable sample, say around 150 subjects, is selected on a
random basis. This is the sample on which the prepared items are tested for
its functionality, workability and effectiveness.
(ii) Pre-try out: It is also known as preliminary try-out. The prepared items are
administered on a sample of around ten subjects taken form the sample. The
answersheets are checked, evaluated and discussed with the candidates for
any kind of problems they would have faced during the test. It is probable
that they might have faced the language difficulty, words ambiguity and some
other problems of this kind. These problems are sorted out. The items having
these problems are rewritten or rephrased to improve and modify the language
difficulties and ambiguity of the items. At the end of the pre try- out, the
initial draft of the test is prepared.
(iii) Proper try-out: At this stage, around fifty candidates are selected from the
sample and the initial draft of the test is administered on them. Answersheets
are scored and item analysis is done. Difficulty value and discrimination power
of each item are calculated. The items which come within the acceptable
range of difficulty value and discrimination power are selected for the test
and others are rejected.
(iv) Final try-out: The final try-out is done on a comparatively large sample. The
sample size may be more than 100 or even more, depending upon the size of
population. After administration and scoring of the test, reliability and validity
of the test are measured. If it is proved to be reliable and valid, it gets green
signal.
Step 6. Evaluating the test and preparing final draft of the paper
For establishing quality, an index test manual is prepared which informs about the
test’s norms, scoring key, reliability and validity. The final draft of the test paper is
prepared. Instructions for the examinees as well as for the test administration are
Self-Instructional Material 53
Major Tools and Techniques determined. Item analysis is performed to find out the item workability for the test.
in Educational Evaluation
The required changes are done and final draft of the paper is ready for printing.
An educational test is not just a test that measures achievement in subjects of study,
but it is also a psychological test that leads to an assessment of the overall development
of a student. According to Anastasi, ‘psychological test is essentially an objective
and standardized measure of a sample of behaviours’. For Freeman, it ‘is a
standardized instrument designed to measure objectively one or more aspects of a
total personality by means of samples of verbal or non-verbal responses, or by
means of other behaviours’.
Test is a stimulus selected and organized to elicit responses which can reveal
certain psychological traits in the person who deals with them. The diagnostic or
predictive value of a psychological test depends upon the degree to which it serves
as an indicator of a relatively broad and significant area of response. It is obvious
that a psychological test is the quantitative and qualitative measurement of the various
aspects of behaviour of the individual for making generalized statements about his
total performances.
54 Self-Instructional Material
The aspects which affect the characteristics of a good test are as follows: Major Tools and Techniques
in Educational Evaluation
• Validity of the test
• Reliability of the test
• Objectivity of the test NOTES
• Usability of the test
• Comprehensive and preciseness of the test
• Administration of the test
• Test from economic viewpoint
• Availability of the test
• Appearance of the test
• Standardization of the test
• Norms of the test
Some of the important characteristics of a test are analysed below.
Validity: Validity of a test refers to its truthfulness; it refers to the extent to
which a test measures what it intends to measure. Standardization of a test requires
the important characteristic viz., validity. If the objectives of a test are fulfilled, we
can say that the test is a valid one. The validity of a test is determined by measuring
the extent to which it matches with a given criterion. Let us take an example, suppose
we want to know whether an ‘achievement test in mathematics’ is valid. If it really
measures the achievement of students in mathematics, the test is said to be valid, or
else not. So ‘validity’ refers to the very important purpose of a test and hence it is
the most important characteristic of a good test. A test may have other merits, but if
it lacks validity, it is valueless.
Freeman states, ‘an index of validity shows the degree to which a test measures
what it is supposed to measure when compared with the accepted criteria’. Lee J.
Cronback held the view that validity ‘is the extent to which a test measures what it
purports to measure’.
Reliability: Reliability refers to consistency of scores obtained by some
individuals when re-tested with the test on different sets of equivalent items or
under other variable examining conditions. It refers to the consistency of scores
obtained by the same individuals when they are re-examined with the same test on
different occasions or with different sets of equivalent items or under different
examining conditions. Reliability paves the way for consistency that makes validity
possible and identifies the degree to which various kinds of generalizations are
justifiable. It refers to the consistency of measurement i.e., how stable test scores
or other assessment results are from one measurement to another.
Reliability refers to the extent to which a measuring device yields consistent
results upon testing and retesting. If a measuring device measures consistently, it is
reliable. The reliability of a test refers to the degree to which the test result obtained
is free from error of measurement or chance errors. For instance, we administer an
achievement test in mathematics for students of class IX. In this test, Paresh scores
Self-Instructional Material 55
Major Tools and Techniques 52. After a few days, we administer the same test. If Paresh scores 52 marks again,
in Educational Evaluation
we consider the test to be reliable, because we feel that this test accurately measures
Paresh’s ability in mathematics. H.E. Garrett stated, ‘the reliability of test or any
measuring instrument depends upon the consistency with which it gauges the ability
NOTES to whom it is applied’. The reliability of a test can also be defined as ‘the correlation
between two or more sets of scores on equivalent tests from the same group of
individuals’.
Objectivity: Objectivity is an important characteristic of a good test. Without
objectivity, the reliability and validity of a test is a matter of question. It is a
pre-requisite for both validity and reliability. Objectivity of a test indicates two things—
item objectivity and scoring objectivity.
‘Item objectivity’ refers to the item that must call for a definite single answer.
In an objective-type question, a definite answer is expected from the test-takers.
While framing the questions, some points to be kept in mind are: ambiguous questions,
lack of proper direction, double barrelled questions, questions with double negatives,
etc. These concepts affect the objectivity of a test. Let us take an example of an
objective item. Suppose we ask students to write about Gandhi. This question does
not have objectivity. Because, here the answers will have different perceptions for
different individuals and also the evaluation. If we ask the students ‘what was Gandhi’s
father’s name’, this obviously will have only one answer and even the biasness of
the evaluator will not affect the scoring. So all the items of a test should be objective.
Objectivity of scoring refers to by whosoever checked the test paper would
fetch the same score. It refers to that the subjectivity or personal judgment or biasness
of the scorer should not affect the scores. The essay-type questions are subjective
and the scores are affected by a number of factors like mood of the examiner, his
language, his biasness, etc. Essay-type questions can have objectivity if the scoring
key and proper directions for scoring are provided.
Usability: Usability of a test refers to the practicability of a test. It refers to
the degree to which the test can be successfully used by the teachers/evaluators.
Usability of a test depends on certain aspects which are expressed in the following
manner:
(a) Comprehensibility: The test items should be free from ambiguity and
the direction to the test items and other directions to the test must be
clear and understandable. The directions for scoring and the
interpretation of scores must be within the comprehension of the user.
(b) Ease of administration: If the directions for administration are
complicated or if they need more time and labour, the users may lag
behind to use such tests. The directions for administration must be clear
and concise. The test paper should be constructed according to the
availability of time. Lengthy tests involving more time may not be
preferred for use.
(c) Availability: If a test is not available at the time of necessity, it lacks its
usability. Most of the standardized tests are of high validity and reliability,
56 Self-Instructional Material
but their availability is very less. So it is desirable that in order to be Major Tools and Techniques
in Educational Evaluation
reliable, the tests must be readily and easily available.
(d) Cost of the test: The cost of the test must be cheap, so that the schools
and teachers can afford to purchase and use them. If it will be costly,
then every school cannot avail it. So a good test should be of reasonable NOTES
price.
(e) Ease of interpretation: A test is considered to be good if the test scores
obtained can be easily interpreted. For this, the test manual should provide
age norms, grade norms, percentile norms and standard score norms
like standard scores, T-scores, Z-scores etc. So ‘interpretability’ of test
refers to how readily the raw scores of test can be derived and
understood.
(f) Ease of scoring: A test in order to be usable must ensure ease of
scoring. The scoring procedure must be a simple one.
All the directions for scoring and the scoring key should be available, to make
the scoring an objective one. The examiner’s biasness, the handwriting of the examinee
should not affect the scoring of a test.
Classification of Tests
Tests are divided into different types taking into consideration their content, objective,
administration system, scoring style etc. According to mode of administration, tests
are of two types:
(i) Individual test: When a psychological test is administered upon an individual
at a particular time, it is known as ‘individual test’.
(ii) Group test: When a test is administered upon a group of individuals at a
particular time, it is known as ‘group test’. It is mostly applicable on adult
literates.
According to the ability of the student, tests are of two types:
(i) Speed test: This type of test is applicable upon the individuals to know the
mental speediness. Here, the time is limited and the number of questions is
more and all the questions are equal in difficulty level. Railway examinations,
banking examinations are the examples of speed test.
(ii) Power test: This type of test is applicable upon the individuals to know the
mental power or the ability. Here, time limit is not there and the individuals
are expected to answer the question within as much time they like. All the
questions of this test are arranged according to difficulty level and
discriminating power. The essay competitions by the media are the bright
examples of power test.
According to the type of items involved in the test, it can be of three types:
(i) Essay-type test: Essay-type tests are otherwise known as open-ended tests.
The essay question is especially useful for measuring those aspects of complex
achievement that cannot be measured well by more objective means. These
include: (a) the ability to supply rather than merely identify interpretations and
Self-Instructional Material 57
Major Tools and Techniques application of data, and (b) the ability to organize, integrate and express ideas
in Educational Evaluation
in a general attack on a problem. Outcomes of the first type are measured by
restricted-response questions and outcomes of the second type by extended-
response questions. For example, ‘discuss the educational philosophy of M.K.
NOTES Gandhi’.
(ii) Short-answer type test: This type of test requires to be written in a short-cut
manner regarding a concept. It is suitable for measuring a wide variety of
relatively simple learning outcomes, and it is used almost exclusively to measure
the recall of memorized information. For example: ‘What is measurement?
Write within 50 words.’
(iii) Objective-type test: In objective-type questions, the individual is expected to
answer the question with the help of a word, a phrase, a number or a symbol.
The test with multiple-choice items, true-false items, matching type items,
fill-in-the blanks items, one-word substitution are the examples of objective
type test.
According to the method of scoring, test are of two types:
(i) Machine-scored test: The tests which are scored or assessed by the machines
like computer are known as ‘machine-scored test’. The Bank P.O. examination
is an example of machine-scored test.
(ii) Hand-scored test: The tests which are assessed by the human beings are
known as ‘hand-scored tests’. The classroom achievement tests is an example
of hand-scored tests.
According to the principle of test construction, tests are of two types:
(i) Teacher-made test: Generally ‘teacher-made tests’ are prepared by classroom
teachers to assess pupils’ growth. It is related to action research. Teacher-
made tests serve different purposes viz., to measure pupil’s achievement, to
know how far the specific objectives have been fulfilled, to diagnose the
learning difficulties, and to arrange specific remedial measures to award grades
etc. This type of test only follows two steps—planning and preparation.
(ii) Standardized test: Standardized tests measure the common objectives of a
wide variety of schools. They have standard procedures for administration
and scoring, and provide norms for interpreting the scores. A test manual and
other necessary material are typically provided to aid in the administration of
the test and the interpretation and use of the results. The test items are generally
of high quality because they have been prepared by specialists, subject experts,
pre-tested and selected on the basis of their effectiveness and their relevance
to a rigid set of specification. They are specially useful for measuring general
educational development, determining student’s progress from one year to
the next, grouping students, analysing learning difficulties, and comparing
achievement with learning ability.
Standardized test and teacher-made test have been discussed later in this unit
in detail.
58 Self-Instructional Material
According to the nature of the test, they are classsified as: Major Tools and Techniques
in Educational Evaluation
(i) Oral test: It is a kind of verbal test. In oral test, the individual is expected to
answer orally. This type of tests is mostly applicable to illiterates or small
children. In the public survey, the people are asked to speak something
regarding the issue. In the interview board, the interviewers ask questions to NOTES
the interviewee and the interviewee answers orally. This type of test is known
as ‘oral test’.
(ii) Written test: Here, the individual has to respond the questions in writing form.
So the respondent should have the writing ability. It is only applicable upon
the literates. All the written examinations are the examples of written test. It
is a kind of verbal test.
(iii) Performance test: This type of test is also known as ‘nonverbal test’. The
respondent is not expected to respond verbally. He has to perform the task.
The running competition, jumping competition held by physical examination
are the examples of performance test.
60 Self-Instructional Material
Barr, Davis and Johnson define questionnaire as, ‘a questionnaire is a Major Tools and Techniques
in Educational Evaluation
systematic compilation of questions that are submitted to a sampling of population
from which information is desired’ and Lundberg says, ‘fundamentally, questionnaire
is a set of stimuli to which literate people are exposed in order to observe their
verbal behaviour under these stimuli’. NOTES
Types of Questionnaire
Figure 2.1 depicts the types of questionnaires that are used by researchers.
Questionnaire
Individual
In groups
In print
Projected
64 Self-Instructional Material
Major Tools and Techniques
2.5 DIAGNOSTIC TEST CONSTRUCTION AND in Educational Evaluation
USEFULNESS
Diagnostic tests are a kind of educational tests. These tests have two purposes: NOTES
(i) prognostic purpose, and (ii) diagnostic purpose. The prognosis of the students
done in the specific subjects which has been taught to them. The diagnosis means to
identify the causes of their weakness of their poor attainment of the students. The
prognosis and diagnostic functions are complementary to each other, and both are
essential in educational measurement and evaluation.
A diagnostic test is designed to real specific weakness or failures to learn in
some subject of study such as reading or arithmetic. In a diagnostic test, the main
interest is the performance on individual items or on small groups of highly similar
items. In the diagnostic test, score or mark is not assigned for the correct answer
but wrong answer provide the basis for the causes of his failure.
Diagnostic tests are those which help us to know the particular strength and
weakness of the student. These tests are also known as ‘analytical tests’. The
correct answer provides a student’s strength and wrong answer indicates his
weakness. The achievement tests provide the overall scores on the basis of correct
answers of the subject items. The wrong answers are assigned zero marks. These
attainment tests do not provide reason for the poor scores of an individual.
The term ‘diagnostic’ as applied to tests is fraught with danger and ambiguity.
Educationists consider the certain test as diagnostic, while others are achievement
tests which have no diagnostic characteristics. A diagnostic test undertakes to provide
a picture of strengths and weaknesses. Hence, any test that yields more than a
single overall score is, in a sense, diagnostic. Even if there are only two part scores,
for example, one for arithmetic computation and one for arithmetic reasoning,
diagnostic tests makes it possible to say that the student performed better in
computation than in reasoning answers to problems. It means that the diagnostic
tests are qualitative, not quantitative. A diagnostic test does not yield the total scores
of an individual in a subject which he has studied and taken the test.
Analysis as the basis of diagnosis
The successful development of school learning depends upon the care with which
the underlying and basic skills of the subjects themselves are recognized and utilized
in teaching. For instance, teaching a child how to do additions comprise not only in
developing the habit of responding automatically and correctly to the basic
combinations, but also involves higher levels of skill, such as control of the attention
span, and carrying from one column to the next. The teacher’s task is made obvious
and objective if he understands this. Similarly, it can be shown that silent reading
comprehension is not a single isolated ability. It is a composite of many elements,
Self-Instructional Material 65
Major Tools and Techniques such as knowledge of word meanings, ability to get meaning from sentences, ability
in Educational Evaluation
to arrange through unit and sentence units into logically organized wholes, and ability
to find desired material quickly. The teacher has a real basis for instructional
procedures with this knowledge.
NOTES Language is another basic subject in which many delicately balanced skills
are interwoven in an extremely complex manner. Here again the elements of
achievement in the total process must be identified. Blind trust in general practice on
the total skill must necessarily give way to the exact identification and discovery of
the particular points of pupil weakness as a basis for special emphasis.
Good diagnosis must parallel the process of good teaching. Effective diagnostic
materials in any school subject can be prepared only after the skills contributing to
success in that subject have been isolated and identified. Psychologically, the reason
for this is that on the whole the child learns to do what he practises or does. Remedial
work, accordingly, can function only when the point at which pupil mastery breaks
down has been located. Thus, the analysis must be penetrating and the diagnosis
must be precise.
Specific nature of diagnosis
Diagnosis must be more exact than merely a broad statement of general functions.
It is not enough to discover that a child is unable to read silently. The exact nature of
his handicap must be revealed before it is possible to undertake a remedial
programme. The more specific the diagnostic information revealed, the more exactly
the remedial material can be made to fit the need. To return to a frequently used
illustration, if it is found by diagnosis that a child is unable to add, unless the exact
point at which his mastery of addition breaks down can be determined by the diagnosis,
teaching, or remedial efforts are largely wasted. One of the outstanding reasons
why more effective teaching and remedial work has not been done in certain fields
is that no adequate analysis of basic skills can be made or has been made.
Importance of diagnostic use of test results
Tests as such are incapable of improving instruction because of any inherent power.
Existing conditions are merely revealed by them. Remedial or corrective teaching is
the result of deliberate constructive effort by the teacher after the particular points
of weakness in the instruction of the pupils has been revealed by the tests. The ease,
the clarity and the directness with which these needs are revealed by the tests are a
measure of their real educational value.
Very few existing tests are so constructed as to permit the interpretation of
their results directly in terms of an effective remedial procedure. However, this
seems to be no good reason for the failure of teachers to apply more directly the
results of this work in testing to the improvement of their teaching practice. Just as
the data revealed by the navigator’s instruments require calculation and interpretation,
so is it necessary to analyse test data carefully in order to make them the basis of
genuine remedial programme.
66 Self-Instructional Material
Diagnosis as the basis for remedial work Major Tools and Techniques
in Educational Evaluation
Accurate diagnosis of class and individual pupil difficulties, coupled with application
of remedy is not only important but necessary for teachers. The success of the
remedial or corrective teaching depends upon the accuracy and detail with which
NOTES
the specific skills involved in successful achievement in the subject are identified
and isolated in the test. Tests of the general survey type, or tests that report single
unanalysed scores, cannot supply this information in sufficient detail.
Diagnosis as the basis for preventive work
Diagnosis as applied in education has taken on a meaning indicative of a breakdown
in method, a failure of instructional techniques. Unquestionably, one of the basic
purposes of diagnosis is the location of weaknesses and the determination of their
causes, but there is nothing in the method that precludes its use in the prevention of
weaknesses through anticipation of their causes. Out of the knowledge gained through
the use of diagnostic procedures should become the basis for preventive work of all
types. The existence of a weakness implies a failure at some point in the programme.
The real importance in the discovery should lie rather in the prevention of its
reappearance elsewhere under similar conditions.
An illustration from the field of medicine may make this point somewhat
more concrete. In every medical examination for diagnostic purposes, a complete
analysis is made and an exact case record of all observations is kept. Due to the
analysis of these records, a better understanding of the causes and characteristics
of certain types of human ailments is possible. Out of this same type of analysis has
also come the basis for much of the preventive work that characterizes modern
medical science.
In a similar way, accurate and detailed educational diagnosis may ultimately
offer the basis for the development of a programme of preventive work in education.
For example, if after diagnosing the addition of fractions in the fifth grade, it is found
that the failure of pupils to reduce the fractions to their lowest terms in the answers
is a common weakness, the obvious thing to do is to correct the defects at once, and
then proceed to reconstruct the first instruction so that in the following year the
causes for this particular weakness may not operate so powerfully. Similarly, any
weakness identified now should afford the basis for decisions calculated to reduce
the probability of their recurrence in the future.
2.5.1 Types of Diagnostic Tests
The diagnostic tests are broadly classified into two categories: (i) educational diagnostic
tests, and (ii) physical or clinical diagnostic tests.
(i) Educational diagnostic tests: These tests are related to different subjects
for specific level or class or grade.
(ii) Clinical diagnostic tests: These tests are also of several types relating to
hearing, vision and other aspects.
Self-Instructional Material 67
Major Tools and Techniques The regular classroom teacher should take help of a clinical specialist. Due to their
in Educational Evaluation
knowledge about the students, classroom teachers can be more confident in their
orientation towards group approaches. The teacher would expect normal actions
and behaviours on the part of all students. They would be prepared, however, to deal
NOTES with reasonable departure from normal behaviour, and for the unusual cases, the
teacher would feel free to invite the person with special skill in diagnosing and
treating such cases to bring them at par to the normal students of the class.
Functions of diagnosis
Although diagnosis is an individual activity, it has four functions:
(i) Classification
(a) Intellectual level
(b) Vocational level
(c) Aptitude or Musical level
(ii) Assessment of specific ability
(a) Level of adjustment
(b) Level of abnormality
(c) Level of depression and anxiety
(iii) Aetiology function refers to study the diagnosis or causes.
(iv) Remediation may be of following types:
(a) Clinical treatment for physical ailments
(b) Counselling for mental ailments
(c) Remedial teaching for learning weakness
(d) Special education for handicapped
These functions are highly individualized.
Methods of diagnosis
In view of the above functions, the following methods are used for diagnosis:
(i) Observation method: It is most popular method used for both prognosis
and diagnosis. It is used for children. It is a subjective method, as well as
being the most commonly used method.
(ii) Testing procedure used for diagnosis: This may include: (a) Clinical testing
method, (b) Psychological testing method, and (c) Educational testing method.
It has been mentioned that the teacher has to invite other skilled persons to know the
deficiencies of the students to provide proper remediation.
Steps for construction of diagonostic test
The following steps are used for preparing the diagnostic tests:
• Formulation of objectives and outline of the content or topic.
• Content analysis into sub-topics and its elements: (a) Sequence of sub-topics
and elements within the sub-topic; (b) Sequence of learning points.
• Identifying in difficulty order of sub-topics.
68 Self-Instructional Material
• Deciding types of the items. Major Tools and Techniques
in Educational Evaluation
• Preparing items and try-out.
• Item analysis of test items and modification of items.
• Analysis of logical sequence of content. NOTES
• Preparing the final draft of the test.
• Preparing manual of the test.
• Remedial devices or measures.
Diagnostic test in school subjects
Let us look at the different diagnostic tests in school subjects.
(i) Diagnostic reading test: Reading is an important skill for school subjects.
Low silent reading test includes the following sub-tests:
(a) Role of reading and comprehension of prose.
(b) Poetry comprehension and appreciation.
(c) Vocabulary in different areas.
(d) Meaning of sentences.
(e) Paragraph comprehension.
With these tests, the following types of errors are committed by the students:
wrong pronunciation of words, spelling errors, omission error, repetition error,
placement of words, combining word, adverse reading etc. The teacher has
to locate the real causes for these errors, only after that remediation would
be provided.
(ii) Diagnostic test of mathematical skills: The compass diagnostic test in
arithmetic is used on the topic: addition, subtraction, multiplication and division.
The following types of errors are committed by the students:
(a) Carry over in addition.
(b) Borrowing in addition.
(c) Borrowing and reducing one from next.
(d) Placement of decimal error.
(e) Tables are not remembered in multiplication.
(f) Placement of decimal in multiplication.
(g) Tables are not remembered in division.
(h) Noting wrong numbers.
Example of early over error in addition
Self-Instructional Material 69
Major Tools and Techniques Example of borrowing error in subtraction
in Educational Evaluation
2.6 SUMMARY
Self-Instructional Material 71
Major Tools and Techniques • Tests are divided into different types taking into consideration their content,
in Educational Evaluation
objective, administration system, scoring style etc.
• An achievement test is an instrument designed to measure the relative
achievement of students. It is an indispensable instrument in the teaching–
NOTES learning process.
• A questionnaire is a tool for research, comprising a list of questions whose
answers provide information about the target group, individual or event.
Although they are often designed for statistical analysis of the responses, this
is not always the case. This method was the invention of Sir Francis Galton.
• Diagnostic tests are a kind of educational tests. These tests have two purposes:
(i) prognostic purpose, and (ii) diagnostic purpose.
• The prognosis of the students done in the specific subjects which has been
taught to them. The diagnosis means to identify the causes of their weakness
of their poor attainment of the students.
• The diagnostic tests are broadly classified into two categories: (i) educational
diagnostic tests, and (ii) physical or clinical diagnostic tests.
• Standardized test: A standardized test is any form of test that requires all
test takers to answer the same questions, or a selection of questions from
common bank of questions, in the same way, and that is scored in a ‘standard’
or consistent manner, which makes it possible to compare the relative
performance of individual.
• Criterion-referenced test: Tests that are designed to measure student
performance against a fixed set of predetermined criteria or learning standards.
• Objectivity: Objectivity is a noun that means a lack of bias, judgment, or
prejudice.
• Pictorial: Something that is expressed in pictures or is illustrated.
• Remedial: A remedial action is intended to correct something that is wrong
or to improve a bad situation.
Short-Answer Questions
1. What is a standardized test? What are its characteristics?
2. Differentiate between criterion-referenced tests and norm referenced tests.
3. What are the essential characteristics of a good test?
4. Discuss the merits and demerits of different types of tests.
5. What are the different types of diagnostic tests?
Long-Answer Questions
1. Describe how a criterion-referenced test can be constructed.
2. Discuss how tests can be classified.
3. Discuss the difference types of questionnaires that are used by researchers.
4. Describe how a diagnostic test can be constructed.
3.0 INTRODUCTION
In the previous unit, you learnt about the major tools and techniques used in evaluation.
In it, standardized tests, criterion-referenced tests and diagnostic tests were discussed.
In this unit, the discussion will turn towards psychological testing. Psychological
testing refers to the administration of psychological tests. A psychological test is an
objective and standardized measure of a sample of behaviour. There are many
types of psychological tests - intelligence tests, personality tests, occupational tests,
and so on. This unit will discuss psychological tests in detail. The unit will also
discuss different types of examination systems such as semester system and the
open book system.
Self-Instructional Material 75
Psychological Testing
3.2 PSYCHOLOGICAL TESTING IN THE AREA
OF INTELLIGENCES
• Verbal intelligence test: A verbal intelligence test needs the following verbal
capabilities in the tests regarding the language in which the intelligence test is
developed and administered:
o Understanding spoken language
o Understanding written language
76 Self-Instructional Material
o Ability to speak language Psychological Testing
Self-Instructional Material 77
Psychological Testing Some important tests of Intelligence
There are many tests for measuring intelligence. As discussed above, they can be of
three types namely verbal, non-verbal and performance types. Some important tests
of intelligence are given as follows:
NOTES
• Stanford–Binet test of intelligence
• Wechsler–Belleve intelligence test
• Thurstone’s test of primary mental abilities
• Non-verbal test of intelligence (also known as Reven’s Progressive Matrices)
• Culture free intelligence test
• Bhatia–battery performance test
Note: To study and for using any of the intelligence test, consult any psychological
test library or psychology lab. Read the manual of the test and try to use it to have
first hand experience.
• To assess how far the desirable attitudes have been developed in the students
during the course and after the completion of the course.
• To help the students to develop positive attitude towards certain things.
NOTES
• To help the students in their career plan.
• To help the management to make its administration and supervision a qualitative
one.
• To help the teacher to overcome their weakness in the teaching–learning
situation.
• To help the students to check their undesirable behaviour.
Measurement of Attitude
Attitude is a subjective concept which is not absolute, rather relative. So, when test
is prepared for testing attitude, certain dimensions are to be kept in mind. The
dimensions are:
• Direction
• Degree
• Intensity
From the direction point of view, there are two kinds of directions: (i) positive and
(ii) negative. When an individual has positive bent of mind towards something, it is
known as ‘positive attitude’ and when he has negative bent of mind towards something
it is known as ‘negative attitude’. Every student’s attitude should be measured in
relation to his teaching–learning situation.
Every attitude has its degree. For example, a person who sings occasionally
has less degree of positive attitude towards singing in comparison to the person
whose profession is singing. So at the time of measuring attitude, the degree of
predisposition should be taken into consideration.
Attitudes also have an intensity dimension. At a high degree of intensity, some
kind of behaviours are motivated towards a greater extent. So all these dimensions
should be kept in mind at the time of attitude testing.
The methods to be followed for the measurement of attitude are:
(i) Thurstone Scale: This scale was developed by Thurstone. Thurstone’s attitude
scale is known as equal-appearing interval scale. In this scale, both favourable
and unfavourable statements regarding a particular topic are reflected in an
eleven point scale. The respondent is supposed to check the point for each
item according to his attitude. The median of the judged locations for an item
is its scale value. The scale position are very much a function of the judges
who are chosen.
(ii) Likert Scale: This scale was developed by Likert. All the items of this scale
are followed by five options. The respondents are supposed to point out the
option they like. The decisions are either favourable or unfavourable on the
object, or person. Judges are not appointed for this scale, and this scale is
Self-Instructional Material 81
Psychological Testing known as ‘five-point scale’. Likert type scale is less time consuming and
economic. Its approach is more empirical because, it deals with the
respondent’s score directly rather than employing judges. The sum of the
item credits is the total score of the individuals, which is interpreted in terms
NOTES of empirically established norms.
Example of Likert type scale:
1. ‘Science is the Scales 5 4 3 2 1
soul of present Values SAA U D SD
day society’.
Values:
SA – Strongly Agree 5
A – Agree 4
U – Undecided 3
D – Disagree 2
SD – Strongly Disagree 1
Limitations of attitude testing
Attitude testing has certain limitations which cannot be avoided. The limitations are:
• Attitude is a subjective concept, so it is very difficult to measure attitude
quantitatively.
• Attitude is such a complex affair that it cannot be represented by any single
numerical index.
• Attitude is learned not inborn. So it varies from situation to situation and time
to time.
• In most of cases, it is seen that there is difference between verbally expressed
attitudes and attitudes reflected in behaviour.
3.3.2 Personality Tests
The concept of personality is subjective in nature. It is very difficult to assess or
measure subjective concepts objectively. Still, psychologists have tried to measure
the personality of human beings through different tools and techniques. In the primitive
society, physical strength was the norm of personality Measurement; during the
Vedic period, the memorization of Vedas was the norm of personality measurement.
Later, astrology, palmistry, physiognomy and phrenology were considered as the
measures of personality.
The methods used for assessment of personality may be categorized as
subjective, objective and projective techniques. It is very difficult to bring a watertight
compartment among all these assessment techniques. Some of the important
techniques are discussed here.
Concept of personality
According to G.W. Allport, ‘personality is the dynamic organization within individual
82 Self-Instructional Material of those psycho-physical systems that determine his unique adjustment to his
environment’. Eysenck defined personality as ‘the more or less stable and enduring Psychological Testing
organization of a person’s character, temperament, intellect and physique, which
determine his unique adjustment to the environment’. According to Watson,
‘personality is the sum of activities that can be discovered by actual observations
over a long period of time to give reliable informations’. For Morton Prince, personality NOTES
is ‘the sum total of all the biological innate dispositions, impulses, tendencies, appetites
and instincts of the individual and the dispositions and tendencies acquired by
experience’.
In a nutshell, personality refers to the external appearance and internal qualities
of an individual. It is something unique to everyone, and it is the result of the interaction
of heredity and environment. Personality refers to individual’s unique and relatively
stable patterns of behaviour, thoughts and feelings. We cannot draw a watertight
compartment between personality and all the psychological traits. Personality is a
summative approach which assesses all the integrative qualities of an individual.
Learning and acquisition of experiences in every platform of life contribute towards
growth and development of personality.
Characteristics of personality
The characteristics of personality are as follows:
• Personality is built by heredity and environment.
• There is individual difference in personality.
• Personality determines one’s adjustment to his environment.
• Personality emerges from the interaction of psychobiological organism.
• Personality may be intrinsic or extrinsic.
• Personality is the reflection of all the psychological and physical traits of an
individual.
• Personality can be assessed.
• Personality means man within the man.
• Personality refers to social adaptability.
• Personality is a dynamic organization.
• Behaviour is the reflection of personality.
• Personality permits a prediction about an individual.
• Personality is more or less stable in nature.
• Personality exhibits self consciousness.
• Personality includes all the behaviour patterns: conative, cognitive and affective.
• Personality includes conscious, semi-conscious and unconscious activities.
• Learning and acquisition of experiences contribute towards growth and
development of personality.
• Personality should not be taken as synonymous with one’s character and
temperament.
• Personality of an individual is directed towards some specific ends. Self-Instructional Material 83
Psychological Testing Determinants of personality
An individual is the by-product of his heredity and environment. Many believe that
heredity plays a major role towards personality development, but in reality, no one
factor can be given credit of influencing personality. Personality is the by-product of
NOTES
many factors which are discussed below.
(i) Heredity: In most of the cases, children are more similar to their parents
and siblings. Heredity influences physique, motor-sensory equipment and level
of intelligence. Physical appearance of an individual contributes a lot for his
personality. So, heredity does play an important role in the development of
personality.
(ii) Environment: Here, we will discuss about three types of environment:
(a) Physical environment, (b) Social environment, and (c) Cultural environment.
These factors make an individual to ‘acquire’ a personality.
(a) Physical environment: Physical environment refers to the physical,
geographical and geological structure of the area where an individual
lives. In case of the cold countries, people are white; but in hot countries,
people’s complexion is black. The people of North East of India are
different from the people of South India in their colour and physical
appearance. This difference is mostly due to the physical environment.
(b) Social environment: Social environment of an individual includes all
social agents like parents, siblings, peer groups, school, teachers,
neighbour etc. All these factors play their role in the development of
personality of an individual. Children from homes where morality, honesty,
spiritualism, religiousness are given importance, are different from the
children from the homes of poverty, family disorder, formal relationship
among all etc. A child’s maximum time is devoted in school with his
teachers and friends. Teachers are the role models before the children.
The principle of discipline, living cooperatively, respect to teachers, feeling
brotherhood and sisterhood in uniform dress, all these are acquired by
schools. Teacher’s open mindedness, democratic look, enthusiastic and
industriousness, leave a mark upon a child which develops his personality.
Interpersonal relationship among the members of a society are important
means which help in the development of certain social personality
characteristics. All these factors help in the development of personality.
(c) Cultural environment: Cultural values moulds the personality of people
belonging to that culture. A child internalizes the values, ideas, beliefs,
norms and customs of a culture through the interaction with this culture
and the society. Every society has its own cultural heritage, and this
cultural heritage transmits from generation to generation successfully
which is known as ‘enculturation’. The personality of the people of
Eastern society is different from the personality of the people of Western
society, it is due to the cultural change. Margaret Mead conducted a
study on the adolescents of Somoa, a primitive culture. She concluded
that the cultural pattern of a society influence to a great extent the
personalities of individuals.
84 Self-Instructional Material
Subjective and objective techniques Psychological Testing
Several methods of assessment and evaluation of learners are in use these days,
such as unit tests, half-yearly examinations, annual examinations, semester system,
board examinations and entrance tests.
Unit tests
Unit tests are prepared and conducted to check the understanding level and to know
the problems of the students at initial level. In these tests, the subject matter is
selected from a specific content taught in a limited time period. The subject matter
is significantly pertinent to the objectives to attain in that period. The unit can be an
entire module. Each unit test is totally independent from other tests to be conducted
in the session.
Benefits of unit tests
There are many benefits of unit tests, such as:
• Objectives are clearer and well defined.
• Follow-up after evaluation is simpler.
90 Self-Instructional Material
• Students get opportunities for improvement. Psychological Testing
94 Self-Instructional Material
9. Since it is impossible to achieve 100 percent or even near to 100 percent Psychological Testing
reliability of the various tools used for evaluating pupils, it is desirable that
students should be classified broadly into 5 to 7 grades rather than using 101
point scale as at
NOTES
10. As every pupil learns at his own rate he should be judged in terms of his own
capacities and goals and not in terms of the standards of his class, institution
or the board of secondary education. As such passing of a student in all the
subjects at a time cannot be considered essential.
11. The grades of every pupil indicate only his level of performance which may
be satisfactory or unsatisfactory in terms of his own standard. Thus, a grade
howsoever low it may be, cannot be taken as failure but as an indicator of his
present level of achievement.
12. Given more time and proper remedial teaching a student can improve his
achievement Therefore, a student should get the opportunity of improving his
grade in one or more subjects, if fie so desires.
13. The more accurately and meaningfully the evidence about pupils’ growth in
different aspects of his development is reported to students, teachers, parents,
employers and institutes recording of pupil performance in various areas of
development is a prerequisit to every evaluation programme.
Evaluation of Non-Scholastic Aspects of Pupil’s Growth
A national seminar organised by the NCERT from 8 to 12 October, 1979 made the
following recommendations on making effective evaluation of Non-scholastic aspects
of pupil’s growth.
1. Educationists the world over are expressing a grave concern over the bookish
nature of the present day education dominated by examinations which tends
to evaluate mostly the cognitive outcomes of education only. This is particularly
true of the Indian scene. The seminar strongly recommends that educational
policy-makers, planners, administrators and teachers should take a serious
note of this lacuna and take proper steps to ensure that adequate attention is
paid to the development and evaluation of both the cognitive and non-cognitive
aspects of pupil growth. This is possible only when the schools pay adequate
attention to the development and scientific evaluation of both these aspects in
an integrated and balanced manner.
2. Since the foundations of personality are laid in early childhood, it is very
important that developmental evaluation of non-cognitive aspects of growth
should start as early as possible. The Education Departments and the Boards
of Secondary Education should ensure that proper facilities are provided in
the educational institutions so as to make possible the development and
evaluation of these aspects of pupil growth. For this purpose, it would be
necessary to develop systematic and realistic programmes for various types
of institutions and to ensure that they are sincerely executed. The importance
of cocurricular activities in the fostering of non-scholastic objectives of
Self-Instructional Material 95
Psychological Testing education is too well- known to need reiteration by this seminar. It is, therefore,
recommended that at least six periods per week must be provided for
cocurricular activities in all secondary and higher secondary schools. The
proportion could be even large in primary schools. The planning and execution
NOTES of these activities need to be done in a manner that participation of each and
every pupil in some worthwhile activity is duly made possible.
3. Appreciating the role of education in shaping the personality of the child the
seminar recommends that evaluation in schools should be used more for
developmental purposes than for simply reporting, promoting from class-to-
class and certification. This is particularly true to the non-scholastic aspects
of growth, scientific assessment of which poses serious theoretical and
practical problems. The seminar, therefore, recommends that the evaluation
done in these areas should be used for formative: purposes only and should
not be quantified to be used for promotion from class-to-class.
4. The seminar considered in detail the role of class-room instructional
programmes in the development of non-scholastic aspects of growth. It was
agreed that in each subject area specific objectives of the psychomotor and
effective domains should also be identified and included in the evaluation
scheme. Similar exercise may be done in the cocurricular area as well. In
addition, programmes conducive to the development of these objectives should
also be identified and executed in both the areas in a systematic manner.
5. Training of teachers in the development and evaluation of non-scholastic
aspects of pupil growth is very crucial, more so because of the elusive and
non-tangible nature of these aspects. The Departments of Education, Boards
of Secondary Education and the Universities should undertake in-service
programmes on an extensive scale and also, incorporate to relevant topics
and activities in the preservice programmes. Availability of adequate facilities
for training in this area should be made an essential condition for recognition
of training institutions at all levels.
6. The role of the supervisor in the proper execution of educational programmes
in India cannot be ignored. For successful implementation of any innovative
project, involvement of administrators and the headmasters is a must and
therefore orientation of these categories of staff in the supervision of work in
the non-scholastic aspects of growth should receive proper emphasis in their
training and orientation programmes.
7. The present state of affairs in education in India is an inevitable outcome of
quantitative expansion at the cost of quality control. To restore education to
its pristine glory it is necessary that serious notice is taken by the Central and
the State Governments and the local communities of the existing inadequate
facilities in most of the schools and ernest efforts are made to improve them
early so as to make possible a satisfactory implementation of educational
programmes. The wide gap between the stated objectives of education and
their actual implementation in schools for want of adequate facilities need to
be appreciated early enough and remedial measures taken accordingly.
96 Self-Instructional Material
8. Instructional materials and manuals make a significant contribution to the Psychological Testing
successful implementation of educational programmes. There is a great need
for the development of such materials in the forms of teacher’s guides,
students’ guides etc. and their circulation to schools. The Department and the
Boards of Secondary Education should get such materials developed with the NOTES
help of Teachers colleges and other experts in adequate quantities for the
guidance of teachers and students. The role of mass media in this area need
also be recognized.
9. To give due importance to the development and evaluation of non-scholastic
aspects of growth in the school time-table, it is necessary that the periods
devoted by teachers to this work should be counted in their work-load.
10. For an effective implementation of any programme of development and
evaluation of non-scholastic aspects of growth it is essential that good work
done by teachers in this area should receive due recognition for which the
Department and the Boards can undertake a variety of steps such as holding
seminar reading programmes, issuing of special certificates of recognition to
outstanding teachers and providing for the assessment of the work in this
area in the confidential report forms of teachers.
11. The need for research and development activities in the non-scholastic area
of pupil growth is too obvious to be dilated upon. The National Council of
Educational Research and Training, the State Institutes of Education, Colleges
and the Departments of Education in the Universities should appreciate this
need and undertake Systematic and Scientific research in this area. A beginning
in this respect has been made by the National Council of Educational Research
and Training which needs further intensification at all levels.
Trends in Examinations and Evaluation
As a result of reform movement in examinations and evaluation, the following major
trends are disconcernible:
• Use of more scientific and systematic methods instead of arbitrary methods.
• Stress on performance in both academic and non-academic areas
• Continuous evaluation in place of periodic examinations
• Use of a variety of techniques and tools instead of a few techniques and tools
of evaluation
• Wider uses of test results in place limited uses
• Emphasis on improvement of achievement rather than measurement of
achievement
• Treatment of testing in relation to other elements of the curriculum rather
than treating testing in isolation
• Preferring grades to marks
• Awarding subjectwise grades in place of overall grades
• Clearing the examination in parts and not at one stroke
Self-Instructional Material 97
Psychological Testing • Improving grades through subsequent attempts
• Providing opportunities for re-evaluation
• Spot evaluation at specified centres in place of marking by examiners at their
NOTES residences
• Deployment of special squads for checking unfair means
• Declaring the use of unfair means as an offence
• More flexible and purposeful form of question papers than the traditional one
• Apart from testing memorisation, questions include for evaluating abilities
like understanding, critical thinking, application of knowledge and skills etc
• More coverage of the syllabus
• Inclusion of a variety of questions in the question paper
• A large number of questions in place of a few questions
• Replacement of overall options to internal options
• Specifically worded questions in place of large questions
• More objective scoring
• Use of multiple sets of question papers
Use of Question Bank
One of the main shortcomings of the present education system is the lack of
appropriate questions asked in the examination. It does not meet the teaching-learning
process properly. The person who prepares the questions basically depends on his
choice and questioning skill. These questions are prepared on basic knowledge and
many times it also has been observed that most of the questions are repeated; only
language is changed. These questions are not able to evaluate knowledge,
understanding or applicability domain of the students. As a result, students prepare
only few questions for the examination, which does not give the complete evaluation
of student’s knowledge.
Keeping all these points in mind, the idea of preparing a question bank has
been generated. A question bank is a set of questions on a subject used either for
study review or for drawing questions used in an examination. It is readymade
collection of the questions that helps students and teachers in teaching learning
process. It is prepared for critical evaluation. It is a means of determining the
presence, quality and criticality of the subject. Thus, it can be said that the question
bank is a database of questions that can be shared among various courses. How
questions meet specific criteria to create assessments is important.
A good question bank:
• Tests student understanding of a subject
• Guides the student to learn a lesson deeply
• Develops questioning skill among students and higher level thinking
98 Self-Instructional Material
In a question bank, a number of questions are prepared from a single unit of a Psychological Testing
subject, maximum number of questions those can be prepared from a unit are prepared.
There may be many types of question papers:
• Bank of objective type questions
NOTES
• Bank of short answer questions
• Bank of essay type (descriptive) questions
• Bank of miscellaneous questions
These questions are prepared on the basis of difficulty level and discrimination
power to make these question banks more reliable and logical.
To prepare a question bank, you must:
1. Ensure that there are all types of questions- Objective type questions,
short-answer questions, long-answer questions, questions requiring illustrations,
diagrams etc. The main thing is that the question bank must be exhaustive as
far as the content coverage is concerned
2. Ensure that there are all level of questions- There must be all levels of
questions in a question bank. These levels pertain to complexity; from very
simple to very complex questions, in order to prepare the student fully for
examination
3. Ensure questions from previous years’ question papers- Sometimes,
examiners setting the papers use cryptic language to form questions to confuse
students. Students should know how to interpret the cryptic questions and
answer correctly. This can be achieved by offering them such questions from
old exam papers.
4. Provide answers at the end- A good question bank always contains answers
at the end. Otherwise there is no way for the student to confirm if he/she has
answered correctly. For mathematical problems or Physics numerical questions,
there need not be the entire solution, but just the final answer.
Differential Vs Uniform Evaluation
In evaluation situations, we tend to overlook the fact that there are individual
differences among students. We feel satisfied with using a uniform yardstick for all
students at a particular point of time. We do not usually take cognizance of the fact
that the rate of learning of different children may be different, the learning situations
may vary from class to class and student to student and resource inputs may not be
the same. Evaluation has to take care of and cater to the needs of such extensive
and intensive differences and promote self-paced learning by providing for individual
differences instead of prescribing uniform group evaluation.
Full Vs Partial Evaluation
Full evaluation must take into account all the three aspects i.e. cognitive, affective
and psycho-motor. The traditional system of examination evaluates only cognitive
abilities. Even in cognitive area external examinations often to do not go beyond the
Self-Instructional Material 99
Psychological Testing evaluation of memorisation or convergent thinking. These drawbacks can be overcome
through evaluation at the school level.
3.4.2 Open Book Examination
NOTES While developing assessment tools for classroom, certain issues should be addressed
in relation to the following:
• Purpose and impact: How should the assessment be used and how should
it impact the instruction and the selection of curriculum?
• Validity and fairness: Does it calculate what it intends to calculate?
• Reliability: Is the data that is to be collected reliable across applications
within the classroom, school and district?
• Significance: Does it address content and skills, which are to be valued by
and reflect current thinking in the field?
• Efficiency: Is the method of assessment consistent with the time available in
the classroom setting?
There is a huge range of assessments that are accessible for use in reforming
science assessment in the classroom. These types of assessments comprise strategies
that are traditional as well as alternative. The diverse types of alternative assessments
can be used with a range of science content and process skills, including the following
general targets:
• Declarative knowledge
• Conditional knowledge
• Procedural knowledge
• Application knowledge
• Problem solving
• Critical thinking
• Documentation
• Understanding
Assessment can be divided into three stages:
1. Baseline assessment: It establishes the ‘starting point’ of a student’s
understanding.
2. Formative assessment: It provides information to help guide the
instruction throughout the unit.
3. Summative assessment: It informs both the student and the teacher
about the level of conceptual understanding and performance capabilities
that a student achieves.
The huge range of targets and skills that can be addressed in classroom
assessment needs the use of many assessment formats. Some formats, and the
stages of assessment in which they most likely occur are shown in Table 3.3.
Oral exams
An oral exam is a chance for one to demonstrate their knowledge, their presentation/
speaking skills as well as their ability to communicate. These exams can be formal
or informal, but all exams should be considered formal exchanges for making a good
impression. For both types, you should listen carefully to the question and then answer
directly.
Self-Instructional Material 101
Psychological Testing Formal exams have a list of questions in a ready format. The criteria for
evaluation are usually set in a right/wrong format, and can at times be competitive.
For this kind of exam, if you wish to add ‘related’ or qualified information, first ask
permission as a basic courtesy.
NOTES Informal exams are mostly more open and your responses can be usually
longer and the evaluations can be more subjective. Answers are generally less exact
(right/wrong) and more value is added for problem solving analysis and method, as
well as interpersonal communication and presentation.
Written tests
Written tests are those tests that are given on paper or on a computer. A test taker
who takes a written test might respond to precise items by writing or typing inside a
given space of the test or on a separate form or document.
A test developer’s choice of what style or format to use while developing a
written test is generally random agreed that there is no solitary invariant standard
for testing. As a result, these tests might comprise only one test item format or might
have a combination of different test item formats.
Some common written test formats are:
• MCQs or Multiple Choice Questions- A statement or question is given
with five or six options and the respondent has to choose the correct answer
or answers, depending on the correct answers included. In general, besides
the correct answer/s, at least three incorrect answers are included to confuse
the respondent.
• Fill-in-the-blanks- A statement is given with some key words missing and
the respondent needs to provide the missing words.
• Short-answer questions- The respondent is asked to write brief answers,
which may include a definition, a short description, an example, a list, or to
draw a figure and so on.
• Long-answer questions- The respondent is asked to write descriptive, essay
type answers that require him or her to analyse and apply the concepts learned.
Open book examination
In an open book exam you are assessed on understanding instead on recall as well
as memorization.
You will be expected to do the following:
• Apply material to new situations
• Analyse elements and relationships
• Synthesize or structure
• Evaluate using your material as evidence
• Access to content varies by instructor
The exam can be taken home or in the classroom with questions that are
seen or unseen before exam time. You should not underestimate the preparation
1 Sub topic - 1 15 60
2 Sub topic - 2 10 40
Total 25 100
NOTES
Topic I Topic II Topic III Topic IV Total
Knowledge 1 2 1 1 5
(12.5%)
Comprehension 2 1 2 2 7
(17.5%)
Application 4 4 3 4 15
(37.5%)
Analysis 3 2 3 2 10
(25%)
Synthesis 1 1 2
(5%)
Evaluation 1 1
(2.5%)
Total 10 10 10 10 40
(25%) (25%) (25%) (25%)
3.5 SUMMARY
o Acquisitive attitude
o Play attitude
o Scientific attitude
NOTES
o Business attitude
o Artistic attitude
o Religious attitude
• The determinants of attitude are:
o Cultural or social determinant
o Psychological determinants
o Functional determinants
• The methods to be followed for the measurement of attitude are:
(i) Thurstone Scale
(ii) Likert Scale
• According to G.W. Allport, ‘personality is the dynamic organization within
individual of those psycho-physical systems that determine his unique
adjustment to his environment’.
• The term ‘project’ was used for the first time by Sigmund Freud in the field of
psychology. Projection, according to Freud means externalizing of conflicts
or other internal conditions that give rise to conscious pain and anxiety.
• Several methods of assessment and evaluation of learners are in use these
days, such as unit tests, half-yearly examinations, annual examinations,
semester system, board examinations and entrance tests.
Short-Answer Questions
1. Differentiate between individual intelligence test and group intelligence test.
2. Discuss how attitude can be measured.
3. What are some of the methods that are in use to evaluate learners?
4. What are open-book examinations? How can one prepare for them?
5. What is a semester system? Discuss its characteristics.
108 Self-Instructional Material
Long-Answer Questions Psychological Testing
MEASUREMENT AND
NOTES
EVALUATION-I
Structure
4.0 Introduction
4.1 Unit Objectives
4.2 Statistical Treatment of Data
4.2.1 Interpretation of Data
4.3 Frequency Distribution and Graphic Representation of Data
4.3.1 Presentation of Data in Sequence: Grouping, Tabulation and Graphical
Representation
4.4 Measures of Central Tendency and Variability
4.5 Co-efficient of Correlation
4.6 Percentile and Percentile Rank
4.6.1 Skewness and Kurtosis
4.7 Normal Probability Curve
4.8 Derived Scores (Z-score, Standard Score and T-score)
4.9 Summary
4.10 Key Terms
4.11 Answers to ‘Check Your Progress’
4.12 Questions and Exercises
4.13 Further Reading
4.0 INTRODUCTION
In the previous unit, you learnt about psychological testing. In this unit, the discussion
will turn towards statistics in measurement and evaluation. Statistics is science of
collecting and analysing numerical data in large quantities, particularly for the purpose
of inferring proportions in a whole from those in a representative sample.
One can analyse quantitative data through techniques such as measures of
central tendency. Measures of central tendency are of various types, such as
arithmetic mean, mode and median. This is also commonly known as simply the
mean. Even though average, in general, means any measure of central location,
when we use the word average in our daily routine, we always mean the arithmetic
average.
Data According
According
To Classification To
Attributes Class
Intervals
DESCRIPTIVE
CLASSIFICATION
EXCLUSIVE
SIMPLE CLASS-INTERVALS
CLASSIFICATION
MANIFOLD INCLUSIVE
CLASSIFICATION CLASS-INTERVALS
10 // 2
11 ///// 5
12 ///// //// 9
13 ///// // 7
14 ///// 5
15 // 2
Total N = 30
When the amount of data is largle, it is useful to group data into classes or
class intervals or categories as shown in Table 4.4.
Table 4.4 Grouped Frequency Distribution
10-11 7
12-13 16
14-15 7
Total N = 30
10-11 9.5-11.5
12-13 11.5-13.5
14-15 13.5-15.5
10-11 7 7
12-13 16 23
14-15 7 30
Total N = 30
NOTES
Example 4.3: The marks of 52 students are recorded here. Draw a frequency
polygon for this data.
Example 4.5: Draw a ‘less than’ ogive curve for the following data:
Solution: Plot the points with coordinates having abscissae as actual limits and
ordinates as the cumulative frequencies: (10, 2), (20, 10), (30, 22), (40, 40), (50, 68),
Self-Instructional Material 133
Statistics in Measurement (60, 90), (70, 96) and (80, 100) as the coordinates of the points. Joining the points
and Evaluation-I
plotted by a smooth curve forms the ogive.
NOTES
There are several commonly used measures of central tendency, such as arithmetic
mean, mode and median. These values are very useful not only in presenting the
overall picture of the entire data but also for the purpose of making comparisons
among two or more sets of data.
As an example, questions like ‘How hot is the month of June in Delhi?’ can
be answered, generally by a single figure of the average for that month. Similarly,
suppose we want to find out if boys and girls at age 10 years differ in height for the
purpose of making comparisons. Then, by taking the average height of boys of that
age and average height of girls of the same age, we can compare and record the
differences.
While arithmetic mean is the most commonly used measure of central location,
mode and median are more suitable measures under certain set of conditions and
for certain types of data. However, each measure of central tendency should meet
the following requisites :
1. It should be easy to calculate and understand.
2. It should be rigidly defined. It should have only one interpretation so
that the personal prejudice or bias of the investigator does not affect its
usefulness.
134 Self-Instructional Material
3. It should be representative of the data. If it is calculated from a sample, Statistics in Measurement
and Evaluation-I
then the sample should be random enough to be accurately representing
the population.
4. It should have sampling stability. It should not be affected by sampling
fluctuations. This means that if we pick 10 different groups of college NOTES
students at random and compute the average of each group, then we
should expect to get approximately the same value from each of these
groups.
5. It should not be affected much by extreme values. If few very small or
very large items are present in the data, they will unduly influence the
value of the average by shifting it to one side or other, so that the average
would not be really typical of the entire series. Hence, the average
chosen should be such that it is not unduly affected by such extreme
values.
Meaning of the Measures of Central Tendency
If the progress scores of the students of a class are taken and they are arranged in
a frequency distribution, we may sometime find that there are very few students
who either score very high or very low. The marks of most of the student will lie
somewhere between the highest and the lowest scores of the whole class. This
tendency of a group about distribution is named as central tendency and typical
score that lies in between the extremes and shared by most of the students is referred
to as a measure of central tendency. Tate in 1955 defined the measures of central
tendency as, ‘A sort of average or typical value of the items in the series and its
function is to summarize the series in terms of this average value.’
The most common measures of central tendency are:
1. Arithmetic Mean or Mean
2. Median
3. Mode
Let us consider the three measures of central tendency.
I. Arithmetic Mean
This is also commonly known as simply the mean. Even though average, in general,
means any measure of central location, when we use the word average in our daily
routine, we always mean the arithmetic average. The term is widely used by almost
every one in daily communication. We speak of an individual being an average
student or of average intelligence. We always talk about average family size or
average family income or Grade Point Average (GPA) for students, and so on.
Calculating arithmetic mean (M): The simplest but most useful measure of central
tendency is the arithmetic mean. It can be defined as the sum of all the values of the
items in a series divided by the number of items. It is represented by the symbol M.
The advantage of combined arithmetic mean is that, one can determine the over, all
mean of the combined data without having to going back to the original data.
An Example:
Find the combined mean for the data given below
n1 = 10, x1 = 2, n2 = 15, x2 = 3
20 45
=
25
= 2.6
For discussion purposes, let us assume a variable X which stands for
some scores, such as the ages of students. Let the ages of 5 students be 19,
20, 22, 22 and 17 years. Then variable X would represent these ages as
follows:
X: 19, 20, 22, 22, 17
Placing the Greek symbol σ(Sigma) before X would indicate a command that
all values of X are to be added together. Thus:
σ X = 19 + 20 + 22 + 22 + 17
The mean is computed by adding all the data values and dividing it by the
number of such values. The symbol used for sample average is X so that:
19 20 22 22 17
X
5
In general, if there are n values in the sample, then
X1 X2 ......... X n
X
n
In other words,
n
Xi
i 1
X , i 1, 2 ... n.
n
The above formula states, add up all the values of Xi where the value of
i starts at 1 and ends at n with unit increments so that i = 1, 2, 3, ... n.
If instead of taking a sample, we take the entire population in our calculations
of the mean, then the symbol for the mean of the population is µ (mu) and the
size of the population is N, so that:
N
Xi
i 1
, i 1, 2 ...N .
N
Car 3 50 150
Locomotive 5 25 125
Aeroplane 7 15 105
Double Decker 9 10 90
Example 4.7: The arithmetic mean of daily wages of two manufacturing concerns
A Ltd. and B Ltd. is ` 5 and ` 7, respectively. Determine the average daily wages
of both concerns if the number of workers employed were 2,000 and 4,000,
respectively.
Solution: (a) Multiply each average (viz., 5 and 7) by the number of workers in the
concern it represents.
(b) Add up the two products obtained in (a) above
(c) Divide the total obtained in (b) by the total number of workers.
Weighted Mean of Mean Wages of A Ltd. and B Ltd.
Some measures other than measures of central tendency are often employed when
summarizing or describing a set of data where it is necessary to divide the data into
equal parts. These are positional measures and are called quantiles and consist of
quartiles, deciles and percentiles. The quartiles divide the data into four equal parts.
The deciles divide the total ordered data into ten equal parts and percentiles divide
n 1
D7 = 7 th observation in the ordered data.
10
Percentiles are generally used in the research area of education where people
are given standard tests and it is desirable to compare the relative position of the
subject’s performance on the test. Percentiles are similarly calculated as:
n 1
P7 = 7 th observation in the ordered data.
100
n 1
P69 = 69 th observation in the ordered data.
100
Quartiles
The formula for calculating the values of quartiles for grouped data is given as
follows.
Q = L + (j/f)C
Where,
Q = The quartile under consideration.
L = Lower limit of the class interval which contains the value of Q.
j = The number of units we lack from the class interval which contains
the value of Q, in reaching the value of Q.
f = Frequency of the class interval containing Q.
C = Size of the class interval.
Let us assume we took the data of the ages of 100 students and a frequency
distribution for this data has been constructed as shown.
n 1
1
n–1 b n 1
Gn = ar =a = a 2b n 1 n 1
a
Example 4.10: Find 7 GM’s between 1 and 256.
Solution: Let G1, G2, ... G7, be 7 GM’s between 1 and 256
Then, 256= 9th term of GP,
150 Self-Instructional Material
= 1. r8, where r is the common ratio of the GP
This gives that, r8 = 256 ⇒ r = 2 Statistics in Measurement
and Evaluation-I
Thus, G1 = ar = 1.2 = 2
G2 = ar2 = 1.4 = 4
G3 = ar3 = 1.8 = 8 NOTES
G4 = ar4 = 1.16 = 16
G5 = ar5 = 1.32 = 32
G6 = ar6 = 1.64 = 64
G7 = ar7 = 1.128 = 128
Hence, required GM’s are 2, 4, 8, 16, 32, 64, 128.
Example 4.11: Sum the series 1 + 3x + 5x2 + 7x3 + ... up to n terms, x ≠ 1.
Solution: Note that nth term of this series = (2n – 1) xn – 1
Let Sn = 1 + 3x + 5x2 + ... + (2n – 1) xn – 1
Then, xSn = x + 3x2 + ... + (2n – 3) xn – 1 + (2n – 1) xn
Subtracing, we get
Sn(1 – x) = 1 + 2x + 2x2 + ... + 2xn – 1 – (2n – 1) xn
1 xn 1
= 1 + 2x 1 x – (2n – 1) xn
1 x 2 x 2 x n (2n 1) x n (1 x)
=
1 x
1 x 2 x n (2n 1) x n (2n 1) x n 1
= 1 x
1 x (2n 1) x n (2n 1) x n 1
=
1 x
1 x (2n 1) x n (2n 1) x n 1
Hence, S=
(1 x)2
Example 4.12: If in a GP (p + q)th term = m and (p – q)th term = n, then find its
pth and qth terms.
Solution: Suppose that the given GP be a, ar, ar2, ar3, . . .
By hypothesis, (p + q)th term = m = ar p + q – 1
(p – q)th term = n = arp – q – 1
1/ 2q
m m
Then, = r2q ⇒ r =
n n
(p q 1) / 2 q
m
Hence, m =a ⇒ a = m(q – p + 1)/2q n(p + q – 1)/2q
n
1 1
1 n n
=
a r = r (r 1)
1 a (r 1) r n
1
r
(1 r n )
= ...(3)
a (1 r ) r n 1
By Equations (2) and (3),
(1 r n ) n
P2Rn = a2n rn(n – 1)
a n (1 r ) n r n ( n 1)
a n (1 r n )n
= n
= Sn, by (1)
(1 r ) Self-Instructional Material 155
Statistics in Measurement Example 4.21: The ratio of the 4th to the 12th term of a GP with positive common
and Evaluation-I
1
ratio is . If the sum of the two terms is 61.68, find the sum of series to 8 terms.
256
NOTES
Solution: Let the series be a, ar, ar2, . . .,
T4 = 4th term = ar3
T12 = 12th term = ar11
T4 1
By hypothesis, =
T12 256
ar 3 1
i.e., 11
=
ar 256
1 1
8
=
r 256
⇒ r8 = 256
⇒ r = ±2
Since r is given to be positive, we reject negative sign.
Again, it is given that
T4 + T12 = 61.68
i.e., a (r3 + r11) = 61.68
a (8 + 2048) = 61.68
61.68
a= = 0.03
2056
Hence, S8 = Sum to eight terms
a (1 r 8 ) a (r 8 1)
=
1 r r 1
(0.03) (256 1)
= (2 1) = 0.03 × 255 = 7.65
Example 4.22: A manufacturer reckons that the value of a machine which costs
him Rs 18750 will depreciate each year by 20%. Find the estimated value at the
end of 5 years.
Solution: At the end of first year the value of machine is
80 4
= 18750 × = (18750)
100 5
2
4
At the end of 2nd year it is equal to (18750); proceeding in this manner,,
5
5
4
the estimated value of machine at the end of 5 years is (18750)
5
64 16
= 18750
125 25
156 Self-Instructional Material
1024 Statistics in Measurement
= 750 = 1024 × 6 and Evaluation-I
125
= 6144 rupees
Example 4.23: Show that a given sum of money accumulated at 20 % per annum,
NOTES
more than doubles itself in 4 years at compound interest.
6a
Solution: Let the given sum be a rupees. After 1 year it becomes (it is increased
5
a
by ).
5
2
6 6a 6
At the end of two years it becomes a
5 5 5
Proceeding in this manner, we get that at the end of 4th year, the amount will
4
6 1296
be a = a
5 625
1296 46
Now, a 2a a, since a is a + ve quantity, so the amount after 4
625 625
years is more than double of the original amount.
a a
Example 4.24: If x=a+ + ... ∞
r r2
b b
y= b + ... ∞
r r2
c c
and z= c 2
+ ... ∞
r r4
xy ab
Show that =
z c
a ar
Solution: Clearly, x= ,
1 r 1
1
r
b br
y=
1 ( 1/r ) r 1
c cr 2
and, z=
1 r2 1
1 2
r
2
xy ab r cr 2 ab
Now, = 2 2 =
z (r 1) r 1 c
Harmonic mean
If a, b, c are in HP, then b is called a Harmonic Mean between a and c, written as
HM.
Harmonical progression
Non zero quantities whose reciprocals are in AP, or Arithmetic Progression are
said to be in Harmonical Progression, written as HP.
Consider the following examples:
1 1 1
(a) 1, , , , ... ...
3 5 7
1 1 1 1
(b) , , , , ... ...
2 5 8 11
5 10
(c) 2, , , ...
2 3
1 1 1
(d) a , a b , a 2b , ... ... a, b 0
55 55
(e) 5, , , 11, ... ...
9 7
It can be easily checked that in each case, the series obtained by taking
reciprocal of each of the term is an AP.
To insert n harmonic means between a and b
Let H1, H2, H3, ..., Hn be the required Harmonic Means. Then,
a, H1, H2, ..., Hn, b are in HP
1 1 1 1 1
i.e., , ,
a H1 H 2
, ..., ,
Hn b are in AP
1
Then, = (n + 2)th term of an AP
b
1
= + (n + 1)d
a
Where d is the common difference of AP.
ab
This gives, d=
(n 1)ab
nb b 2a 2b 2a b nb
=
(n 1) ab (n 1) ab
(n 1) ab
⇒ H2 =
2a b nb
1 1 3a 2b nb
Similarly, = 3d
H3 a ( n 1) ab
(n 1) ab
⇒ H3 = and so on,
3a 2b nb
1 1 1 n (a b)
= nd
Hn a a (n 1) ab
nb b na nb
=
(n 1) ab
na b (n 1) ab
= ⇒ Hn =
(n 1) ab na b
1 1 2 1 1 1
Then, = 4 4
x 2 5 2 2 10
1 1 2 1
⇒ = ⇒ x = 10
x 2 5 10
1 4
Example 4.27: Insert two harmonic means between and .
2 17
1 4
Solution: Let H1, H2 be two harmonic means between and .
2 17
1 1 17
Thus, 2, , , are in AP Let d be their common difference.
H1 H 2 4
17
Then, = 2 + 3d
4
Coefficient of variation
The square of standard deviation, namely σ2, is termed as variance and is more
often specified than the standard deviation. Clearly, it has the same properties as
standard deviation.
As is clear, the standard deviation σ or its square, the variance, cannot be
very useful in comparing two series where either the units are different or the mean
values are different. Thus, a σ of 5 on an examination where the mean score is 30
has an altogether different meaning than on an examination where the mean score
is 90. Clearly, the variability in the second examination is much less. To take care of
this problem, we define and use a coefficient of variation, V. Where,
σ
V= ×100
x
This is expressed as percentage.
Example 4.28: The following are the scores of two batsmen A and B in a series of
innings:
A 12 115 6 73 7 19 119 36 84 29
B 47 12 76 42 4 51 37 48 13 0
Who is the better run-getter? Who is more consistent?
Solution: In order to decide as to which of the two batsmen, A and B, is the better
run-getter, we should find their batting averages. The one whose average is higher
will be considered as a better batsman.
To determine the consistency in batting we should determine the coefficient
of variation. The less this coefficient the more consistent will be the player.
A B
Score x x2 Scores x x2
x x
12 –38 1,444 47 14 196
115 +65 4,225 12 –21 441
6 –44 1,936 76 43 1,849
73 +23 529 42 9 81
7 –43 1,849 –4 – 29 841
19 –31 961 51 18 324
160 Self-Instructional Material
119 +69 4,761 37 4 16 Statistics in Measurement
and Evaluation-I
36 –14 196 48 15 225
84 +34 1,156 13 –20 400
29 –21 441 0 –33 1,089
NOTES
∑ x = 500 17,498 ∑ x = 330 5,462
Batsman A: Batsman B:
500 330
x = = 50 x = = 33
10 10
17 , 498 5, 462
σ = = 41.83 σ = = 23.37
10 10
41.83 × 100 23. 37
V = V = × 100
50 33
= 83.66 per cent = 70.8 per cent
A is a better batsman since his average is 50 as compared to 33 of B, but B is
more consistent since the variation in his case is 70.8 as compared to 83.66 of A.
Example 4.29: The following table gives the age distribution of students admitted
to a college in the years 1914 and 1918. Find which of the two groups is more
variable in age.
Age Number of Students in
1914 1918
15 – 1
16 1 6
17 3 34
18 8 22
19 12 35
20 14 20
21 13 7
22 5 19
23 2 3
24 3 –
25 1 –
26 – –
27 1 –
Solution:
Assumed Mean–2l Assumed Mean–19
Age 1914 1918
1914 Group:
σ =
∑ fx ′ 2
−
LM
∑ ( fx ′ ) OP 2
N NN Q
=
299 FG IJ
−51
2
63
−
63H K
= 4. 476 − 0. 655 = 4. 091
= 2.02.
x = 21 +
FG −51 IJ = 21 – 8 = 20.2
H 63 K
2. 02
V = × 100
20. 2
202
= = 10
20. 2
1918 Group:
σ =
495 FG IJ
−9
2
= 3.3673 0.0037
147
−
147H K
= 3.3636 = 1.834
x = 19 +
FG −9 IJ
H 147 K
= 19 – .06 = 18.94
1.834
V = × 100
18. 94
= 9.68
The coefficient of variation of the 1914 group is 10 and that of the 1918 group 9.68.
This means that the 1914 group is more variable, but only barely so.
Correlation analysis is the statistical tool generally used to describe the degree to
which one variable is related to another. The relationship, if any, is usually assumed
to be a linear one. This analysis is used quite frequently in conjunction with regression
analysis to measure how well the regression line explains the variations of the
dependent variable. In fact, the word correlation refers to the relationship or
interdependence between two variables. There are various phenomena which are
related to each other. For instance, when demand of a certain commodity increases,
its price goes up and when its demand decreases, its price comes down. Similarly,
with age the height of children, with height the weight of children, with money the
supply and the general level of prices go up. Such sort of relationships can as well be
noticed for several other phenomena. The theory by means of which quantitative
connections between two sets of phenomena are determined is called the ‘Theory
of Correlation’.
On the basis of the theory of correlation, one can study the comparative
changes occurring in two related phenomena and their cause–effect relation can
be examined. It should, however, be borne in mind that relationships like ‘black cat
causes bad luck’, ‘filled up pitchers result in good fortune’ and similar other beliefs
of the people cannot be explained by the theory of correlation, since they are all
imaginary and are incapable of being justified mathematically. Thus, correlation is
concerned with relationship between two related and quantifiable variables. If
two quantities vary in sympathy, so that a movement (an increase or decrease) in
one tends to be accompanied by a movement in the same or opposite direction in
the other and the greater the change in one, the greater is the change in the other,
the quantities are said to be correlated. This type of relationship is known as
correlation or what is sometimes called, in statistics, as covariation.
For correlation, it is essential that the two phenomena should have cause–
effect relationship. If such relationship does not exist then one should not talk of
correlation. For example, if the height of the students as well as the height of the
trees increases, then one should not call it a case of correlation because the two
phenomena, viz., the height of students and the height of trees are not even casually
related. However, the relationship between the price of a commodity and its demand,
the price of a commodity and its supply, the rate of interest and savings, etc. are
examples of correlation, since in all such cases the change in one phenomenon is
explained by a change in another phenomenon.
( )
2
(a) The variation of the Y values around the fitted regression line, viz., ∑ Y − Yˆ ,
technically known as the unexplained variation.
(b) The variation of the Y values around their own mean, viz., ∑ (Y − Y ) ,
2
( ) ( )
2 2
∑ Y −Y
= − ∑ Y − Yˆ
2
164 Self-Instructional Material Yˆ Y
The Total and Explained as well as Unexplained variations can be shown as Statistics in Measurement
and Evaluation-I
given in Figure. 4.2.
Y-axis
d Y)
ine
100 Y) e x pla ., Y – int
–
, Y t U n n (i
.e po NOTES
Mean line of X
i.e. oin iatio ecific
i on ( ific p ar s p
iat ec v ta
var a sp a
Comsumption Expenditure (’00 Rs)
ot a l ’ at Y
80 T r‘ Y
o Explained Variation
( i.e.,Y Y ) at a
specific point
60
Y
Mean line of Y
X
40
X
on
fY
eo
lin
on
20 ssi
egre
R
X- axis
0 20 40 60 X 80 100 120
Income (’00 Rs)
Fig. 4.2 Diagram Showing Total, Explained and Unexplained Variations
2
Yˆ Y
= 1 – 2
Y Y
Interpreting r2
The coefficient of determination can have a value ranging from 0 – 1. The value of
1 can occur only if the unexplained variation is 0, which simply means that all the
data points in the Scatter diagram fall exactly on the regression line. For a 0 value to
occur, Σ(Y − Y )2 = Σ(Y − Yˆ )2 , which simply means that X tells us nothing about Y and
hence there is no regression relationship between X and Y variables. Values between
0 and 1 indicate the ‘Goodness of fit’ of the regression line to the sample data. The
higher the value of r2, the better the fit. In other words, the value of r2 will lie Self-Instructional Material 165
Statistics in Measurement somewhere between 0 and 1. If r2 has a 0 value then it indicates no correlation, but
and Evaluation-I
if it has a value equal to 1 then it indicates that there is perfect correlation and as
such the regression line is a perfect estimator. However, in most cases, the value of
r2 will lie somewhere between these two extremes of 1 and 0. One should remember
NOTES that r2 close to 1 indicates a strong correlation between X and Y, while an r2 near 0
means there is little correlation between these two variables. r2 value can as well be
interpreted by looking at the amount of the variation in Y, the dependant variable, that
is explained by the regression line. Supposing, we get a value of r2 = 0.925 then this
would mean that the variations in independent variable (say X) would explain 92.5
per cent of the variation in the dependent variable (say Y). If r2 is close to 1 then it
indicates that the regression equation explains most of the variations in the dependent
variable (see Example 4.30).
Example 4.30: Calculate the coefficient of determination (r2) using the provided
data. Calculate and analyse the result.
Observations 1 2 3 4 5 6 7 8 9 10
Income (X) (‘00 `) 41 65 50 57 96 94 110 30 79 65
Consumption
Expenditure (Y) (‘00 `) 44 60 39 51 80 68 84 34 55 48
Solution:
r2 can be worked out as follows:
2
Unexplained variation Y Yˆ
Since, r2 =1 =1
Total variation 2
Y Y
2 2
As, Y Y Y2 Y2 nY , we can write,
2
Y Yˆ
r2 =1
Y2 nY 2
Calculating and putting the various values, we have the following equation:
260.54 260.54
r2 = 1 2
1
2526.10
0.897
34223 10 56.3
Analysis of Result: The regression equation used to calculate the value of the
coefficient of determination (r2) from the sample data shows that, about 90 per cent
of the variations in consumption expenditure can be explained. In other words, it
means that the variations in income explain about 90 per cent of variations in
consumption expenditure.
Observation 1 2 3 4 5 6 7 8 9 10
Income (X) (’00 `) 41 65 50 57 96 94 110 30 79 65
Consumption
Expenditure (Y) (’00 `) 44 60 39 51 80 68 84 34 55 48
Then, by applying the following formulae, we can find the value of the coefficient of
correlation as,
Explained variation
r = r2
Total variation
Unexplained variation
= 1
Total variation
2
Y Yˆ
1
= 2
Y Y
σ X σY
= r .r
σY σ X
= r2 = r
As stated earlier, the sign of ‘r’ will depend upon the sign of the regression
coefficients. If they have minus sign, then ‘r’ will take minus sign but the sign of ‘r’
will be plus if regression coefficients have plus sign.
(b) Coefficient of Alienation: Based on k2, we can work out one more measure,
namely the coefficient of alienation, symbolically written as ‘k’. Thus,
coefficient of alienation, i.e., ‘k’ = k 2 .
Unlike r + k2 = 1, the sum of ‘r’ and ‘k’ will not be equal to 1 unless one of
the two coefficients is 1 and in this case the remaining coefficients must be zero.
In all other cases, ‘r’ + ‘k’ > 1. Coefficient of alienation is not a popular measure
from a practical point of view and is used very rarely.
∑ XY
=
∑ X 2 ∑Y2
⋅
n n
∑ XY
n =
∑ X 2 ∑Y2
The above formulae are based on obtaining true means (viz., X and Y ) first and
then doing all other calculations. This happens to be a tedious task, particularly
if the true means are in fractions. To avoid difficult calculations, we make use of
the assumed means in taking out deviations and doing the related calculations. In
such a situation, we can use the following formula for finding the value of ‘r’:2
(a) In Case of Ungrouped Data:
∑ dX .dY ∑ dX ∑ dY
− ⋅
n n n
r = 2 2
∑ dX 2 ∑ dX ∑ dY
2
∑ dY
− . −⋅
n n n n
dX dY
dX .dY
n
= 2 2
2 dX 2 dY
dX dY
n n
170 Self-Instructional Material
Where, ∑dX = ∑(X – XA) XA = Assumed average of X Statistics in Measurement
and Evaluation-I
∑dY = ∑(Y – YA) YA = Assumed average of Y
∑dX2 = ∑(X – XA)2
NOTES
∑dY2 = ∑(Y – YA)2
∑dX . dY = ∑(X – XA) (Y – YA)
n = Number of pairs of observations of X and Y
(b) In Case of Grouped Data:
∑ fdX . ∑ fdY
∑ fdX .dY −
n
or r = 2 2
∑ fdX ∑ fdY
∑ fdX 2 − ∑ fdY 2 −
n n
X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15
Solution:
Let us develop the following table for calculating the value of ‘r’:
X Y X2 Y2 XY
1 9 1 81 9
2 8 4 64 16
3 10 9 100 30
4 12 16 144 48
5 11 25 121 55
6 13 36 169 78
7 14 49 196 98
8 16 64 256 128
9 15 81 225 135
n=9
∑X = 45 ∑Y = 108 ∑X2 = 285 ∑Y2 = 1356 ∑XY = 597
_ _
∴ X = 5; Y = 12
(a) Coefficient of correlation by the method of least squares is worked out as
follows:
First find out the estimating equation,
Ŷ = a + bXi
XY nXY
Where, b = 2
X2 nX
2 2
Y Yˆ Ŷ Y
1
= 2 = 2
Y Y Y Y
2
a Y b XY nY
= 2
Y2 nY
This is as per short-cut formula,
2
7.25 (108 ) + 0.95 ( 597 ) − 9 (12 )
r = 2
1356 − 9 (12 )
54.15
= = 0.9025 = 0.95
60
(b) Coefficient of correlation by the method based on regression coefficients is
worked out as,
Regression coefficients of Y on X,
XY nXY
i.e., b YX = 2 2
X nX
597 9 5 12 597 540 57
= 285 9 5 285 225 60
Regression coefficient of X on Y,
XY nXY
i.e., bXY = 2 2
Y nY
597 9 5 12 597 540 57
= 2
1356 9 12 1356 1296 60
Hence, r = bYX . bXY
57 57 57
= × = = 0.95
60 60 60
For a Sample
Pearson’s correlation coefficient when applied to a sample is commonly represented
by the letter r and may be referred to as the sample correlation coefficient or
the sample Pearson correlation coefficient. We can obtain a formula for r by
substituting estimates of the covariances and variances based on a sample into
the above formula. That formula for r is:
∑in=1 ( X i − X )(Yi − Y )
r=
(X −X) (Y − Y )
n 2 n 2
=i 1 =
i i 1 ∑ ∑ i
1 n X i − X Yi − Y NOTES
r= ∑
n − 1 i =1 s X sY
Where,
Xi − X
, X , and s are the standard score, sample mean, and sample standard
sX X
deviation, respectively.
How to Calculate Product Moment Correlation Coefficient?
The product moment correlation coefficient allows you to work out the linear
dependence of two variables (referred to as X and Y). Let us consider an
example, suppose you are the owner of a restaurant. You record the time of
every 10th customer stayed in your restaurant (X in minutes) and the amount
spend (Y, in rupees). If it is considered that the longer time the customer stayed
the bigger is the amount spend, then this would be a positive correlation. Or it can
also be considered in the other way, i.e., the richer the client the lesser time he
takes for lunch in restaurant, then this would be a negative correlation. Pearson
Product-Moment Correlation Coefficient or PMCC can be calculated to find the
correlation in a situation.
Step 1: Remove Incomplete Pairs: After removing incomplete pairs, use only
those observations where both X and Y are known. However, do not exclude
observations just because one of the values equals zero.
Step 2: Summarize the Data into the Values needed for the Calculation:
These are:
• n: The number of data.
• ΣX: The sum of all the X values.
• ΣX2: The sum of the squares of the X values.
• ΣY: The sum of all the Y values.
• ΣY2: The sum of the squares of the Y values.
• ΣXY: The sum of each X value multiplied by its corresponding Y value.
Step 3: Calculate SXY, SXX and SYY using these values:
• SXY =ΣXY − (ΣXY ÷ n)
• SXX =ΣX2 – (ΣX ΣX ÷ n)
• SXY =ΣY2 – (ΣY ΣY ÷ n)
Step 4: Insert these Values into the Equation below to Calculate the
Product Moment Correlation Coefficient (r): The value should be between
1 and –1.
Self-Instructional Material 175
Statistics in Measurement
and Evaluation-I S xy
r=
S xy S yy
6ΣDi2
ρ = 1−
n ( n 2 − 1)
Here, n is the number of observations and Di , the positive difference between
ranks associated with the individuals i.
Like r, the rank correlation lies between –1 and +1. Consider Examples 3 and
4 for better understanding
Example 4.32: The ranks given by two judges to 10 individuals are as follows:
Rank given by
Individual Judge I Judge II D D2
x y = x–y
1 1 7 6 36
2 2 5 3 9
3 7 8 1 1
4 9 10 1 1
5 8 9 1 1
6 6 4 2 4
7 4 1 3 9
8 3 6 3 9
9 10 3 7 49
10 5 2 3 9
ΣD2 = 128
8 9 64 81 72
6 4 36 16 24
4 1 16 1 4
3 6 9 36 18
10 3 100 9 30
5 2 25 4 10
Σx = 55 Σy = 55 Σx2 = 385 Σy2 = 385 Σxy = 321
55 55
321 − 10 × ×
10 10 18.5 18.5
r = = = = 0.224
2 2 82.5 × 82.5 82.5
55 55
385 − 10 × 385 − 10 ×
10 10
This shows that the Spearman ρ for any two sets of ranks is the same as
the Pearson r for the set of ranks. However, it is much easier to compute ρ.
Often, the ranks are not given. Instead, the numerical values of observations are
given. In such a case, we must attach the ranks to these values to calculate ρ.
Example 4.34: Show by means of diagrams various cases of scatter expressing
correlation between x, y.
Solution:
(a) (b) Y
Y
O X
Positive slope
Direct linear relationship
High scatter r low, positive
O X
Negative slope
Inverse linear relationship
High scatter r low, negative Self-Instructional Material 177
Statistics in Measurement
and Evaluation-I (c) Y (d) Y
NOTES
O X O X
Slope = 0 No relationship
r=0 r~–0
(e) Y (f) Y
O X
Direct
O X curvilinear relationship
Inverse
curvilinear relationship
(g) Y
O X
Perfect relationship
But, r = 0 because of
non-linear relation
Percentile is the point below which the given percent of cases lies. We know that
median is the point in the distribution below which 50% of the case lies. We also NOTES
know how to calculate median and quartile deviation. It was discussed earlier that
Q1 and Q3 mark points in the distribution below which lie 25% and 75% of the cases
respectively. By using the same method we can compute points below which any
percent of cases, say 10%, 45%, 70%, 80%, 85% or any percent of the scores or
cases lie. These points are called as percentiles and are represented by the symbol
‘Pp’, the P referring to the percentage of cases below the given value or score.
Thus P15 refers to the point below which 15% of the scores lie, P85 refers the point
or score below which 85% of the scores lies. Suppose a student is declared as
achieving 84th percentile (P84) in a test, it means that his position in the test is very
good as 84% of the examinees have achieved marks which are less than this student.
In other words 84% of the students of his class are inferior to him, or he is better
than 84% of the students of his class.
The method of finding out the percentile is essentially the same as we have
seen and used in computing median and quartile deviation. Following formula may
be used to compute percentile:
Pp = l + [(pN – F)/fp] x i
Where,
P = percentage of the distribution to be calculate, such as 15%, 30%, 50% (median),
70% etc.
l = Exact lower limit of the class interval upon which Pp lies.
pN = Part of N to be counted off in order to reach Pp.
f = Sum of all scores upon intervals below l.
fp = Number of scores within the interval upon which Pp falls.
i = Size or length of the class interval.
Percentile is the very useful concept to represent the exact position of an individual
in the class. Merely percentage of marks such as 80% in mathematics does not give
much information about the learner progress or achievement. It simply gives
information that this student has got 80 % marks. May be this 80% marks is the
highest in the class or may be that this marks is the lowest in the class or may be that
this is the average marks obtained by students in the class. In the other way if we
present marks of student in percentile then it represent exact performance of students
with reference to the class. For example if we say that a student A has got P80 (80th
Percentile) in the class then it gives information that 80% of his or her classmates
have scores below him or her i.e. the learner is very-very good in comparison to
others in the class. He or she is one of the 20% good performers of the class.
x − M O 3( x − M d )
PSk
= =
s s
It may have any value, but usually it lies between –1 and +1.
Example 4.35: If for a given data it is found that
Q3 + Q1 − 2 M d 8 + 2 − 5
BSk
= = = 0.83
Q3 − Q1 8−2
(c) Kelley’s coefficient of skewness
1
KSk =P50 − ( P10 + P90 ) Self-Instructional Material 181
2
Statistics in Measurement
and Evaluation-I
where P10 , P50 and P90 are the 10th, 50th and 90th percentiles of the data.
(d) Method of moments
NOTES If µ 2 , µ 3 are moments about the mean we have the coefficient of skewness
µ 23
=
β1 = 3
µ 23 / σ 6
µ2
Sometimes, we define the coefficient of skewness as follows:
µ 23 µ 3
=
γ1 =
β1 =
µ 32 σ 3
Kurtosis
Kurtosis is a measure of peakedness of a distribution. It shows the degree of convexity
of a frequency curve.
If the normal curve is taken as the standard, symmetrical and bell-shaped
curve then kurtosis gives a measure of departure from the normal convexity of a
distribution. The normal curve is mesokurtic. It is of intermediate peakedness. The
flat-topped curve, broader than the normal, is termed platykurtic. The slender, highly
peaked curve is termed leptokurtic.
Measures of kurtosis
4 μ
(a) Moment Coefficient of Kurtosis : β 2 =μ 2
2
Instead of β 2 we often use γ 2 = β 2 − 3 which is positive for a leptokurtic
distribution, negative for a platykurtic distribution and zero for the normal
distribution.
Q 1
(b) Percentile Coefficient of Kurtosis k = =
, where Q ( Q3 − Q1 )
P90 − P10 2
is the semi-interquartile range.
In probability theory, the normal probability curve or normal (or Gaussian) distribution
is considered as the most frequently occurring continuous probability distribution.
Normal distributions are exceptionally significant in statistics and are typically used
X or X
–3σ –2σ –σ µ +σ +2σ +3σ
Fig. 4.4 Area of the Total Curve between µ ± 1 (σ)
Normal curves with identical standard deviation but each with different means:
Normal curves each with different standard deviations and different means:
σ = 6.45
μ = 18
z=0 X = 22
σ = 6.45 NOTES
μ = 18 X = 24
z=0
The value from the concerning table, when Z = 0.93, is 0.3238 which refers to the
area of the curve between µ = 18 and X = 24. The area of the entire left hand portion
of the curve is 0.5 as usual.
Hence, the area of the shaded portion is (0.5) + (0.3238) = 0.8238 which is the
required probability that the account will have been closed before two years, i.e.,
before 24 months.
Example 4.38: Regarding a certain normal distribution concerning the income of
the individuals we are given that mean=500 rupees and standard deviation =100
rupees. Find the probability that an individual selected at random will belong to
income group,
(a) ` 550 to ` 650 (b) ` 420 to `570
Solution: (a) To find the required probability we are interested in the area of the
portion of the normal curve as shaded and shown below:
= 100
= 500
µ = 500 X = 650
z = 0 X = 550
To find the area of the curve between X = 550 to 650, let us do the following
calculations:
550 − 500 50
Z
= = = 0.50
100 100
Corresponding to which the area between µ = 500 and X = 550 in the curve as per
table is equal to 0.1915 and,
= 100
= 100
z=0
z=0
X = 420 X = 570
570 − 500
=Z = 0.70
100
Corresponding to which the area between µ = 500 and X = 570 in the curve as per
table is equal to 0.2580.
420 − 500
and Z= = −0.80
100
Corresponding to which the area between µ = 500 and X = 420 in the curve as per
table is equal to 0.2881.
Hence, the required area in the curve between X = 420 and X = 570 is,
(0.2580) + (0.2881) = 0.5461
This is the required probability that an individual selected at random will belong to
income group of ` 420 to ` 570.
1''
Example 4.39: A certain company manufactures 1 all-purpose rope using
2
imported hemp. The manager of the company knows that the average load-bearing
capacity of the rope is 200 lbs. Assuming that normal distribution applies, find the
1''
standard deviation of load-bearing capacity for the 1 rope if it is given that the
2
188 Self-Instructional Material rope has a 0.1210 probability of breaking with 68 lbs. or less pull.
Solution: Given information can be depicted in a normal curve as shown below: Statistics in Measurement
and Evaluation-I
Probability of this
area (0.5) – (0.1210) = 0.3790
μ = 200
X = 68 z=0
If the probability of the area falling within µ = 200 and X = 68 is 0.3790 as stated
above, the corresponding value of Z as per the standard statistical tables showing
the area of the normal curve is – 1.17 (minus sign indicates that we are in the left
portion of the curve).
Now to find σ, we can write,
X −μ
Z=
σ
68 − 200
or −1.17 =
σ
or −1.17σ =
−132
or σ = 112.8 lbs. approx.
Thus, the required standard deviation is 112.8 lbs. approximately.
Example 4.40: In a normal _distribution, 31 per cent items are below 45 and 8 per
cent are above 64. Find the X and σ of this distribution.
Solution: We can depict the given information in a normal curve as shown below:
X X
X
X
If the probability of the area falling within µ and X = 45 is 0.19 as stated above, the
corresponding value of Z from the table showing the area of the normal curve is –
0.50. Since, we are in the left portion of the curve, we can express this as under,
45 − μ
−0.50 = (1)
σ
Similarly, if the probability of the area falling within µ and X = 64 is 0.42, as stated
above, the corresponding value of Z from the area table is, +1.41. Since, we are in
the right portion of the curve we can express this as under, Self-Instructional Material 189
Statistics in Measurement 64 − μ
and Evaluation-I 1.41 = (2)
σ
If we solve Equations (1) and (2) above to obtain the value of µ or X , we have,
− 0.5 σ = 45 – µ (3)
NOTES 1.41 σ = 64 – µ (4)
By subtracting Equation (4) from Equation (3) we have,
− 1.91 σ = –19
∴ σ = 10
Putting σ = 10 in Equation (3) we have,
− 5 = 45 – µ
∴ _ µ = 50
Hence, X (or µ)=50 and σ =10 for the concerning normal distribution or probability
curve.
Applications of normal distribution or probability curve
The following are the applications of normal distribution or probability curve:
1. Random Processes: Many naturally occurring random processes tend to
have a distribution that is approximately normal. Examples can be found in
any field, these include: SAT test scores of college bound students and body
temperature of a healthy adult.
2. Approximation of Binomial Distribution: When np>5 and n(1-p)>5, the
normal distribution provides an good approximation of the binomial distribution.
Distributions that are based on multiple observations, for example the Binomial
distribution, approach the normal distribution when n gets large. The value n
>30 is usually considered large.
3. Standardization: It is used where it is usually hypothesize that the theoretical
distribution of a certain variable is normal, whereas the measurement of such
variable may not give a normal distribution.
For example, in the introductory classes of Statistics there are 200 students
and it has been assumed that the performance of all the students in the
examination should be normally distributed. In addition, for giving reasonable
distribution of marks, the mean should be 55 and the standard deviation should
be 10. After the examinations being over, the lecturer marked all the papers,
and the mean and standard deviation of the raw scores given by the lecturer
are 50 and 6, respectively. For converting the raw score to standardize score,
the follows steps were taken:
(a) The standard score is obtained by Z= (X-50)/6.
(b) Then the converted (standardized) = 10(Z) + 55.
Hence, a raw score of 56 will be converted into 65.
4. Composite Scores: When more than one measure is used to measure a
variable, the distribution of each measure usually differs from each other. In
order to obtain an unbiased measure using several different measurements,
each sub-measure is standardized before added together.
Fig. 4.6 Normal Curve Showing Areas at Different Distances from the Mean
• It has 50 per cent frequency above and 50 per cent below the mean. The
mean is zero and it is always reference point.
• Standard deviation of a normal curve is always 1.
• The points of inflection of the curve occur at points –1 unit above and below
mean.
• The distribution of frequency per cent has the definite limits.
• There is a definite relation between quartile deviation and standard deviation
in a normal distribution curve.
• It is a mathematical curve and is an open-ended curve.
Some limits are as follows:
• The middle 68 per cent frequency is between –1 and +1.
• The middle 95 per cent frequency is between –1.96 and + 1.96.
• The middle 99 per cent frequency is between –2.58 and + 2.58.
The total area under the normal curve is arbitrarily taken as 10,000. Every score
should be converted into standard score (Z score) by using the following formula:
X −M
Z=
σ
The area in proportion should be converted into a percentage at the time of
reading the table. From the table, we can see the areas from mean to σ and also we
can read the value of σ scores from the mean for the corresponding fractional area.
Uses of Normal Probability Curve: Determining Mean and Median
The uses of normal probability curve are discussed in this section.
(18-25)
25.15%
Solution: Both the raw scores (18 and 25) are to be converted into Z scores.
X −M 18 − 15
Z score of 18 = =
σ 5
3
=
5
= 0.6σ
X −M 25 − 15
Z score of 25 = =
σ 5
10
=
5
According to the table of area of a normal probability curve, the total
percentage of cases lie between the mean and 0.6σ is 22.57. The percentage of
cases lie between the mean and 2σ is 47.72. So, the total percentage of cases that
fall between the scores 18 and 25 is 47.72 – 22.57 = 25.15.
NPC is used to determine the limit which includes a given percentage
of cases
Example 4.42: Given a distribution of scores with a mean of 12 and an σ of 6, what
limits will include the middle 70 per cent of the cases? Refer to Figure given below
to calculate the answer.
70%
35% 35%
Mean = 12, σ = 6
M=50
S D= 10
97.72-PR
70 − 50
Solution: The Z score for the score 70 is = 2σ
10
As per the table of area under the NPC, the area of the curve that lies
between mean and 2σ is 47.72 per cent. The total percentage of cases below 70 is:
50 + 47.72 = 97.72 per cent or 98 per cent.
Thus, the percentile rank of the student is 98.
NPC is used to Find out the Percentile Value of a Student whose Percentile
Rank is Known
Example 4.44: The percentile rank of a student in a class test is 80. The mean of
the class in the test is 50 and the σ is 20. Calculate the student’s score in the class
194 Self-Instructional Material test. Figure given below illustrates the case.
Statistics in Measurement
and Evaluation-I
80%
M=50 NOTES
SD=20
50% 30%
Score = 66.8
-3σ -2σ -1σ 84σ 1σ 2σ 3σ
Solution: The student has scored 30 per cent scores above the mean. According to
the table of area under NPC, 30 per cent cases from the mean is 0.84σ.
1σ = 20.
0.84σ = 20 × .84 = 16.8
Thus, the percentile value of the student is 50 + 16.8 = 66.8.
NPC is used to Divide a Group into Sub-Groups According to
their Capacity
Example 4.45: Suppose there is a group of 100 students in a Commerce class. We
want to divide them into five small groups A, B, C, D and E according to their ability,
the range of ability being equal in each sub-group. Find out how many students
should be placed in each category.
D B
E A
Solution: The total area under NPC is –3σ to + 3σ, that is 6σ. This 6σ should be
divided into five parts, so 6σ ÷ 5 = 1.2σ.
According to the table of area under NPC:
3.5 per cent of the cases lie between 1.8 σ to 3σ (Group A, the high scorers).
23.8 per cent of the cases lie between .6σ to 1.8σ (23.8 per cent of the cases for B
and also 23.8 per cent of the cases for D), the middle 45 per cent of the cases lie –
0.6σ to + 0.6σ (Group C), and the lowest 3.5 per cent of the cases lie between –3σ
to – 1.8σ (Group E)
M = 30 Mean = 70
SD = 10 SD = 10
English Test Statistics Test
Solution: In case of the English test:
Raw score = 60
Mean = 30
SD =10
X − M 60 − 30 30
So Z score for the English test = = = = 3σ
σ 10 10
In case of statistics test raw score = 80
Mean = 70
SD = 10
X − M 80 − 70 10
So Z Score for the statistics test = = = = 1σ
σ 10 10
So, the student has done better in the English than the statistics on.
NPC is Used to Determine the Relative Difficulty Level of Test Items
Example 4.47: In a standardized test of psychology, question numbers A, B, C and
D were solved by the students, 45 per cent, 38 per cent, 30 per cent and 15 per cent
respectively. Assuming the normality, find out the relative difficulty level of the
196 Self-Instructional Material
questions. Also explain the difficulty levels of questions. Table 4.7 displays the Statistics in Measurement
and Evaluation-I
information in tabular form.
Solution:
Table 4.7 Determining the Difficulty Level of Test Items NOTES
As we know that in an NPC, 50 – 50 cases lie both the sides of mean. The
mean of NPC is that point which is shown as 0. In an NPC, the explanation of
difficulty level is done on the basis of σ — distance. Therefore, if a question is at the
positive side of the NPC and σ has more distance from the mean, the question of a
test will be much difficult. The relative difficulty value of the test items has been
shown:
The question
A to B is 0.18σ is more difficult (0.31σ –0.13σ = 0.18σ)
A to C is 0.39σ is more difficult (0.52σ –0.13σ = 0.39σ)
A to D is 0.91σ is more difficult (1.04σ –0.13σ = 0.91σ)
B to C is 0.21σ is more difficult (0.52σ –0.31σ = 0.21σ)
B to D is 0.73σ is more difficult (1.04σ –0.31σ = 0.73σ)
C to D is 0.52σ is more difficult (1.04σ –0.152σ = 0.52σ)
Statistical Significance
Statistical significance is the result that is not likely to occur randomly, but rather it is
likely to be attributable to a specific cause. Statistical significance can be strong or
weak and is important feature of research in many mathematics and science related
fields. Statistical significance does not always indicate practical significance. In
addition, it can be misinterpreted when researchers do not use language carefully in
reporting their results.
The calculation of statistical significance (significance testing) is subject to a
certain degree of error. The researcher must define in advance the probability of a
sampling error which exists in any test that does not include the entire population.
Sample size is considered as an important component of statistical significance
because larger samples are less prone to accidents. Only random, representative
samples should be used in significance testing.
The level at which one can accept whether an event is statistically significant
is known as the significance level or P value. Hence, statistical significance is the
Self-Instructional Material 197
Statistics in Measurement number called a P value and defines the probability of the result being observed
and Evaluation-I
given that the null hypothesis is true. If this P value is sufficiently small, the
experimenter can safely assume that the null hypothesis is false.
In experiments of statistics one must define a level of significance at which a
NOTES correlation will be estimated to have been verified, though the option is often actually
made after the event. It is significant to know that however small the value of P is
there is always a finite chance that the result is a pure accident. A typical level at
which the threshold of P is set would be 0.01, which means there is a one per cent
chance that the result was accidental. The significance of such a result would then
be indicated by the statement P < 0.01. Further, in some cases the researcher can
use the much lower levels of significance. A level frequently referenced is P < 0.05.
This means that there is a one in twenty chance that the whole object was accidental.
It is difficult to generalize, but on the whole P < 0.01 would normally be
considered significant and P < 0.001 highly significant. The origin of the P < 0.05
criterion goes back to the great pioneer of significance testing, R A Fisher, who did
not in fact proved this. Many leading scientists and mathematicians today believe
that the emphasis on significance testing is grossly overdone. P < 0.05 had become
an end in itself and the determinant of a successful outcome to an experiment.
Tests for Statistical Significance
Tests for statistical significance are specifically used to answer the question, such
as what is the probability that what we think is a relationship between two variables
is really just a chance occurrence? If number of samples is selected from the same
population then can we still find the same relationship between these two variables
in every sample? If we could do a census of the population, would we also find that
this relationship exists in the population from which the sample was drawn? Or is
our finding due only to random chance?
Tests for statistical significance tell us what the probability is and also the
relationship that occurs only due to random chance. This illustrates what the probability
is and what would be the error if it is assumed that the relationship exists. It is
always not 100% certain that a relationship exists between two variables. There are
too many sources of error to be controlled, for example, sampling error, researcher
bias, problems with reliability and validity, simple mistakes, etc. But using probability
theory and the normal probability curve, the probability of being wrong can be estimated
if it is assumed that the finding a relationship is true. If the probability of being wrong
is small, then it is assumed that the observation of the relationship is a statistically
significant finding.
Statistical significance means that there is a good chance that one may
accurately find that a relationship exists between two variables. But statistical
significance is not the same as practical significance. We can have a statistically
significant finding, but the implications of that finding may have no practical application.
The researcher must always examine both the statistical and the practical significance
of any research finding. For example, consider that there is a statistically significant
relationship between a citizen’s age and the satisfaction level with city recreation
services. It may be that older citizens are 5% less satisfied than younger citizens
with city recreation services. But is 5% large enough difference to be concerned.
198 Self-Instructional Material
At times, when differences are small but statistically significant which is due to a Statistics in Measurement
and Evaluation-I
very large sample size then in a sample of a smaller size, the differences would not
be enough to be statistically significant. The following are some significant tests for
testing statistical significance.
NOTES
Steps in testing for statistical significance
1. State the research hypothesis
2. State the null hypothesis
3. Select a probability of error level (α or Alpha Level)
4. Select and compute the test for statistical significance
5. Interpret the results
There is always a possibility that the researcher will make a mistake regarding the
relationship between the two variables. There are two possible mistakes or errors.
The first is called a Type I error. This occurs when the researcher assumes that a
relationship exists when in fact the evidence is that it does not. In a Type I error, the
researcher should accept the null hypothesis and reject the research hypothesis, but
the opposite occurs. The probability of committing a Type I error is called alpha (α).
The second is called a Type II error. This occurs when the researcher assumes that
a relationship does not exist when in fact the evidence is that it does. In a Type II
error, the researcher should reject the null hypothesis and accept the research
hypothesis, but the opposite occurs. The probability of committing a Type II error is
called beta ( β).
Researchers generally specify the probability of committing a Type I error,
i.e., the value of alpha. Most researchers select an alpha as 0.05. This means that
there is a probability of 5% of making a Type I error assuming that a relationship
between two variables exists when it is really does not. However, an alpha of 0.01
is also used when researchers do not want to have a probability of being wrong
more than 0.1% of the time or one time in a thousand.
The level of alpha can vary, but the smaller the value, the more stringent the
requirement for reaching statistical significance becomes. Alpha levels are often
written as the ‘P value’ or ‘P = 0.05’. Usual levels are P = 0.05 or the chance of one
in 20 of making an error; P = 0.01 or the chance of one in 100 of making an error; P
= 0.001 or the chance of one in 1,000 of making an error. When accounting the level
of alpha, it is usually accounted as being ‘less than’ some level, using the ‘less than’
sign or ‘<’. Thus, it is accounted as P < 0.05 or P < 0.01, etc.
For nominal and ordinal data, Chi-square test is used as a test for statistical
significance. To calculate Chi-square, we compare the original, observed frequencies
with the new, expected frequencies. t-test is considered as the important test for
statistical significance and is used with interval and ratio level data. t-tests can be
used in several different types of statistical tests.
Tests for statistical significance are used to estimate the probability that a
relationship observed in the data occurred only by chance and that the probability
variables are really unrelated in the population.
NOTES In the words of American psychologists Thorndike and Hagen, ‘Norms are defined
as the average performance on a particular test made by a standardization sample’.
Frank S. Freeman defines a norm in measurement as ‘the average or standard
score on a particular test made by a specified population’.
Norms are representations of average performance based upon the results
of testing a specified group of students. It is a device of transforming raw scores
into standard scores in a group. Educational measurement is not a case of physical
measurement, but a case of mental measure. Therefore, educational measurement
is relative. In case of this, we have to go for the evaluation of an individual in relation
to other in a class or a group. Let us take an example. Suppose Prakash is a student
of Class X. In the classroom examination, he scored 60 in English and 75 in
Mathematics. If a layman analyses the result of Prakash, he will say that Prakash is
better in Mathematics than in English. Prakash scored the highest marks in English
test but the lowest marks in Mathematics in the class. So the interpretation made by
the layman was wrong because the two scores are raw scores.
In order to get a valid and reliable result, we should go for a norm, through
which we can get the perfect place of an individual in a group. For this purpose of
interpretations, the raw scores are to be converted into derived scores. Norms tell
us where a student or a class stands in relation to a reference group. Norms are
representation of average performance based upon the results of testing a specified
group of students. It is a statistical procedure to minimize the interpretive error of a
test score. The norms of any educational test represent the average test performance
of the standardized group or sample selected from a specified population. In this, an
individual score is compared with the standardized sample as a reference group.
The importance of norms in the field of measurement and evaluation are explained
as follows:
• Norms are the basis of interpreting raw scores into derived scores.
• Norms place an individual in the exact place within a group.
• Norms are helpful for selection and classification of students.
• Norms are helpful in provision of guidance and counselling the students.
• Norms speak about the attainment of instructional objectives by the students.
• Norms help in minimizing the interpretive error of a measuring instrument.
Types of Norms
In the field of educational and mental measurement, we use four types of norms,
which are as follows:
(i) Age norm
The concept of age norm was developed by French psychologist Alfred Binet in
1908. It basically deals with mental age. This age norm is also known as ‘mental age
200 Self-Instructional Material
norms’ or ‘age equivalent norms’. The ‘age norm’ is defined as the average Statistics in Measurement
and Evaluation-I
performance of a representative sample of certain age group on the measure of
intelligence or ability. Let us consider a suitable example to have clarity about age
norms.
Suppose the average score of students of age 15 years 2 months on an NOTES
achievement test is 80. So the age norm for the score of 80 will be 15 years 2
months. Suppose Mohan is 12 years old and he scores 80 in the achievement test.
Here, though his chronological age is 12, Mohan’s mental age is 15 years 2 months.
So, when that age norm is fixed, standardized test is given to a representative
sample of students of a particular age level and the average score is calculated, and
this score is considered as the norm for the group. The students who achieve that
score are considered within that age norm.
Limitations of age norm
The limitations of age norm are as follows:
• It is very difficult to get a true representative sample of individuals of a selected
age group.
• In case of very high and very low scores, it is difficult to interpret it with age
norms.
• Mental age units are not fixed in case of different tests; it may vary.
• It has the limited scope to be used in some psychological and educational
tests.
• Age norms lack a standard and uniform unit throughout the period of growth
of physical and psychological traits.
• It is the difficult and a time consuming task to develop the age norms and
mental age.
• The mental age of a particular age group may differ from locality to locality
and test to test.
(ii) Grade norm
Grade norms are also like age norm. However, here, measurement is based upon
class or grade level, not on age level. Grade norms have been widely used with
standardized achievement tests, especially at the elementary school level. The grade
equivalent that corresponds to a particular raw score is identified as the grade level at
which the typical student obtains that raw score. A grade norm corresponding to a raw
score is the grade level of those pupils whose average raw score is the raw score in
question.
Suppose we conducted a test on the VIIIth grade students. After getting the
result, we get 60 to be the average of that test. Therefore, 60 will be the grade norm
for the students of VIIIth grade. Grade norms are based on the average performance
of students at various grade levels. Grade norms are most useful for reporting growth
in the basic skills during the elementary school period. They are least useful for
comparing a student’s performances on different tests.
Self-Instructional Material 201
Statistics in Measurement Limitations of grade norms
and Evaluation-I
The limitations of grade norms are as follows:
• The rate of growth from grade to grade is not uniform throughout.
NOTES • Grade norms lack a comparability of scores on different tests.
• When a student of VIIth grade gets a credit of IXth grade, it does not mean
that the student has the ability to be a student of IXth grade.
• Fractional grades do not have any meaning.
• Grade norms are affected by quality of schools, quality of teachers and quality
of students.
• The interpretation of grade norm is very confusing because it provides only
level of performance with respect to a subject rather than the educational
level of the students.
(iii) Percentile norm
Percentile norms are about the position of an individual in relation to the norming
group. ‘Percentile norm’ is a point on the scale of measurement determined by the
percentage of individuals in given populations that lie below this point. It describes a
student’s performance in terms of the percentage of other students in some clearly
defined group that earn a lower score. This might be a grade or age group, or any
other group that provides a meaning comparison. If the percentile norm of a score
60 is 65, we mean that 65 per cent of the students of the normative group lie below
a score of 60.
Percentiles should not be confused with the common ‘percentage score’.
The percentage scores are raw scores, whereas percentiles are transformed scores.
It provides a basis for interpreting an individual’s score on a test in terms of his own
standing in a particular standardization sample. It should be based upon a sample
which has been made homogeneous with respect to age group, sex group grade
level, socio-economic status, etc. This is applicable for all types of tests: intelligence,
attitude, aptitude and achievement.
Limitations of percentile norm
• The percentile units are not equal on all parts of the scale. The percentile
difference of 5 near the middle of the scale (e.g., 45 to 50) represents a much
smaller difference in test performance than the same percentile difference at
the ends (e.g., 85 to 90), because a large number of students receive scores
near the middle, whereas relatively few students have extremely high or low
scores.
• Percentile norms are generally confused with percentage scores which affects
the interpretation.
• Percentile norm indicates only the relative position of an examinee in the
standardization sample. It conveys nothing regarding the amount of the actual
difference between the scores.
• Percentile rank of one group cannot be compared with percentile rank of
202 Self-Instructional Material another group.
• Conversion of raw scores to percentile scores give more differences in the Statistics in Measurement
and Evaluation-I
middle than at the extremes.
We can use the following formula to compute percentile rank of raw scores:
Pn – f b NOTES
PP = L + ×i
fa
where PP = Percentile point of a raw score
L = Lower limit of the raw score falls in the particular class
interval
P n = Percentage of the frequency
f b = Frequency below the class interval
fa = Actual frequency of the class interval
i = Size of the class interval
(iv) Standard score norms
The most important method used to indicate an individual’s relative position in a
group by showing how far his achieved score is above or below average. This is the
approach of standard score and standard scores express performance of the
individuals in terms of standard deviation units from the mean. There are numerous
types of standard scores used in testing. They are as follows:
(a) Z-Scores: Z-score represents test performance directly as the number of
standard deviation units a raw score is above or below the mean. The Z-
scores are the units of normal probability curve, ranges from – 3σ to + 3σ,
with mean value zero and standard deviation one.
X−M
The formula for computing Z-score is =
SD
Where,
X = Raw score
M = Arithmetic mean of raw scores
SD = Standard deviation of raw score
A Z-score is always negative when the raw score is smaller than the mean.
(b) T-Scores: T-scores are also the standard scores but the mean value is 50 and
standard deviation is 10. T-scores can be obtained by multiplying the Z-score
by 10 and adding the product to 50. Thus,
T-score = 50 + (10 Z)
One reason that T-scores are preferred to Z-scores for reporting test results
is that only positive integers are produced in T-scores.
(c) Stanines: The stanine norm is developed by the technique of normalized
standard scores. It was developed by the US Air Force during the World War
II. Stanines are single digit scores ranging from 1 to 9. This system of scores
is so-called because the distribution of raw scores is divided into nine equal
parts. Stanine 5 is in the centre of the distribution and includes all cases within
Self-Instructional Material 203
Statistics in Measurement one-fourth of a standard deviation of either side of the mean. Here, the mean
and Evaluation-I
score is 5 and standard deviation is 1.96 or 2. When raw scores are transformed
into stanine scores, the distributions of scores take the shape of normal curve.
In the Stanine system, a 9-point scale is used, in which 9 is high, 1 is low and
NOTES 5 is average. Stanines are normalized standard scores that make it possible to compare
a student’s performance on different tests.
The stanines are distributed on normal curve. The area of normal probability
curve has been divided into nine standards with a fixed percentage. The first stanine
includes 4 per cent, second stanine includes 7 per cent, third stanine includes 12 per
cent, fourth stanine includes 17 per cent, fifth stanine includes 20 per cent, sixth
stanine includes 17 per cent, seventh stanine includes 12 per cent, eighth stanine
includes 7 per cent and the ninth stanine includes 4 per cent of the total cases.
Stanine Description Percentage Stanine’s
position
1, 9 Bottom and top 4 (lst)
(9th)
2, 8 Above bottom 7 (2nd)
and below top (8th)
3, 7 Near to second 12 (3rd)
or eighth (7th)
4, 6 Above or below 17 (4th)
mean (6th)
5 Middle or mean 20 (5th)
Short-Answer Questions
1. What does the editing of data involve?
2. What is inferential statistics?
3. List the characteristics of mean.
4. How is median for ungrouped data calculated?
5. List the assumptions of Karl Pearson’s method.
MEASUREMENT AND
NOTES
EVALUATION-II
Structure
5.0 Introduction
5.1 Unit Objectives
5.2 Reliability: Concept and Determining Factors
5.2.1 Methods of Determining Different Reliability Coefficient
5.3 Validity: Concept and Uses
5.3.1 Determining Validity Co-efficient
5.3.2 Relation between Validity and Reliability
5.4 Trends in Evaluation: Grading, Credit System, Cumulative Record Card
5.4.1 Issues and Problems
5.5 Computer in Evaluation
5.5.1 Multimedia in Education
5.6 Summary
5.7 Key Terms
5.8 Answers to ‘Check Your Progress’
5.9 Questions and Exercises
5.10 Further Reading
5.0 INTRODUCTION
In the previous unit, you were introduced to some statistical concepts in measurement
and evaluation. In it, the measures of central tendency and variability were discussed.
Measures of central tendency are of various types, such as arithmetic mean, mode
and median. Also discussed was the normal probability curve and the coefficient of
correlation. In this unit, the discussion on statistical concepts in measurement and
evaluation will continue. In it, you will learn about the methods of determining different
reliability coefficients, as well as validity coefficients. The unit will also discuss the
emerging trends in evaluation.
NOTES Reliability refers to the consistency of measurement, which is how stable test scores
or other assessment results are from one measurement to another. It means the
extent to which a measuring device yields consistent results upon testing and retesting.
If a measuring device measures consistently, it is reliable. The reliability of a test
refers to the degree to which the test results obtained are free from error of
measurement or chance errors.
Characteristics
The characteristics of reliability are as follows:
• It refers to the degree to which a measuring tool yields consistent results
upon testing and retesting.
• It indicates the level to which a test is internally consistent, i.e., how accurately
the test is measuring.
• It refers to the results obtained with measuring instrument and not to the
instrument itself.
• An estimate of reliability refers to a particular type of stability with the test
result.
• Reliability is necessary but not a sufficient condition for validity.
• Reliability is a statistical concept.
• It refers to the preciseness of a measuring instrument.
• It is the coefficient of internal consistency and stability.
• It is the function of the length of a test.
Factors affecting reliability
The reliability of a test is affected by a couple of factors which are explained in the
following manner:
(i) Length of the test: There is positive correlation between the number of
items in a test and the reliability of a test. The more the number of items the
test contains, the greater is its reliability. In several tests, the scores of sub-
tests and whole tests are calculated separately and their reliability is also
calculated separately. The reliability of the whole test is always more than
the sub-test, because whole test means more items, which is better
representation of the content.
(ii) Construction of the test: The nature of items, their difficulty level, objectivity
of scoring, item interdependence and alternative responses are factors which
affect the reliability. More alternative responses will increase the reliability of
the test.
NOTES
2rhh
rxx =
1 + rhh
n σx – Σpq
2
rtt =
n – 1 σx
2
Where,
r tt = Reliability index
n = Number of items in the test
p = Proportion of right responses
q = Proportion of wrong responses
Example 5.1: A test consisting of 50 items and standard deviation of test score is
7.5 and the sum of the product of proportion of right and wrong responses on the
item is 10.43. Calculate the reliability.
Solution:
n σx – Σpq
2
rtt =
n – 1 σx
2
50 7.5 –10.43
2
=
50 –1 7.52
The reliability coefficient of the test is 0.84.
The KR-21 method is also a simple one for calculating reliability of a test.
The test is administered once on a group to determine quickly the reliability. The
mean and variance are calculated and then the following formula KR-21 is used.
n σ x 2 – M (n –M)
rtt =
σx 2 (n – 1)
Where,
rtt = Reliability of the whole test
n = Number of items in the test
M = Mean of the test scores
σx = Standard deviation
Self-Instructional Material 217
Statistics in Measurement Example 5.2: An objective test of 100 multiple items have been administered to a
and Evaluation-II
small group of students. The mean of test score is 50 and standard deviation is 10.
Calculate the reliability coefficient of the test.
Solution:
NOTES
nσx 2 – M (n –M)
rtt =
σx 2 (n – 1)
10000 – 2500
=
9900
= 0.76
The reliability coefficient of the test is 0.76.
Limitations of Kuder–Richardson method
The limitations of Kuder–Richardson method are as follows:
• Kuder–Richardson formulae are not suitable for speed assessments—
assessments with time limits that prevent students from attempting all the
items.
• The formulae indicate the consistency of student response from one day to
the other.
• It cannot be used for power test and heterogeneous tests.
• The different Kuder–Richardson formula results differ in reliability coefficient.
• In case all the items of the tests are not highly homogeneous, this method will
produce lower reliability coefficient.
(v) Inter-rater method
This method assesses reliability through scoring/evaluating, done by two or more
independent judges for every test.The various scores given by the judges are then
compared to determine the consistency of estimations.The way the comparison is
carried out is: each rater assigns each test item a score, which would be on a scale
from 1 to 10.Then the correlations between any two ratings are calculated. There is
another method for testing inter-rater reliability. In this method, raters identify a
category for each observation and then compute the percentage of agreement among
the raters. For instance, if the raters are in agreement 7 times out of 10, the test will
be said to have a 70 per cent inter-rater reliability rate.
In case the raters seem to be in disagreement, it would imply that either the
raters need to be trained again or the scale is defective. Sometimes, it so happens
218 Self-Instructional Material
that various raters would have different opinions about measurement results emerging Statistics in Measurement
and Evaluation-II
from the same object, such as a scientific experiment or a test, wherein first the test
is carried out, then its results are interpreted, recorded and presented. At any of
these stages, the rater may become affected by rater’s bias (the tendency to rate in
the direction of what the rater expects). There may also be discrepancy during NOTES
interpretation and presentation of results, for instance, the round-off may be different
in terms of higher or lower digit next to the decimal.
Limitations of inter-ratermethod
The limitations of inter-rater method are as follows:
• The method can be tedious because inter-rater reliability statistics need to be
calculated separately for every item and every pair of raters.
• It is a lengthy and difficult task to train the raters such that they are able to
reachan exact agreement.
• Even when they are trained, the forced consensus might render the ratings
inaccurate and this would be a threat to the validity of the student’s scores.
• The resulting estimates might turn out to be too conservative if two raters
show differences in the method used on the rating scale.
The validity of a test is determined by measuring the extent to which it matches with
a given criterion. It refers to the very important purpose of a test, and it is the most
important characteristic of a good test. A test may have other merits, but if it lacks
validity, it is valueless.
Characteristics of Validity
The characteristics of validity are as follows:
• Validity is a unitary concept.
• It refers to the truthfulness of the test result.
• In the field of education and psychology, no test is perfectly valid because
mental measurement is not absolute but relative.
• If a test is valid, it is reliable; but if a test is reliable, it may or may not be valid.
• It is an evaluative judgment on a test. It measures the degree to which a test
measures what it intends to measure.
N ∑ XY × ∑ X ∑ Y
r=
√ [N ∑ X – ( ∑ X)2 ] [N ∑ Y 2 – ( ∑ Y)2 ]
2
Where,
r = Validity index
N = Size of sample
X = Raw scores in the test X
Y = Raw scores in the test Y
XY = Sum of the products of each X score multiplied with its
corresponding Y score
In order to make the calculation an easy one, we can use the above formula
in this way:
N ∑ dxdy – ∑ dx ∑ dy
r=
√ [N ∑ dx – ( ∑ dx) 2 ] [N ∑ dx 2 – ( ∑ dx) 2 ]
2
Where,
r = Validity index
N = Size of the sample
222 Self-Instructional Material
dx = X – M Statistics in Measurement
and Evaluation-II
dy = Y – M
X = Raw score of X group
Y = Raw score of Y group NOTES
M = Mean
Sometimes, we need to predict the future result of somebody with reference
to the present result. The following regression equation is used for this purpose:
σy
y= r (X − Mx) + My
σx
Where, Y = Predicted value
My = Mean of predicted score (Y)
Mx = Mean of test score (X)
σy = Standard deviation of predicted score (Y)
σx = Standard deviation of test score (X)
X = Test scores (basis of prediction)
Y = Predicted value
(ii) Cross validation: Cross validation indicates a process of validating a test by
using a population sample that is different from the sample on which it was
originally standardized. It is necessary because the validity data may be high
or low due to chance factors peculiar to the standardization sample. When
test is administered to various samples in a variety of situations, it is being
cross-validated. The different types of cross validation are: validity extension,
validity generalization and psychometric signs.
5.3.2 Relation between Validity and Reliability
Numerous factors tend to make assessment results invalid for their intended use.
Factors affecting validity
Let us have a discussion regarding the factors which affect the validity of a test.
(i) Lack of clarity in directions: Instructions that do not clearly indicate to the
student how to respond to the tasks and how to record the responses decreases
the validity of a test. If the directions are not clear, the students will
misunderstand the purpose of the test, and this in turn will hamper the validity
of the test.
(ii) Ambiguity: Ambiguous statements lead to confusion and misinterpretation.
Ambiguity sometimes confuses the good students more than it does the poor
students. So no question of the test should be ambiguous.
HOME CIRCUMSTANCES
1. Economic status of the family ...........................................................
................................................................................................................
2. Child’s position in the family (whether only child, eldest or youngest
child) ........................................................................................................
Self-Instructional Material 237
................................................................................................................
Statistics in Measurement 3. Type of family (whether Joint or unitary) ..................................................
and Evaluation-II
................................................................................................................
4. Education of the Parent/Guardian .............................................................
NOTES ................................................................................................................
5. Special circumstances ..............................................................................
................................................................................................................
ATTENDANCE
PHYSICAL DATA
Craft/Work
Experience
PERFORMANCE IN CO-CURRICULAR ACTIVITIES/OPEN HOUSE
(Average Grade of the Child)
Activities Grade
1. Sports and Athletics
2. Personal Hygiene
3. Literary (debating etc.)
4. Dramatics
5. Community service (sanitation drive,
literacy campaigns etc.)
6. Any other activity
NOTES
PERSONALITY CHARACTERISTICS
Nature Remarks/Grade
1. Honesty
2. Punctuality
3. Courtesy
4. Habits of work
5. Co-operativeness
6. Sociability
7. Self-confidence
8. Emotional stability
9. Leadership
10. Initiative
GENERAL REMARKS
Over the last so many years, computers have entered into each and every place in
our society. We see computer-dependence all around – in our homes, private
workplaces, government departments, schools, colleges, hospitals, railway/air ticket
booking outlets, banks, etc. So, it has become essential to equip a child with computer
skills to enable him/her to use computer with ease in education and numerous day-
to-day tasks. The computer enhances cognition, alertness, technical skills, imagination
power, creativity, etc. of the student. It enhances learning environment and creates
a positive competition among the students. The computer is so easy and interesting
to learn that anyone can learn it at any age.
Advantages of computer in education
Some advantages of the computer are as follows:
• It adds a lot of ease to the teaching-learning process thus enhancing the
achievements of the teacher and the student.
• It enables the students to think logically.
• Computer training helps students compete in the high-tech world confidently.
• As today no aspect of life is without interference of the computer, it has
opened a lot many job avenues for the students.
• The use of the Internet through the computer connects the students with the
students in their own country or other countries. This helps them understand
the culture of each other and also discuss their academic problems.
• The computer-enabled Internet also helps the students to stay in touch with
their teachers even after leaving the school, thus form the alumni community.
5.6 SUMMARY
• The reliability of a test refers to the degree to which the test results obtained
are free from error of measurement or chance errors.
• Reliability is necessary but not a sufficient condition for validity.
• When examining the reliability coefficient of standardized tests, it is important
to consider the methods used to obtain the reliability estimates.
• American Psychological Association (APA) introduced several methods of
estimating reliability. These are:
(i) Test-retest method
(ii) Equivalent forms method
(iii) Split-half method
(iv) Kuder-Richardson method
(v) Inter-rater method
• The validity of a test is determined by measuring the extent to which it matches
with a given criterion. It refers to the very important purpose of a test, and it
is the most important characteristic of a good test.
• A test may have other merits, but if it lacks validity, it is valueless.
• The six types of validity are discussed below.
(i) Face validity
(ii) Content validity
(iii) Concurrent validity
(iv) Construct validity
246 Self-Instructional Material
(v) Predictive validity Statistics in Measurement
and Evaluation-II
(vi) Criterion validity
• The methods used for assessing the validity of a test are:
o Correlation method NOTES
o Cross validation
• Reliability and validity are closely related, even though they cannot be
interchanged. An assessment that has very low reliability will also have low
validity; quite obviously, a measurement that has low levels of accuracy or
consistency is not likely to suitably fulfil its objective.
• At the same time, the factors necessary for achieving a considerably high
degree of reliability can affect validity negatively.
• Grades in the realm of education are standardized measurements of varying
levels of comprehension within a subject area.
• It has become essential to equip a child with computer skills to enable him/her
to use computer with ease in education and numerous day-to-day tasks.
• The computer enhances cognition, alertness, technical skills, imagination power,
creativity, etc. of the student. It enhances learning environment and creates a
positive competition among the students.
Short-Answer Questions
1. What are the characteristics of reliability?
2. What are the factors upon which the reliability of a test is dependent?
3. What is the test-retest method? What are its limitations?
4. What is the concept of grading? What are the merits of the grading system?
5. List the criteria for scholastic evaluation.
6. How do states implement the non-detention policy in primary classes in India?
7. What are the two main types of multimedia in use?
Long-Answer Questions
1. Explain the various types of validity.
2. Discuss the different methods of determining reliability coefficient.
3. Describe the different methods of determining validity.
4. Discuss the relation between reliability and validity.