Allama Iqbal Open University, Islamabad: Student Name: Safa Rafique
Allama Iqbal Open University, Islamabad: Student Name: Safa Rafique
Allama Iqbal Open University, Islamabad: Student Name: Safa Rafique
ANSWER:
Validity is the appropriateness of a particular uses of the test scores, test validation is then the process
of collecting evidence to justify the intended use of the scores. In order to collect the evidence of
validity there are many types of validity methods that provide usefulness of the assessment tools. Some
of them are listed below.
CONTENT VALIDITY
The evidence of the content validity is judgmental process and may be formal or informal. The formal
process has systematic procedure which arrives at a judgment. The important components are the
identification of behavioural objectives and construction of table of specification. Content validity
evidence involves the degree to which the content of the test matches a content domain associated
with the construct. For example, a test of the ability to add two numbers, should include a range of
combinations of digits. A test with only one-digit numbers, or only even numbers, would not have good
coverage of the content domain. Content related evidence typically involves Subject Matter Experts
(SME’s) evaluating test items against the test specifications. It is a non-statistical type of validity that
involves “the systematic examination of the test Content to determine whether it covers a
representative sample of the behaviour domain to be measured” (Anastasi & Urbina, 1997). For
example, does an IQ questionnaire have items covering all areas of intelligence discussed in the scientific
literature? A test has content validity built into it by careful selection of which items to include (Anastasi
& Urbina, 1997). Items are chosen so that they comply with the test specification which is drawn up
through a thorough examination of the subject domain. Foxcraft et al. (2004, p. 49) note that by using a
panel of experts to review the test specifications and the selection of items the content validity of a test
can be improved. The experts will be able to review the items and comment on whether the items cover
a representative sample of the behaviour domain. For Example – In developing a teaching competency
test, experts on the field of teacher training would identify the information and issues required to be an
effective teacher and then will choose (or rate) items that represent those areas of information and
skills which are expected from a teacher to exhibit in classroom. Lawshe (1975) proposed that each rater
should respond to the following question for each item in content validity: Is the skill or knowledge
measured by this item?
• Essential
• Not necessary with respect to educational achievement tests, a test is considered content valid when
the proportion of the material covered in the test approximates the proportion of material covered in
the course. There are different types of content validity; the major types face validity and the curricular
validity are as below.
1 FACE VALIDITY
Face validity is an estimate of whether a test appears to measure a certain criterion; it does not
guarantee that the test actually measures phenomena in that domain. Face validity is very closely
related to content validity. While content validity depends on a theoretical basis for assuming if a test is
assessing all domains of a certain criterion (e.g. does assessing addition skills yield in a good measure for
mathematical skills.
– To answer this you have to know, what different kinds of arithmetic skills mathematical skills include )
face validity relates to whether a test appears to be a good measure or not. This judgment is made on
the “face” of the test, thus it can also be judged by the amateur. Face validity is a starting point, but
should NEVER be assumed to be provably valid for any given purpose, as the “experts” may be wrong.
For example- suppose you were taking an instrument reportedly measuring your attractiveness, but the
questions were asking you to identify the correctly spelled word in each list. Not much of a link between
the claim of what it is supposed to do and what it actually does.
• If the respondent knows what information we are looking for, they can use that “context” to help
interpret the questions and provide more useful, accurate answers. possible Disadvantage of Face
Validity
• If the respondent knows what information we are looking for, they might try to “bend & shape” their
answers to what they think we want.
Q.2 What are the rules of writing Multiple choice test items?
ANSWER:
There are several rules we can follow to improve the quality of this type of written examination.
Make sure that every question examines only the important knowledge. Avoid detailed questions – each
question has to be relevant for the previously set instructional goals of the course.
Use simple language, taking care of spelling and grammar. Spelling and grammar mistakes (unless you
are testing spelling or grammar) only confuse students. Remember that you are examining knowledge
about your subject and not language skills.
Clear the text of the body of the question from all superfluous words and irrelevant content. It helps
students to understand exactly what is expected of them. It is desirable to formulate a question in such
way that the main part of the text is in the body of the question, without being repeated in the answers.
Be careful that the formulation of the question does not (indirectly) hide the key to the correct answer.
Student (adept at solving tests) will be able to recognize it easily and will find the right answer because
of the word combination, grammar etc, and not because of their real knowledge.
Be careful not to repeat content and terms related to the same theme, since the answer to one question
can become the key to solve another.
All offered answers should be unified, clear and realistic. For example, unlikely realisation of an answer
or uneven text quantity of different answers can point to the right answer. Such a question does not test
real knowledge. The position of the key should be random. If the answers are numbers, they should be
listed in an ascending order.
If you use negative questions, negation must be emphasized by using CAPITAL letters, e.g. “Which of the
following IS NOT correct…” or “All of the following statements are true, EXCEPT”.
8. Avoid Distracters in the Form of “All the answers are correct” or “None of the Answers is
Correct”
Teachers use these statements most frequently when they run out of ideas for distracters. students,
knowing what is behind such questions, are rarely misled by it. Therefore, if you do use such statements,
sometimes use them as the key answer. Furthermore, if a student recognizes that there are two correct
answers (out of 5 options), they will be able to conclude that the key answer is the statement “all the
answers are correct”, without knowing the accuracy of the other distracters.
Distracters which only slightly differ from the key answer are bad distracters. Good or strong distracters
are statements which themselves seem correct, but are not the correct answer to a particular question.
The greater the number of distracters, the lesser the possibility that a student could guess the right
answer (key). In higher education tests questions with 5 answers are used most often (1 key + 4
distracters). That means that a student is 20% likely to guess the right answer.
Q.3 Write a detailed note on scale of measurements.
ANSWER:
Levels of Measurements
There are four different scales of measurement. The data can be defined as being one of the four scales.
The four types of scales are:
Nominal Scale
Ordinal Scale
Interval Scale
Ratio Scale
Nominal Scale
A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags” or “labels” to
classify or identify the objects. A nominal scale usually deals with the non-numeric variables or the
numbers that do not have any value.
A nominal scale variable is classified into two or more categories. In this measurement mechanism, the
answer should fall into either of the classes.it is qualitative. The numbers are used here to identify the
objects. The numbers don’t define the object characteristics. The only permissible aspect of numbers in
the nominal scale is “counting.”
Example:
M- Male
F- Female
Here, the variables are used as tags, and the answer to this question should be either M or F.
Ordinal Scale
The ordinal scale is the 2nd level of measurement that reports the ordering and ranking of data without
establishing the degree of variation between them. Ordinal represents the “order.” Ordinal data is
known as qualitative data or categorical data. It can be grouped, named and also ranked.
The ordinal scale shows the relative ranking of the variables It identifies and describes the magnitude of
a variable along with the information provided by the nominal scale, ordinal scales give the rankings of
those variables the interval properties are not known the surveyors can quickly analyse the degree of
agreement concerning the identified order of variables
Very often
Often
Not often
Not at all
Totally agree
Agree
Neutral
Disagree
Totally disagree
Interval Scale
The interval scale is the 3rd level of measurement scale. It is defined as a quantitative measurement scale
in which the difference between the two variables is meaningful. In other words, the variables are
measured in an exact manner, not as in a relative way in which the presence of zero is arbitrary.
Example:
Likert Scale
Net Promoter Score (NPS)
Bipolar Matrix Table
Ratio Scale
The ratio scale is the 4th level of measurement scale, which is quantitative. It is a type of variable
measurement scale. It allows researchers to compare the differences or intervals. The ratio scale has a
unique feature. It possesses the character of the origin or zero points.
Ratio scale has a feature of absolute zero.It doesn’t have negative numbers, because of its zero-point
feature. It affords unique opportunities for statistical analysis. The variables can be orderly added,
subtracted, multiplied, divided. Mean, median, and mode can be calculated using the ratio scale.
Ratio scale has unique and useful properties. One such feature is that it allows unit conversions like
kilogram – calories, gram – calories, etc.
Example:
ANSWER:
Conducting Parent-Teacher Conferences the first conference is usually arranged in the beginning of the
school year to allow parents and teachers to get acquaintance and preparing plan for the coming
months. Teachers usually receive some training to plan and conduct such conferences. Following steps
may be observed for holding effective parent-teacher conferences.
• Announce the final date and time as per convenience of the parents and Children
• Develop a packet of conference including student’s goals, samples of work, and reports or notes from
other staff.
3. Conduct conference with student, parent, and advisor. Advisee takes the lead to the greatest
possible extent
Progress
• Be friendly
• Be honest
• Be positive in approach
• Keep a written record of the conference, listing problems and Suggestions, with a copy for the parents
• Don’t argue
ANSWER:
Teachers use criterion-referenced tests to determine which specific concepts, such as parts of speech or
adding fractions, a child has learned in class.1 Some tests are commercially produced and sold as part of
a curriculum. The Brigance system is an example. Other teachers develop specific tests to complement
their unique lesson plans.
Because criterion-referenced tests measure specific skills and concepts, they tend to be lengthy.
Typically, they are designed with 100 total points possible. Students earn points for each item
completed correctly. The students’ scores are generally expressed as a percentage.
Criterion-referenced tests are the most common type of test teachers use in regular classroom work. So,
while parents and students may not hear the term “criterion-referenced test” often, they’re certainly
familiar with this popular form of assessment.
Other Benefits
In addition to providing scores to measure progress, these test results give specific information on skills
and sub-skills the student understands. They also provide information on the skills the student has not
yet mastered. Both types of information are useful in determining what type of specially designed
instruction the student needs and what the instruction should cover.Educators use these tests to
evaluate the effectiveness of teaching programs, to determine students’ mastery of concepts and skills
and to measure progress toward a student’s Individualized Education Program (IEP) goals and
objectives.
These tests, whether designed by teachers or commercially produced, may reveal if a student has a
learning disability that school officials haven’t diagnosed. On the other hand, the tests can reveal how
students are managing known learning disabilities.
Do they continue to struggle in specific areas or have they made progress? Perhaps their performance
has remained static. A criterion-referenced test can give teachers an idea of how a student is advancing
in class. Results from a series of such tests can be used to help students with learning disabilities set
goals both on and off their IEP.
While criterion-referenced tests may reveal how well students have mastered certain concepts, they
alone don’t tell the whole picture about what a student has learned in class. Student work, projects,
essays, and even participation in class discussions, can give parents and teachers a comprehensive look
at a student’s performance.
After all, many students, especially those with learning disabilities and special needs, don’t always
perform well on tests. If your child’s performance on criterion-referenced tests is underwhelming, speak
with her teacher about how they’re doing in all aspects of the class. Determine your child’s academic
progress using multi-dimensional measures for a more well-rounded assessment.
DISADVANTAGES
Although these assessments are becoming more popular in the special education field they do have
some drawbacks. These include:
It does not allow for comparing the performance of students in a particular location with national
norms. For example, a school would be unable to compare 5th grade achievement levels in a district, and
therefore be unable to measure how a school is performing against other schools.
It is time-consuming and complex to develop. Teachers will be required to find time to write a
curriculum and assessments with an already full work-load. It might require more staff to come in and
help.
It costs a lot of money, time and effort. Creating a specific curriculum takes time and money to hire
more staff; and most likely the staff will have to be professionals who have experience.
It needs efficient leadership and collaboration, and lack of leadership can cause problems – for instance,
if a school is creating assessments for special education students with no well-trained professionals,
they might not be able to create assessments that are learner-centered.
It may slow the process of curriculum change if tests are constantly changed. It is diifficult for curriculum
developers to know what is working and what is not working because tests tend to be different from
one school to another. It would require years of collecting data to know what is working and what is not.
Despite it’s flaws, criterion-referenced assessments will still be important in special education because
comparing scores of students with special needs to average students will not achieve much in measuring
the student’s current level of performance.