Key Contents:: Week 1&2: Assessment, Concepts and Issues A. Assessment and Testing 1. Assessment and Test
Key Contents:: Week 1&2: Assessment, Concepts and Issues A. Assessment and Testing 1. Assessment and Test
Key Contents:: Week 1&2: Assessment, Concepts and Issues A. Assessment and Testing 1. Assessment and Test
Key contents:
A. Assessment and testing
1. Assessment and test
Should tests be at the highest level of seriousness?
Should they be degrading or threatening to sts?
Build sts confidence?
Learning exp?
Integral part of sts ongoing classroom development?
Bring out the best in sts?
● Definitions:
ASSESSMENT:
+ is appraising or estimating the level or magnitude of some attribute of a person.
+ is the systematics process of documenting and using empirical data on the knowledge,
skills, attitudes, and beliefs.
ex: question responses, assignments, projects, homework, diary/journal,
comments/suggestions, quizzes, reflections, tests,... .
TESTS: + are a subset of assessment, a genre of assessment techniques.
They are prepared administrative procedures occuring at identifiable times in a curriculum.
- A test is used to examine someone’s knowledge of something to determine what that
person knows or has learned. It measures the level of skill or knowledge that has been
reached.
ex: driving test, grammar test, swimming test,... .
4 BASICS OF A TEST:
A test is a method of measuring a person's ability, knowledge, or performance in a given
domain.
#1:A test is a method.
An instrument
Set of techniques
Procedures
Performance from test-takers
Preparation from organ
How to be qualified:
Explicit test structure
Prescribed answer keys
Scoring rubrics (Writing)
Question prompts (Speaking)
Example:
Norm-Referenced Tests
- measure broad skill areas, then rank students with respect to how others
(norm group) performed on the same test.
- each test-taker's score is interpreted in relation to a mean (average score), median (middle
score), standard deviation (extent of variance in scores), and/ or percentile rank
Score reported
- Numerical score (90, 780...)
- Percentile rank (80%)
Examples – TOEFL iBTI IELTS/ FCE - CAT (International University HCM) - HS. National
Grad. Examination
Criterion-Referenced Tests
- determine whether students have achieved certain defined skills. An individual is compared
with a preset standard for expected achievement.
- designed to give test-takers feedback, usually in the form of grades, on specific course or
lesson objectives
- Test scores tell more or less how well someone performs on a task.
Score reported - Numerical score (90)
Examples - Quizzes - In-class assessment - Midterm/ Final
B. TYPES AND PURPOSES OF ASSESSMENTS
5 TYPES OF TESTS
1. Achievement Tests
- The primary role is to determine whether course objectives have been met
- Measure learners' ability within a classroom lesson, a unit, or even an entire
curriculum
- Be a form of summative assessment, administered at the end of a lesson/ unit/
semester
- Short-term/ Long-term/ Accumulative
-
Specifications for an achievement test determined by
● objectives of the lesson, unit, or course being assessed
● relative importance (or weight) assigned to each objective
● tasks used in classroom lessons during the unit of time
● time frame for the test itself and for returning evaluations to students
● potential for formative feedback
2. Diagnostic Tests
- Identify aspects of a language that a student needs to develop or that a course should include
- Help a teacher know what needs to be reviewed or reinforced in class, enable the student to
identify areas of weakness
- offers more detailed, subcategorized information about the learner
3. Placement Tests
- Place a student into a particular level or section of a language curriculum or school
- Include points to be covered in the various courses in a curriculum
- Come in many varieties (formats, question types)
- use existing standardized proficiency tests due to obvious advantage in practicality,
cost, speed in scoring, and efficient reporting of result
UWE
Listening: 20 MCQs (2 parts)
Grammar: 20 MCQs (2 parts)
Vocabulary: 20 MCQs (2 parts)
Reading: 20 MCQs (2 parts)
Writing: essay (200 words)
IU (Full IELTS – Proficiency test)
Writing:Visuals + Essay (2 parts)
Speaking: various Qs (3 parts)
Listening: 40 Os (4 sections)
Reading: 40 Qs (3 passages)
4. Proficiency Tests
- Test overall ability, not limited to any one course, curriculum, or single
skill in the language
- Traditionally consisted of standardized MC items on grammar, vocabulary, reading
and aural comprehension
- Almost always summative and norm-referenced
- Play a gatekeeping role in accepting/ denying someone academically
- A key issue is how the constructs of language ability are specified
5. Aptitude Tests
- Measure capacity/ general ability to learn a foreign language (before
taking a course)
- Predict success in academic courses
- Significant correlations with the ultimate performance of students in language courses
(Carroll, 1981), but measured by similar processes of mimicry, memorization, and puzzle-
solving → less popular
Other Tests:
internal test, external test, objective test, subjective test, combination test,... .
C. ISSUES IN LANGUAGE ASSESSMENT
I. Behavioral Influences on Language Testing
- Strongly influenced by behavioral psychology and structural linguistics
- Assumption that language can be broken down into its component parts and that those
parts can be tested successfully
- → discrete-point tests
Main skills: listening, speaking, reading, and writing
Units of language: phonology, morphology, lexicon, syntax, and discourse
ECPE: Examination for the Certificate of Proficiency in English) Writing, Listening, GCVR,
Speaking
ECCE: (Examination for the Certificate of Competency in English) Listening, GVR, Writing,
Speaking
2. Integrative Approaches
Language pedagogy rapidly moving in more communicative directions
→ discrete-point approach → inauthentic
- Language competence: a unified set of interacting abilities that could not be tested
separately (John Oller, 1979) → integrative testing
Cloze test:
- Reading passage
- a blank after each 7 words
- Integration of vocabulary, structure, grammar, reading comprehension, prediction,
discourse ...
Dictation:
- Listen and write
- integration of listening, writing, grammar, structure, vocabulary, efficient short-term
memory, discourse, ..
3. Communicative Language Testing
- By the mid-1980s, a switch to work on communicative competence → communicative test
tasks
- Integrative tests such as cloze only reveal a candidate's linguistic competence, NOT directly
about a student's performance ability → a quest for authenticity on comm. Performance
- Bachman (1990) proposed a model of language competence:
A. Organizational Competence
1. Grammatical (including lexicon, morphology, and phonology)
2. Textual (discourse)
B. Pragmatic Competence
1. Illocutionary (functions of language)
2. Sociolinguistic (including culture, context, pragmatics, and purpose)
4. Traditional and "Alternative" Assessment
5. Performance-Based Assessment
- General educational reform movement: standardized tests DO NOT elicit actual
performance on the part of test-takers. → Performance-Based Assessment
Involves: oral production, written production, open-ended responses, integrated
performance, group performance, and interactive tasks
Drawbacks: time-consuming, expensive
Pay-off: more direct testing, actual or simulated real-world tasks → higher content validity
Appearance: Task-based assessment , Classroom-based assessment
D. CURRENT HOT TOPICS IN LANGUAGE ASSESSMENT
Assessing for Learning
Dynamic Assessment
Zone of Proximal Development (ZPD)
• Learner's potential abilities > the actual performance in a task
• What learner can do with assistance/ feedback
• Assessment NOT complete wlo observation and assistance (Poehner and Lantolf, 2003) →
Important to explore/ discover ZPD
What to do:
- provide clear tasks and activities
- pose questions for Ss to demonstrate understanding and knowledge
- intervene with feedback and student reflections on their learning
Assessing Pragmatics
Phonetics Phonology Morphology Syntax Semantics Pragmatics
● Focusses of pragmatics research: speech acts (e.g., requests, apologies, refusals,
compliments, advice, complaints, agreements, and disagreements)
● Research instruments: discourse completion tasks, role plays, and socio-pragmatic
judgment tasks, and are referenced against a native speaker norm
● Underrepresents L2 pragmatic competence (Roever, 2011). need to include
assessment of learners' participation in extended discourse
● Other aspects to be assessed: recognizing and producing formulaic expressions (e.g.,
Do you have the time? Have a good day.)
Use of Technology in Testing:
Advantages Concerns:
2. Validity
- the consideration that it ‘really measures what it purports to measure’ (McCall, 1922,
p.196)
- the cónideration that tests really assess what they are intended to assess (Davies,
1990; Brown, 2005)
- 6 attributes of VALIDITY
+ measure what it purports to measure
+ not measure irrelevant or ‘contaminating’ variables
+ rely as much as possible on empirical evidence (performance)
+ involve performance that samples the test’s criterion (objective)
+ offer useful, meaningful infor abt a test-taker’s ability
+ supported by a theoretical rationale or argument
Evidence of VALIDITY
CONTENT CRITERION CONSTRUCT FACE
Content-related Criterion-related evidence Construct-related evidence Face validity
evidence
Clearly Based on classroom Any theory, hypothesis, or The degree to which
define the objectives model that attempts to explain a test looks right,
achievement Minimal passing observed phenomena in our appears to measure
you are grade universe of perceptions (Brown the knowledge or
measuring & abeywickrama, 2018) abilites it claims to
Examples: + research
Ex: + speaking test proposal: measure
(write down the The extent to which
- Meet requirements of The theory
responses) st vie the assessment
components of a underpinning the
as fair, relevant and
+ 10 course research paper assessment relevant and
useful for improving
objectives (test 2 out - Clear descriptions of adequate to support the
learning
of 10) each part intended decision
A fallacy to some
- Use of hedges in (Green, 2013).
+ 1 objective tested experts.
in midterm & final discussion and Ex: linguistics construct Reflect the quality
(some deleted) conclusion of a test
Presentation skill course
Direct CONCURRENT (persuasion): - content,
testing: test VALIDITY -> results pronunciation, vocabulary,
taker actually supported by other concurrent body language, visual aids.
performing performances (Brown & Q/A, organization (logical
the target abeywickrama, 2018) ideas)
task. Ex: Scores validated A major issue in
phonetics & through comparison validating large-scale
phonology with various teachers’ standardized tests of
course: grades (Green, 2013). proficiency ->
students Ex: + IU CAT (English) – Practicality: Omit oral
pronounce correlation test with high test
words/ sound school year -end score/ high Very low correlation:
analysis school graduation exam. W & S/ R & L
system to + regarding for final exams + micro level: test takers,
check (asserted by another lecturer) families (preparation, accuracy,
spectrum. psychology/motivation/attitude
Indirect + questionnaire: adjusted
from FW (then checked by /habits/...)
testing: test
taker NOT advisor) + macro level: society,
actually PREDICTIVE VALIDITY - educational systems
performing > determine students’ (conditions for coaching,
the target readiness to ‘move on’ to teaching methodologies, …)
task, but do another unit
related Ex: placement test, scholastic
task(s). Ex: aptitude test (SAT)
students
write down What if the test designer just
transcription pays attentions to concurrent
, validity but ignore content
differentiate validity?
minimal
pairs.
3. Reliability
Definitions:
5 principles of reliability
1. Has consistent conditions across two or more administrations
2. Gives clear direction for scoring/evaluation
3. Has uniform rubrics for scoring/evaluation
4. Lends itself to consistent application of rubrics by the scorer
5. Contains items/tasks that are unambiguous to the test-taker
Factors causing unreliability:/ Possible sources of fluctuations:
- The student:
+ Health problems (fatigue, bad stomach, …)
+ Psychological issues (anxiety, break-up, ...)
+ Test wiseness (strategies,..)
What can teachers do to help students reveal their true competence? -> test
bank, mock-test, share tips, awards
- The scoring/ the rater reliability (interobserver reliability)
The degree of agreement between different people observing or assessing the same
thing.
Interrater reliability: when two or more scorers yield consistent scores of the same
test
Why fluctuation exist?
+ Lack of adherence to scoring criteria
+ Inexperience
+ Inattention
+ Preconceived biases
What can be done to avoid the big gap between two raters?
Intra-rater reliability (internal factor): when a single rater repeats administrations of
a test. (1 rater rates the test twice)
=> why fluctuation exist?
+ unclear scoring criteria
+ fatigue
+ bias towards ‘good’ and ‘bad’
+ carelessness
What can be done to maintain intra-rater reliability? ->
- The test administration
The conditions in which the test is administered. This may cause unfairness
Why fluctuation exist?
+ Background noise
+ Photocopying variations
+ Lighting systems
+ Location of air conditioners/fans
+ Arrangement/ condition of chairs/desks
+ Session of exam (morning/afternoon/…)
+ Locations of speakers (listening exam)
Scenario: think about a midterm/final test in IU
- The test itself
the nature of the test itself can cause measurement errors
Why fluctuation exist?
+ test format
+ test design items
- Characteristics (to categorize learners)
- Poorly designed items (ambiguous, two similar answers)
- Subjective test (open-ended questions)
- Objective test (MCQs)
4. Authenticity
Definitions:
Signals of authenticity:
An authentic test:
1. Contains language that us as natural as possible
2. Has items that are contextualized rather than isolated
3. Includes meaningful, relevant, interesting topics
4. Provides some thematic organization to items, such as through
a story line or episode.
5. Offer tasks that replicate real-word tasks.
Example of authentic assessment: portfolios, role-play, memos, presentations, case studies,
proposal, reports, projects, …
5. Washback
Definitions: the effect testing on teaching and learning.
….
CONTRIBUTORS TO POSITIVE WASHBACK:
A test that provides beneficial washback …
1. Positively influences what and how teachers teach
2. Positively influences What and how learners learn
3. Offer learners a chance to adequately prepare
4. Give learners feedback that enhances their language
development
5. Is more formative in nature than summative
6. Provide conditions for peak performance by the learner
What can do to maintain positive washback?
Suggested research topics:
Washback from the on-going assessment in the Writing 1 course
Students’ perceptions towards face validity of exams in the presentation skill course
WEEK 6: