This is a near final draft of the article which appears as
Macqueen, S., Pill, J., & Knoch, U. (2016). Language test as boundary object: Perspectives
from test users in the healthcare domain. Language Testing, 33 (2), 271-288.
Language test as boundary object: Perspectives from test users in the healthcare
domain
Susy Macqueen University of Melbourne
John Pill American University of Beirut
Ute Knoch University of Melbourne
Abstract
Objects that sit between intersecting social worlds, such as Language for Specific Purposes
(LSP) tests, are boundary objects – dynamic, historically‐derived mechanisms which maintain
coherence between worlds (Star & Griesemer, 1989). They emerge initially from
sociopolitical mandates, such as the need to ensure a safe and efficient workforce or to
control immigration, and they develop into standards (i.e. stabilized classifying mechanisms).
In this article, we explore the concept of LSP test as boundary object through a qualitative
case study of the Occupational English Test (OET), a test which assesses the English
proficiency of healthcare professionals who wish to practise in English‐speaking healthcare
contexts. Stakeholders with different types of vested interest in the test were interviewed
(practising doctors and nurses who have taken the test, management staff, professional
board representatives) to capture multiple perspectives of both the test‐taking experience
and the relevance of the test to the workplace. The themes arising from the accumulated
stakeholder perceptions depict a ‘boundary object’ which encompasses a work‐readiness
level of language proficiency on the one hand and aspects of communication skills for
patient‐centred care on the other. We argue that the boundary object metaphor is useful in
that it represents a negotiation over the adequacy and effects of a test standard for all
vested social worlds. Moreover, the test should benefit the worlds it interconnects, not just
in terms of the impact on the learning opportunities it offers candidates, but also the impact
such learning carries into key social sites, such as healthcare workplaces.
Keywords: Language for Specific Purposes (LSP) tests, boundary object, standards, test
impact, authenticity, validity
1
High‐stakes language tests are frequently positioned metaphorically as gates into
socioeconomically desirable domains. Powerful stakeholders are construed as gatekeepers
who control the flow by setting a standard for entry in the form of a test score. In this
article, we expand this popular metaphor to a more holistic notion of tests as boundary
objects which are brought about by socioeconomic needs. We first explore the theoretical
insights offered by the concept of boundary objects and we then apply it to a case study of
the relevance and impact of a high‐stakes specific purpose language test, the Occupational
English Test (OET).
I Boundary objects
Boundary objects are dynamic, historically‐derived mechanisms which develop in order to
maintain coherence between intersecting social worlds (Star & Griesemer, 1989). They are
constructed from a pooling of information at the intersections of the participating social
worlds (Star & Griesemer, 1989). Boundary objects often become standards. Broadly
speaking, standards are stabilised classifying mechanisms which cohere aspects of social
structure and are pervasive in modern life; the ‘raw materials out of which social order is
constructed’ (Busch, 2011, p. 42). Standards come to be recognised as the way things are
(done) in societies, such as what to do at a red traffic light, the length of a phone number
and how to gain citizenship of a country. It is a feature of the standard‐saturated modern
world that the genesis and reasoning behind classificatory systems get forgotten as
standards are naturalised in social use (Bowker & Star, 1999; Busch, 2011).
Tests of Language for Specific Purposes (LSP) can be considered boundary objects,
developing as they do from a social mandate, such as a need to ensure a national healthcare
workforce can communicate to a level that is deemed adequate for patient safety and
workplace efficiency. This need brings about a classificatory infrastructure; LSP tests
categorise and differentiate people in terms of how ready they are to participate
linguistically in specific discourse communities. They do this by using standardized language
sampling methods which, ideally, elicit language forms and structures that are relevant to
the target social sphere and are judged using methods that are also congruent with ways in
which communication is evaluated in the target domain (Douglas, 2000; Jacoby &
McNamara, 1999). How meaningful or relevant these samples and the attendant
classificatory infrastructure (e.g. criteria, scoring, cut scores) are to the people within the
boundaries of the target sphere as well as to those hoping to enter it is a key question for
research on the impact of tests.
Test impact and authenticity
Current conceptualisations of validity have led to increased attention to the social context in
which tests are implemented, to the values underpinning test constructs and to how test
scores are interpreted, used and perceived by different stakeholders (McNamara & Roever,
2006). Such thinking makes way for including test consequences as a necessary part of
validation programs (Cheng, 2008). Fulcher and Davidson (2007) argue for an effect‐driven
approach to test development whereby impact on stakeholders not only informs design
decisions but is the basis for ongoing monitoring of test validity. They define effect‐driven
testing as ‘a blending of procedure with fundamental virtue, and the fundamental virtue is
simple: think about the intended beneficial impact as the test is built, and be willing to knock
it down when things change’ (p. 177). Standards theorist, Lawrence Busch, similarly advises
that the balance of benefits and costs be considered at the design stage of a standard.
Ultimately standards must not affect social conditions in ways which restrict the democratic
rights of those judged by them (Busch, 2011; Shohamy, 2004). In the Standards for
Educational and Psychological Testing (AERA, APA, NCME, 2014) sections are devoted to
avoiding negative consequences of testing and to determining the likelihood of intended
2
benefits, both of which can be informed by stakeholder perspectives (including test‐taker
experiences) as well as the congruence with target domain. To this end, impact studies
‘enable us to re‐evaluate and make explicit not just the standards we promote but the very
view of language we take for granted’ (Davies, 2008, p. 440). Thus, the role of test impact
research is to challenge the taken‐for‐grantedness of the standard by interrogating its
rationale and examining its effects.
The meanings of LSP tests, like those of other standards, are found between groups,
and it is by studying the adequacy of their between‐ness that we can evaluate whether an
appropriate informational balance has been struck (albeit always incomplete and
inadequate, to some degree (Star & Lampland, 2009)). Validation, as Moss (1998) puts it,
‘entails ongoing evaluation of the dialectical relationship between the products and
practices of testing, writ large, and the social reality that is recursively represented and
transformed’ (p. 11). Adequate authenticity is at the centre of such ongoing evaluations,
because it is most likely to promote beneficial test preparation practices relevant to the
future domain of practice (Messick, 1996) and consequently, most likely to be beneficial to
the user worlds. That said, authenticity alone is not sufficient in test design. Needs analysis
processes prioritise domain tasks and skills which should be represented on a test (Douglas,
2010). Thus, effect‐driven LSP test design involves considering the relative importance of
test tasks as well as their degrees of authenticity, with test effects and test content requiring
periodic checks to ensure the test and its social worlds are operating coherently.
Such investigations involve determining the relationship between the test construct
and the criterion domain, both of which are generated in the test development process
(McNamara, 2006). As McNamara has phrased it, ‘the criterion domain is itself an
abstraction, a construct in need of validation itself’ (p. 36). Many performance tests, for
instance, are underpinned by models of communicative competence (see Purpura, 2008).
Target domain communication may be shaped by frameworks for communication developed
from professional philosophies which are upheld through education and professional policy,
such as patient‐centred care, espoused in many English‐speaking healthcare workplaces (e.g.
Makoul, 2001). Messick (1996) articulated the ideal of authentic assessments as those which
‘pose engaging and worthwhile tasks (usually involving multiple processes) in realistic
settings or close simulations so that the tasks and processes, as well as available time and
resources, parallel those in the real world’ (p. 243). Even with careful analysis and sampling
of the practices of the real world, the development process of a standardized test can bleach
tasks and processes of authentic characteristics to the extent that the test tasks that result
may be inadequate simulations (Lewkowicz, 2000). It is helpful for the sake of investigating
test effects to consider the ‘target language domain’ as an abstraction that informs test
design in contrast to actual real‐world practices, which are too contextually complex and
locally detailed to be applied in a standardized test. Impact studies can raise awareness of
the degree of commensurability between theoretical models which contribute to the test
instrument and those which underlie domain practices as well as the congruence between
the generalised target language domain and actual communication practices in relevant
local contexts.
Stakeholder voices
Stakeholder perceptions can be insightful in terms of both the degree and the importance of
authenticity (Lewkowicz, 2000). LSP test stakeholders, whether test takers or institutional
score users, are cast as ‘domain experts’ by virtue of their workplace experience and/or
training. Accordingly they are seen as uniquely placed to articulate what matters for
effective communication in the relevant language use situation (Douglas, 2000; Douglas &
Myers, 2000; Jacoby & McNamara, 1999; see also other papers in this issue) to give
feedback on the appropriateness of a test’s content and format (Brown, 1993; Lumley &
3
Brown, 1996), to decide what level of test performance is sufficient for effective functioning
in this domain (Lumley, Lynch, & McNamara, 1994; O'Neill, Buckendahl, Plake, & Taylor,
2007 and other papers in this issue) and, in general, to determine whether a test is achieving
its intended effects.
Three relevant studies of test‐taker perceptions involve comparisons between the
OET, an LSP test, and the IELTS, a general/academic English proficiency test. These two tests
have been used interchangeably as health care registration standards in Australia and New
Zealand for some years. In an exploratory study by Elder (2007), 53 non‐native‐English‐
speaking health professionals took both the OET and IELTS in close succession. A comparison
of band score/grade frequencies on each test revealed that while overall ‘pass’ rates on each
test were comparable (i.e., similar numbers of candidates received an overall Band 7 or
above on IELTS and a B grade or above on the OET sub‐tests), the candidates viewed these
two tests rather differently, as indicated in their post‐test survey responses. The OET
reading, writing and speaking components were perceived to be somewhat easier than their
IELTS counterparts, whereas the reverse was the case for OET listening, which posed
challenges on account of the Australian‐accented input and the need to take notes while
listening (see below for a description of the listening sub‐test tasks). The perceptions of
relative difficulty were not, however, reflected in obvious score differences on the two tests,
to the extent that these scores were comparable. While many candidates accepted the IELTS
as a valid measure of their general English proficiency, there was almost unanimous
agreement that the OET was a better measure of their professional competence and hence
more relevant to their needs. This suggests that health specificity in the boundary
mechanism allows for greater coherence between worlds. However, a study by Read and
Wette (2009) showed that practical issues are also weighed up in the evaluation of test
options. In their study, which elicited comparative feedback on the relevance and utility of
the IELTS and OET from 23 overseas‐trained health professionals enrolled for a six‐month
English course for health professionals in New Zealand, there was a slight preference for the
OET on the grounds of its health‐related content and the fact that candidates could, at that
time, take single components of the test where they had failed to reach the required
standard, rather than re‐sitting all four sub‐tests as required for the IELTS. The IELTS, on the
other hand, was seen as more affordable and easier to prepare for given greater availability
of practice materials. However, the authors point out the candidates’ perceptions varied
somewhat over time such that ‘by the third interview many health professionals seemed to
have realised that both IELTS and the OET are tests of English language proficiency, not of
clinical knowledge and therefore the medical content of the OET became less important as a
factor determining their choice of a test’ (Read & Wette, 2009, p. 39).
In a study by Oh (2010), the views of 18 international medical graduates (IMGs)
were canvassed after they had taken and passed the OET and had experience in the
workforce, as is the case with the current study. Interestingly, all but the most able
participants (who had passed the OET after only a one‐week period of preparation) held the
view that preparing for the exam helped them develop effective communication skills for
their profession. Opinions of test tasks and their relevance to the medical domain were
generally favourable, although the listening test was criticised for its failure to represent
non‐native accents encountered in the workplace. Some also held the view that the note‐
taking demands on the listener were too heavy and distracted from the processing of new
information. Among the limitations of the OET noted by respondents was its failure to
sample communication with colleagues and its lack of general English content relevant to
daily conversation. Taken together, these studies show that test‐takers are affected by
multiple aspects of a standard (specificity, timing, cost, availability of practice materials) and
that effects may vary depending on the degree to which the candidate engaged with test
preparation. In the present study, we sought to build on these studies of test‐taker
4
perceptions by eliciting multiple stakeholder perspectives on the test as a boundary
mechanism.
II Occupational English Test (OET)
The Occupational English Test (OET) is ‘an international English language test that assesses
the language and communication skills of healthcare professionals who seek to register and
practise in an English‐speaking environment’.1 In this paper, we draw from a larger study
(Macqueen, Pill, Elder, & Knoch, 2013) to exemplify the effects and meanings of the OET as a
boundary object. We focus on two of the four OET sub‐tests, the Speaking and the Listening
sub‐tests, because these offer complementary perspectives for a productive and a receptive
skill and because these skills are common to many communicative language performance
tests. A brief description of the sub‐tests is below (see Elder, this issue, for further detail).
Speaking sub‑test
The speaking sub‐test is a role‐play format in which the candidate takes the role of clinician
and the interviewer takes the role of patient/carer/client for two different role‐play
scenarios, each lasting five minutes.2 The test is profession‐specific, which means that
nurses, for instance, will be given situations that are considered typical of nursing duties.
Candidate performances are assessed against five criteria: Overall Communicative
Effectiveness, Intelligibility, Fluency, Appropriateness of Language, and Resources of
Grammar and Expression. (Although new criteria are under review for the test, this study is
based on the criteria currently in operation.)
Listening sub‑test
The listening sub‐test is common to all 12 professions and is 50 minutes long. There are two
parts. The first (Part A) is a recording of a consultation between a professional and a
patient/client during which candidates take notes under supplied headings. An example is a
doctor consulting with a patient about his back injury; candidates supply notes under
headings such as ‘history of recent incident’, ‘details of pain’ ‘treatment’, ‘further details of
problem’, ‘doctor’s initial comments’ and ‘results of the examination’. The second (Part B) is
an audio recording of a talk on a general medical topic during which candidates complete
short answer and fixed‐response questions. An example for this part is a nurse talking about
the incidence of congenital heart disease in newborns in Australia.3
III Method
We explore, through the qualitative analysis of interview data, the nature of the test as it is
perceived by participants with diverse relationships to the test: those involved in setting the
standard (professional bodies), those who occupy the target domain (supervisors in the
workplace) and ‘boundary crossers’ as classified by the test (successful test‐takers who have
achieved the standard and since entered the target domain). The largest OET candidate
groups are doctors and nurses and it is these professions which formed the focus of the
study. Our overall aim was to determine how well the test functions as a standard for work‐
readiness. Specifically, we posed the following questions.
1 http://www.occupationalenglishtest.org
2 A sample role‐play for each profession can be found at: http://www.occupationalenglishtest.org/Display.aspx?tabid=2425
3 A sample listening test can be found at http://www.occupationalenglishtest.org/display.aspx?tabid=2425
5
Research questions
1. How do stakeholder groups with diverse relationships to the test perceive the test
content in relation to language use in the workplace?
2. Are the language challenges of the workplace adequately represented in the test?
3. What are the effects of the test as boundary object?
The interviews were semi‐structured to enable an in‐depth, contextualised exploration of
relevant issues as they arose. They were approximately 30 to 45 minutes long. Interviewees
were asked to comment on the relevance of the OET content to the workplace and on the
language demands of the workplace for users of English as a second language. Participants
who had not taken the test (i.e. board members and most supervisors) were provided with a
description of each sub‐test and sample test papers so they could comment on specific
tasks. Past candidates were also asked to fill out a background questionnaire, which included
information about their OET scores, test preparation methods, linguistic and cultural
background, professional occupation and experience, and general opinion of their test
experience.
The questionnaire and semi‐structured interview data were filtered through a
grounded analytical process to elucidate stakeholder perceptions of test effectiveness,
relevance and impact. The interview data were transcribed and analysed thematically by
two researchers using NVivo, computer software designed to support qualitative coding. The
analysis was an iterative process (King & Horrocks, 2010) in which descriptive themes were
identified and interpreted in relation to the research aims.
Participants
As mentioned above, three stakeholder groups were interviewed for this study. These are
described below.
1) Senior representatives of the two relevant professional bodies (i.e. national medical and
nursing & midwifery boards)
2) Supervisors of EAL (English as an Additional Language) health professionals
Five senior nurses and six senior doctors were interviewed, all of whom had several years’
experience in their profession. These participants were either educators and/or in senior
institutional roles. Two nurses were co‐ordinators at aged‐care facilities (one urban and one
rural) and one of these had considerable experience in remote hospital contexts. One doctor
was also a past OET candidate and so was able to offer the perspectives of both past
candidate and supervisor.
3) Successful OET candidates who are currently employed in their health professions
The past candidates interviewed were employed in a variety of contexts and had a broad
range of experience. The sample comprised 10 nurses (7 working in hospitals and 4 in Aged
Care, including one in both) and 8 doctors (2 working in medical centres and 6 in hospitals,
at least 3 of which were regional). The participants all had work experience in Australian
healthcare contexts, enabling them to comment on workplace communication practices; the
mean time the past candidates had worked in Australia was just under 12 months. The mean
time the past candidates had worked in their professions in any context was 4 years 8
months. All past candidates had achieved a grade B or higher on all four sub‐tests of the
OET, with most achieving grade B. Thirteen of the nineteen past candidates had also taken
IELTS and were therefore able to compare their experiences of the two tests. The
6
participants spoke a variety of first languages, representing the diversity of overseas‐trained
health professionals who practise in Australia: Arabic (3), Bangla (1), Burmese (1), Chinese
(4), Filipino (1) Gujarati (1), Hindi (1), Korean (1), Punjabi (2), Russian (1), Sinhala (1).
IV Findings
The findings are presented as seven themes which emerged in the analytic process. The
themes (listed below) have been selected from the broader study for what they reveal about
the test as a boundary object and a professional registration standard.
General
1. Communication skills vs language proficiency
2. Specificity of test materials
Listening
3. Two types of listening
4. Note‐taking in the listening test
Speaking
5. Roleplay in the speaking test
6. Switching register
7. Inter‐/intra‐professional communication
Extracts are attributed using a code indicating whether the interviewee is a supervisor (S) or
past candidate (PC), the interviewee’s profession (Nurse or Doctor) and an identification
number.
1. Communication skills vs language proficiency
A persistent theme in the interviews involved participants distinguishing the healthcare
construct of ‘communication skills’ from the applied linguistics construct of ‘language
proficiency’ (see also Manias & McNamara). The relevance of this distinction was evident in
responses from all groups of participants. The medical board representative, for example,
referred to the limits of the language test purpose:
we don’t think it’s testing clinical communication skills … there are many native
English speakers who don't have good communication skills in a clinical context … if
the testing is congruent with practice, that’s terrific, but we shouldn’t be relying on
that.
Similarly a senior doctor who was also a past candidate observed that aspects of spoken
communication such as ‘whether you are caring, whether you are sympathetic, whether you
understand what the patient’s need is’ should not be tested in a language test because they
are ‘well above the level of basic communication skills’ (S‐Doctor 6). Another past candidate
described the workplace communication as requiring ‘something on top of the language
itself … it is a lot about interpersonal skills in hospital so you shouldn’t just do the task, you
are not just task orientated, you should also relate to, build a rapport with the patient’ (PC‐
Nurse 3). Senior doctors and nurses noted the importance of checking a patient’s
comprehension as a part of the health professional–patient interaction, while past‐
candidates observed that body language was an important component of workplace
communication that appeared to be absent from the OET. Participants’ comments indicate
that a distinction between speaking and listening skills is not particularly relevant to them;
‘communication skills’ appears to combine the ‘macro’ skills that are considered separate for
test purposes.
7
Despite the fact that the test (in its current form) does not focus explicitly on
assessing clinical communication skills, some past‐candidates assumed this was part of the
test construct. As one informant observed, the specific‐purpose characteristics of the test
are not easily separated from ‘pure’ language assessment: ‘it is also a test on understanding,
listening skills … whether you are able to provide reassurance, you have a reassuring
technique, whether you have advising skills, it is a little bit more comprehensive
communication skills’ (PC‐Nurse 1). PC‐Nurse 9 explained that the first time she took OET
she focused only on language. When she failed, she reconsidered her stance and decided for
her subsequent attempt that focusing on communication (‘to listen to the patient, what she
really wants’) was what was required. Indeed, several informants indicated how such
attention to appropriate communication in context helped them succeed on the OET.
2. Specificity of test materials
The issue of specificity of test materials was raised by some past candidates. One nurse
commented that some role‐play scenarios in the speaking test would most likely not be
encountered by a junior nurse with little prior experience, and PC‐Nurse 6 stated that the
test content is ‘beyond what I do in terms of my communication [in current employment] …
it is ahead of what is basic’. On the other hand, another nurse pointed out that the scenarios
are limited to ‘medical ward and surgical ward’ and so do not match his particular
experience as a psychiatric nurse (PC‐Nurse 4). There was a general acknowledgement that
test takers will perform better if the topic they are given in the role‐play is something they
are familiar with; therefore, professional knowledge is seen to facilitate performance even if
that knowledge is not being assessed directly in the test.
Issues of specificity of topic also arose regarding both parts of the listening test. PC‐
Doctor 7 points out that the test topics are not all directly related to the medical field and
include other areas of health practice; she would prefer it if they were always closely linked
with medical practice. In contrast, a nurse participant (PC‐Nurse 3) noted that the topics are
rarely from a nursing context; she felt the topics are quite similar to some of the topics used
in IELTS (which is not a specific‐purpose test) and did not relate particularly to her
experience of workplace communication. Another doctor felt the Part A consultations were
too often based in general practice.
Despite these qualms, the questionnaire data showed that almost all past‐
candidates had chosen to take the OET because it was related to their profession and, in the
interviews, informants generally characterised the OET as complementary to the
professional exams they must also take, e.g., the Australian Medical Council’s clinical
examination (PC‐Doctor 6).4 Two participants indicated that they perceived the OET as being
easier because it was more related to their professions. As one wrote in the questionnaire,
‘IELTS can be very tricky esp. when talking about rare kinds of animals, or fish, etc.’
Participants suggested that dealing with an unfamiliar medical topic on the OET was
preferable to dealing with an unfamiliar topic from any field (in IELTS). In the view of PC‐
Doctor 6, IELTS imposes a double burden of content and language knowledge:
For any English exam there are two major informations you need to know: the
English itself, and some general knowledge. If I sit for exam with OET – so I already
[have] the medical knowledge, the general knowledge. So the rest of the exam will
be English so I can handle it later on. But in IELTS I should handle both of them.
Past candidates’ comments also indicated how preparing to take the OET helped set their
expectations of professional life and the workplace: ‘it was a learning experience not just
4 Thirteen of the nineteen past candidates had also taken IELTS. (One past candidate was also a supervisor).
8
about language but also a little bit about the culture … of nursing in English speaking
contexts’ (PC‐Nurse 10). PC‐Nurse 1 stated that the OET is ‘more functional … more
practical … we are more familiar’; the test is related to the working environment: ‘I chose
OET because it is more related to what I do and what I want to do here in Australia and that
is the purpose of it’ (PC‐Nurse 6).
3. Two types of listening
For the senior staff in the study, listening can be divided into two overlapping types: 1)
listening as a clinical strategy (communication skills) and 2) listening comprehension as part
of clinical process. This can be seen as a specific instance of the distinction drawn in the first
theme – communication skills vs. language proficiency. Both types of listening might pose
problems for EAL staff; one doctor reported that complaints about non‐English‐speaking
staff were usually because clients felt the staff members were not listening to them or did
not understand them (S‐Doctor 1).
S‐Doctor 4 explained the importance of listening as a clinical strategy: ‘often
listening is the biggest task that you have to do when you are talking to a patient – it is not
all about information transfer from you to them, particularly … if a patient is upset about
something or you have got to tell them something terrible.’ This kind of psychologically
strategic listening, however, may be seen as a professional skill, inappropriately tested as a
language skill; the same doctor concluded that this type of listening was less a language skill
than a clinical one: ‘It is more content in a way because that is the appropriate clinical skill at
this point, just to actively listen.’
All senior staff described aspects of listening as a part of clinical process for
comprehending and recording information, as well as for decision‐making. This second type
of listening is tapped into in the OET listening test; senior staff saw clear links between Part
A (note‐making during a consultation) and the workplace, as exemplified by S‐Doctor 6, a
medical educator and past candidate:
So that is what you are doing every day. You need to pick up the information from
patients … whether you are actually capturing the correct answer or the correct
information. … So you need to see whether they actually pick up all the important
essential information from that patient.
One past candidate doctor stated that the consultation (Part A) can also provide a model for
how a health professional should speak and noted that ‘my ears become more sharper’
while preparing for the listening test. The listening test is found to provide exposure to
accents that is relevant for test takers in Australia. PC‐Nurse 7 noted, however, that the test
does not cover the range of accents, slang or abbreviations encountered to the extent she
would like. The variety of accents present in most workplaces and the fact that this is not
represented in the listening test was highlighted by several participants.
4. Note‑taking in the listening test
Writing skills are incorporated in the listening test in the form of note‐taking (Part A) and
short answer questions (Part B). While the note‐taking task was perceived as an authentic
combination of skills, its timing and structure drew some criticism in terms of authenticity.
Several senior professionals reported that notes were generally not taken during the
consultation process, where clinical strategies such as attention to visual cues and rapport‐
building took precedence. For example:
9
You should maintain rapport with the patient. Look at them when they are talking so
you can pick up on cues including if you are examining them you look at their face if
you are feeling their abdomen to see if they are in pain because they won’t tell you.
You have to take all that in and if you are busy writing notes you can’t do that. (S‐
Doctor 4)
Comparing the workplace and the OET listening test note‐taking task, which uses an audio‐
recording, a past‐candidate noted how real‐life consultation listening was easier because
information can be checked and repeated back (PC‐Doctor 3). Describing work in an
Emergency Department, PC‐Doctor 5 explained: ‘you ask a question and they answer … and
literally when you start writing the patient pauses and waits for you to finish your writing,
then when you raise you head they continue’.
Although note‐taking might not ideally occur while a patient is talking during a
consultation, several senior professionals reported that note‐taking while listening to a
conversation occurred in ward rounds with senior team members. Other challenges of the
real world are indicated in the description by S‐Doctor 4, a medical educator in a major
urban hospital, who describes the ‘aural nightmare’ in which such notes might be taken:
…on that ward round they [international medical graduates] won’t be given enough
time to do the tasks that they are supposed to be doing. So they will be balancing the
folder, writing out the notes about what is happening while simultaneously being
expected to write a radiology request or a pathology request, look at the medication
chart, look at the fluid balance chart, take in everything that is being said, write their
own notes for what jobs they might have to do later. Already you see this is an
impossible task, meanwhile the interaction between the patient and the registrar will
be taking place in a crowded and busy ward room with other patients having loud
conversations with their relatives or just yelling, machines will be beeping and maybe
another ward round is taking place in the same room. It is an aural nightmare really,
if you can’t filter then you are going to have a lot of trouble.
Further, senior doctors observed that real‐world note‐taking is not indiscriminate:
‘you will be writing notes about what was talked about and what was decided’ (S‐Doctor 5).
Thus, there is a degree of selection and transformation of the information from
conversation/consultation to key issues and plan. In the listening test, however, candidates
write notes for headings which are provided on the test paper and the text (dialogue) is
presented heading by heading. S‐Doctor 6 indicated that the test text should require
candidates to present only the relevant information in a consultation. S‐Nurse 4 observed
that one way to demonstrate that effective listening had taken place would be for
candidates to provide appropriate headings for their notes:
It seems like [test designers] have almost given people the answers by putting the
headings [in the Part A test booklet] (…) whereas I would imagine that if I am a
Registered Nurse with some background in practice that I should be able to come up
with my own headings from the dialogue or from watching for interactions.
Note‐taking to record what has been done appears to be important in much nursing work
and PC‐Nurse 6 recognised the test ‘helping me jot down notes, make abbreviations like
really quickly during like telephone call from the doctor’. PC‐Nurse 10 learnt to focus only on
important words as a strategy to understanding, while PC‐Nurse 5 felt her ability to write
quickly helped with the listening task (viewing it as a dictation exercise).
10
On the whole, the Part A task (listening to a consultation and taking notes) was more
readily linked by the participants to typical workplace activities than the Part B task
(listening to a monologue and responding to questions). Nevertheless, participants
considered Part B as linked to listening to lectures or talks and video teaching in the
workplace in which ‘you just pick up the facts and the most important words and figures’
(PC‐Doctor 1).
5. Role‑play in the speaking test
Perhaps due to the strong tradition of simulation assessment in Australia (see Woodward‐
Kron & Elder, this issue), doctors and nurses at supervisory level generally indicated that
patient–clinician role‐plays were highly relevant to workplace communication and viewed as
a familiar assessment method. Both medical and nursing educators reported that
communication skills are inherent in assessments that involve simulated patients. One
medical educator reported that professionals who have trained overseas may only have
been assessed through written exams. Therefore, having this assessment genre represented
in the language test is congruent with the healthcare context and the learning culture that
surrounds it.
Past candidate informants also recognised a general closeness between the OET
role‐play situations they practised when preparing to take the test and their workplace
experience; one commented that ‘sometimes I found that my patients really talked like from
OET preparation’ (PC‐Nurse 7). They noted clinical aspects of the role‐plays and how these
are familiar to them from their everyday practice of healthcare. They also pointed out how
OET role‐play scenarios raised their awareness of cultural and professional expectations,
e.g., about privacy of information (PC‐Nurse 4). Especially among the nurses interviewed, it
appears that preparation for the OET speaking test helped with confidence: ‘I need to greet
the patients and I have to give explanation and I give advice and reassure patient. So that …
preparation of OET helped me everyday conversation with patients and also yeah I could
practise lots of different topics when I prepared OET’ (PC‐Nurse 7).
Despite the previous comments, authenticity of roles was a point of discussion. One
medical educator observed that role‐play assessments in general offer ‘a clean example of
an interaction, which is unrealistic’. In real life, he noted, patients are more usually
challenging in some regard, e.g., they are talkative, hostile or confused (see also Woodward‐
Kron & Elder, this issue, about the consensual orientation of the OET role‐plays). However,
another doctor pointed to the complexity of involving more realistic examples: ‘you [test
developer] would have to pick which kind of difficult communicating you were going to test
them on. Are you going to test them on a person who is reticent and quiet and you are
trying to draw them out or the verbose person or the angry one?’ (S‐Doctor 1). Another
doctor felt this kind of interpersonal discourse management was more appropriately
assessed as part of clinical skills than language skills (see theme 1 above).
Past candidates commented on the importance of the co‐operation of interlocutors
involved in the speaking sub‐test to support their demonstration of appropriate
communication skills through the roleplays. PC‐Doctor 1 described how, ‘if [the
interlocutors] ask questions and they’re involved in the conversations and they sort of
participate in the test actively, it really helps motivate us to do our role as well,
appropriately as well, and it’s healthy because it means you flow nicely’. Some informants
felt that poor training was apparent for particular interlocutors who lacked these necessary
skills (see also Woodward‐Kron & Elder, this issue).
6. Switching register
Many participants noted that speaking in healthcare settings crucially involves speaking to
other healthcare professionals. One doctor noted this gap in the speaking test tasks,
11
describing the process of synthesis and translation between registers that occurs in inter‐
professional interactions:
I think it would be interesting to hear them take the same data from the doctor–
patient [interaction] and have a doctor–doctor. Either listening to them make a
referral or go to their senior saying, “I am worrying about this man/lady because of
x.” Because you have got to synthesise very quickly and put it all in a very different
language. It is [using] straight medicalese to your colleagues. (S‐Doctor 1)
PC‐Nurse 7 also noted the need she had for more complex, professional language when
handing over a patient to her nurse‐in‐charge, a task she found challenging.
The converse was also true for some past candidates who observed that preparing
for the speaking test made them aware of the need to learn suitable lay terms in order to
interact successfully with patients: ‘I have to convert the professional terms into the layman
terms and I was surprised at how many words are there … my vocabulary got richer when I
was preparing for speaking [test]’ (PC‐Doctor 7). S‐Doctor 5 observed that EAL‐speaking
professionals may not know non‐medical jargon which is appropriately used in talking to
patients.
7. Inter‑/Intra‑professional communication
Related to their observations on the demands of different workplace registers, participants
also considered patient handover a vulnerable workplace communication activity and noted
that it is not represented in the speaking test. PC‐Nurse 3 pointed out that ‘lots of
information can be missed from one person to the next person to the next’ in real‐life
handover situations. One senior nurse in an aged‐care context gave this example of a
communication failure between EAL‐speaking staff:
I will find that we will hand something over and then the next day we will come back
and it has gone through two nurses and we get something completely different (…)
Just say [nurse name] might handover a certain resident is on antibiotics for a UTI
and we will get back that they are on antibiotics for a chest infection. (S‐Nurse 1)
It was also acknowledged, however, that assessing interprofessional communication is
complicated in a language test in terms of determining appropriate professional roles for the
candidate and interlocutor to play, providing sufficient relevant content for the role‐play,
and its assessment by language assessors without training in healthcare.
The seven themes above provide a glimpse of the complex relationship between
test construct, domain construct and real world, as perceived by different stakeholders. We
now discuss these themes in relation to our research questions and the boundary object
metaphor.
V Discussion
In this paper, we have operationalized the notion of tests as boundary objects through
investigating stakeholder worlds and inquiring about the degree to which the test is
operating coherently between them. Documenting the perspectives of various groups with
different relationships to the test in this way invites difference and power into the discussion
of test use so that tensions may emerge (Holland & Reeves, 1994). In answer to our first
research question about how stakeholder groups with diverse relationships to the test
perceive the test content in relation to language use in the workplace, one such tension
emerged. This is represented in Figure 1, which shows the test as Boundary Object (BO)
which is, to some extent, composed of two language use models: a work‐readiness level of
12
language (macro‐skill) proficiency for healthcare purposes on the one hand (BO1), and
aspects of communication skills for patient‐centred care on the other (BO2). Although for
some stakeholders the two constructs are kept relatively separate (e.g. the professional
board view), for some past candidates the two clearly overlap, with patient‐centred
communication strategies emerging as role‐play test strategies that are accorded some
importance (BO3). This blending may be an effect of a degree of incommensurability
between the applied linguistics and health professional notions of communication. That is,
the separability of speaking and listening skills–quite standard in applied linguistics (e.g.
Buck, 2001)–is impossible in communication frameworks associated with patient or person‐
centred care (e.g. Levinson, Lesser, & Epstein, 2010; McCormack & McCance, 2006).
The genesis of these two ‘constructs’ have had relatively separate disciplinary
trajectories (Skelton, 2013), with language tests such as the OET traceable to models of
communicative competence (described in McNamara, 1990). The role of standards as
boundary objects is to make things work together (Bowker & Star, 1999) and, by and large, it
would seem that the two OET sub‐tests presented in this paper supply adequate
authenticity in relation to actual workplace practices. To answer our second research
question about the adequate representation of the language challenges of the workplace,
there appears to be sufficient overlap between the worlds they join with some
discrepancies. In the general evaluation of the OET given by PC‐Doctor 8, ‘It’s a test, but it's
relevant’. This echoes the finding by Elder (2007) that health professionals viewed the OET
as far more relevant to their professional needs than IELTS.
Sufficiency of overlap can be exemplified in the listening note‐taking task (Part A).
While note‐taking is a skill that is perceived as relevant to the target domain, there are
inauthentic features such as the practice of writing while listening (also discussed in Elder,
2007), the lack of selection of note content, the absence of visual cues, the unnaturally quiet
test environment and the inability to direct the interaction to check information as
necessary. Despite this test method interference, participants saw the activity as sufficiently
representative of real‐world practices. That said, there are aspects of the test standard
which would appear to be invisible to the participants, such as the use of short answer or
multiple‐choice questions, which likely bear little resemblance to real‐world activities but
which are now so entrenched in worldwide testing practice that their use is naturalized. It is
also arguable that test tasks that bear closer resemblance to the tasks of the domain are
easier for stakeholders to critique in domain terms. Thus, note‐taking, which was readily
linked to daily healthcare practice by participants, elicits extensive opinion whereas the use
of multiple‐choice questions attracts little comment. Indeed, it is unlikely that these domain‐
related criticisms would be made of IELTS listening tasks because their relationship to the
domain is too distant. So, while stakeholder perceptions are worthwhile in that they invite
user participation in the standard‐maintaining process (Busch, 2011), human perceptions are
laden with cultural bias, naturalized standards included.
High‐stakes standardized tests are also standardizing mechanisms. Tests categorise
people into levels or standards but they also standardize language use by sanctioning
‘appropriate’ varieties, for instance, as judged by raters trained to recognise deviation from
norms deemed appropriate (Shohamy, 2007; Young, 2012). As noted by past candidates in
this study, the accents represented in the test are not representative of the broad range of
accents they would typically encounter in Australian healthcare workplaces from patients
and colleagues, also reported in Oh (2010). This case illustrates the need for a standard to
evolve so that it better reflects, through judicious sampling, the range of accents typically
encountered in the workplace. Doing so would increase the likelihood that candidates are
able to adapt to the different varieties they may later experience in the workplace (Harding,
2014).
13
Another key feature of real‐world communication which participants noted to be
absent from the OET was intra‐/inter‐professional communication. However, including this
in the test has implications for test administration since it is unlikely that language specialists
would be equipped to evaluate the associated register without extensive training. Sampling
this aspect of the workplace requires investment on the part of the test provider to adapt
the standard. Further, both board representatives viewed the tests (OET and IELTS) purely in
terms of general English proficiency. The nursing board representative explained that the
test result gives employers assurance that the registrant has adequate non‐technical
language proficiency with the understanding that any new employee would need to learn
context‐specific language.
However, the view of past candidates, who have to invest time and money in a test
experience, is different. For many, preparing for an LSP test is far preferable because of the
beneficial effects on their profession‐related communication skills. Indeed they were critical
of test topics for not being specific enough. To return to the notion of ‘effect‐driven testing’
(Fulcher & Davidson, 2007) and our final research question about the effects of the OET,
there were several test‐related practices which informants found beneficial once they
entered the workplace, such as the development of note‐taking skills, the perceived
similarity between test role‐plays and real‐world conversing with patients, the modeling of
patient consultations, and other aspects of healthcare practice in Australia (e.g. maintaining
patient privacy). The overall test experience appeared to provide some level of enculturation
into the Australian healthcare context. Thus, LSP test effects might be best evaluated in
terms of post‐test practices, as depicted in Figure 1.
BO3
Adequate
representation?
BO2
BO1
Construct
Workplace
communication
Construct
Test design
PRE‐TEST PRACTICES
POST‐TEST PRACTICES
Effects of test
Figure 1 Boundary object (BO) manifestations and test effects
Figure 1 depicts how test effects have been operationalized in this study. Using the notion of
test as a boundary object that sits between worlds, we have observed two, overlapping
constructs: BO1, the test construct (and its underlying theory of language and test method
overlay) and BO2, the workplace communication construct (and its underlying theory of
practice/communication as well as actual practices in real‐world contexts). The abstract
target language domain is represented by BO3: the degree of overlap between the test
construct and the workplace construct, as perceived by the participants. Note here we
distinguish between BO3, the target domain sampled by the test to which test scores apply,
and the ‘real world’ or actual, un‐sample‐able communication, to which test effects apply.
14
As with all standards, which are always incomplete and inadequate (Star & Lampland, 2009),
complete overlap is always impossible, even undesirable, as the local specificity of real world
sampling would be unfair. However, just as a test validation argument strives for
completeness and coherence, the degree of overlap is crucial in the extrapolation inference
which entails an ‘ambitious leap’ from a statistical sampling procedure to the real world
(Kane, 2013, p. 28).
The between‐ness of LSP tests is their key characteristic, well‐signified in the popular
‘gatekeeping’ metaphor. The boundary object concept allows us to approach validation as a
dialectic pursuit (Akkerman & Bakker, 2011; Moss, 1998), a negotiation over the adequacy
and effects of the test for the social worlds it connects. Adopting this concept to
characterize test impact frees us from seeing the test purely as propelling influence
backwards (implicit in the notion of washback) and instead sees it as exerting influence
potentially into all stakeholder worlds, across the whole candidate trajectory, from test‐
preparation into the workplace. In Figure 1, therefore, we show test effects as bi‐directional,
with the quality of pre‐test and post‐test practices both warranting investigation. Test
developers, providers and enforcers should aim for test effects which are constructive for
the worlds the test interconnects, not just in terms of the pre‐test learning opportunities it
offers candidates, but also the potential impact such learning has on their future
workplaces.
Funding acknowledgement
The project drawn on in this article was commissioned by the OET Centre.
15
VI References
Akkerman, S. F., & Bakker, A. (2011). Boundary crossing and boundary objects. Review of
Educational Research, 81(2), 132‐169.
American Educational Research Association, American Psychological Association, & National
Council on Measurement in Education. (2014). Standards for educational and
psychological testing. Washington, DC: American Educational Research Association.
Bowker, G. C., & Star, S. L. (1999). Sorting things out: Classification and its consequences.
Cambridge, Mass.: The MIT Press.
Brown, A. (1993). The role of test‐taker feedback in the test development process: Test‐
takers' reactions to a tape‐mediated test of proficiency in spoken Japanese.
Language Testing, 10(3), 277‐301.
Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press.
Busch, L. (2011). Standards: Recipes for reality. Cambridge, Mass.: MIT Press.
Cheng, L. (2008). Washback, impact and consequences. In N. H. Hornberger & E. Shohamy
(Eds.), Encyclopedia of Language and Education, Volume 7: Language Testing and
Assessment (pp. 349‐364). Berlin: Springer.
Davies, A. (2008). Ethics, professionalism, rights and codes. In N. H. Hornberger & E.
Shohamy (Eds.), Encyclopedia of Language and Education, Volume 7: Language
Testing and Assessment (pp. 2555‐2569). Berlin: Springer.
Douglas, D. (2000). Assessing languages for specific purposes. Cambridge: Cambridge
University Press.
Douglas, D. (2010). Understanding Language Testing. Abingdon: Routledge.
Douglas, D., & Myers, R. (2000). Assessing the communication skills of veterinary students:
Whose criteria? In A. J. Kunnan (Ed.), Fairness and validation in language
assessment: Selected papers from the 19th Language Testing Research Colloquium,
Orlando, Florida (pp. 60‐81). Cambridge: Cambridge University Press.
Elder, C. Exploring the limits of authenticity in LSP testing: The case of a specific purpose
language test for health professionals. Language Testing.
Elder, C. (2007). OET–IELTS benchmarking study. Report to the OET Centre: Language Testing
Research Centre, University of Melbourne.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource
book. Abingdon: Routledge.
Harding, L. (2014). Communicative language testing: current issues and future research.
Language Assessment Quarterly, 11(2), 186‐197.
Holland, D., & Reeves, J. R. (1994). Activity theory and the view from somewhere: Team
perspectives on the intellectual work of programming. Mind, Culture, and Activity,
1(1‐2), 8‐24.
Jacoby, S., & McNamara, T. (1999). Locating Competence. English for Specific Purposes,
18(3), 213‐241. doi: http://dx.doi.org/10.1016/S0889‐4906(97)00053‐7
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of
Educational Measurement, 50(1), 1‐73.
King, N., & Horrocks, C. (2010). Interviews in Qualitative Research. London: Sage.
Levinson, W., Lesser, C. S., & Epstein, R. M. (2010). Developing physician communication
skills for patient‐centered care. Health Affairs, 29(7), 1310‐1318.
Lewkowicz, J. A. (2000). Authenticity in language testing: some outstanding questions.
Language Testing, 17(1), 43‐64.
Lumley, T., & Brown, A. (1996). Specific‐purpose language performance tests: task and
interaction. Australian Review of Applied Linguistics(13), 105‐136.
Lumley, T., Lynch, B., & McNamara, T. (1994). A new approach to standard‐setting in
language assessment. Melbourne Papers in Language Testing, 3(2), 19‐40.
16
Macqueen, S., Pill, J., Elder, C., & Knoch, U. (2013). Investigating the test impact of the OET:
A qualitative study of stakeholder perceptions of test relevance and efficacy:
Language Testing Research Centre, University of Melbourne.
Makoul, G. (2001). Essential elements of communication in medical encounters: the
Kalamazoo consensus statement. Academic Medicine, 76(4), 390‐393.
Manias, E., & McNamara, T. Health professionals' views of professional readiness in the
standard setting context: Tensions between language and communication.
Language Testing.
McCormack, B., & McCance, T. V. (2006). Development of a framework for person‐centred
nursing. Journal of advanced Nursing, 56(5), 472‐479.
McNamara, T. (1990). Assessing the second language proficiency of health professionals.
(PhD thesis), University of Melbourne, Melbourne.
McNamara, T. (2006). Validity in language testing: The challenge of Sam Messick's legacy.
Language Assessment Quarterly, 3(1), 31‐51.
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Malden:
Blackwell Publishing.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241‐
256.
Moss, P. A. (1998). The role of consequences in validity theory. Educational Measurement:
Issues and Practice, 17(2), 6‐12.
O'Neill, T. R., Buckendahl, C. W., Plake, B. S., & Taylor, L. (2007). Recommending a nursing‐
specific passing standard for the IELTS examination. Language Assessment
Quarterly, 4(4), 295‐317.
Oh, H. (2010). Overseas health professionals’ perspectives on the OET. (Unpublished
Master’s Thesis), University of Melbourne.
Purpura, J. (2008). Assessing communicative language ability: Models and their components.
In N. H. Hornberger & E. Shohamy (Eds.), Encyclopedia of Language and Education,
Volume 7: Language Testing and Assessment (pp. 53‐68). Berlin: Springer.
Read, J., & Wette, R. (2009). Achieving English proficiency for professional registration: The
experience of overseas‐qualified health professionals in the New Zealand context
IELTS Research Reports (Vol. 10).
Shohamy, E. (2004). Assessment in multicultural societies: Applying democratic principles
and practices to language testing. In B. Norton & K. Toohey (Eds.), Critical
pedagogies and language learning (pp. 72‐92). Cambridge: Cambridge University
Press.
Shohamy, E. (2007). Language tests as language policy tools. Assessment in Education, 14(1),
117‐130.
Skelton, J. (2013). English for medical purposes. In C. A. Chapelle (Ed.), The Encyclopedia of
Applied Linguistics (pp. 1‐7): Blackwell Publishing Ltd. doi:
10.1002/9781405198431.wbeal0379
Star, S. L., & Griesemer, J. R. (1989). Institutional ecology, translations and boundary objects:
Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907‐39.
Social studies of science, 19(3), 387‐420.
Star, S. L., & Lampland, M. (2009). Reckoning with Standards. In M. Lampland & S. L. Star
(Eds.), Standards and their stories: How quantifying, classifying and formalizing
practices shape everyday life (pp. 3‐24). Ithaca: Cornell University Press.
Woodward‐Kron, R., & Elder, C. Authenticity in an English language screening test for
clinicians: A comparative discourse study. Language Testing.
Woodward‐Kron, R., & Elder, C. (2015). Authenticity in an English language screening test for
clinicians: A comparative discourse study. Language Testing.
17
Young, R. F. (2012). Social dimensions of language testing. In G. Fulcher & F. Davidson (Eds.),
The Routledge handbook of language testing (pp. 178‐193). Abingdon: Routledge.
18