Language test as boundary object: Perspectives from test users in the healthcare domain

Susy Macqueen; John Pill; Ute Knoch

This is a near final draft of the article which appears as Macqueen, S., Pill, J., & Knoch, U. (2016). Language test as boundary object: Perspectives from test users in the healthcare domain. Language Testing, 33 (2), 271-288. Language test as boundary object: Perspectives from test users in the healthcare domain Susy Macqueen University of Melbourne John Pill American University of Beirut Ute Knoch University of Melbourne Abstract Objects that sit between intersecting social worlds, such as Language for Specific Purposes (LSP) tests, are boundary objects – dynamic, historically‐derived mechanisms which maintain coherence between worlds (Star & Griesemer, 1989). They emerge initially from sociopolitical mandates, such as the need to ensure a safe and efficient workforce or to control immigration, and they develop into standards (i.e. stabilized classifying mechanisms). In this article, we explore the concept of LSP test as boundary object through a qualitative case study of the Occupational English Test (OET), a test which assesses the English proficiency of healthcare professionals who wish to practise in English‐speaking healthcare contexts. Stakeholders with different types of vested interest in the test were interviewed (practising doctors and nurses who have taken the test, management staff, professional board representatives) to capture multiple perspectives of both the test‐taking experience and the relevance of the test to the workplace. The themes arising from the accumulated stakeholder perceptions depict a ‘boundary object’ which encompasses a work‐readiness level of language proficiency on the one hand and aspects of communication skills for patient‐centred care on the other. We argue that the boundary object metaphor is useful in that it represents a negotiation over the adequacy and effects of a test standard for all vested social worlds. Moreover, the test should benefit the worlds it interconnects, not just in terms of the impact on the learning opportunities it offers candidates, but also the impact such learning carries into key social sites, such as healthcare workplaces. Keywords: Language for Specific Purposes (LSP) tests, boundary object, standards, test impact, authenticity, validity 1 High‐stakes language tests are frequently positioned metaphorically as gates into socioeconomically desirable domains. Powerful stakeholders are construed as gatekeepers who control the flow by setting a standard for entry in the form of a test score. In this article, we expand this popular metaphor to a more holistic notion of tests as boundary objects which are brought about by socioeconomic needs. We first explore the theoretical insights offered by the concept of boundary objects and we then apply it to a case study of the relevance and impact of a high‐stakes specific purpose language test, the Occupational English Test (OET). I Boundary objects Boundary objects are dynamic, historically‐derived mechanisms which develop in order to maintain coherence between intersecting social worlds (Star & Griesemer, 1989). They are constructed from a pooling of information at the intersections of the participating social worlds (Star & Griesemer, 1989). Boundary objects often become standards. Broadly speaking, standards are stabilised classifying mechanisms which cohere aspects of social structure and are pervasive in modern life; the ‘raw materials out of which social order is constructed’ (Busch, 2011, p. 42). Standards come to be recognised as the way things are (done) in societies, such as what to do at a red traffic light, the length of a phone number and how to gain citizenship of a country. It is a feature of the standard‐saturated modern world that the genesis and reasoning behind classificatory systems get forgotten as standards are naturalised in social use (Bowker & Star, 1999; Busch, 2011). Tests of Language for Specific Purposes (LSP) can be considered boundary objects, developing as they do from a social mandate, such as a need to ensure a national healthcare workforce can communicate to a level that is deemed adequate for patient safety and workplace efficiency. This need brings about a classificatory infrastructure; LSP tests categorise and differentiate people in terms of how ready they are to participate linguistically in specific discourse communities. They do this by using standardized language sampling methods which, ideally, elicit language forms and structures that are relevant to the target social sphere and are judged using methods that are also congruent with ways in which communication is evaluated in the target domain (Douglas, 2000; Jacoby & McNamara, 1999). How meaningful or relevant these samples and the attendant classificatory infrastructure (e.g. criteria, scoring, cut scores) are to the people within the boundaries of the target sphere as well as to those hoping to enter it is a key question for research on the impact of tests. Test impact and authenticity Current conceptualisations of validity have led to increased attention to the social context in which tests are implemented, to the values underpinning test constructs and to how test scores are interpreted, used and perceived by different stakeholders (McNamara & Roever, 2006). Such thinking makes way for including test consequences as a necessary part of validation programs (Cheng, 2008). Fulcher and Davidson (2007) argue for an effect‐driven approach to test development whereby impact on stakeholders not only informs design decisions but is the basis for ongoing monitoring of test validity. They define effect‐driven testing as ‘a blending of procedure with fundamental virtue, and the fundamental virtue is simple: think about the intended beneficial impact as the test is built, and be willing to knock it down when things change’ (p. 177). Standards theorist, Lawrence Busch, similarly advises that the balance of benefits and costs be considered at the design stage of a standard. Ultimately standards must not affect social conditions in ways which restrict the democratic rights of those judged by them (Busch, 2011; Shohamy, 2004). In the Standards for Educational and Psychological Testing (AERA, APA, NCME, 2014) sections are devoted to avoiding negative consequences of testing and to determining the likelihood of intended 2 benefits, both of which can be informed by stakeholder perspectives (including test‐taker experiences) as well as the congruence with target domain. To this end, impact studies ‘enable us to re‐evaluate and make explicit not just the standards we promote but the very view of language we take for granted’ (Davies, 2008, p. 440). Thus, the role of test impact research is to challenge the taken‐for‐grantedness of the standard by interrogating its rationale and examining its effects. The meanings of LSP tests, like those of other standards, are found between groups, and it is by studying the adequacy of their between‐ness that we can evaluate whether an appropriate informational balance has been struck (albeit always incomplete and inadequate, to some degree (Star & Lampland, 2009)). Validation, as Moss (1998) puts it, ‘entails ongoing evaluation of the dialectical relationship between the products and practices of testing, writ large, and the social reality that is recursively represented and transformed’ (p. 11). Adequate authenticity is at the centre of such ongoing evaluations, because it is most likely to promote beneficial test preparation practices relevant to the future domain of practice (Messick, 1996) and consequently, most likely to be beneficial to the user worlds. That said, authenticity alone is not sufficient in test design. Needs analysis processes prioritise domain tasks and skills which should be represented on a test (Douglas, 2010). Thus, effect‐driven LSP test design involves considering the relative importance of test tasks as well as their degrees of authenticity, with test effects and test content requiring periodic checks to ensure the test and its social worlds are operating coherently. Such investigations involve determining the relationship between the test construct and the criterion domain, both of which are generated in the test development process (McNamara, 2006). As McNamara has phrased it, ‘the criterion domain is itself an abstraction, a construct in need of validation itself’ (p. 36). Many performance tests, for instance, are underpinned by models of communicative competence (see Purpura, 2008). Target domain communication may be shaped by frameworks for communication developed from professional philosophies which are upheld through education and professional policy, such as patient‐centred care, espoused in many English‐speaking healthcare workplaces (e.g. Makoul, 2001). Messick (1996) articulated the ideal of authentic assessments as those which ‘pose engaging and worthwhile tasks (usually involving multiple processes) in realistic settings or close simulations so that the tasks and processes, as well as available time and resources, parallel those in the real world’ (p. 243). Even with careful analysis and sampling of the practices of the real world, the development process of a standardized test can bleach tasks and processes of authentic characteristics to the extent that the test tasks that result may be inadequate simulations (Lewkowicz, 2000). It is helpful for the sake of investigating test effects to consider the ‘target language domain’ as an abstraction that informs test design in contrast to actual real‐world practices, which are too contextually complex and locally detailed to be applied in a standardized test. Impact studies can raise awareness of the degree of commensurability between theoretical models which contribute to the test instrument and those which underlie domain practices as well as the congruence between the generalised target language domain and actual communication practices in relevant local contexts. Stakeholder voices Stakeholder perceptions can be insightful in terms of both the degree and the importance of authenticity (Lewkowicz, 2000). LSP test stakeholders, whether test takers or institutional score users, are cast as ‘domain experts’ by virtue of their workplace experience and/or training. Accordingly they are seen as uniquely placed to articulate what matters for effective communication in the relevant language use situation (Douglas, 2000; Douglas & Myers, 2000; Jacoby & McNamara, 1999; see also other papers in this issue) to give feedback on the appropriateness of a test’s content and format (Brown, 1993; Lumley & 3 Brown, 1996), to decide what level of test performance is sufficient for effective functioning in this domain (Lumley, Lynch, & McNamara, 1994; O'Neill, Buckendahl, Plake, & Taylor, 2007 and other papers in this issue) and, in general, to determine whether a test is achieving its intended effects. Three relevant studies of test‐taker perceptions involve comparisons between the OET, an LSP test, and the IELTS, a general/academic English proficiency test. These two tests have been used interchangeably as health care registration standards in Australia and New Zealand for some years. In an exploratory study by Elder (2007), 53 non‐native‐English‐ speaking health professionals took both the OET and IELTS in close succession. A comparison of band score/grade frequencies on each test revealed that while overall ‘pass’ rates on each test were comparable (i.e., similar numbers of candidates received an overall Band 7 or above on IELTS and a B grade or above on the OET sub‐tests), the candidates viewed these two tests rather differently, as indicated in their post‐test survey responses. The OET reading, writing and speaking components were perceived to be somewhat easier than their IELTS counterparts, whereas the reverse was the case for OET listening, which posed challenges on account of the Australian‐accented input and the need to take notes while listening (see below for a description of the listening sub‐test tasks). The perceptions of relative difficulty were not, however, reflected in obvious score differences on the two tests, to the extent that these scores were comparable. While many candidates accepted the IELTS as a valid measure of their general English proficiency, there was almost unanimous agreement that the OET was a better measure of their professional competence and hence more relevant to their needs. This suggests that health specificity in the boundary mechanism allows for greater coherence between worlds. However, a study by Read and Wette (2009) showed that practical issues are also weighed up in the evaluation of test options. In their study, which elicited comparative feedback on the relevance and utility of the IELTS and OET from 23 overseas‐trained health professionals enrolled for a six‐month English course for health professionals in New Zealand, there was a slight preference for the OET on the grounds of its health‐related content and the fact that candidates could, at that time, take single components of the test where they had failed to reach the required standard, rather than re‐sitting all four sub‐tests as required for the IELTS. The IELTS, on the other hand, was seen as more affordable and easier to prepare for given greater availability of practice materials. However, the authors point out the candidates’ perceptions varied somewhat over time such that ‘by the third interview many health professionals seemed to have realised that both IELTS and the OET are tests of English language proficiency, not of clinical knowledge and therefore the medical content of the OET became less important as a factor determining their choice of a test’ (Read & Wette, 2009, p. 39). In a study by Oh (2010), the views of 18 international medical graduates (IMGs) were canvassed after they had taken and passed the OET and had experience in the workforce, as is the case with the current study. Interestingly, all but the most able participants (who had passed the OET after only a one‐week period of preparation) held the view that preparing for the exam helped them develop effective communication skills for their profession. Opinions of test tasks and their relevance to the medical domain were generally favourable, although the listening test was criticised for its failure to represent non‐native accents encountered in the workplace. Some also held the view that the note‐ taking demands on the listener were too heavy and distracted from the processing of new information. Among the limitations of the OET noted by respondents was its failure to sample communication with colleagues and its lack of general English content relevant to daily conversation. Taken together, these studies show that test‐takers are affected by multiple aspects of a standard (specificity, timing, cost, availability of practice materials) and that effects may vary depending on the degree to which the candidate engaged with test preparation. In the present study, we sought to build on these studies of test‐taker 4 perceptions by eliciting multiple stakeholder perspectives on the test as a boundary mechanism. II Occupational English Test (OET) The Occupational English Test (OET) is ‘an international English language test that assesses the language and communication skills of healthcare professionals who seek to register and practise in an English‐speaking environment’.1 In this paper, we draw from a larger study (Macqueen, Pill, Elder, & Knoch, 2013) to exemplify the effects and meanings of the OET as a boundary object. We focus on two of the four OET sub‐tests, the Speaking and the Listening sub‐tests, because these offer complementary perspectives for a productive and a receptive skill and because these skills are common to many communicative language performance tests. A brief description of the sub‐tests is below (see Elder, this issue, for further detail). Speaking sub‑test The speaking sub‐test is a role‐play format in which the candidate takes the role of clinician and the interviewer takes the role of patient/carer/client for two different role‐play scenarios, each lasting five minutes.2 The test is profession‐specific, which means that nurses, for instance, will be given situations that are considered typical of nursing duties. Candidate performances are assessed against five criteria: Overall Communicative Effectiveness, Intelligibility, Fluency, Appropriateness of Language, and Resources of Grammar and Expression. (Although new criteria are under review for the test, this study is based on the criteria currently in operation.) Listening sub‑test The listening sub‐test is common to all 12 professions and is 50 minutes long. There are two parts. The first (Part A) is a recording of a consultation between a professional and a patient/client during which candidates take notes under supplied headings. An example is a doctor consulting with a patient about his back injury; candidates supply notes under headings such as ‘history of recent incident’, ‘details of pain’ ‘treatment’, ‘further details of problem’, ‘doctor’s initial comments’ and ‘results of the examination’. The second (Part B) is an audio recording of a talk on a general medical topic during which candidates complete short answer and fixed‐response questions. An example for this part is a nurse talking about the incidence of congenital heart disease in newborns in Australia.3 III Method We explore, through the qualitative analysis of interview data, the nature of the test as it is perceived by participants with diverse relationships to the test: those involved in setting the standard (professional bodies), those who occupy the target domain (supervisors in the workplace) and ‘boundary crossers’ as classified by the test (successful test‐takers who have achieved the standard and since entered the target domain). The largest OET candidate groups are doctors and nurses and it is these professions which formed the focus of the study. Our overall aim was to determine how well the test functions as a standard for work‐ readiness. Specifically, we posed the following questions. 1 http://www.occupationalenglishtest.org 2 A sample role‐play for each profession can be found at: http://www.occupationalenglishtest.org/Display.aspx?tabid=2425 3 A sample listening test can be found at http://www.occupationalenglishtest.org/display.aspx?tabid=2425 5 Research questions 1. How do stakeholder groups with diverse relationships to the test perceive the test content in relation to language use in the workplace? 2. Are the language challenges of the workplace adequately represented in the test? 3. What are the effects of the test as boundary object? The interviews were semi‐structured to enable an in‐depth, contextualised exploration of relevant issues as they arose. They were approximately 30 to 45 minutes long. Interviewees were asked to comment on the relevance of the OET content to the workplace and on the language demands of the workplace for users of English as a second language. Participants who had not taken the test (i.e. board members and most supervisors) were provided with a description of each sub‐test and sample test papers so they could comment on specific tasks. Past candidates were also asked to fill out a background questionnaire, which included information about their OET scores, test preparation methods, linguistic and cultural background, professional occupation and experience, and general opinion of their test experience. The questionnaire and semi‐structured interview data were filtered through a grounded analytical process to elucidate stakeholder perceptions of test effectiveness, relevance and impact. The interview data were transcribed and analysed thematically by two researchers using NVivo, computer software designed to support qualitative coding. The analysis was an iterative process (King & Horrocks, 2010) in which descriptive themes were identified and interpreted in relation to the research aims. Participants As mentioned above, three stakeholder groups were interviewed for this study. These are described below. 1) Senior representatives of the two relevant professional bodies (i.e. national medical and nursing & midwifery boards) 2) Supervisors of EAL (English as an Additional Language) health professionals Five senior nurses and six senior doctors were interviewed, all of whom had several years’ experience in their profession. These participants were either educators and/or in senior institutional roles. Two nurses were co‐ordinators at aged‐care facilities (one urban and one rural) and one of these had considerable experience in remote hospital contexts. One doctor was also a past OET candidate and so was able to offer the perspectives of both past candidate and supervisor. 3) Successful OET candidates who are currently employed in their health professions The past candidates interviewed were employed in a variety of contexts and had a broad range of experience. The sample comprised 10 nurses (7 working in hospitals and 4 in Aged Care, including one in both) and 8 doctors (2 working in medical centres and 6 in hospitals, at least 3 of which were regional). The participants all had work experience in Australian healthcare contexts, enabling them to comment on workplace communication practices; the mean time the past candidates had worked in Australia was just under 12 months. The mean time the past candidates had worked in their professions in any context was 4 years 8 months. All past candidates had achieved a grade B or higher on all four sub‐tests of the OET, with most achieving grade B. Thirteen of the nineteen past candidates had also taken IELTS and were therefore able to compare their experiences of the two tests. The 6 participants spoke a variety of first languages, representing the diversity of overseas‐trained health professionals who practise in Australia: Arabic (3), Bangla (1), Burmese (1), Chinese (4), Filipino (1) Gujarati (1), Hindi (1), Korean (1), Punjabi (2), Russian (1), Sinhala (1). IV Findings The findings are presented as seven themes which emerged in the analytic process. The themes (listed below) have been selected from the broader study for what they reveal about the test as a boundary object and a professional registration standard. General 1. Communication skills vs language proficiency 2. Specificity of test materials Listening 3. Two types of listening 4. Note‐taking in the listening test Speaking 5. Roleplay in the speaking test 6. Switching register 7. Inter‐/intra‐professional communication Extracts are attributed using a code indicating whether the interviewee is a supervisor (S) or past candidate (PC), the interviewee’s profession (Nurse or Doctor) and an identification number. 1. Communication skills vs language proficiency A persistent theme in the interviews involved participants distinguishing the healthcare construct of ‘communication skills’ from the applied linguistics construct of ‘language proficiency’ (see also Manias & McNamara). The relevance of this distinction was evident in responses from all groups of participants. The medical board representative, for example, referred to the limits of the language test purpose: we don’t think it’s testing clinical communication skills … there are many native English speakers who don't have good communication skills in a clinical context … if the testing is congruent with practice, that’s terrific, but we shouldn’t be relying on that. Similarly a senior doctor who was also a past candidate observed that aspects of spoken communication such as ‘whether you are caring, whether you are sympathetic, whether you understand what the patient’s need is’ should not be tested in a language test because they are ‘well above the level of basic communication skills’ (S‐Doctor 6). Another past candidate described the workplace communication as requiring ‘something on top of the language itself … it is a lot about interpersonal skills in hospital so you shouldn’t just do the task, you are not just task orientated, you should also relate to, build a rapport with the patient’ (PC‐ Nurse 3). Senior doctors and nurses noted the importance of checking a patient’s comprehension as a part of the health professional–patient interaction, while past‐ candidates observed that body language was an important component of workplace communication that appeared to be absent from the OET. Participants’ comments indicate that a distinction between speaking and listening skills is not particularly relevant to them; ‘communication skills’ appears to combine the ‘macro’ skills that are considered separate for test purposes. 7 Despite the fact that the test (in its current form) does not focus explicitly on assessing clinical communication skills, some past‐candidates assumed this was part of the test construct. As one informant observed, the specific‐purpose characteristics of the test are not easily separated from ‘pure’ language assessment: ‘it is also a test on understanding, listening skills … whether you are able to provide reassurance, you have a reassuring technique, whether you have advising skills, it is a little bit more comprehensive communication skills’ (PC‐Nurse 1). PC‐Nurse 9 explained that the first time she took OET she focused only on language. When she failed, she reconsidered her stance and decided for her subsequent attempt that focusing on communication (‘to listen to the patient, what she really wants’) was what was required. Indeed, several informants indicated how such attention to appropriate communication in context helped them succeed on the OET. 2. Specificity of test materials The issue of specificity of test materials was raised by some past candidates. One nurse commented that some role‐play scenarios in the speaking test would most likely not be encountered by a junior nurse with little prior experience, and PC‐Nurse 6 stated that the test content is ‘beyond what I do in terms of my communication [in current employment] … it is ahead of what is basic’. On the other hand, another nurse pointed out that the scenarios are limited to ‘medical ward and surgical ward’ and so do not match his particular experience as a psychiatric nurse (PC‐Nurse 4). There was a general acknowledgement that test takers will perform better if the topic they are given in the role‐play is something they are familiar with; therefore, professional knowledge is seen to facilitate performance even if that knowledge is not being assessed directly in the test. Issues of specificity of topic also arose regarding both parts of the listening test. PC‐ Doctor 7 points out that the test topics are not all directly related to the medical field and include other areas of health practice; she would prefer it if they were always closely linked with medical practice. In contrast, a nurse participant (PC‐Nurse 3) noted that the topics are rarely from a nursing context; she felt the topics are quite similar to some of the topics used in IELTS (which is not a specific‐purpose test) and did not relate particularly to her experience of workplace communication. Another doctor felt the Part A consultations were too often based in general practice. Despite these qualms, the questionnaire data showed that almost all past‐ candidates had chosen to take the OET because it was related to their profession and, in the interviews, informants generally characterised the OET as complementary to the professional exams they must also take, e.g., the Australian Medical Council’s clinical examination (PC‐Doctor 6).4 Two participants indicated that they perceived the OET as being easier because it was more related to their professions. As one wrote in the questionnaire, ‘IELTS can be very tricky esp. when talking about rare kinds of animals, or fish, etc.’ Participants suggested that dealing with an unfamiliar medical topic on the OET was preferable to dealing with an unfamiliar topic from any field (in IELTS). In the view of PC‐ Doctor 6, IELTS imposes a double burden of content and language knowledge: For any English exam there are two major informations you need to know: the English itself, and some general knowledge. If I sit for exam with OET – so I already [have] the medical knowledge, the general knowledge. So the rest of the exam will be English so I can handle it later on. But in IELTS I should handle both of them. Past candidates’ comments also indicated how preparing to take the OET helped set their expectations of professional life and the workplace: ‘it was a learning experience not just 4 Thirteen of the nineteen past candidates had also taken IELTS. (One past candidate was also a supervisor). 8 about language but also a little bit about the culture … of nursing in English speaking contexts’ (PC‐Nurse 10). PC‐Nurse 1 stated that the OET is ‘more functional … more practical … we are more familiar’; the test is related to the working environment: ‘I chose OET because it is more related to what I do and what I want to do here in Australia and that is the purpose of it’ (PC‐Nurse 6). 3. Two types of listening For the senior staff in the study, listening can be divided into two overlapping types: 1) listening as a clinical strategy (communication skills) and 2) listening comprehension as part of clinical process. This can be seen as a specific instance of the distinction drawn in the first theme – communication skills vs. language proficiency. Both types of listening might pose problems for EAL staff; one doctor reported that complaints about non‐English‐speaking staff were usually because clients felt the staff members were not listening to them or did not understand them (S‐Doctor 1). S‐Doctor 4 explained the importance of listening as a clinical strategy: ‘often listening is the biggest task that you have to do when you are talking to a patient – it is not all about information transfer from you to them, particularly … if a patient is upset about something or you have got to tell them something terrible.’ This kind of psychologically strategic listening, however, may be seen as a professional skill, inappropriately tested as a language skill; the same doctor concluded that this type of listening was less a language skill than a clinical one: ‘It is more content in a way because that is the appropriate clinical skill at this point, just to actively listen.’ All senior staff described aspects of listening as a part of clinical process for comprehending and recording information, as well as for decision‐making. This second type of listening is tapped into in the OET listening test; senior staff saw clear links between Part A (note‐making during a consultation) and the workplace, as exemplified by S‐Doctor 6, a medical educator and past candidate: So that is what you are doing every day. You need to pick up the information from patients … whether you are actually capturing the correct answer or the correct information. … So you need to see whether they actually pick up all the important essential information from that patient. One past candidate doctor stated that the consultation (Part A) can also provide a model for how a health professional should speak and noted that ‘my ears become more sharper’ while preparing for the listening test. The listening test is found to provide exposure to accents that is relevant for test takers in Australia. PC‐Nurse 7 noted, however, that the test does not cover the range of accents, slang or abbreviations encountered to the extent she would like. The variety of accents present in most workplaces and the fact that this is not represented in the listening test was highlighted by several participants. 4. Note‑taking in the listening test Writing skills are incorporated in the listening test in the form of note‐taking (Part A) and short answer questions (Part B). While the note‐taking task was perceived as an authentic combination of skills, its timing and structure drew some criticism in terms of authenticity. Several senior professionals reported that notes were generally not taken during the consultation process, where clinical strategies such as attention to visual cues and rapport‐ building took precedence. For example: 9 You should maintain rapport with the patient. Look at them when they are talking so you can pick up on cues including if you are examining them you look at their face if you are feeling their abdomen to see if they are in pain because they won’t tell you. You have to take all that in and if you are busy writing notes you can’t do that. (S‐ Doctor 4) Comparing the workplace and the OET listening test note‐taking task, which uses an audio‐ recording, a past‐candidate noted how real‐life consultation listening was easier because information can be checked and repeated back (PC‐Doctor 3). Describing work in an Emergency Department, PC‐Doctor 5 explained: ‘you ask a question and they answer … and literally when you start writing the patient pauses and waits for you to finish your writing, then when you raise you head they continue’. Although note‐taking might not ideally occur while a patient is talking during a consultation, several senior professionals reported that note‐taking while listening to a conversation occurred in ward rounds with senior team members. Other challenges of the real world are indicated in the description by S‐Doctor 4, a medical educator in a major urban hospital, who describes the ‘aural nightmare’ in which such notes might be taken: …on that ward round they [international medical graduates] won’t be given enough time to do the tasks that they are supposed to be doing. So they will be balancing the folder, writing out the notes about what is happening while simultaneously being expected to write a radiology request or a pathology request, look at the medication chart, look at the fluid balance chart, take in everything that is being said, write their own notes for what jobs they might have to do later. Already you see this is an impossible task, meanwhile the interaction between the patient and the registrar will be taking place in a crowded and busy ward room with other patients having loud conversations with their relatives or just yelling, machines will be beeping and maybe another ward round is taking place in the same room. It is an aural nightmare really, if you can’t filter then you are going to have a lot of trouble. Further, senior doctors observed that real‐world note‐taking is not indiscriminate: ‘you will be writing notes about what was talked about and what was decided’ (S‐Doctor 5). Thus, there is a degree of selection and transformation of the information from conversation/consultation to key issues and plan. In the listening test, however, candidates write notes for headings which are provided on the test paper and the text (dialogue) is presented heading by heading. S‐Doctor 6 indicated that the test text should require candidates to present only the relevant information in a consultation. S‐Nurse 4 observed that one way to demonstrate that effective listening had taken place would be for candidates to provide appropriate headings for their notes: It seems like [test designers] have almost given people the answers by putting the headings [in the Part A test booklet] (…) whereas I would imagine that if I am a Registered Nurse with some background in practice that I should be able to come up with my own headings from the dialogue or from watching for interactions. Note‐taking to record what has been done appears to be important in much nursing work and PC‐Nurse 6 recognised the test ‘helping me jot down notes, make abbreviations like really quickly during like telephone call from the doctor’. PC‐Nurse 10 learnt to focus only on important words as a strategy to understanding, while PC‐Nurse 5 felt her ability to write quickly helped with the listening task (viewing it as a dictation exercise). 10 On the whole, the Part A task (listening to a consultation and taking notes) was more readily linked by the participants to typical workplace activities than the Part B task (listening to a monologue and responding to questions). Nevertheless, participants considered Part B as linked to listening to lectures or talks and video teaching in the workplace in which ‘you just pick up the facts and the most important words and figures’ (PC‐Doctor 1). 5. Role‑play in the speaking test Perhaps due to the strong tradition of simulation assessment in Australia (see Woodward‐ Kron & Elder, this issue), doctors and nurses at supervisory level generally indicated that patient–clinician role‐plays were highly relevant to workplace communication and viewed as a familiar assessment method. Both medical and nursing educators reported that communication skills are inherent in assessments that involve simulated patients. One medical educator reported that professionals who have trained overseas may only have been assessed through written exams. Therefore, having this assessment genre represented in the language test is congruent with the healthcare context and the learning culture that surrounds it. Past candidate informants also recognised a general closeness between the OET role‐play situations they practised when preparing to take the test and their workplace experience; one commented that ‘sometimes I found that my patients really talked like from OET preparation’ (PC‐Nurse 7). They noted clinical aspects of the role‐plays and how these are familiar to them from their everyday practice of healthcare. They also pointed out how OET role‐play scenarios raised their awareness of cultural and professional expectations, e.g., about privacy of information (PC‐Nurse 4). Especially among the nurses interviewed, it appears that preparation for the OET speaking test helped with confidence: ‘I need to greet the patients and I have to give explanation and I give advice and reassure patient. So that … preparation of OET helped me everyday conversation with patients and also yeah I could practise lots of different topics when I prepared OET’ (PC‐Nurse 7). Despite the previous comments, authenticity of roles was a point of discussion. One medical educator observed that role‐play assessments in general offer ‘a clean example of an interaction, which is unrealistic’. In real life, he noted, patients are more usually challenging in some regard, e.g., they are talkative, hostile or confused (see also Woodward‐ Kron & Elder, this issue, about the consensual orientation of the OET role‐plays). However, another doctor pointed to the complexity of involving more realistic examples: ‘you [test developer] would have to pick which kind of difficult communicating you were going to test them on. Are you going to test them on a person who is reticent and quiet and you are trying to draw them out or the verbose person or the angry one?’ (S‐Doctor 1). Another doctor felt this kind of interpersonal discourse management was more appropriately assessed as part of clinical skills than language skills (see theme 1 above). Past candidates commented on the importance of the co‐operation of interlocutors involved in the speaking sub‐test to support their demonstration of appropriate communication skills through the roleplays. PC‐Doctor 1 described how, ‘if [the interlocutors] ask questions and they’re involved in the conversations and they sort of participate in the test actively, it really helps motivate us to do our role as well, appropriately as well, and it’s healthy because it means you flow nicely’. Some informants felt that poor training was apparent for particular interlocutors who lacked these necessary skills (see also Woodward‐Kron & Elder, this issue). 6. Switching register Many participants noted that speaking in healthcare settings crucially involves speaking to other healthcare professionals. One doctor noted this gap in the speaking test tasks, 11 describing the process of synthesis and translation between registers that occurs in inter‐ professional interactions: I think it would be interesting to hear them take the same data from the doctor– patient [interaction] and have a doctor–doctor. Either listening to them make a referral or go to their senior saying, “I am worrying about this man/lady because of x.” Because you have got to synthesise very quickly and put it all in a very different language. It is [using] straight medicalese to your colleagues. (S‐Doctor 1) PC‐Nurse 7 also noted the need she had for more complex, professional language when handing over a patient to her nurse‐in‐charge, a task she found challenging. The converse was also true for some past candidates who observed that preparing for the speaking test made them aware of the need to learn suitable lay terms in order to interact successfully with patients: ‘I have to convert the professional terms into the layman terms and I was surprised at how many words are there … my vocabulary got richer when I was preparing for speaking [test]’ (PC‐Doctor 7). S‐Doctor 5 observed that EAL‐speaking professionals may not know non‐medical jargon which is appropriately used in talking to patients. 7. Inter‑/Intra‑professional communication Related to their observations on the demands of different workplace registers, participants also considered patient handover a vulnerable workplace communication activity and noted that it is not represented in the speaking test. PC‐Nurse 3 pointed out that ‘lots of information can be missed from one person to the next person to the next’ in real‐life handover situations. One senior nurse in an aged‐care context gave this example of a communication failure between EAL‐speaking staff: I will find that we will hand something over and then the next day we will come back and it has gone through two nurses and we get something completely different (…) Just say [nurse name] might handover a certain resident is on antibiotics for a UTI and we will get back that they are on antibiotics for a chest infection. (S‐Nurse 1) It was also acknowledged, however, that assessing interprofessional communication is complicated in a language test in terms of determining appropriate professional roles for the candidate and interlocutor to play, providing sufficient relevant content for the role‐play, and its assessment by language assessors without training in healthcare. The seven themes above provide a glimpse of the complex relationship between test construct, domain construct and real world, as perceived by different stakeholders. We now discuss these themes in relation to our research questions and the boundary object metaphor. V Discussion In this paper, we have operationalized the notion of tests as boundary objects through investigating stakeholder worlds and inquiring about the degree to which the test is operating coherently between them. Documenting the perspectives of various groups with different relationships to the test in this way invites difference and power into the discussion of test use so that tensions may emerge (Holland & Reeves, 1994). In answer to our first research question about how stakeholder groups with diverse relationships to the test perceive the test content in relation to language use in the workplace, one such tension emerged. This is represented in Figure 1, which shows the test as Boundary Object (BO) which is, to some extent, composed of two language use models: a work‐readiness level of 12 language (macro‐skill) proficiency for healthcare purposes on the one hand (BO1), and aspects of communication skills for patient‐centred care on the other (BO2). Although for some stakeholders the two constructs are kept relatively separate (e.g. the professional board view), for some past candidates the two clearly overlap, with patient‐centred communication strategies emerging as role‐play test strategies that are accorded some importance (BO3). This blending may be an effect of a degree of incommensurability between the applied linguistics and health professional notions of communication. That is, the separability of speaking and listening skills–quite standard in applied linguistics (e.g. Buck, 2001)–is impossible in communication frameworks associated with patient or person‐ centred care (e.g. Levinson, Lesser, & Epstein, 2010; McCormack & McCance, 2006). The genesis of these two ‘constructs’ have had relatively separate disciplinary trajectories (Skelton, 2013), with language tests such as the OET traceable to models of communicative competence (described in McNamara, 1990). The role of standards as boundary objects is to make things work together (Bowker & Star, 1999) and, by and large, it would seem that the two OET sub‐tests presented in this paper supply adequate authenticity in relation to actual workplace practices. To answer our second research question about the adequate representation of the language challenges of the workplace, there appears to be sufficient overlap between the worlds they join with some discrepancies. In the general evaluation of the OET given by PC‐Doctor 8, ‘It’s a test, but it's relevant’. This echoes the finding by Elder (2007) that health professionals viewed the OET as far more relevant to their professional needs than IELTS. Sufficiency of overlap can be exemplified in the listening note‐taking task (Part A). While note‐taking is a skill that is perceived as relevant to the target domain, there are inauthentic features such as the practice of writing while listening (also discussed in Elder, 2007), the lack of selection of note content, the absence of visual cues, the unnaturally quiet test environment and the inability to direct the interaction to check information as necessary. Despite this test method interference, participants saw the activity as sufficiently representative of real‐world practices. That said, there are aspects of the test standard which would appear to be invisible to the participants, such as the use of short answer or multiple‐choice questions, which likely bear little resemblance to real‐world activities but which are now so entrenched in worldwide testing practice that their use is naturalized. It is also arguable that test tasks that bear closer resemblance to the tasks of the domain are easier for stakeholders to critique in domain terms. Thus, note‐taking, which was readily linked to daily healthcare practice by participants, elicits extensive opinion whereas the use of multiple‐choice questions attracts little comment. Indeed, it is unlikely that these domain‐ related criticisms would be made of IELTS listening tasks because their relationship to the domain is too distant. So, while stakeholder perceptions are worthwhile in that they invite user participation in the standard‐maintaining process (Busch, 2011), human perceptions are laden with cultural bias, naturalized standards included. High‐stakes standardized tests are also standardizing mechanisms. Tests categorise people into levels or standards but they also standardize language use by sanctioning ‘appropriate’ varieties, for instance, as judged by raters trained to recognise deviation from norms deemed appropriate (Shohamy, 2007; Young, 2012). As noted by past candidates in this study, the accents represented in the test are not representative of the broad range of accents they would typically encounter in Australian healthcare workplaces from patients and colleagues, also reported in Oh (2010). This case illustrates the need for a standard to evolve so that it better reflects, through judicious sampling, the range of accents typically encountered in the workplace. Doing so would increase the likelihood that candidates are able to adapt to the different varieties they may later experience in the workplace (Harding, 2014). 13 Another key feature of real‐world communication which participants noted to be absent from the OET was intra‐/inter‐professional communication. However, including this in the test has implications for test administration since it is unlikely that language specialists would be equipped to evaluate the associated register without extensive training. Sampling this aspect of the workplace requires investment on the part of the test provider to adapt the standard. Further, both board representatives viewed the tests (OET and IELTS) purely in terms of general English proficiency. The nursing board representative explained that the test result gives employers assurance that the registrant has adequate non‐technical language proficiency with the understanding that any new employee would need to learn context‐specific language. However, the view of past candidates, who have to invest time and money in a test experience, is different. For many, preparing for an LSP test is far preferable because of the beneficial effects on their profession‐related communication skills. Indeed they were critical of test topics for not being specific enough. To return to the notion of ‘effect‐driven testing’ (Fulcher & Davidson, 2007) and our final research question about the effects of the OET, there were several test‐related practices which informants found beneficial once they entered the workplace, such as the development of note‐taking skills, the perceived similarity between test role‐plays and real‐world conversing with patients, the modeling of patient consultations, and other aspects of healthcare practice in Australia (e.g. maintaining patient privacy). The overall test experience appeared to provide some level of enculturation into the Australian healthcare context. Thus, LSP test effects might be best evaluated in terms of post‐test practices, as depicted in Figure 1. BO3 Adequate representation? BO2 BO1 Construct Workplace communication Construct Test design PRE‐TEST PRACTICES POST‐TEST PRACTICES Effects of test Figure 1 Boundary object (BO) manifestations and test effects Figure 1 depicts how test effects have been operationalized in this study. Using the notion of test as a boundary object that sits between worlds, we have observed two, overlapping constructs: BO1, the test construct (and its underlying theory of language and test method overlay) and BO2, the workplace communication construct (and its underlying theory of practice/communication as well as actual practices in real‐world contexts). The abstract target language domain is represented by BO3: the degree of overlap between the test construct and the workplace construct, as perceived by the participants. Note here we distinguish between BO3, the target domain sampled by the test to which test scores apply, and the ‘real world’ or actual, un‐sample‐able communication, to which test effects apply. 14 As with all standards, which are always incomplete and inadequate (Star & Lampland, 2009), complete overlap is always impossible, even undesirable, as the local specificity of real world sampling would be unfair. However, just as a test validation argument strives for completeness and coherence, the degree of overlap is crucial in the extrapolation inference which entails an ‘ambitious leap’ from a statistical sampling procedure to the real world (Kane, 2013, p. 28). The between‐ness of LSP tests is their key characteristic, well‐signified in the popular ‘gatekeeping’ metaphor. The boundary object concept allows us to approach validation as a dialectic pursuit (Akkerman & Bakker, 2011; Moss, 1998), a negotiation over the adequacy and effects of the test for the social worlds it connects. Adopting this concept to characterize test impact frees us from seeing the test purely as propelling influence backwards (implicit in the notion of washback) and instead sees it as exerting influence potentially into all stakeholder worlds, across the whole candidate trajectory, from test‐ preparation into the workplace. In Figure 1, therefore, we show test effects as bi‐directional, with the quality of pre‐test and post‐test practices both warranting investigation. Test developers, providers and enforcers should aim for test effects which are constructive for the worlds the test interconnects, not just in terms of the pre‐test learning opportunities it offers candidates, but also the potential impact such learning has on their future workplaces. Funding acknowledgement The project drawn on in this article was commissioned by the OET Centre. 15 VI References Akkerman, S. F., & Bakker, A. (2011). Boundary crossing and boundary objects. Review of Educational Research, 81(2), 132‐169. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Bowker, G. C., & Star, S. L. (1999). Sorting things out: Classification and its consequences. Cambridge, Mass.: The MIT Press. Brown, A. (1993). The role of test‐taker feedback in the test development process: Test‐ takers' reactions to a tape‐mediated test of proficiency in spoken Japanese. Language Testing, 10(3), 277‐301. Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press. Busch, L. (2011). Standards: Recipes for reality. Cambridge, Mass.: MIT Press. Cheng, L. (2008). Washback, impact and consequences. In N. H. Hornberger & E. Shohamy (Eds.), Encyclopedia of Language and Education, Volume 7: Language Testing and Assessment (pp. 349‐364). Berlin: Springer. Davies, A. (2008). Ethics, professionalism, rights and codes. In N. H. Hornberger & E. Shohamy (Eds.), Encyclopedia of Language and Education, Volume 7: Language Testing and Assessment (pp. 2555‐2569). Berlin: Springer. Douglas, D. (2000). Assessing languages for specific purposes. Cambridge: Cambridge University Press. Douglas, D. (2010). Understanding Language Testing. Abingdon: Routledge. Douglas, D., & Myers, R. (2000). Assessing the communication skills of veterinary students: Whose criteria? In A. J. Kunnan (Ed.), Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp. 60‐81). Cambridge: Cambridge University Press. Elder, C. Exploring the limits of authenticity in LSP testing: The case of a specific purpose language test for health professionals. Language Testing. Elder, C. (2007). OET–IELTS benchmarking study. Report to the OET Centre: Language Testing Research Centre, University of Melbourne. Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Abingdon: Routledge. Harding, L. (2014). Communicative language testing: current issues and future research. Language Assessment Quarterly, 11(2), 186‐197. Holland, D., & Reeves, J. R. (1994). Activity theory and the view from somewhere: Team perspectives on the intellectual work of programming. Mind, Culture, and Activity, 1(1‐2), 8‐24. Jacoby, S., & McNamara, T. (1999). Locating Competence. English for Specific Purposes, 18(3), 213‐241. doi: http://dx.doi.org/10.1016/S0889‐4906(97)00053‐7 Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1‐73. King, N., & Horrocks, C. (2010). Interviews in Qualitative Research. London: Sage. Levinson, W., Lesser, C. S., & Epstein, R. M. (2010). Developing physician communication skills for patient‐centered care. Health Affairs, 29(7), 1310‐1318. Lewkowicz, J. A. (2000). Authenticity in language testing: some outstanding questions. Language Testing, 17(1), 43‐64. Lumley, T., & Brown, A. (1996). Specific‐purpose language performance tests: task and interaction. Australian Review of Applied Linguistics(13), 105‐136. Lumley, T., Lynch, B., & McNamara, T. (1994). A new approach to standard‐setting in language assessment. Melbourne Papers in Language Testing, 3(2), 19‐40. 16 Macqueen, S., Pill, J., Elder, C., & Knoch, U. (2013). Investigating the test impact of the OET: A qualitative study of stakeholder perceptions of test relevance and efficacy: Language Testing Research Centre, University of Melbourne. Makoul, G. (2001). Essential elements of communication in medical encounters: the Kalamazoo consensus statement. Academic Medicine, 76(4), 390‐393. Manias, E., & McNamara, T. Health professionals' views of professional readiness in the standard setting context: Tensions between language and communication. Language Testing. McCormack, B., & McCance, T. V. (2006). Development of a framework for person‐centred nursing. Journal of advanced Nursing, 56(5), 472‐479. McNamara, T. (1990). Assessing the second language proficiency of health professionals. (PhD thesis), University of Melbourne, Melbourne. McNamara, T. (2006). Validity in language testing: The challenge of Sam Messick's legacy. Language Assessment Quarterly, 3(1), 31‐51. McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Malden: Blackwell Publishing. Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241‐ 256. Moss, P. A. (1998). The role of consequences in validity theory. Educational Measurement: Issues and Practice, 17(2), 6‐12. O'Neill, T. R., Buckendahl, C. W., Plake, B. S., & Taylor, L. (2007). Recommending a nursing‐ specific passing standard for the IELTS examination. Language Assessment Quarterly, 4(4), 295‐317. Oh, H. (2010). Overseas health professionals’ perspectives on the OET. (Unpublished Master’s Thesis), University of Melbourne. Purpura, J. (2008). Assessing communicative language ability: Models and their components. In N. H. Hornberger & E. Shohamy (Eds.), Encyclopedia of Language and Education, Volume 7: Language Testing and Assessment (pp. 53‐68). Berlin: Springer. Read, J., & Wette, R. (2009). Achieving English proficiency for professional registration: The experience of overseas‐qualified health professionals in the New Zealand context IELTS Research Reports (Vol. 10). Shohamy, E. (2004). Assessment in multicultural societies: Applying democratic principles and practices to language testing. In B. Norton & K. Toohey (Eds.), Critical pedagogies and language learning (pp. 72‐92). Cambridge: Cambridge University Press. Shohamy, E. (2007). Language tests as language policy tools. Assessment in Education, 14(1), 117‐130. Skelton, J. (2013). English for medical purposes. In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics (pp. 1‐7): Blackwell Publishing Ltd. doi: 10.1002/9781405198431.wbeal0379 Star, S. L., & Griesemer, J. R. (1989). Institutional ecology, translations and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907‐39. Social studies of science, 19(3), 387‐420. Star, S. L., & Lampland, M. (2009). Reckoning with Standards. In M. Lampland & S. L. Star (Eds.), Standards and their stories: How quantifying, classifying and formalizing practices shape everyday life (pp. 3‐24). Ithaca: Cornell University Press. Woodward‐Kron, R., & Elder, C. Authenticity in an English language screening test for clinicians: A comparative discourse study. Language Testing. Woodward‐Kron, R., & Elder, C. (2015). Authenticity in an English language screening test for clinicians: A comparative discourse study. Language Testing. 17 Young, R. F. (2012). Social dimensions of language testing. In G. Fulcher & F. Davidson (Eds.), The Routledge handbook of language testing (pp. 178‐193). Abingdon: Routledge. 18

Log In

Language test as boundary object: Perspectives from test users in the healthcare domain

Language test as boundary object: Perspectives from test users in the healthcare domain

Related Papers

RELATED PAPERS