Excepcional Children Viil. 75. No. 3. pp- 32Í-M0. ©2009 Councilfar Exceptioiml Children. An Examination of the Evidence Base for Eunction-Based Interventions for Students With Emotional and/or Behavioral Disorders Attending Middle and High Schools KATHLEEN LYNNE LANE JEMMA ROBERTSON KALBERG JENNA COURTNEY SHEPCARO Ptahödy College ofVanderbilt University ABSTRACT: r: The authors field-tested the core quality indicators and standardsforevidence-based practices for single-case design studies developed by Horner and colleagues (2005) by applying them to the literature exploring fiinctional assessment-based interventions conducted with secondary-age students with emotional and/or behavioral disorders (EBD). First, we evaluated this knowledge base by applying the indicators to determine if the studies identified (n = 12) were of acceptable methodological quality. Second, we analyzed studies meeting the recommended quality indicators to determine whether fiinction-based interventions with students with EBD might be considered an evidence-based practice. Results reveal that only 1 study addressed all proposed quality indicators, suggesting that function-based interventions are not yet an evidence-based practice for this population per these indicators and standards. Limitations and recommendations are posed. S tudents with emotional and/or EBD have behavioral, social, and academic behavioral disorders (EBD) rep- deficits that pose challenges within and beyond resent between 2% and 20% of the school setting (Kauffman, 2005). For examihe school-age population and pie, they have impaired social skills that strain reare among some of the most lationships with teachers and peers (Gresham, challenging students to teach (Walker, Ramsey, & 2002). In addition, students with EBD have Gresham, 2004). By definition, students with broad academic deficits that, at best, remain Exceptional Children 321 stable over time {Nelson, Benner, Lane, Ô£ Smith, 2004). Unfortunately, outcomes do not improve when EBD students leave the school setting as evidenced by employment difficulties, contact with the juvenile justice system, limited community involvement, and high rates of access to mental health services (Bullis & Yovanoff, 2006). During the past 30 years, schools have responded with a range of interventions to support these youngsters including schoolwide primary prevention efforts (e.g., antibullying programs); secondary prevention efforts (e.g., small group instruction in conflict resolution skills); and tertiary prevention efforts (e.g., individualized intervention efforts; Horner & Sugai, 2000). One tertiary intervention effort that has met with demonstrated success, particularly with elementary-age students with F.BD is function-based interventions (Conroy, Dunlap, Glarke, & Alter, 2005; Kern, Hilt, & Gresham, 2004; Lane, Umbreit, & Beebe-Frankenberger, 1999). Function-based interventions refer to interventions designed based on the reasons why problem behaviors occur (Umbreit, Ferro, Liaupsin, & Lane, 2007). The motive for a given behavior is derived through a functional behavioral assessment. In brief, descriptive (e.g., interviews, direct observations of behavior, rating scales) and experimental (e.g., functional analysis) procedures are used to identif)' the antecedent conditions that prompt a target behavior (e.g., disruption) to occur and the consequences that maintain the behavior. These data are used to generate a hypothesis statement regarding the function of the behavior. In general, all behaviors occur to either obtain (positive reinforcement) or avoid (negative reinforcement) attention; activities or tasks; or tangible or sensory conditions (Umbreit et al., 2007). Often, the hypothesis statement is tested by systematically manipulating environmental conditions to identify or confirm maintaining consequences. Next, an intervention is designed based on the function of the target behavior with a goal of teaching the student a more reliable, efficient method of meeting his or her objective (e.g., escaping a too difficult or too easy task; Umbreit, f.ane, & Dejud, 2004). This is done by constructing an intervention that (a) adjusts antecedent conditions that prompt the problem behavior, (b) increases reinforcement rates for the replacement behavior, and (c) extinguishes reinforcement for the target behavior. Functional assessment procedures were originally developed in clinical settings with individuals with developmental disabilities (Iwata, Dorsey, Slifer, Bauman, & Richman, 1982). Since that time, functional assessment-based interventions have been used to shape a variety of behaviors in a range of educational settings (e.g., general education classes, self-contained classrooms, self-contained schools), with students with a range of conditions including severe disabilities (Sasso, Reimers, Cooper, & Wacker, 1992); attention deficit disorders and behavioral concerns (Ervin, DuPauI, Kern, & Friman, 1998); and emotional and/or behavioral problems (Kern, Childs, Dunlap, Clarke, & Falk, 1994; Kern, Delaney, Clarke, Dunlap, & Childs, 2001). In fact, functional behavioral assessments have been endorsed by the National Association of School Psychologists, National Association of State Directors of Education, and National Institutes of Health, and mandated in the Individuals With Disabilities Education Act (IDEA; first in 1997 and again in 2004) when certain disciplinary circumstances occur (Kern et al., 2004). Namely, school personnel must conduct a functional behavioral assessment when (a) a student is placed in an alternative placement for behavior deemed to be dangerous ro self or others; (b) a student is placed in an alternative setting for 45 days due to drug or weapons violations; or (c) a student's suspension or alternative setting placement extends beyond 10 days or constitutes a change in placetnent (Drasgow & Yell, 2001). Given the behaviors typical of students with EBD, many of these students may require function-based interventions. Yet, several researchers contend that such a mandate may not be entirely appropriate (e.g.. Fox, Conroy, & Heckaman, 1998; Gresham, 2004; Kern et al., 2004; Quinn et al., 2001 ; Sasso, Conroy, Seichter, & Fox, 2001). Specifically, there are concerns ofa generalization error in the sense that existing functional assessment procedures, which were originally developed for persons with developmental disabilities, have not been validated for use with students with EBD (Fox et al.; Kern et al., 2004; Sasso et al., 2001). At best, there is a Spring 2009 modest body of literature exploring the effectiveness of function-based interventions for students with EBD, with most of the studies conducted in elementary grades (Lane et al., 1999; Quinn et al.). In the reviews of function-based interventions conducted with students with and at risk for EBD. the populations have been predominantly male, with limited inquiry with secondary-age students (Conroy et al., 2005; Kern et al., 2004; Lane et al., 1999;Sassoetal., 2001). ture focused on single-case methodology, we evaluated this knowledge base by field-testing the quality indicators posed by Horner et al. (2005) to determine if the studies identified in a systematic literature review met the recommended quality indicators. Second, we analyzed studies that met the recommended quality indicators to determine whether function-based interventions with secondary-age students with EBD are an evidence-based practice according to Horner et al.'s proposed standards. Third, we discussed the extent to which the quality indicators represent reasonable standards and offered considerations for future application and evaluation of the quality indicators. Therefore, questions arise as to the efïicacy of function-based interventions, particularly for oldet school-age students. It is possible that designing, implementing, and evaluating functionbased intervetitions with this population will prove to be a highly formidable task given the increased importance of the peer group (Morrison, Arefiinctionalassessment-based Robertson, Laurie, & Kelly, 2002); topographical changes in discipline problems (e.g., covert acts of interventions an evidence-basedpractice aggression, internalizing behaviors; Loebet, for secondary-age students with EBD? Green, Lahey, Frick, & McBurnett, 2000; Morris, Shah, & Morris, 2002); and difficulties in identifying meaningful reinforcers that can compete with the reinforcing value of the undesired, target METHOD behavior (e.g., truancy). Thus, the question arises: ARTICLE SELECTION PROCEDURES Are functional assessment-based interventions an evidence-based practice for secondary-age stu- We conducted a systematic search of psychology and educational databases (PsycINFO and Educadents with EBD? Answers to questions about intervention effi- tional Resources Information Center, ERIC) to cacy have become more complex as researchers identify function-based intervention studies conhave sought to define what constitutes evidence- ducted with secondary-age students with or at based practices. Gersten et al. (2005) and Horner risk for EBD. Search terms included all possible et al. (2005) introduced criteria for determining combinations and derivatives of the following sets whether a practice is evidence-based using group of terms: {jx) functional assessment, fiinctional analdesign and single-case experimental investiga- ysis, assessment based, intervention, and procedures; tions, respectively. These research teams devel- and (b) seriously emotionally disturbed, einotinnal oped quality indicators for group design and andJor behavioral disorders, at risk, and problem besingle-case design inquiry that can be used to de- havior (Lane et al., 1999). The title and abstract termine the extent to which a given study meets of each article from the electronic search was evalrequisite criteria, thereby establishing the study as uated to determine if the article should be read in a reputable, appropriate study. Further, each team its entirety to evaluate inclusion eligibility. Next, a offered guidelines for evaluating bodies of rep- master list of journals that published the included utable studies that meet the quality indicators to studies was created. We conducted hand searches determine if the practice Is evidence based. of those journals that published two or more of The goal of this review was to field-test these the articles from 1980 to present to gather any quality indicators by applying them to the body other articles that met inclusion criteria. Searches ol- literature exploring functional assessment- were conducted in the following journals; Behavbased interventions conducted with secondary- ioral Disorders, Education and Treatment of Chilage students with EBD. Specifically, the intent dren, Journal of Applied Behavior A}jalysis, Joumal was threefold. First, given that this body of litera- of Emotional and Behavioral Disorders, Journal of Exceptional Children 323 Positive Behavior Interventions, and School Psychol- • ogy Review. Finally, we compared our search results with other reviews of function-based interventions (e.g., Dunlap & Childs, 1996; Heckaman, Conroy, Fox, & Chait, 2000; Lane et al., 1999). Thirty-three articles, all of which employed • single-subject designs, were identified as appropriate for further review using the procedures previously stated. Each article was read in its entirery to determine if the article met the following inclusion criteria. • INCLUSION CRITERIA The Intent of this review was to evaluate the extent to which function-based interventions conducted with secondary-age students with or at risk for EBD met the recommended indicators and to determine if function-based interventions are an evidence-hased practice for this population. Studies were Included in this review only if (a) the participants were diagnosed with or were at risk for EBD, (b) the participants were educated in a secondary school setting, (c) an intervention derived from a functional assessment was implemented and evaluated using single-case methodology, (d) intervention results included a graphic display of student outcomes, and (e) the study was published in a refereed journal. Participants included in the studies had to be adolescents, defined as students ages 13 to 18, with or at risk for EBD. This group included students with • EBD, an inclusive term to describe students with behavioral concerns. • EBD and another disability specified in IDEA (e.g., learning disability, other health impairment, speech and language disorder), except for students with a dual diagnosis of moderate mental retardation or developmental disabilities (e.g.. Cole, Davenport, Bambara, & Ager, 1997; O'Reilly et al., 2002) as these students typically participate In a functional skills curricula rather than traditional core curricula (Heckaman et al., 2000; Lane et al., 1999). • A label of emotional disturbance (ED), as specified by IDEA (2004). Psychiatric diagnoses specified in the Diagnostic and Statistical Manual of Mental Disor- ders (DSM-IV-TR; American Psychiatric Association, 2001) such as condttct disorder (CD) or oppositional defiant disorder (ODD). A general behavioral concern (e.g., noncompllance) and attention deficit/hyperactivity disorder, a group of students with attention and behavioral concerns that place them at heightened risk for behavior disorders. Psychiatric (e.g., ODD, CD) or educational (ED) diagnosis that co-occurred with an attention disorder (e.g., Ervln et al., 1998; Lane et al., 1999). Second, all function-based Interventions needed to take place in a secondary school setting, inclusive of middle, junior high, and high schools. If the study reported function-based interventions implemented in multiple school levels (e.g., elementary and middle), only results of the investigation taking place at the secondary school were included (e.g., DePaepe, Shores, Jack, & Denny, 1996; Gunter, Jack, Shores, Carrell, & Flowers, 1993; Stage et al., 2006). Interventions implemented in clinics, day treatment centers, diagnostic centers, or residential day treatment centers (e.g., Platt, Harris, & Clements, 1980) were excluded as the purpose of this review was to examine school-based interventions conducted in secondary schools. If the school level was not stated, the article was excluded unless the student was 13 years or older as there was very limited possibility that a 13-year-old would still be in elementary school. Third, a functional assessment had to he conducted, yielding a hypothesis regarding the reason why the target behavior occurred. Functional assessment procedures—descriptive (e.g., interview, behavior rating scales, direct observation) or experimental (e.g., functional analysis)— must have been delineated. Consistent witb other review articles, at least one of the preceding functional assessment procedures must have been employed In the methodological procedures and a hypothesis statement generated from the functional assessment results (Heckaman et al., 2000; Lane et al., 1999). Further, the article needed to include an intervention based on functional as- Spring 2009 sessment results (Heckaman et al.) and evaluated using single-case mechodology. Articles thac included only functional assessment resulcs, funccional analyses that did not lead eo sustained interventions, or ehose with inecrvcntiotis noe based on functional assessment results were excluded (e.g., DePaepc ee al., 1996; Ervin ec al., 2000). Fourth, chc scudies must have reported a graphic display of student outcomes for individual students. Studies reporting only narrative outcomes (e.g., Sterling-Turner, Robinson, & Wilczynski, 2001) w^ere excluded. We viewed this visual display as essential to evaluate the accuracy of creatment-outcome resulcs and the analytical tools (e.g., stability, level, trend) employed. Further, studies reporting graphic display of group outcomes (e.g.. Center, Deitz, & Kaufman, 1982) were excluded as they did noc allow for inspection of individual outcomes. Finally, only articles published in peer-reviewed journals were included in chis review. Dissercacions, book chapcers, and monographs were excluded because our goal was to draw conclusions based on información thac had wiehseood ehe peer review process. Of the 33 arricies identified in the initial search, 12 areicles mee ehe inclusion criteria as determined by all ehree auehors. These anieles were coded independently by the first and third authors as described in the following section. CODING PROCEDURES FOR QUALITY INDICATORS ticipant selection, and (c) secting description. To meet the first componenc, more ehan a general definition (e.g., ERD) was required. Participancs had to be described in sufficient detail chat included (a) the specific disability as well as (b) the method used Co determine the disability. Participant selección criteria needed to be defined precisely enough to allow replication (e.g., quantifiable daca co indicate rcpiicacion selection criceria). Setting description required a description of the physical setcing that also included sufficienc details (e.g., number of adults presenc, room arrangemenc) that allowed others to recruit similar participancs from similar sectings. The coding reliability was as follows: participant description 83.33%; participant selection 100%; and setting description 91.67%. Dependent Variable. Horner et al. (2005) identified five components to determine the quality of the dependent variables. First, the description of each dependenc variable had to be operationally defined. If more than one dependent variable was reported, and both variables were noc defined precisely, chen chis componenc was considered absent. Second, each dependent variable needed to be measured using a procedure chac produced a quantifiable index such as the frequency of a given behavior per minute. Third, che measuremenc of the dependent variable needed co be valid and described with sufficient precision to allow for replication (e.g., appropriate system of measure dependenc on the nature of che cargec behavior ¡whole interval for variables such as engagement, partial interval for variables such as disruption] wich details of the data collection procedures provided). Articles that mec ehe inclusion criceria were read in cheir entirety by all chree auehors and coded by Fourth, the dependent variable needed to be the first and third authors. Each article was coded measured repeatedly over time. We further dealong the 21 components conscicuting the seven fined chis component to require a minimum of 3 qualicy indicators specified by Horner et al. data points per condition (Kennedy, 2005). As (2005; see Table 1): (a) describing participancs mentioned by Kennedy, 3 data points per phase is and seccings; (b) dependenc variable; (c) indepen- an acceptable standard among researchers emdenc variable; (d) baseline; (e) experimental con- ploying single-case mechodology. Fifth, data crol/incernal validicy; (0 excernal validity; and (g) needed co be reported regarding the reliability or social validity. Specifically, each component was incerobserver agreement (IOA) of each dependent evaluated as being present or absent according co variable. Further, Horner et al. indicated chat IOA levels had to meet the following minimum che guidelines in che seccions that follow. Describing Participants and Setting. Per Hor-scandards: IOA = 80% and Kappa = 60%. Bener et al. (2005), chis indicacor contained chree cause some articles reported ranges as well as componencs: (a) parcicipant description, (b) par- means, we further defined this component to reExceptional Children 325 oo«~i O O r n o o o o o —sCStNfNrNfN mm"^ --^PHr^rn o o ^-.lAirv o c o O O O q (S 0 do d 2 Z .—.. SSS oSSSSS oSSS oSS ^SSS 8 ^ S ^ Z -m O fi n O (N rñ 0 o o rn o o o 'M n fN co o rs o « n o ri rs o o o o ^ o o o o o o ^0 ifî o o O .—. 0 r, ¿H Z 2 " o ;¿í Z 00) rs o o o 2 Yes Yes Yes rs o o O o o « SI 0 .>- . > • h-' >! 2 2 o o ^^^^ o o o o o o r^ Ip o r-j O \r< o >- Z o o o o o o o o m o o o o O o m o o f S o tN o .o —. o o o o .—. rn o m .—, o o o o o o o 3 o O o ,—. o o o o o 3 o 3 s o ^_ o d o 0 0 o o 3 0 o 3 o ¿ „ 0 S 0 2 2 Z 2 o o z 0 z z o z z z o Z 2 Z 2 z z i^ u^ o o ,—. iN (N O o o Ö Ö Ö d o <Ji 'y¡ Q 0 o ^ ^ Z 2 z a • .2 '^ .y S^ ^ a. ^ 1 >. E IJ .S a. c -îi -- 2 F Abi 3 O •5 111 (N O .—. o o o o o 'n o o r' o o 'f-l" J—, n m „ No(0. 33) 33) Yes Yes Yes Yes o o m ,^ m Xo o o C 3 Yes( o^ .—. • O M Yes Yes Yes Yes(l. Yes Yes Yes Yes Yes SI -*" -*• Z N' 00 (N O o o (0. (0. 0 o ''- •1) ^ (0 fn o 'm" o o m .—, ( S (N o 1-1 o O o o o o o o {0 o .—- 20) 20) 20) 20) o o o o o Yes o o o o o Z fc--t • o 00) 00) 33) 7* 33) 33) 33) z E S .2 £• ^ at.§ 1 bb 1 .0 i i E .eu L- Q u E .- U a Ee 2 2 Spring 2009 O O O O O .^pfNpfNp ifl wl Q 33) 33) 33) 20) 20) 20) 20) 20) 33) o o o o o o o o o o o o Z 3 0 7 Z o o o c o r-4 p p f^l d d d d d i^ Jj Ji rN fN <77* .,—i ^ O "^ o o o o 2 f^ (N d y o o o 2 /^ ,-l 3 0 ^ 3 0 0 0 Z íN O o 00) 00) 00) O ^ Z 2 o No(0. o Yes 33) 33) 33) o o Yes Yes Yes 20) o o Yes Yes Yes 33) 00} O o o (00 > • ? go o (00 (00 o o (or (or (00 o o O o z £ OT o >- 2 2 >- (00 o o o o (00 o m o ? , 3 P. o o o o o o o o o o o o 2 Z í^ 25) í3 00) 00) "5 o O Z No(0. Yes o Z ¿¿ Z No(0. pfnp 00) O f T i O 3d353d3d3 o p £ £ o o 0 0 0 0 2 2 :3 o-g u _• ÖD n S '^ O •£ 0 2 ,0.—, 0 0 2 ¿ 30 2 ^..^ 0 co 3 0 0 fN 0 0 fN 0 fN Yes (0. Yes Yes Yes (0. ^1 3 d 0 0 Yes 0 ^—. 0 3 d 2 0 d d d .0 ^fdN 0 0 fN 0 d 0 fN fN z .3 ^ .^ jo ^ 1^ í^ ^^ r<^ rCi O 0" 0 í¿ d d 3 d 0 c fN d d 0 d d 0 ?^ ^ * ?^ O > >H Z 0 -^ PO m rri ^—V < ^ - í ^ f ^ 00 -—^ ^ / ^ V ^ ^ - > g o o o g o o g £ ££ fN p rC d d 0 d "o" "o" Q z z Z 0 fN >- >- z fS H (N r-J (N d d d Ö o uh •5 3 •3 a f- & S -a g Exceptional Children 327 quire only the means to be at or above these criteria. Namely, the component was considered present if the mean met criteria, regardless if the range of scores reported included values below the minimum criteria. If the article reported IOA and Kappa values (e.g., March & Horner, 2002), then both minimum criteria had to be met to be considered present. Finally, it was necessary for IOA or Kappa values to be presented for each measure and each phase. The coding reliability was as follows; depetident variable description 100%; quantifiable measurement 100%; valid and welldefined measurement 100%; measured repeatedly 100%; and IOA 91.67%. tic. For this component, we established a minimum of 3 data points rathet than 5, reasoning that 5 may be an unnecessarily high number. As Kennedy (2005) stated, "[A] baseline needs to be as long as necessary but no longer. The goal of baseline is to establish patterns of behavior to compare to intervention. Therefore, a baseline needs only be long enough to adequately sample this pattern" (p. 38). The second component necessary to establisb the quality of baseline was that baseline conditions be described with sufFicient detail to allow for replication. We clarified this indicator by establishing that the baseline description needed to include information on "who did Independent Variable. Horner et al. (2005) what to whom, where were those actions taken, delineated three components as being necessary and when did those actions occur" (Lane, Worley, for single-subject studies to meet the independent Reichow, & Rogers, 2006, p. 226). The coding variable quality indicator. First, the independent reliability was as Follows: repeated measurevariable needed to be described precisely to allow ment/established pattern 100% and description for replication. This included documentation of 100%. required materials and explicit reporting of speExperimental Control/Internal Validity. Horcific procedures. General descriptions (e.g., token economy) were considered insufficient to meet ner et al. (2005) established experimental conthe expectation. Second, the independent variable trol/internal validity as being evident when needed to be systematically manipulated by the the design documents three demonstrations intervention agent (e.g., teacher, paraprofesof the experimental effect at three different sional). Third, fidelity of implementation was points in time with 3 single participant considered "highly desirable" (Horner et al., p. (with in-subject replication), or across differ174). Horner et al. defined this as continuous dient participants (inter-subject replication). rect measurement of the independent variable or An experimental efFect is demonstrated when a parallel form of assessment. To further define predicted change in the dependent variable this component, we added that fidelity needed to covaries with manipulation of the indepenbe both measured explicitly and the data reported dent variable, (p. 168) (Lane et al., 1999). The coding reliability was as They indicated that three c o m p o n e n t s follows: independent variable description 100%; needed to be addressed to establish experimental systematically manipulated 100%; and fidelity of control. First, the design must include at least implementation 91.67%. three demonstrations of experimental effect at Baseline. Horner et al. (2005) indicated that three different time points. In instances in which baseline conditions needed to include repeated measurement, with an established pattern of re- fewer than three demonstrations were docusponding that could be used to anticipate or pre- mented in a given experiment, this component dict future behavior in the absence of an was considered absent. Second, the design needed intervention. They reported that baseline phases to control for common threats to internal validity. should include multiple data points. Specifically, In addition to requiring established designs that Horner et al. stated the following: "five or more, met this criteria (e.g., ABAB; BABA; changing although fewer data points are acceptable in spe- criterion, multiple baseline with three legs, altercific cases" (Horner et al., p. 168). In addition, nating treatment), we also required that treatment they required that either (a) a trend in the pre- integrity be assessed and reported given that the dicted direction of the intervention efFect not be absence of treatment integrity poses a severe present or (b) that the trend be countertherapeu- threat to internal validity (Gresham, 1989). 328 Spring 2009 Finally, Horner et al. (2005) required a pattern of responding that documented experimental control. They indicated that visual analysis techniques, which involve interpretation of level, trend, and variability of performance during each phase as well as other techniques (e.g., immediacy of effects, magnitude of change, percentage of overlapping data points, and consistency of data patterns) shotild be used to determine if this component was met. For our coding procedures, we determined that authors did not need to discuss each element {level, trend, variability) in text. If a graph with individual student-level data was displayed and the reader could examine level, trend, and variability and the graph suggested a functional relation between the introduction oí the independent variable and corresponding changes in the dependent variables, the componenr was coded as present. 1 he coding reliability was as follows: three demonstrations of experimental effect 100%; internal validity 91.67%; and pattern of results 100%. External Validity. Horner et al. (2005) recommended documenting external validity by replicating experimental effects across participants, settings, behaviors, or materials. Consistent with Tankersley, Cook, and Cook (in press), we interpreted this quality indicator to require replication across one of the following: participants, setting, behavior, or materials. To further clarify criteria tor external validity, we required studies to (a) include three replications in one of those categories as recommended by Horner et al., and (b) meet all three previously stated criteria for experimental control/internal validity to be considered as possibly having external validity given that internal validity is essential to establishing external validity (Wolery, 2007 personal communication). The coding reliability was as follows: 100% external validity. vention produced an efFect that met the defmed, clinical need" (Horner et al., p. 172). We further defined this component as being present if (a) there was a measure of social validity and the evidence from that measure reported a socially meaningful change in the desired direction or (b) a functional relation was evident between the introduction of the independent variable and change in the target behavior (e.g., reduction in aggression). Third, the independent variable was practical and cost effective. We clarified this component by stating that cost effectiveness must be stated explicitly. Practicality was defined as a study conducted in a typical setting with traditional intervention agents and materials typically found in tbe identified setting. Finally, use in typical contexts was defmed as the following: demonstration that typical intervention agenrs (a) report protedures to be acceptable, (b) report the procedures to be feasible within available resources, (t) report the procedure to be effective, and (d) choose to continue use of the intervention procedures after formal support/expectation of use is removed. (Horner et al., 2003. p. 172) We coded this component as present if any one of these four practices was reported. The coding reliability was as follows: social importance of the dependent variable 100%; change in dependent variable is socially Important 75%; independent variable is practical and cost effective 100%; and used in typical contexts 100%. An overarching framework when coding the quality indicators was to evaluate the studies based on what the researchers reported either in text or in visual display, and not in our Interpretations. The modifications to the components constituting the quality indicators were developed to (a) refme definitions to increa.se consistency across Social Validity. Horner et al. (2005) identified raters and (b) allow more transparent criteria for social validity as the final quality indicator, which the reader. referred to rhe social significance of the goals, social acceptability of the treatment procedures, and EVALUATION PROCEDURES FOR rhe social Importance of the effects (Baer, Wolf, & DETERMINING EVIDENCE-BASED llisley, 1968). They identified four components PRACTICE USING SINGLE-SUBJECT for this indicator. First, the dependent variables needed to be socially valid. Second, the change in RESEARCH the dependent variable had to be socially impor- We then applied the five standards for an evitant, defined as a "demonstration that the inter- dence-based practice proposed by Horner et al. Exceptional Cbildren sas (2005) to the body of liceracure examining the effectiveness of function-based intervención conducted with secondary-age students with and at risk for EBD. The goal was to determine if singlesubject research studies document this practice as evidence based. The five scandacds necessary co documenc a practice as evidence-based included the following: indicator for describing participants and setting (Smich & Sugai, 2000; see Table 1). Three studies addressed cwo of the three components by reporting descriptions of the parcicipant and secting chac were precise enough to facilitate replication (Ervin ec al., 1998; Hoff, Ervin, Si Friman, 2005; Liaupsin, Umbreit, Ferro, Urso, & Upreci, 2006). However, despice the thorough descripción of the students' disabilities or condition and the proce1. The practice was defined operationally in CO decermine their disabilicies, studies dures used cext. did noc describe the process used to selecc parcici2. The authors defined the context and outpants with replicable precision. Of the 8 remaincomes associated with the practice. ing studies, all buc 1 (Schloss, Kane, & Miller, 3. The practice was implemented wich fidelity. 1981) mec at least one component constituting 4. Findings documenc che introduction of the this qualicy indicacor. Four scudies reported the practice as functionally relaced to change in critical features of che setcing, buc chese were not che dependent variables. precise enough in describing the participancs or 5. Experimental efFects are replicated across suf- che parcicipant selection process so were not inficient number of peer-reviewed scudies (n = cluded (Gunter et al., 1993; Knapczyk, 1988, 5) published in refereed journals, across chree 1992; Penno, Frank, & Wacker, 2000). In condifferent researchers at chree differenc geo- crast, the remaining chree studies provided degraphical locales, and include ac lease 20 par- tailed descriptions of the participant selection ticipants from five or more studies. process, buC these did noc provide enough detail in describing the participants or setting co allow for replication (Ingram, Lewis-Palmer, Ô£ Sugai, RESULTS 2005; March & Horner, 2002; Stage et al., In the results section, we address the first Cwo put- 2006). In some instances, the level of precision poses of chis review by answering the following for describing the participant selection process questions: To what extent do the scudies identified was particularly detailed. For example, March and for inclusion meet the quality indicators posed by Horner stated che following: Horner et al. (2U05)î To what extent do the scudThree participants were selected ba.sed on (a) ies addressing che qualicy indicators supporc funcno decrease in cheir rate of discipline concion-based interventions for secondary-age tacts following involvement wich the BEP program, (b) documcncacion of ai least five students with EBD as an evidence-based practice? ofifice discipline referrals during thefirst4 months of the new academic year, (c) nomiby BEP team members, (d) student nation S T U D I E S OF F U N C T I O N assent and parent consent, (p. 162) BASED I N T E R V E N T I O N S FOR SECO N OA RY-AG E Seccing was the mosc frequencly addressed, wich 8 ouc of 12 studies meeting the coding criteria for this componenc. In contrasc, only 4 sttidies FINDINGS OF A FIELD TEST described parcicipancs wich sufficienc detail to afOF QUALITY INDICATORS ford ceplicacion (Kevin ec al., 1998; Hoffet al., Quality Indicator I: Describing Participants 2005; Liaupsiti et al., 2006; Smith & Sugai, and Setting. Results revealed that only 1 of che 12 2000). Similarly, 4 scudies described parcicipant scudies reviewed mec all chree components (par- selection criceria wich replicable precision (Ingram ticipant description, parcicipant selection criteria, et al., 2005; March & Hornee, 2002; Smich & and setting descripción) constituting che quality Sugai; Scage ec al., 2006). S T U D E N T S WITH EBD Spring 2009 Quality Indicator 2: Dependent Variables. Five studies met the quality indicator ior dependent variables as evidenced by addressing the five components {description, quantifiable measurement, valid and well-described measurement, repeated measurement, and IOA) constituting this indicator {Gunter et al., 1993; Knapczyk, 1992; Liaupsin et al., 2006; Penno et al, 2000; Smith & Sugai, 2000). Four studies met coding criteria tor all but one component (Ervin et al., 1998; HolT et aJ., 2005; Knapcyzk, 1988; March & Homer, 2002); I study met criteria for three components {Ingram et al., 2005); and 2 studies met criteria for two components: quantifiable measurement and repeated measurement (Schloss et al., 1981; Stage et al, 2006). In all studies, each dependent variable was measured in such a manner thac produced a quantifiable index (e.g., percentage of intervals on-task; Ervin et al., 1998) and all but two studies {Schloss et al, 1981; Stage et al., 2006) operationally defined ail dependent variables. In the latter study, all behavior codes were stated, but not all terms were operationally defined. The majority ot studies {n - 9) reported a valid and welldescribed measurement system. For example, Liaupsin et al. {2006) described data collection of on-task behavior as tollows "3O'S whole interval recording procedure. Observations were 20 min in length and began 5 to 10 min after the assignment of independent class work or reading" (p. 584). In addition, nine studies measured the dependent variables repeatedly over time according to coding criteria (minimum of 3 data points per phase). In instances when this component was not met, there were typically fewer than 3 data points in a phase. For example, in the Ervin et al. (1998) study, one of the students, Joey, had just 1 datum point in the return to baseline phase. Finally, criteria for the IOA component {IOA > 80%; Kappa > 60%) were met in eight studies. However, in some cases IOA was reported as an overall mean, but not for each dependent variable individually {e.g., Ingram et al, 2005; Stage et al.). In other cases, the criterion tor IOA criteria was met, yet the criterion for Kappa was not met (e.g., March & Horner, 2002). Quality Indicator 3: Independent Variable (IV). Six studies met the quality indicator for independent variable as evidenced by addressing the Exceptional Childrm three components (IV description, systematically manipulated, fidelity of implementation) constituting the quality indicator (Ervin et al, 1998; Gunter et al., 1993; Ingram et al, 2005; Liaupsin et al., 2006; Penno et al., 2000; Smith & Sugai, 2000). Three studies met two components: independent variable description and systematic manipulation of the independent variable {Hoffet al, 2005; March & Horner, 2002; Stage et al, 2006); yet, these studies did not address implementation fidelity. The final three studies addressed one out of three components, with all three studies systematically implementing the independent variable (Knapczyk, 1988, 1992; Schloss et al, 1981). In all studies {n - 12) the independent variable was systematically manipulated by the experimenter; of these studies, 9 described the intervention procedures with replicable precision. Six studies measured and reported treatment fidelity. The 3 studies not meeting expectations for fidelity were published between 1981 and 1992 (Knapczyk, 1988, 1992; Schloss et al., 1981). However, it should be noted that the importance of treatment integrity was not emphasized in the literature until the 1980s as documented in articles written by Yeaton and Sechresc {1981) and Gresham (1989). March and Horner {2002) addressed the lack of treatment Rdelity data as a limitation stating "a final limitation lies in the absence of treatment integrity data . . . the only process for documenting fidelity of procedural implementation was the weekly observation and feedback to teachers by the first author" (p, 168). Although Hoffet al. (2005) stated that "Kevins teacher implemented all of the intervention strategies" (p. 50), they did not mention how (or iO they collected fidelity data. Finally, Stage et al. (2006) did monitor fidelity of data, but they reported poor fidelity ot implementation (e.g., "In Gale's case, there was a complete lack of treatment fidelit)' within the general education setting." p. 468), thereby not meeting this component. Quality Indicator 4: Baseline. Seven studies met the quality indicator for baseline as evidenced by addressing the two components (repeated measurement and established pattern description) constituting the quality indicator (Gunter et al, 1993; Knapczyk, 1988, 1992; Liaupsin et al., 2006; March & Horner, 2002; Penno et al.. 2000; Smith & Sugai, 2000). The remaining five studies met ar least one of the two criteria for the baseline quality indicator. More specifically, three studies met at least one component, meeting expectations for an established pattern and repeated measurement (Ingram et al., 2005; Schioss et al., 1981; Stage et al., 2006). The other two studies met expectations for description of baseline conditions (Ervin et al., 1998; Hoffet al., 2005). Ten studies met the criteria for reporting a baseline phase that included three or more data points and an established pattern of repeated measurement ofa dependent variable that supported a patterned responding predictive of future behavior. However, 2 studies included fewer than the requisite number ot data points in the return to baseline phase (Ervin et al., 1998; Hoffet al., 2005), although Hoff and colleagues acknowledge this as a "brief withdrawal of the intervention and return to baseline" (p. 51). Nine studies met the requisite criteria for describing the baseline condition. The remaining 3 studies did not describe the baseline condition precisely enough for replication (Ingram et ai., 2005; Schioss et al., 1981; Stage et al., 2006). nal validity according to the posed criteria. Several studies did not meet this component due to the absence of treatment integrity (e.g.. Hoffet al.; Knapczyk, 1992; March & Horner; Schioss ct al.; Stage et al.). Finally, six studies met the component of pattern of results that supported experimental control (Gunter et al.; Knapczyk, 1988, 1992; March & Horner; Schioss et al.; Smith & Sugai). The absence of sufficient data points in each phase prohibited studies from satisfying this component (e.g., Ervin et al.; Hoffet al.; Ingram et al.), as did the absence of sufficient demonstrations (e.g., Liaupsin et al.; Penno et al.; Stage et al.). Quality Indicator 6: External Validity. Only one study established external validity according to the coding procedures (Smith & Sugai, 2000). In most studies, external validity was not established given that we defined the presence of internal validity as a prerequisite to external validity. Namely, the study needed to meet all components constituting the experimental control/internal validity indicator to have the possibility of experimental control. Ihus, only two studies (Gunter et al., 1993; Smith & Sugai) had the possibility of Quality Indicator 5: Experimental Control/In- meeting this indicator. ternal Validity. Two studies (Gunter et al., 1993; Quality Indicator 7: Social Validity. One study Smich & Sugai, 2000) met the three components met the quality indicator for social validity as eviconstituting this quality indicator: three demon- denced by addressing the four components (destrations of experimental effect, internal validity, pendent variable is socially important, change in and pattern ot results. Four studies met two com- dependent variable is socially important, indepenponents: three demonstrations of experimental ei- dent variable is practical and cost effective, and fect and pattern of results, with internal validity practice is used in typical contexts) constituting not established (Knapczyk, 1988, 1992; March & the indicator (Smith & Sugai, 2000). Seven studHorner, 2002; Schioss et al., 1981). Six studies ies met all components save for the third compodid not meet any of the components. nent, which required cost-effectiveness to be In terms of the components, six studies stated (Ervin et al., 1998; Gunter et al., 1993; demonstrated experimental efFect as evidenced by Hoffet al., 2005; Ingram et al., 2005; Knapczyk, at least three demonstrations across participants 1988, 1992; March & Horner, 2002). The re(e.g., March & Horner, 2002; Schioss et al, maining four studies met two of the four compo1981); setting (Knapczyk, 1988, 1992); or via an nents, with three studies establishing the ABAB design (Gunter et al., 1993; Smith & dependent variable as socially important and emSugai, 20Ü0). Based on coding criteria, experi- ploying the independent variable in typical conmental effect was scored as absent if there was an texts (Liaupsin et al., 2006; Penno et al., 2000; insufficient number of data points in a phase Stage et al., 2006). The fourth study established (e.g., Ervin et al., 1998; Hoffet al., 2005; Ingram the dependent variable as socially important and et al., 2005) or if there were only two or fewer reported a change that was socially important demonsrrations evident (e.g., Liaupsin et al., (Schioss et al., 1981). 2006; Penno et a!., 2000). Only two studies All studies established the dependent variable (Gunter et al.; Smith & Sugai) established inter- as socially important and 11 reported use of the Spring 2009 independent variable in typical contexts. Nine established the change in the dependent variable as socially important. Yet, only 1 study (Smith & Sugai, 2000) specifically stated that the intervention was both practical and cost effective, reporting that the intervention was "conducted in [an] actual classroom with minimal time or use of additional resources" (p. 215). FuNC TION-BA SED IN TER VEN TIÜNS EOR SECONDARY-AGE WITH EBD: STUDENTS DETERMINATION AN EVIDENCE-BASED OF PRACTICE Given that only one study (Smith & Sugai, 2000) met all seven quality indicators, it is clear that ftinction-based interventions conducted with secondary-age students with and at risk tor EBD cannot yet be docutnented as an evidence-based practice according to Horner et al. s (2005) standards. As a practice. Function-based interventions involve (a) conducting descriptive and, in some cases, experimental tools to identify the function of the target behavior; (b) designing ati intervention linked to functional assessment data to adjust antecedent conditions and to maintain consequences so that the student can acquire a more reliable, more efficient, functionally equivalent behavior; and (c) implementing the intervention with fidelity using an experimental design (e.g., multiple ba.seline, ABAB) that ensures experimental control. However, in the studies reviewed, the number of quality indicators met in entirety ranged from 0 to 7 (see Table 1). Moreover, only one study met four indicators (Gunter et al., 1993); two studies met three indicators (Liaupsin et al., 2006; Penno et al., 2000); one study met two indicators (Knapcyzk, 1992); and four studies met just one indicator (Ervin et al., 1998; Ingram et al., 2005; Knapczyk, 1988; March & Horner, 2002). In addition, it should be noted that despite the specification of inclusion criteria, there was still variability in the functional assessment tools employed, student characteristics, and instructional setting. For example, although all studies reviewed met the inclusion criteria of having one functional assessment tool, a hypothesis, and an intervention linked to the functional assessment data, there still was variability in the functional Exceptional Children assessment process used to identify the maintaining function of the target behavior (see Table 2). Some studies involved both teacher and student interviews (e.g., Ervin et al., 1998; Hoffet al., 2005; Ingram et al., 2005; Liaupsin et al., 2006; March & Horner, 2002; Penno et al., 2000; Smith & Sugai, 2000; Stage et al., 2006), yet other studies involved only teacher interviews. Likewise, several studies involved functional analyses of behavior (e.g., F.rvin et al.; Hoff et al.; Penno et al.; S t ^ et al.). Second, the articles reviewed contained students with different facets of EBD as described in the article selection process. Finally, although all studies were conducted in school-based settings (e.g., self-contained schools, self-contained classrooms), and not in clinical settings, there was still heterogeneitj' in the settings. Thus, it shouid be noted that there was still variability in terms of target population, context, and functional assessment processes. Even if the results supported functional assessment-based interventions as an evidence-based practice for adolescents with or at risk for EBD according to quality indicators posed by Horner et al. (2005), the actual practice evaluated still may have contained variability in the components constituting the practice despite the inclusion criteria specified in this review. DISCUSSION Students with EBD pose significant challenges to parents, teachers, and society as a whole (Kauffman, 2005). Function-based interventions are one tertiary level, ideographic approach employed to meet the multiple needs of this population, particularly for elementary-age students (Lane et ai., 1999). However, (ii net ion-based interventions bave not yet been established as an evidencebased practice for secondary-age students with EBD according to the criteria specified by Horner et al. (2005). This is unfortunate given that function-based interventions are mandated per IDEA for students with specific disciplinary circumstances (Kern et al., 2004) In this analysis, a systematic literature review identified 12 studies of function-based interventions conducted with middle and high school students with and at risk for EBD in school settings. 333 TABLE 2 Functional Assessment Components Functional Assessment Component Schloss, Kane, & Miller (¡98!) Knapczyk (¡988) Knapczyk (1992) Gunter, Jack, Ervin, Shores, DuPaul, Kern. Carrell, & & Friman Flowers (1993) (1998) Penno Frank, & Wacker (2000) Direct observations no yes yes yes yes yes Teacher interview yes yes yes no yes yes Student interview yes no no no yes Parent interview yes no yes no no no Other interview no no no no no no Rating scales no no no yes yes no Record search no no no no no y& Functional analysis no no no no yes yes Hypothesis statement yes yes yes yes yes yes Intervention linked to asse.ssnient data yes yes yes yes yes yes Functional Assessment Component Smith & Sugai (2000) March & Homer (2002) Hoff, Ervin, & Friman (2005) Ingram, Lewis-Palmer. &Sugai (2005) Liaupsin Stage, Jackson, Umbriet, Moscovitz, Ferro, Urso, Erickson, Thurman, & Upreti &Jessee, et al (2006) (2006) Direct observations yes yes yes yes yes yes Teacher interview yes yes yes yes yes yes Student interview yes yes yes yes yes yes Parent interview no no no no no yes Other interview no no no no no no Rating scales no no yes no no yes Record search yes yes no no yes no Functional analysis no no yes no no yes Hypothesis statement yes yes yes yes yes yes Intervention linked to assessment data yes yes yes yes yes yes Application of the core quality indicators for single-subject research revealed only one study (Smith Oí Sugai, 2000) as meeting all 21 components constituting the seven quality indicators posed by Horner and colleagues (2005). Given that only one study met this rigorous set of indicators, there is an insufficient number of studies conducted that meet the requisite standards for qualifying a practice as "evidence-based" according to the criteria set forth by Horner and colleagues. However, we contend that this assessment may be based on indicators that may be somewhat too rigorous. In the sections that follow we (a) offer illustrations of how some of the indicators may exceed reasonable standards and (b) propose a different approach to evaluating a given study against the posed quality indicators. QuAHTY INDICATORS: STANDARDS? REASONABLE As we coded the articles in the review, we discussed certain components that may be so stringent that they excluded studies that do, in fact, make a meaningful contribution to the knowledge base. Specifically, we felt that the require- Spring2009 mènes for describing participants, establishing repeaced measurement of the dependent variable, repeated measurement and escablished pattern for baseline, and stating cost-effectiveness as a component of the social validity indicator may need CO be reconsidered. In this analysis, a systematic literature review identified 12 studies offiinctionhased interventions conducted with middle and high school students with and at risk for EBD in school settings. Describing Participants. For example, in Qualicy Indicator 1: Describing Participants and Settitigs, the first component focused on participant description. To meet requisite criceria for this componenc, che authors needed to reporc the specific disability or condición and the "specific instrument and process used to determine their disability" (Horner et al., 2005, p. 167). It may be that the latter componenc is beyond reasonable at chis cime. Alchough ic is imporcanc to ensure precision for purposes of replication, it may be more reasonable to require chat the process (e.g., as determined by a mulcidisciplinary team) be accepcahle racher than requiring specific instruments. This is particularly true given information available in cumulative files and space limitations associaced wich publicación efforcs. Repeated Measurement. As part of Qualicy Indicacor 2: Dependent Variable, componenc four required that dependent variables be measured repeatedly over cime, and Qualicy Indicator 4; Baseline established che need for 5 daca points in baseline, wbich we alcered to require a minimum of 3 daca poincs. Yet, according to Kennedy (2005), the "goal of baseline is co escablish pacterns of behavior co compare co inccrvencion. Therefore, a baseline needs only be long enough Co adequacely sample chis pactern" (p, 38). Consider the scudy by Ervin ec al. (1998) in which fewer than 3 data poincs were coUecced during che reversal phases, One could argue that because of the dramatic change in level, additional data points were noc warranted in the return to baseline phase. However, for purposes of chis review, all areicles wich fewer than 3 data poincs were re- Exceptional Children porced as not meecing che componenc of repeaced measurement. Failure co meet che requisice number of daca points per phase also influenced the extent to which the internal validity indicator was met. Again looking at the Ervin ec al. (1998) scudy, the return to baseline phase for Joey had but 1 datum point, which did not meet criteria for baseline requirement. Thus, this study did not meec che internal validity criteria. Because internal validity is required Co establish external validicy (Wolery, 2007 personal communicacion), this also precluded this study from meeting the external validity componencs. Yet, despice che limited number of daca points, the argument could be made for experimental control given the clear changes in level. These same ramifications were recognized when coding the study conducted by Hoff et al. (2005). T h e brief return co baseline (2 data points) did noc meec our minimum criceria of 3 daca poincs per phase. Therefore, the article was coded as not having at lease three demonstrations of experimental effect and the patterti of results was noc sufficient given chat only 2 data points were in the return to baseline condition. Because internal validity was not established, excernal validity was absenc as well according to out coding procedures. However, in inspecting the graph, there was a very clear change in level and possibly trend when the intervención was withdrawn. I his serves as anocher illuscradon as to the possibility chac some of che componencs defining each qualicy indicacor (e.g., requirement of a minimum of 3 data points) may be too scringenc. Thus, some scudies may be excluded chac do lend support for a given praccice. Cost Effectiveness. The third component of Quality Indicator 7: Social Validity required that che intervención be "practical and cose effeccive" (Horner et al., 2005, p. 174). Concordanc wich Tankersley, Cook, and Cook's (in press) efforts to evaluate Horner et al.'s quality indicators based on information reported in cexc, our coding system required the cosc-effeccivcncss of an incervention to be seated explicitly. Vet chis requirement may be coo rigorous because only one study (Smich & Sugai, 2000) explicitly mentioned costeffectiveness of the intervención. Moving forward, ic may be wise to offer clarifying poincs for evalu- ating cost-effectiveness as many studies may indeed be cost-etFective in the sense that the benefits outweigh the costs (e.g., time, resources), even though cost-effectiveness is not computed or discussed explicitly. One could argue tbat a practices cost-etifectiveness could be assessed indirectly by looking at social validity or treatment integrity data. Namely, if the intervention was too costly in terms of time or resources, then it would be apt to receive a negative social validity rating or be implemented with low fidelity {Lane & BeebeFran ken berger, 2004) We do recognize that it is difficult to develop indicators and coding practices that can successfully capture the contribution and qualities of all studies. For example, the Penno et al {2000} study did not meet our criteria for establishing a socially important change in the dependent variable. However, it should be noted that the authors reported "of particular importance is the finding tbat behavior problems were reduced for 2 of three participants even though the instructional modifications were designed to enhance academic performance" (Penno et al, p. 341). The coding system we applied overlooked this finding. Further, as we evaluated studies that were published more than 2 decades ago, it is important to note that standards for research shift over time. For example, the three studies not mentioning or reporting fidelity of the independent variable, required as part ot Quality Indicator 3: Independent Variable, were published between 1981 and 1992 (Knapczyk, 1988, 1992; Schloss et al, 1981)—prior to the emphasis placed on treatment integrity. Finally, che study by Stage et al (2006) reported three cases, but article selection procedures restricted coding to only secondary-age students. Consequently, the two other applications to younger students—which met many of the indicators—were not reported in this APPLICATION OF THE INDICATORS: A MoDiTiED APPROACH Rather than evaluating only those studies that met all indicators in entirety, another approach might be to impose an 80% minimum criteria with "credit" or recognition of the components that were addressed in a given quality indicator. For example, the dependent variable quality indicator contains five components that need to be addressed. Moving forward, we may want to consider weighting each component, with each component contributing an equal proportion of the quality indicator. In the case of the dependent variable quality indicator, each component would be weighted as contributing to 20% of the total score tor the indicator. To illustrate, consider the article by Ervin and colleagues (1998). This study met the requirements for description, quantifiable measurement, valid and well-described measurement, and IOA. Yet it did not meet the requirements for measured repeatedly. Rather than scoring this indicator as a zero for omitting one of the five components, a weighted scoring could be employed as follows: DV quality indicator = ((descriprion){l) (.20)) + {(quantifiable measurement)(l) {.20)) + {{valid and well-described measurement)(l) {.20)) + ({measured repeatedly)(0) In this case, rather than applying an absolute coding system of "met" or "not met," the study could receive "parcial credit" for the components that were addressed. In the above illustration of the dependent variable indicator, the study would receive an overall score of .80 rather than receiving a zero. If this method was applied to all indicators for this study, then the overall quality indicator composite score for the Ervin et al {1998) study with partial credit would be 3.72 (describing participants = 0.67; dependent variable = 0.80; independent variable = 1,00; baseline - 0.50; experimental control = 0.00; external validity = 0.00; social validity = .75) as opposed to the current score of meeting one out of seven indicators. Such a scoring system would reveal a more precise, detailed description of the critical components addressed in the study. In lable 2, we present a total score for each article when scored using the presence or absence of each indicator as well as the partial-credit scoring system explained above. If we set a goal of studies achieving 80% of the indicators (80% x 7 indicators), then studies with a total score of 5.60 could be considered rigorous enough to be evaluated in the decision of whether or not a practice is Spring 2009 evidence-based. In this review, no additional studies would have been included for evaluating the evidence base. However, it is possible that such a coding procedure could influence the number of studies included in other literature reviews. Yet another consideration would be to differentiate between the value of each indicator. Namely, are certain quality indicators (e.g., internal and external validity) more important than other indicators (e.g., social validity)? Some may argue that violating internal validity is a more serious concern than omitting social validity. If so, should the weighted value of each indicator or each component within each indicator be considered? Also, should the value of the indicator be dependent on the type of study (efficacy or effectiveness) being conducted? As we move toward conducting studies in more applied settings with less university support, should the value of certain indicators be viewed as more or less necessary? studies included interviews from teachers and students (e.g., Penno et al., 2000); some included functional analyses (e.g.. Hoffet al., 2005); and some included record searches (e.g., Liaupsin et al., 2006). Thus, although the interventions derived from functional assessment data were evaluated in terms of the quality indicators, the functional assessment process was not standardized (Kern et al., 2004; Sasso et al., 2001). We recommend that future reviews be considered in which a particular method of conducting function-based interventions, such as the model posed by Umbreit et al. (2007) be evaluated to determine if the specific model is an evidence-based practice. Despite these considerations, this article offers an initial application of the core qualiry indicators and standards for evidence-based practices proposed by Horner et al. (2005) for single-case methodology to functional assessment-based inCONCLUSION COMMENTS: terventions conducted with secondary-age stuCONS/DERATIONS AND FUTURE dents with EBD or at risk for developing EBD. DIRECTIONS Findings suggest that when assessed using the criteria proposed, this practice cannot be considered As we conclude the task of applying the quality an evidence-based practice at this time. However, indicators and standards posed by Horner et al. we contend that this practice holds promise. Cer(2005) to function-based interventions with secondary-age students with EBD, we offer the fol- tainly, additional high-quality research may result lowing comments. First, we applaud Horner and in the practice being considered ev id en ce-based colleagues for the effort placed into developing for the target population using these or similar quality Indicators for single-case research. This standards. Weighting the criteria, assigning partial was clearly a formidable—and necessary-—^task credit, or weighting indicators depending on the that will continue to influence how research pro- focus of the study may also be possible directions posals and subsequent investigations will be con- for reñning the application of indicators; in this ducted. We value the concept of setting standards way, researchers are certain to include all meanand hope that our goal of offering input as to ingful and trustworthy studies of the practices where these indicators may be too stringent and and ensure that important contributions to this in need of modification is received in the spirit body of literature are not eliminated based on criintended: to establish scientifically valid, yet reateria being unattainable. In the years to come, it sonable indicators for evaluating single-subject will be important to be thoughtful and careful as work. scholars and stakeholders use the proposed indicaFinally, in this field testing of the proposed tors. 