EJ1304830
EJ1304830
EJ1304830
brief-report2020
AEIXXX10.1177/1534508420944231Assessment for Effective InterventionBruhn et al.
Brief/Psychometric Reports
Assessment for Effective Intervention
Abstract
Self-monitoring interventions for students with challenging behavior are often teacher-managed rather than self-managed.
Teachers direct these interventions by completing parallel monitoring procedures, providing feedback, and delivering
contingent reinforcement to students when they monitor accurately. However, within self-monitoring interventions, the
degree to which teachers and students agree in their assessment of students’ behavior is unknown. In this study, a self-
monitoring intervention in which both teachers and students rated the students’ behavior, we analyzed 249 fixed interval
ratings of behavior from 19 student/teacher pairs to determine the relationship between ratings within and across teacher/
student pairs. We found a strong correlation overall (r =.91), although variability existed within individual pairs and student
ratings tended to be higher than teacher ratings. We discuss implications for practice, limitations, and future directions.
Keywords
interventions, technology, progress monitoring
Self-monitoring (SM) is an antecedent-based strategy in reported 13 studies including contingent reinforcement for
which students are taught to recognize the occurrence of a students’ SM accuracy (i.e., student ratings [SR] matched
specific behavior and record the extent to which that behav- teacher ratings [TR] either exactly or within a range during
ior occurs at predetermined times. Theoretically, SM is the same time period). SM with reinforcement for matching
effective because it prompts students to be intentional about accuracy has resulted in decreases in off-task and disruptive
exercising control over their behavior (i.e., self-regulation; behaviors (e.g., Freeman & Dexter-Mazza, 2004). Relatedly,
Bandura, 1991). Research indicates SM has been successful researchers have found that once students were deemed accu-
in improving students’ academic and behavioral outcomes rate with SM and accuracy checking and reinforcement were
(Bruhn et al., 2015). One argument for SM interventions is removed, on-task behavior continued to improve (Peterson
that if they are truly student-managed, students will become et al., 2006). Conversely, Ardoin and Martens (2004) found
more self-reliant and independent while also reducing the accuracy matching decreased disruptive behavior, but when
cost and burden associated with teacher-managed interven- matching was removed, behavior worsened.
tions (Briesch & Chafouleas, 2009). A meta-analysis of SM Regardless of the effects of accuracy matching and its
interventions for students with autism found greater student unique contribution to SM interventions, across these stud-
involvement resulted in stronger effects (Davis et al., 2016). ies, the teacher’s rating is the presumed standard for accu-
In contrast, reviews of SM studies have revealed SM inter- racy (e.g., Chafouleas et al., 2012). In some cases, these data
ventions are often reliant on the teacher to manage external may be used to make decisions about student responsiveness
contingencies such as delivering feedback and reinforce- to SM. In a recent study, 13 elementary teachers and one of
ment (Briesch & Chafouleas, 2009). On one hand, this may their students completed ratings of students’ behavior during
be viewed as a limitation of the extent to which SM is really
self-managed by the student. Conversely, teacher involve- 1
The University of Iowa, Iowa City, IA, USA
ment has been used as one way to improve the accuracy of 2
Vanderbilt University, Nashville, TN, USA
students’ SM, while also promoting generalization across
Corresponding Author:
settings (Peterson et al., 2006). Allison Bruhn, The University of Iowa, N252 Lindquist Center,
In a review of 41 peer-reviewed articles on SM for stu- Iowa City, IA 52242, USA.
dents with problem behavior, Bruhn and colleagues (2015) Email: allison-bruhn@uiowa.edu
Bruhn et al. 317
one instructional classroom activity (Bruhn et al., 2019). total difficulties screened into the study. In total, 17 teachers
The length of the time between ratings (e.g., every 5 min) and 18 students participated in the study. One student par-
and total session length varied by teacher (e.g., 45 min). ticipated with the same teacher in two different settings, and
Teachers used their ratings of behavior to (a) determine thus, we analyzed each setting separately. One teacher com-
whether students were responding to the SM intervention pleted procedures with two different students (each at dif-
and (b) make intervention adaptations (e.g., increasing SM ferent times and in separate settings). Thus, the analysis
interval length). According to multilevel modeling of teach- indicates 19 student/teacher combinations (see Table 1).
ers’ rating data, students improved their positive behaviors
significantly (p < .001) from baseline to intervention. Measures and Procedures
As students progress through SM interventions, which
include teachers completing parallel procedures to check SDQ. The SDQ is a behavioral rating scale consisting of 25
for accuracy and make data-based decisions, teacher sup- items rated on a 0–2 scale (i.e., never, sometimes, always)
port may be faded to promote maintenance and generaliza- that are used to assess student risk across five domains:
tion. To continue tracking student progress without parallel hyperactivity/inattention, emotional symptoms, conduct
data, teachers may have to rely on students’ SM data to problems, peer problems, and prosocial behavior. The first
evaluate on-going response to intervention. For teachers four domains constitute an aggregate score for total difficul-
and researchers who view teachers’ data as the standard for ties. The SDQ was originally validated for ages 4–17 years.
accuracy, they may be hesitant to rely on students’ SM data It has demonstrated high correlations with the Rutter Ques-
for fear it may be unreliable. To this end, the purpose of this tionnaire (Rutter, 1967) and the Child Behavior Checklist
brief report is to examine the degree to which teacher and (Achenbach, 1991); while also evidencing adequate internal
student ratings completed as part of an SM intervention are consistency (α = .64–.89; Hill & Hughes, 2007).
related. Research questions (RQ) include the following:
Percentage of positive behavior: Teacher and student ratings.
Research Question 1 (RQ1): Across all sessions and Teachers and students used a noncommercially available,
teacher/student pairs, is there a correlation between aver- author-developed mobile application (MoBeGo) on an iPad
age teacher and average student ratings? to rate student behavior. We used MoBeGo rather than tra-
Research Question 2 (RQ2): Are average teacher and ditional paper forms because this app was being tested as
average student ratings significantly different? part of an externally funded research and development proj-
Research Question 3 (RQ3): To what extent is there ect funded by the Institute of Education Sciences (510-14-
agreement between teacher and student within sessions 2540-00000-13607400-6200-000-00000-20-0000). As part
for individual student–teacher pairs? of the iterative development process of MoBeGo, we aimed
to determine the extent to which teacher and student ratings
were similar to each other.
Method Prior to completing ratings, teachers first met with
research assistants (RAs) to complete a 1-hr training
Participants and Setting sequence during which they determined (a) students’ prob-
The Institutional Review Boards at two universities and lem behaviors, (b) positive replacement behaviors to moni-
three school districts approved this study. Participants tor, (c) the class or activity for monitoring, and (d) SM
included teachers and students from two school districts (A interval lengths. Teachers programmed the app to these
and B) in a Midwest state that is noncategorical for special specifications during the training. Teachers had the option
education services (i.e., students are not labeled under the to select from positive behaviors from the default settings in
13 disability categories) and one urban district (C) from a the app or input their own behaviors. The behaviors had
Southern state. One middle school from District A (rural), accompanying operational definitions in the form of a ques-
three elementary schools from District B (small city), and tion (e.g., Be Responsible = Did the student work carefully
one middle school from District C (urban) participated. on the assigned task and ask for help if needed?). Teachers
Teachers of Grade 3–8 consented to participate and identi- could select as few as one or as many as five behaviors,
fied students who might benefit from behavioral SM (e.g., although generally, they selected three. Teachers and RAs
frequent off-task behavior, high rates of office discipline discussed various classroom scenarios and how behaviors
referrals, behavior goal on individualized education pro- might look during these scenarios.
gram). Then, we obtained parental consent and student After programming behaviors into the app, teachers
assent, and teachers completed the Strengths and Difficulties selected a target class period (e.g., seventh period math) or
Questionnaire (SDQ; Goodman, 1997) on the consented instructional activity (e.g., reading rotations) for the student
student. Students who scored in the borderline or abnormal to self-monitor. Teachers selected the class or activity dur-
range for hyperactivity/inattention, conduct problems, or ing which the student most often displayed the problem
318 Assessment for Effective Intervention 46(4)
District B
District A (Three Elementary District C
(One Middle School) Schools) (One Middle School)
behavior. Each behavior was rated on a fixed interval, the classroom. This included teaching the student about the
selected by the teacher, for the duration of the class period programmed behaviors by reviewing operational definitions
or instructional activity. For instance, if math instruction (e.g., examples and nonexamples), discussing why these
occurred for 45 min and the teacher selected a 5-min inter- behaviors are important to classroom success, and asking the
val length, then they had up to nine opportunities for rat- student how they would rate given different classroom sce-
ings. Teachers customized interval length to suit individual narios. Students learned how to use the various features and
student need (e.g., severity of problem behavior, student functions of the app (e.g., where to touch the iPad to rate
age) and instructional context. An audio prompt from the behaviors, how to view behavior definitions). Next, students
app signaled the interval was over and it was time to rate. practiced rating behaviors with the app based on hypothetical
Ratings followed a 5-point numerical scale with accompa- scenarios. Scenarios included examples or nonexamples of
nying anchors (0 = never, 1 = a little, 2 = sometimes, 3 = the programmed behaviors, and then students practiced rat-
a lot, 4 = always; see Figure S1 in Supplemental Appendix). ing behaviors using the app until they demonstrated 100%
The app automatically calculated and graphed an aggregate accuracy using the app’s functions and indicated they were
percentage of positive behavior (PPB) by summing the total comfortable with procedures and definitions.
number of points earned, dividing by the total points possi- The next day, both the teacher and student rated the stu-
ble, and multiplying by 100. Using the previous example, if dent’s behavior during the same instructional period using
the teacher rated two behaviors, there was a possibility of the same interval length and procedures the teacher used dur-
72 points (two behaviors × four points × nine ratings). ing baseline. During the intervention condition, students
Previous research has indicated moderate to high correla- rated first. Immediately after rating, students passed the
tions between teachers’ ratings of students’ positive behav- device to their teacher, and the teacher rated the student’s
ior and systematic direct observation of academic behavior for that same interval. After both students and teach-
engagement (r = .61–.91; Bruhn et al., 2018), as well as ers completed ratings independently, they viewed both rat-
high interrater reliability between teachers and RAs, using ings before starting the next interval (see Figure S1 in
the same 5-point scale (r = .82–.91; Bruhn et al., 2018). Supplemental Appendix). Although they viewed these rat-
Once teachers completed the training, they began rating ings together to see how ratings aligned, teachers did not
their student’s behavior during the same instructional period deliver planned reinforcement for accuracy matching.
for 3 consecutive days (i.e., baseline). Following baseline, Teachers had the option to provide specific feedback on the
after class was over, the teacher and RA trained the student in ratings (e.g., “You did a great job with . . . ” and “I see we
Bruhn et al. 319
both rated you a 3 . . . ”). This continued for each interval corresponding TRs, which was a statistically significant
until the end of the session. We used the total PPB from TR difference (t[18] = 2.85, p = .01).
and SR from each completed session for data analysis. Across
the 19 teacher/student pairs, the number of completed ses-
RQ3: Within Teacher–Student Pair Agreement
sions for each pair ranged from 6 to 27 (median = 12) result-
ing in 249 sessions with a PPB from both TR and SR. In 14 of 19 cases, TR and SR demonstrated a moderate to
high correlation (r = .52–.96). We did not observe this trend
in five cases (see Figure S4 in Supplemental Appendix).
Data Analysis Ratings for Student 1 showed no association (r = .05) due to
RQ1: Correlation between average teacher and student ratings. consistently high SR. For Student 7 (not shown), we did not
To determine whether there was a correlation between the calculate a correlation because the student consistently gave
average TR and the corresponding average SR for teacher/ perfect ratings. The moderate correlations for Student 16 (r
student pair, we first calculated the average rating across ses- = .39) and Student 19 (r = .45) were also hampered by low
sions. We then plotted the averages and calculated Pearson’s variability in the SR. Thus, in these four cases, the lack of a
correlation. Given research suggesting students can be trained strong correlation appears to be a result of low variability in
to accuracy (e.g., Ardoin & Martens, 2004), we hypothesized SR, which were consistently higher than TR. The fifth case,
a moderate to high correlation between average ratings. Student 18, may be the most interesting (r = .47). Student
18 showed high agreement when the teacher provided a high
RQ2: Difference between teacher and student ratings. To rating but showed greater variability in self-ratings when the
examine the overall degree of agreement between TR and teacher provided a low rating.
SR, we conducted a paired samples t test. We used this to Mixed model analysis confirmed the earlier findings.
determine whether there was a significant difference between That is, we found a significant positive relationship between
TR and SR. Based on previous research (e.g., Ardoin & teacher and student ratings (p = .0001). Second, we found
Martens, 2004), we hypothesized students would rate them- considerable variability in the strength of the relationship
selves higher, but the difference would not be significant. across individual students (p = .0001).
Hill, C. R., & Hughes, J. N. (2007). An examination of the Riley-Tillman, T. C., Chafouleas, S. M., Sassu, K. A., Chanese,
convergent and discriminant validity of the Strengths and J. A., & Glazer, A. D. (2008). Examining the agreement of
Difficulties Questionnaire. School Psychology Quarterly, 22, direct behavior ratings and systematic direct observation
380–406. data for on-task and disruptive behavior. Journal of Positive
Peterson, L. D., Young, K. R., Salzberg, C. L., West, R. P., & Hill, M. Behavior Interventions, 10(2), 136–143.
(2006). Using self-management procedures to improve Rutter, M. (1967). A children’s behavior questionnaire for com-
classroom social skills in multiple general education settings. pletion by teachers: Preliminary findings. Journal of Child
Education and Treatment of Children, 29(1), 1–21. Psychology and Psychiatry, 8, 1–11.