Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Oasis

2010, ACM Transactions on Computer-Human Interaction

Oasis: A Framework for Linking Notification Delivery to the Perceptual Structure of Goal-Directed Tasks SHAMSI T. IQBAL Microsoft Research and BRIAN P. BAILEY Microsoft Research and University of Illinois at Urbana-Champaign A notification represents the proactive delivery of information to a user and reduces the need to visually scan or repeatedly check an external information source. At the same time, notifications often interrupt user tasks at inopportune moments, decreasing productivity and increasing frustration. Controlled studies have shown that linking notification delivery to the perceptual structure of a user’s tasks can reduce these interruption costs. However, in these studies, the scheduling was always performed manually, and it was not clear whether it would be possible for a system to mimic similar techniques. This article contributes the design and implementation of a novel system called Oasis that aligns notification scheduling with the perceptual structure of user tasks. We describe the architecture of the system, how it detects task structure on the fly without explicit knowledge of the task itself, and how it layers flexible notification scheduling policies on top of this detection mechanism. The system also includes an offline tool for creating customized statistical models for detecting task structure. The value of our system is that it intelligently schedules notifications, enabling the reductions in interruption costs shown within prior controlled studies to now be realized by users in everyday desktop computing tasks. It also provides a test bed for experimenting with how notification management policies and other system functionalities can be linked to task structure. Categories and Subject Descriptors: H.1.2 [Models and Principles]: User/Machine Systems— Human Information Processing; H.5.2 [Information Interfaces and Presentation:]: User Interfaces— Evaluation/methodology, user-centered design General Terms: Experimentation, Human Factors, Measurement Additional Key Words and Phrases: Interruption, attention, notification management systems, breakpoints This work was supported in part by the National Science Foundation under award no. IIS 05-34462. Authors’ addresses: S. T. Iqbal, One Microsoft Way, Microsoft Corp., Redmond, WA 98052; email: shamsi@microsoft.com; B. P. Bailey, 201 N. Goodwin Ave., Department of Computer Science, University of Illinois, Urbana, IL 61801; email: bpbailey@illinois.edu. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org.  C 2010 ACM 1073-0516/2010/12-ART 15 $10.00 DOI 10.1145/1879831.1879833 http://doi.acm.org/10.1145/1879831.1879833 ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15 15:2 • S. T. Iqbal and B. P. Bailey ACM Reference Format: Iqbal, S. T. and Bailey, B. P. 2010. Oasis: A framework for linking notification delivery to the perceptual structure of goal-directed tasks. ACM Trans. Comput.-Hum. Interact. 17, 4, Article 15 (December 2010), 28 pages. DOI = 10.1145/1879831.1879833 http://doi.acm.org/10.1145/1879831.1879833 1. INTRODUCTION Notifications are being increasingly used as a powerful mechanism for proactively delivering information to users in many multitasking domains [McCrickard et al. 2003]. We use the term notification to refer to a visual cue, auditory signal, or haptic alert generated by an application or service that relays information to a user outside her current focus of attention. Notifications can provide useful benefits such as supporting near instant communication [Czerwinski et al. 2000b; Latorella 1999], enabling awareness of peripheral information [Maglio and Campbell 2000], and relaying reminders of upcoming activities [Dey and Abowd 2000]. At the same time, notifications are almost always delivered as soon as the underlying information becomes available without regard for the state of the user’s ongoing task. An important challenge is, therefore, to understand how to appropriately balance the timeliness of information delivery with the cost of interrupting user tasks. This article presents the architecture and implementation of a novel system, Oasis, that addresses this challenge by leveraging the perceptual structure of user tasks to mediate notification delivery. Improved notification management is important for end users, as controlled studies have consistently shown that this type of immediate interruption incurs costs for users in terms of decreased productivity and increased negative affect [Adamczyk and Bailey 2004; Bailey and Konstan 2006; Czerwinski et al. 2000b; Kreifeldt and McCarthy 1981; Zijlstra et al. 1999]. The accumulation of these costs may also be significant, as field studies have found that users receive many notifications during their workday [Iqbal and Horvitz 2007; Jackson et al. 2001]. Recent work has also shown that users prefer to continue the use of notifications to cue them to new information despite the interruption costs rather than to turn the notifications off and have to repeatedly check for new information manually [Iqbal and Horvitz 2010]. The problem of managing interruptions from external sources, through notifications or other mechanisms, is not constrained to the desktop. Similar problems have been identified in other multitasking environments, such as, aviation cockpits [Dismukes et al. 1998; Latorella 1999], in-vehicle systems [Lee et al. 2004] and control rooms [Stanton 1994]. An active thread of research in the domain of interruption management has been to explore techniques for delivering notifications such that the interruption costs are reduced without significantly affecting the timeliness of the notifications [Adamczyk and Bailey 2004; Czerwinski et al. 2000a; Fogarty et al. 2005; Horvitz et al. 1999; Iqbal and Bailey 2005; Iqbal and Bailey 2006]. Prior research has formulated one particular approach where notification delivery is linked to the occurrence of breakpoints within a user’s tasks. The concept of ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:3 a breakpoint originated from research in psychology showing that the human perceptual system segments observed activities into a hierarchical structure of discrete action units [Newtson and Engquist 1976] and that people typically generate similar structures for the same activity [Zacks et al. 2001]. The boundary between two adjacent action units is called a breakpoint and breakpoints can be categorized based on the granularity of the units segmented. For interactive computing tasks, it has been shown that there are at least three granularities of breakpoints—Fine, Medium, and Coarse— that can be reliably detected by users [Iqbal and Bailey 2007]. Empirical studies have shown that deferral of notifications until breakpoints are reached during task execution reduces interruption costs as measured by resumption lag and subjective assessments of frustration [Adamczyk and Bailey 2004; Iqbal and Bailey 2005; Iqbal and Bailey 2006]. One explanation as to why these moments are less disruptive is because they correspond with transient reductions in mental processing effort [Bailey and Iqbal 2008]. Studies have further shown that coarser breakpoints correspond with successively larger reductions in interruption cost [Iqbal and Bailey 2006]. In all of these studies, however, the execution sequences of the experimental tasks were controlled, the locations of the breakpoints within those sequences were identified a priori, and it was the experimenter who manually triggered delivery of the notifications. It was therefore unclear whether a computational system would be able to mimic similar techniques or whether similar reductions in cost could be achieved for free-form tasks, such as, tasks controlled by the goals of the user. Instead of breakpoints, researchers have investigated the use of other aspects of task structure such as the planning, execution, and evaluation stages of a task for notification delivery [Czerwinski et al. 2000a; Monk et al. 2002]. Though the empirical results are of theoretical interest, implementing these approaches would require explicit knowledge of the tasks themselves. A distinct advantage of using breakpoints is that they can be detected without explicit knowledge of the tasks, thereby allowing the technique to be more broadly applied. Our notification management system, Oasis (Omniscient Automated System for Interruption Scheduling) automates the process of aligning notification scheduling with the occurrence of breakpoints within free-form tasks for the desktop domain. When an application wants to deliver a notification, it sends a request to Oasis specifying the desired deferral policy (e.g., defer to the next Coarse, Medium, or Fine breakpoint). Oasis monitors the user’s interaction stream, detects breakpoints on the fly, and signals the application when it identifies a breakpoint satisfying the specified policy, after which the application can render the notification. For example, imagine that a user is cognitively engaged in editing a document and a notification is generated by her email client. Our system allows the notification to be deferred until the user completes the current sentence or paragraph rather than interrupt in the middle of editing. In general, when a user is engaged in a task and a notification is generated, our system allows the user to complete the execution of her current thought (i.e., the operations associated with the currently active “chunk” in memory) before allowing the notification to be delivered. In relation to other prototype systems for notification management, our system is the first to implement a ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:4 • S. T. Iqbal and B. P. Bailey strategy for scheduling notifications that is grounded in cognitive principles; in this case, principles related to the use of perceptual task structure. We describe the architecture and implementation of the system, how it detects breakpoints (and implicitly, task structure) without requiring explicit knowledge of the task itself, and how it layers a set of flexible notification scheduling policies on top of this detection mechanism. Oasis is designed to identify breakpoints within goal-directed interactive tasks such as document editing, image manipulation, and programming tasks, and provides access to three levels of breakpoints: Coarse, Medium and Fine. By detecting three breakpoint granularities, Oasis offers increased flexibility in balancing interruption cost with the timeliness of notification delivery. Oasis detects Coarse breakpoints independent of any specific application by tapping into the already available system-level event stream. Detection of Medium and Fine breakpoints requires access to an application’s event stream, which can typically be exposed using lightweight plug-ins. For demonstration purposes, the current implementation of Oasis provides plug-ins for two commonly used applications, Microsoft Visio (diagram editing) and Visual Studio (programming). Plug-ins for additional applications can be developed independently and interfaced with Oasis. To support these applications, situations where improved breakpoint detection is needed, or for other research purposes, the system includes a data collection module, data annotation tool, and learning component that allows statistical models for breakpoint detection to be created or refined. The contribution of this article is the description of the architecture, implementation, and usage scenarios of a functional notification management system. The system demonstrates the feasibility of automating the notification scheduling techniques that were manually applied in our prior empirical studies for controlled tasks, thereby enabling similar results to now be realized for desktop computing tasks. The system also provides a test bed for exploring how different notification management policies and other system functionalities (e.g., automated structuring of command histories) can be linked to perceptual task structure. Although we have previously reported empirical results for how the use of our system impacts users and their tasks [Iqbal and Bailey 2008], this article focuses on a discussion of the architecture and implementation details of our system and the rationale for its design. 2. RELATED WORK In this section, we discuss the benefits and costs of notification, the theory of perceptual task structure and its efficacy for notification scheduling, and other systems and strategies for managing notifications. 2.1 Benefits and Costs of Notifications An application uses notifications to proactively deliver information to the user, rendered through one of several sensory modalities—visual, auditory or haptic [Lee et al. 2004; McCrickard et al. 2003]. Perhaps the most salient benefit of using notifications is that they reduce the user’s need for repeatedly checking an external information source, a behavior which is also detrimental to the user’s ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:5 ongoing task [Maglio and Campbell 2000]. The use of notifications can be beneficial in many situations, such as, to support peer communication [Czerwinski and Horvitz 2002], provide awareness of collaborator activities [Dabbish and Kraut 2004], and relay application assistance at appropriate moments [Maes 1994]. At the same time, however, current applications almost always deliver notifications as soon as the underlying information becomes available. This often results in the user being interrupted in the midst of an ongoing action. Researchers have studied this phenomenon extensively and found these types of interruptions result in reduced task productivity [Czerwinski et al. 2000a; Kreifeldt and McCarthy 1981; Latorella 1998; McFarlane 1999; Monk et al. 2002], impaired decision making [Speier et al. 1999], and increased negative affect [Adamczyk and Bailey 2004; Zijlstra et al. 1999]. For example, Bailey and Konstan [2006] have shown that notifications interrupting the current task cause users to take up to 27% more time to complete the task, commit up to twice the errors, and experience up to twice the anxiety. Gillie and Broadbent [1989] showed that these types of disruptive effects can be caused by the complexity of the interrupting task and its similarity to the main task. Recovery and resumption of interrupted activities also take substantial time, between 10–25 minutes, and this has been corroborated by several studies [DeMarco and Lister 1999; Gonzalez and Mark 2004; Iqbal and Horvitz 2007]. Interruption costs are often attributed to limitations in human information processing capabilities, where the additional demands on cognitive resources due to the interrupting task interfere with and reduce performance on the primary task [Navon and Gopher 1979; Wickens 2002]. Our research aims to achieve a more flexible and effective balance between the costs and benefits of notifications for end users. In our approach, notifications are deferred for a short time in exchange for a meaningful reduction in the ensuing interruption cost. This is achieved by linking notification delivery to the perceptual structure of user tasks. 2.2 Perceptual Task Structure and Its Use for Notification Management Perceptual structure refers to how the human perceptual system segments an observed, goal-directed activity into discrete units [Newtson 1973]. The boundary between two adjacent units is called a breakpoint, which can be thought of as a moment of transition in perception or action [Zacks and Tversky 2001]. Controlled experiments have shown that human observers generally agree on the location of the breakpoints within a given goal-directed activity, indicating there is a shared cognitive schema that drives the perceptual system [Zacks et al. 2001]. Observers have reported using cues such as the completion of an action, change in pace of the action, and change in the object of focus as salient indicators of a breakpoint [Zacks and Tversky 2001]. Zacks et al. [2001] offered evidence showing that the perceptual system organizes observed activity into at least a two-level hierarchical structure. This was achieved by asking independent observers to watch video recordings of other people performing physical tasks, such as, folding clothes, washing dishes, making beds etc., and ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:6 • S. T. Iqbal and B. P. Bailey Table I. Definitions and Actions Corresponding to the Different Types of Breakpoints Breakpoint Coarse Fine Medium Definition Example of corresponding action transition from the largest meaningful Switch from an ongoing programming and natural unit of action to the next task to interacting with a media application transition between the smallest mean- Switch from actively editing code to ingful and natural unit of execution to compiling and debugging the next transition between natural and meaning- Switch from one source code file to anful units which are smaller than Coarse other within the same project but larger than Fine mark the locations of breakpoints that separated the largest meaningful units of action (defined as coarse breakpoints) and those that separated the smallest units (defined as fine breakpoints); and then analyzing their temporal alignment. Results showed that the observers generally agreed on the locations of coarse and fine breakpoints (within 1sec windows) and that the coarse unit boundaries aligned with the fine unit boundaries more often than would be predicted by chance. These results provide evidence of the existence of hierarchical structures in physical tasks and that these structures can be reliably detected by observers who did not have the experience of performing the task themselves. Building on this corpus of theoretical work in the domain of interactive computing tasks, Iqbal and Bailey [2007] showed that observers can reliably identify three granularities of breakpoints within task execution sequences. In addition to the coarse and fine granularities identified by Zacks et al. [2001], an additional granularity, medium, was also identified, indicating transitions between units smaller than those surrounding coarse breakpoints but larger than those surrounding fine. For example, for a user engaged in programming, switching from the programming activity to interacting with a media player (or other unrelated task) can indicate a Coarse breakpoint. A Fine breakpoint can occur by switching from editing code to compiling and debugging the code while a medium breakpoint can occur when switching between two source files to edit independent sections of code. Table I defines the three granularities of breakpoints for interactive tasks and summarizes these examples. Examples of breakpoints for other task domains are analogous. As with Zacks et al.’s [2001] work, Iqbal and Bailey [2007] also found that the locations of breakpoints identified by observers were mostly consistent with the locations identified by the users who performed the tasks. This agreement may be explained because individuals are also observers of their own actions. Inspired by Miyata and Norman’s [1986] conjecture that the transitions between different tasks and phases of a task would represent less disruptive moments for interruption, our prior work has tested the efficacy of using breakpoints for notification scheduling. Through several controlled studies, our results have shown that delivering notifications at breakpoints results in lower interruption costs than when delivered immediately [Iqbal and Bailey 2005; Iqbal and Bailey 2006]. Results further showed that coarser granularities of breakpoints correspond with successively lower interruption costs [Iqbal and ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:7 Bailey 2006]. Similar results have been shown to hold for different task domains, including document editing, programming, diagram editing, and image manipulation, which argues that the technique is reasonably general. In these experiments, however, the execution sequences of the experimental tasks were controlled, the locations of the breakpoints were manually defined a priori, and it was the experimenter who triggered the delivery of the notifications using a Wizard of Oz paradigm. Other researchers have also leveraged task structure to investigate more effective strategies for scheduling notifications. For example, Czerwinski et al. [2000a] showed that interruptions during the execution phase of a task compared to the planning or evaluation phase cause users to take more time to switch to the interrupting task. Monk et al. [2002] demonstrated that interrupting before the beginning of a subtask causes less resumption lag than interrupting during a subtask. This corpus of empirical studies has typically leveraged different phases of a task whereas our work emphasizes the use of specific moments within an execution sequence such as breakpoints. The advantage of using breakpoints is that they exist in nearly all goal-directed tasks and can be detected without explicit knowledge of the tasks themselves. This article contributes the design of a system that automates the process of linking notification scheduling to the occurrence of breakpoints within real (uncontrolled) user tasks. Our system leverages the use of statistical models for detecting the three granularities of breakpoints defined in Table I [Iqbal and Bailey 2007], but significantly extends that prior work by layering a set of flexible deferral policies on top of the models and by implementing a complete end-to-end process for scheduling notifications. The system also provides a separate tool set for building customized models for detecting breakpoints within user activities. 2.3 Strategies and Systems for Managing Notifications Notification management generally requires coordination between the interrupter and interruptee to effectively balance the cost and benefit of notifications. Existing systems can therefore be categorized as supporting at least one of three strategies; aiding the interrupter, aiding the interruptee, or supporting the coordination between them. The behavior of Oasis falls primarily into the third category. However, it could benefit from having access to the information that other systems provide to aid the interrupter, and its output could be utilized by other systems or rendering techniques meant to aid the interruptee. For aiding the interrupter, systems such as Lilsys [Begole et al. 2004], MyVine [Fogarty et al. 2004], and ConNexus [Tang et al. 2001] provide visual cues of the interuptee’s availability to be used by the interrupter. Examples of such cues include office presence, social engagement, and desktop activity. The interrupter can assess these cues to determine an appropriate moment for initiating communication. In relation to our system framework, similar cues could also be accessed by applications to determine an appropriate time window (see Sections 3.1 and 4.2) that would accompany a specific notification request. ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:8 • S. T. Iqbal and B. P. Bailey To aid the interruptee, one class of solutions aggregates and renders the notification content within a peripheral display. For example, in the Scope system, communication events, calendar appointments, and system alerts are summarized in a persistent radial display [Van Dantzich et al. 2002]. A drawback is that the interruptee must decide when and how often to check the display for updated information. Another class of solutions renders the notification in a manner that reveals important attributes of the underlying notification content. For example, Gluck et al. [2007] showed that linking the content’s utility to the salience of the notification display resulted in better performance and user satisfaction during interactive tasks. In contrast, our system schedules notifications at less disruptive moments during a user’s ongoing task. However, because our system only signals the requesting application as to when such a moment occurs, the application can use any modality (e.g., visual, auditory or haptic), level of saliency, and display system for rendering the notification. Facilitating the coordination between the interrupter and interruptee is the strategy that most closely reflects the one realized within the Oasis system and is representative of what McFarlane calls a mediated approach [McFarlane and Latorella 2002]. Systems in this category have typically employed decision theoretic approaches for deciding when to interrupt [Horvitz et al. 1999]. The systems typically use as input some combination of desktop activity, calendar events, and visual and acoustical information from the surrounding task environment. For example, in the Priorities system [Horvitz et al. 1999], Lookout [Horvitz 1999], and the Notification Platform [Horvitz et al. 2003], Bayesian networks were used for inferring a user’s focus of attention. This information was used to deliver notifications when the computed utility was high, that is, when the benefits of delivering the information outweigh the cost of interruption. A descendant of the Notification Platform, Bestcom, considered the user’s social and task context, current and future availability, and communication preferences to predict the best device and modality for interpersonal communication [Horvitz et al. 2002]. In context of desktop computing tasks, [Fogarty et al. 2005] demonstrated that application events could be leveraged to reasonably predict whether a user is interruptible or not, though they did not build a system to demonstrate the effectiveness of their approach. This class of systems is potentially powerful, but the decisions made by these systems for when to interrupt have not yet been empirically shown to correspond with reduced interruption costs. In addition, the rationales motivating these approaches have not had firm grounding in cognitive principles, which might make it difficult to interpret or explain the empirical outcomes. Relative to these systems, our system ties notification delivery to the perceptual structure of a user’s task and leverages interaction events to automatically detect this structure without specification of the task itself. This technique is grounded in cognitive principles and has been shown to yield favorable results in several empirical studies (see Section 2.2). Finally, at least one prior system, PETDL, has shared the goal of linking notification delivery to the structure of end-user tasks [Bailey et al. 2006]. That system provided a language for specifying interactive tasks and a separate component for monitoring a user’s execution through those tasks. Within a task ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery Application in interaction focus addin • 15:9 Breakpoint Models OASIS notification Apps in background events Breakpoint Detector F, M, C, NAB Request to interrupt at (F, M, C, Any, Best) expired Grant request Requests Update time Scheduler no Match? yes Fig. 1. Schematic of Oasis and its operation (highlighted). This article focuses on describing the details of the design and implementation of the system. description, costs of interruption could be assigned for each step of a specified task, which could be used for reasoning about when to interrupt. However, manually authoring these types of specifications for free-form tasks, such as those commonly performed in the desktop domain, is complex and requires substantial effort. This would be a major barrier for large-scale deployment of such an approach. In contrast, OASIS automatically detects interruptible moments (i.e., breakpoints) without requiring any explicit specification of the tasks, thereby removing this barrier. 3. OASIS Oasis is a system service that schedules notifications to occur at perceptually meaningful breakpoints during user tasks. A schematic of the system is shown in Figure 1. Oasis acts as an intermediary between the user and applications that want to deliver notifications. The following scenario illustrates how Oasis functions to better manage notifications. Imagine a user cognitively engaged in designing a complex diagram using Microsoft Visio or other appropriate tool. As the user is interacting with the diagram, a collaborator sends a mail message detailing several additional features that need to be included in the diagram. Rather than notify the user immediately and risk disrupting the user’s ongoing actions, the mail client sends a notification request to Oasis. The request specifies that the notification should be deferred until the next fine breakpoint, but should not be delayed more than five minutes. Meanwhile, the user continues to add and manipulate elements in the diagram and Oasis continues to analyze this stream of actions. When the user completes a particular chunk of actions (e.g., inserts and then sets the properties on a new shape), Oasis identifies this moment as a fine breakpoint and compares it against the policy specified for the notification. ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:10 • S. T. Iqbal and B. P. Bailey Since it matches, Oasis signals the mail client and the client can then render the notification using its preferred technique. In this scenario, because the user is able to complete her immediate thought (actions) before being interrupted, she can switch her attention to the notification and resume the interrupted task with more efficiency and less frustration than if it had been delivered immediately. In this latter situation, the notification would have likely interrupted an ongoing action. Had the user turned off notifications, she may have missed information relevant to the task. Finally, if a matching breakpoint did not occur within the specified time window, Oasis would have granted the request at the specified time of expiration. 3.1 Policies for Scheduling Notifications The scheduling behavior of Oasis reflects a mediated interruption management approach where the system acts as an intermediary between the user and the interrupting application and determines appropriate moments for delivering notifications [McFarlane and Latorella 2002]. This approach is implemented within Oasis through a set of scheduling policies. To send a notification to the user via Oasis, an application prepares a notification request consisting of the desired scheduling policy and maximum time it can wait and sends this request to Oasis. The policy specifies the type of breakpoint at which the notification is to be delivered. As a starting point, Oasis offers five policies giving different balances between notification timeliness and interruption cost that applications can specify in their request: Next Coarse. This policy specifies that the notification is to be delivered when the user reaches the next Coarse breakpoint in their interaction sequence. A Coarse breakpoint is the moment between two units of action perceived to be the largest, meaningful units in the given context [Iqbal and Bailey 2007]. For example, in the desktop domain, a Coarse breakpoint could be the transition from a programming activity to interacting with a digital media player. As Coarse breakpoints typically occur less frequently but correspond with lower interruption costs, this policy can be utilized for notifications of general interest to the user but that are not urgent or relevant to the ongoing task. For example, a reminder about an upcoming talk might be best scheduled using this policy. Next Fine. This policy specifies that the notification is to be scheduled at the next Fine breakpoint. A Fine breakpoint is defined as the moment between two units of action considered to be the smallest meaningful units in the given context. An example of a Fine breakpoint is when the user switches from editing source code to starting a compilation. This policy is useful for notifications relevant to the current task or otherwise cannot be deferred for too long. For example, a notification for an instant message describing an error in the source code the user is working on might choose to use this policy. This ensures that the user receives the information in the context of the ongoing task but with less disruption compared to delivering it immediately. Next Medium. This policy schedules the notification at the next Medium breakpoint. A Medium breakpoint is the moment between two units of action where those units are smaller than Coarse but larger than Fine. An example ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:11 Table II. Summary of the deferral policies along with heuristics for selecting each policy based on the notification content and desired outcome for the user. Policy Next Coarse Next Medium Next Fine Next breakpoint of any type Coarsest breakpoint in a given timeframe Notification content is . . . Relevant Urgent √ √ √ √ Desired outcome Less disrupFaster delivtion ery √ √ √ √ √ is when the user has just completed browsing an online API reference and is resuming the editing of source code during a programming task. This policy is useful for notifications that are relevant to the ongoing task but not so urgent that they need to be delivered at a Fine breakpoint. For example, a mail notification from a collaborator about an issue with a related programming project could be deferred until a Medium breakpoint. Next breakpoint of any type. This policy specifies that the notification is to be scheduled at the next breakpoint of any type: Coarse, Medium or Fine. This policy ensures that a notification is delivered at the next perceptual break, regardless of its granularity. This policy would be useful for notifications that have increased urgency for the user. For example, a notification about a required meeting that is starting in a few minutes might be best scheduled using this policy. Coarsest breakpoint in a given timeframe. This policy specifies the notification is to be scheduled at the breakpoint that has the lowest interruption cost (i.e., coarsest breakpoint) in a given timeframe. On detection of a breakpoint, the system uses the frequency of past breakpoints and their type to predict the likelihood of receiving a coarser breakpoint within the remaining time. If the likelihood is high, then the system waits. Otherwise the request is granted at the current breakpoint. Note that the previous policy emphasizes determining the first interruptible moment, whereas this policy emphasizes identifying the moment with the least disruption. The first three policies were derived directly from prior empirical work investigating effects of delivering notifications at different granularities of breakpoints [Adamczyk and Bailey 2004; Iqbal and Bailey 2005; Iqbal and Bailey 2006]. The latter two policies were added to offer more flexible system behavior and were derived from envisioning various application scenarios as well as our experience gained from prior work. Table II summarizes the deferral policies currently offered in our system along with heuristics for selecting each policy based on the notification content and desired outcome for the user. This set of policies provides only a starting point and should not be considered exhaustive or, in the case of the latter two, mutually exclusive in terms of the breakpoints selected. The policies may be refined and new policies may be added as experience is gained with the system. This experience would also be useful for more carefully recommending when each policy is best applied. We ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:12 • S. T. Iqbal and B. P. Bailey assume that applications will choose appropriate policies guided by the criteria in Table II or other relevant information. In Section 4.3, we discuss how existing systems can be used or Oasis might be extended to provide additional information useful for selecting an appropriate policy. An application can specify a maximum time to wait as part of its notification request in order to prevent overly long deferrals, similar to the concept of bounded deferral discussed in Horvitz et al. [2005]. The timeframe provides an upper bound on the time within which the notification must be delivered so that it does not become irrelevant for the user. If no breakpoint occurs by the end of the given timeframe, the notification request is granted immediately. This situation should be rare but, if it does occur, the resulting system behavior (i.e., interrupt now) should be no worse than today’s interface. We assume that a notification delivered at any moment within the given timeframe has equal utility for the user and applications can determine what an appropriate timeframe should be. Heuristics for selecting a timeframe are discussed in Section 4.2. 3.2 Scheduling Notifications As shown in Figure 1, two runtime components within Oasis, the Scheduler and the Breakpoint Detector, manage notification requests. These components coordinate to (i) identify breakpoints by evaluating user actions against statistical models of breakpoints; (ii) manage notification requests from end-user applications; and (iii) match scheduled requests to the appropriate breakpoints and manage the specified timeframes. 3.2.1 Detecting Breakpoints. Detecting breakpoints is necessary for scheduling notifications within Oasis and this process is handled by the Breakpoint Detector: (see Figure 2). The Breakpoint Detector buckets global system and application events, evaluates each bucket of events using a set of statistical models, and outputs a stream of breakpoint and not-a-breakpoint (NAB) decisions. A default set of statistical models is provided with Oasis, but the system allows new models to be created. Two types of events are received by the Breakpoint Detector: global system events, including mouse, keyboard, and top-level window events, and application events, such as the event generated when text is added to a shape on the canvas in MS Visio. Global events are captured using the Win32 API and the necessary instrumentation is provided by Oasis. However, applications must be instrumented to send their events to our system. For purposes of demonstration, we developed plug-ins for two commonly used applications, Microsoft Visual Studio (source code editing tool) and Microsoft Visio (diagram editing tool) that expose their event stream to our system. Similar plug-ins can be built for other applications. The default models used within OASIS were learned from the data and process reported in our prior work [Iqbal and Bailey 2007; Iqbal and Bailey 2008]. These models were created by collecting authentic task execution data in the domains of programming and diagram editing, having observers annotate the perceived breakpoints, and learning the mappings from the surrounding ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery Visual Studio Plug-in Text added Shape added ... 6643874 6652343 Win32 API Win minimized App closed 697654 699943 15:13 Breakpoint Detector Event Data Structure Mouse moved 6578974 Code compiled 6579673 Visio Plug-in • Events Apply operators Bucket events into examples Evaluate examples using models Breakpoint models Coarse/ Medium/ Fine/ NAB Scheduler Fig. 2. Schematic of the Breakpoint Detector. Visual Studio and Visio are shown as example applications. Events are bucketed into examples and these are then evaluated using the global model for Coarse breakpoints and the relevant application specific models for Medium and Fine. The decision output (Coarse/Medium/Fine/NAB) is passed to the Scheduler. interaction data to the identified breakpoints. Further details of how the models were built and their performance are discussed in Section 4.1. Oasis uses one model for detecting Coarse breakpoints. This model accepts as input the global system events captured by our system and is therefore independent of any specific application. There is an additional model for detecting Medium and Fine breakpoints within each of our demonstration applications. The use of default models eliminates the need for end users to engage in a lengthy training process prior to first use of the system. However, if needed, the models could be trained on a per user basis to improve their accuracy using our offline tool set (see Section 3.3). When an observed event occurs, it is sent to the breakpoint detector. The breakpoint detector maintains a 30-second history of events and evaluates these events every few seconds using the models. These models map the events to each type of breakpoint and NABs. If a breakpoint is detected, the Scheduler is notified of the occurrence and type of the breakpoint; otherwise, the current moment is reported as a NAB. 3.2.2 Managing Notification Requests. When an application wants to render a notification to the user, it first sends a request to the Scheduler via a network connection (see Figure 1). As described in Section 3.1, a request consists of a scheduling policy and a time window. For example, the request (300, next-medium) indicates a defer-to-next-medium policy and a time window of 300s. The Scheduler queues the request, along with the handle to the application, and then waits either for an appropriate breakpoint to occur or for the time window to expire. Once either occurs, the Scheduler signals the application that its request has been granted. The application can then render the notification using the modality and saliency of its choice. Figure 3 gives pseudocode outlining this process. ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:14 • S. T. Iqbal and B. P. Bailey Fig. 3. Algorithm (in pseudocode based on C#) for managing notification requests. ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:15 When notified of a breakpoint occurrence, the Scheduler checks whether the breakpoint matches any of the policies specified in the requests in its queue. If there is a match, a signal is sent to the corresponding application, which in turn, can render its notification for the user. Matching the policies for defer-untilnext-coarse, -medium, -fine or –any-type is straightforward. The most complex policy is defer-until-the-coarsest breakpoint within a given time frame, which we elaborate here. For this policy, if a Coarse breakpoint is detected, the request is immediately granted since no coarser breakpoint is possible. If the breakpoint is Medium or Fine, the scheduler forecasts the probability of the user reaching a coarser breakpoint within the remaining timeframe. The scheduler calculates this probability as follows: P(coarser breakpoint occurring within X sec | current breakpoint type) = # (current to coarser pairs with distance <= X) / # (all current to coarser pairs). For example, suppose the current breakpoint type is Medium and there are 50 seconds left within the timeframe specified for the corresponding request. The system traverses the distribution history of breakpoints to count the number of Medium-to-next-Coarse pairs with a temporal distance less than 50s. This number is then divided by the total number of Medium-to-next-Coarse pairs in the history. If the resulting probability is less than a specified threshold then the system grants the request. Otherwise, it waits for the next breakpoint and the process repeats, until either a coarser breakpoint is not likely or the timeframe given with the request expires. The probability threshold can be specified in a system configuration file. A default distribution history is included with Oasis. The history contains the temporal distances between each pair of breakpoint types (e.g., Coarse-tonext-Coarse, Coarse-to-next-Medium, etc.). The history is stored as an XML file and is loaded at system startup. The default history was created from the data used to train the default statistical models for detecting breakpoints (see Section 3.2.1). However, if more accurate predictions are needed, our system could be extended to track and use a running history of breakpoints detected for the specific user. 3.3 Tool Set for Developing Breakpoint Models As described in the previous section, the Breakpoint Detector relies on the use of statistical models for detecting breakpoints. Oasis maintains a set of default models and we expect that the use of default models will be able to suffice for most end users and work contexts. However, for situations where improved performance is desired or for other research purposes, customized models can be created using a model development component within Oasis. The model building process involves recording user interaction data, annotating the breakpoints within the recorded data, and learning the mappings from the interaction data to each type of breakpoint. Oasis provides a set of tools supporting this process which we elaborate in this section. ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:16 • S. T. Iqbal and B. P. Bailey AVI file of on-screen interactions XML file of timestamped application events XML file of timestamped window events Activity Recorder Data aggregation Video stream of interaction data 5 frames/sec Techsmith Screen Recorder SDK Application events {event, time} {event, time} Visual Studio add-in Window events {event, time} Visio add-in Win32 API Data collection Fig. 4. Schematic of the Activity Recorder service. The data is collected through system APIs and application add-ins, is stored as AVI and XML files, and these files are later accessed by the Breakpoint Annotator. Model development in Oasis is similar to the process described with the Subtle toolkit [Fogarty and Hudson 2007]. However, Subtle relies on the method of experience sampling for labeling the training data. This method prompts the user to enter the labels at random moments or at moments that can be specified a priori. Because learning models of breakpoints requires the labels to be placed at precise moments within the task execution data and these moments cannot be known a priori, it would be difficult to directly apply this toolkit for learning the models. Our system is therefore different in that it provides a set of tools for capturing and retrospective labeling of the interaction data. 3.3.1 Recording Interaction Data. The Activity Recorder is a service used to record interaction data and its architecture is outlined in Figure 4. It collects three categories of data: a video of onscreen interaction, application events, and global system events. The videos are used for retrospective labeling of the breakpoints. The application and global system events are collected and used in raw form as well as to generate higher-level features that may be predictive of the breakpoints. The recorder has a user interface for starting, pausing, and stopping the data collection. 3.3.2 Labeling Breakpoints. Once the activity data has been collected, the next step is to annotate the locations of the breakpoints. The Breakpoint Annotator is an interactive tool enabling this functionality, shown in Figure 5. The tool allows a user to view a video of on-screen interaction and annotate where they feel the breakpoints occur. When the user wants to enter a breakpoint, she opens the breakpoint dialog by selecting the appropriate button from the playback control (bottom left of Figure 5). This opens a nonmodal dialogue (bottom ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:17 Fig. 5. The Breakpoint Annotator is an interactive tool for marking the locations of breakpoints and identifying their type, entering rationale for the breakpoints, and saving the annotations to a structured file. It also provides a visual summary of the ongoing annotations (right side). middle of Figure 5) where the user selects the type of the breakpoint and enters the rationale for why she believes this is a breakpoint, if instructed. From within the dialog, the user can scrub the video in different increments to locate the precise time point of the breakpoint. The breakpoint type, its rationale, and the video time point are saved and appear in the summary panel (right of Figure 5). Once complete, the breakpoint data can be saved as a zip archive containing the interaction video, the breakpoint annotation records, and the distribution history of the breakpoints. The latter is used for the prediction algorithm described in Section 3.2.3. 3.3.3 Learning the Models. To learn the breakpoint models, the system first creates a set of training examples by associating the global and application event data with the labeled breakpoints. Note that this is a fully automated process and does not require any input from the user. To create training ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:18 • S. T. Iqbal and B. P. Bailey examples, the system creates a thirty second history of the interaction data preceding each labeled breakpoint. Features are derived from the interaction data and corresponding training examples are generated. Some example features include switching to a mail client (global), finish building project (programming) and finish adding a shape to the current diagram (diagram editing). To distinguish non-breakpoint moments from breakpoints, additional training examples are generated from a random sample of moments that were not identified as breakpoints. Using the training examples, the learning module filters features that are most predictive and contribute to improved accuracy. A reduced number of features is useful for improving the efficiency of the overall system. The training examples and predictive features are then passed through a set of statistical models based on different learning algorithms including Decision trees, Bayesian nets, Multilayer Perceptrons, and IB1. The models with the highest accuracy are selected and stored in the appropriate location for use by the Oasis system. Further details of this learning process can be found in Iqbal [2008]. 3.4 Implementation and Interfacing with Applications Oasis is fully functional and was developed in Visual C# using the.NET Framework. It consists of about 14,000 lines of code. The runtime component leverages two classes of plug-ins for accessing the user interaction data; a system plugin and application plug-ins. The system plug-in leverages the Win32 API for capturing events related to the mouse, keyboard, and top-level window manipulations. This plug-in is included as part of Oasis. Application plug-ins provide access to application-level events and our system relies on others (e.g., end users, developers, or researchers) to create these plug-ins. However, to provide proof of concept, we developed plug-ins for two commonly used applications, Microsoft Visio and Microsoft Visual Studio. The R R Visual Studio Tools plug-in for Visio was developed using the Microsoft for Microsoft Office. This technology allows “add-ins” to be created for any application in the Microsoft Office suite. The plug-in captures events related to manipulating shapes (creating, sizing, deleting, setting properties, etc.), the canvas view position and scale, and manipulating diagrams (creating, naming, saving, closing, etc.). About 500 events are captured with this plug-in. For Visual Studio, the plug-in was developed by extending the IDE. The plug-in reports events related to the editing of the code (ended line of code, method, or class), selection of menu and toolbar commands, and navigation (opening, switching, and closing source files and scrolling the current source file). About 370 events are captured using this plugin. Because application scripting technologies have become widely available, similar plug-ins can be created for a wide variety of end user applications in the desktop domain. For model development, videos of onscreen interaction are captured using the Techsmith Screen Recorder SDK. The event data is collected from the application and system plug-ins and this data is saved as XML files. The event data and screen interaction data contain time stamps from the system clock allowing for precise synchronization. This is needed when aligning the annotated ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:19 breakpoints to the interaction data during the model building process. The breakpoint models are learned using the Weka toolkit [Witten and Frank 2005]. The current implementation of Oasis provides an interface for applications to connect to the Scheduler through an IP Version 4 stream socket using the TCP/IP protocol. For an application executing on the same machine as Oasis, the connection is established using the loopback address (127.0.0.1), but this can be replaced by a different IP address if the application is executing on a different machine. Applications that want to send notification requests set up connections with the Scheduler a priori. The connection is attempted using a nonblocking system call (send) to a predetermined system port. This prevents the application from blocking indefinitely on the request and allows the application to take appropriate action if the connection fails. If successful, the application uses the connection for sending all notification requests to Oasis. Any application capable of sending and receiving messages through IPV4 sockets can interface with Oasis. 4. DISCUSSION In this section, we review results from a user evaluation of our system and then discuss issues related to its implementation and the generalizability of the underlying technique. 4.1 Evaluation of System Performance and User Impact We conducted a user study to test the effectiveness of using default statistical models for detecting breakpoints for novel interaction data (i.e., data that was not part of the original training set) and to evaluate how scheduling notifications with our system impacts users and their tasks. Because details of the study have been reported elsewhere [Iqbal and Bailey 2008], we only summarize the methodology and our most significant findings. The first phase of the study evaluated how well the system could identify and differentiate breakpoints for novel interaction data using the default statistical models. The study was conducted in the task domains of programming and diagram editing. To build the default models, we recruited six users (three per domain) who performed their own tasks in the respective domain. The users had complete control over the task and how it would be performed. Our Activity Recorder (see Section 3.3) was installed on the users’ machine and, while the desired task was being performed, collected about 1.5 hrs of interaction data. Examples of the tasks performed included developing a Web-based application in ASP.net and developing information architecture diagrams for a Web site design. Twelve observers viewed the interaction videos and identified the location of breakpoints and their type using the Breakpoint Annotator. The breakpoint data was fed into our model building component which generated the default models. A 10-fold cross-validation showed reasonable performance; recall was 71% for Coarse, 84% for Medium, and 96% for Fine for the programming tasks and 56% for Medium and 58% for Fine for the diagram editing tasks. To evaluate the models for novel interaction data we recruited six additional users (again, three per domain) who installed and ran our system while they ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:20 • S. T. Iqbal and B. P. Bailey performed their own tasks in the respective domain. Our system was using the default models and was configured to record the detection of breakpoints. The users later reviewed their interaction videos and retrospectively identified the locations of their breakpoints. The locations of these user-identified breakpoints were compared to the locations of the system-identified breakpoints. Results showed that users and the system agreed on the location of 43% (programming) and 40% (diagram editing) of the breakpoints. However, the default models particularly struggled to differentiate the type of the breakpoints, with recall values ranging between 2 and 42%. These results were lower than expected and highlight the challenge of creating a robust set of default models for breakpoint detection. We see several directions forward for improving model performance. One is to train the default models using a much larger training set, which might be possible by collecting labeled data from many users or observers. Another direction would be to have users train personalized models, which has been shown to increase model performance [Fogarty and Hudson 2007]. A third direction would be to begin with the default models but train them further with additional data collected from the user. Given the active research on modeling user activities [Fogarty and Hudson 2007; Horvitz et al. 2003; Horvitz et al. 2002], we believe performance for default and/or personalized models will reach acceptable thresholds for breakpoint detection. We therefore wanted to test the effects of scheduling notifications at breakpoints for free-form user activities. To compensate for the model accuracies, we combined the breakpoint data and learned new models that would detect the occurrence of a breakpoint without differentiating its type. This yielded recall accuracies of 59% and 52% for programming and diagram editing, which was sufficient for the next phase of our study. To identify the breakpoint types, we had the users retrospectively label the type of the breakpoints that were detected by Oasis. Sixteen users participated in this phase of the evaluation, consisting of graduate and undergraduate students. A diagram editing task (design a leisure space for students) and a programming task (develop a set of digital image filters) were created and performed using MS Visio and Visual Studio. The tasks were designed to be open-ended, meaning that users had complete freedom in how they would achieve the goal; the task steps were not predetermined in any way. The tasks were more challenging and of longer duration than the controlled tasks typically used in prior lab studies. Eight users performed the diagram editing task while the other eight users performed the programming task. The tasks lasted about two hours. During the study, users received two types of notifications: those relevant to the task and those of general interest to the user. For example, relevant notifications included references to useful examples of source code or leisure space diagrams whereas general interest notifications included actual references to new releases of common software or announcements related to our institution. Relevance was included as a factor to understand how it interacts with the system’s scheduling behavior. Sixteen notifications were delivered per user, eight relevant and eight of general interest. An application was created ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:21 that generated notification requests at random moments during the user’s task and rendered the notifications when the requests were granted. Our system was configured to grant requests under two conditions. In one condition, requests were granted immediately and this served as a baseline. For the other condition, the system granted requests using the defer-to-next breakpoint policy. Users identified the types of the breakpoints retrospectively, allowing us to analyze the scheduling behavior for each type of breakpoint. Given the scope and complexity of the study, testing the other policies was not included here but should be a focus of future work. Dependent measures included the user’s reaction time (time to acknowledge a notification), resumption time (the time to resume the main task), and self-reported frustration. We also gained user feedback about the effectiveness of the scheduling behavior through post experiment interviews. A compelling finding from our study was that the concept of scheduling notifications at breakpoints fit well with how the users preferred the notifications to be managed. For example, when asked to characterize their preferred moments for a notification, many users described how they preferred, “finishing the current thing [action] rather than jumping to something else.” More specifically, users pinpointed moments such as after adding specific shapes to a diagram or completing specific lines of source code, and describing how these actions related to the completion of their current goal; the same behavior our system seeks to mimic. Quantitatively, we found that scheduling notifications at breakpoints reduced frustration by 20% and reaction time by 25% relative to immediate delivery. This reduction in cost was balanced against notifications being deferred for only about 90 seconds on average. User feedback also provided insights into when applications should specify each of the tested policies. For example, an application should choose a defer-tomedium or defer-to-fine policy for a notification relevant to the user’s task and choose a defer-to-coarse policy for notifications of general interest. The rationale expressed by users is that they wanted to receive relevant notifications while they were still in the context of the main task, otherwise they felt the need to return to it; whereas they wanted the general interest notifications to be deferred until after they had left the main task and were more amenable to receiving general information. Overall, the results from our study show that scheduling notifications at breakpoints can result in a measurable benefit for users and fits their own expectations for managing notifications. However, the study also clearly indicates that more research is needed to improve breakpoint detection, preferably in a manner that does not impose a burden that may inhibit user adoption of this type of system in practice. 4.2 Constructing a Notification Request When an application wants to render a notification using our system, it must construct and send a notification request. Each notification request consists of two components: the scheduling policy and a time window within which the notification must be delivered (see Section 3.1). The application must select the time window carefully because it will affect the system’s scheduling ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:22 • S. T. Iqbal and B. P. Bailey Table III. Recommended values (in seconds) for the time window for each of the scheduling policies in Oasis Defer-to-coarse 190 Defer-to-med 158 Defer-to-fine 114 Defer-to-any 225 Defer-to-best 659 behavior. For example, a time window that is set too short could override the corresponding policy because the likelihood of detecting a matching breakpoint would be low. On the other hand, setting the time window too long could reduce the utility of the notification’s content, (e.g., if a matching breakpoint does not occur, the notification would be delayed until the time window expires). To assist application developers, we offer initial heuristics and discuss additional parameters that may be considered for specifying an appropriate time window. An initial set of heuristics for appropriate time windows for each policy was derived by analyzing the breakpoint data set reported in Iqbal and Bailey [2007]. The heuristics are summarized in Table III. This dataset included the breakpoint distributions for several programming, document editing, and image manipulation tasks and was the most robust dataset we had available. Our goal in analyzing the dataset was to calculate durations for which there would be a reasonable likelihood of a breakpoint occurring that would satisfy the given policy. For example, for the defer-to-coarse, -medium, and -fine policies, we computed the heuristic based on the median distance between each of those types of breakpoints in the data set. For the defer-to-any policy, we used the median distance between all adjacent pairs of breakpoints. For the deferto-best policy, we computed the heuristic as the median distance to the next breakpoint of any type and then added the maximum distance from any breakpoint to the next Coarse. The heuristics could be later refined by examining a larger data set and leveraging the experience gained from use of the system. In selecting an appropriate time window as well as the scheduling policy, an application may want to consider parameters such as the urgency and relevance of the notification content with respect to the user’s ongoing task. For example, if an application is aware that the notification is more urgent for the user or relevant to her task (e.g., using systems like Giornata [Voida and Mynatt 2009] or Tasktracer [Dragunov et al. 2005]), it may specify a shorter time window in its request (the exact value would require some calibration). This would allow opportunity for a suitable breakpoint to occur but help guard the utility of the notification. Though challenging, applications may leverage existing techniques for computing these values dynamically [Marx and Schmandt 1996; Horvitz et al. 1999] or, in the future, may be able to query system services to retrieve these values automatically [Dragunov et al. 2005]. One may also envision a system like Oasis making a user’s current activities (the structure of which it already detects) available to applications, with some level of abstraction, along with predictions for how long s/he may remain on that activity as well as probabilities of switching to other activities. Applications may be able to leverage this information in determining how relevant or urgent their information is relative to the current task and create notification requests accordingly. Determining relevance or urgency is challenging, and the effect of inaccurate assessment is not negligible. Applications or services may therefore ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:23 consider establishing a user feedback mechanism to tune their algorithmic estimations of these attributes, which would help to improve the efficacy of the overall system behavior. 4.3 Granularities of Breakpoints Our system schedules notifications by detecting interruptible moments during user tasks. As a result, the system is not able to provide measures of a user’s interruptibility between these moments (i.e., it does not provide a continuous cost function). This capability would be useful, for example, for enabling additional scheduling policies in our system. One method for converging toward a more continuous cost function consistent with our current approach is to detect successively finer granularities of breakpoints (e.g., in the extreme case, down to the transitions between keystrokes). However, we have not pursued this level of detail in our system because the empirical evidence has not yet shown that delivering notifications at breakpoints with a granularity finer than those already detectable by our system yields a measurable benefit for the user [Iqbal and Bailey 2006]. That is, delivering a notification at a breakpoint with a very fine granularity has not been shown to be better than delivering it immediately. Conversely, it might also be useful to detect additional granularities of breakpoints coarser than those currently detectable by our system. An example of this type of breakpoint would be a transition between physical activities within the office (e.g., switching between interacting with the desktop and reading physical documents). This type of breakpoint could be detected by deploying sensors throughout the office environment [Fogarty and Hudson 2007] and learning statistical models using a process and tools similar to those described in this article. Because empirical benefits of using this class of breakpoint have been shown in the domain of mobile computing [Ho and Intille 2005], notification management systems similar to Oasis should consider implementing this extension. 4.4 Generalizing to Other Tasks, Domains, and System Functionalities Oasis demonstrates the technique of aligning notification scheduling to the perceptual structure of user tasks and the system was implemented in the desktop domain as a proof of concept. An evaluation offered initial evidence indicating that the system provides an empirical benefit for users during authentic activities and that the system’s behavior is consistent with how users prefer notifications to be managed. The evaluation tested the system in the two task domains that it currently supports: programming and diagram editing. In addition, prior work has tested the technique and shown favorable results for other interactive tasks, including document editing, image manipulation, and communication tasks. We therefore believe the technique generalizes to the class of goal-directed, content generation tasks. Further research is needed to test the efficacy of our system for interactive tasks that are less goal-directed or do not involve content generation. For example, one challenge with these types of tasks is that the perceptual structure may be more difficult to detect, by both systems and human observers. For example, if a user is browsing the Web for ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:24 • S. T. Iqbal and B. P. Bailey leisure (where the goal is less obvious and there is no content generation), it may be more difficult to reliably identify the transitions between meaningful units of action. Extending the capability of a system like Oasis to other desktop task domains would require that the event streams of the desired applications be made available to the system. As application plug-in and scripting architectures have become common, the technical feasibility for exposing the event streams is already available. For example, we were able to create plug-ins for MS Visio and Visual Studio with relatively modest effort. And in the TaskTracer project [Dragunov et al. 2005], it was demonstrated that plug-ins could be created to capture a very large number of interaction events for the entire MS Office suite (Excel, Word, Powerpoint, and Outlook). Generally speaking, developers should report as many events as possible and allow the model building component to learn which events to include in the breakpoint models. In addition, the effect on the user experience caused by a growing number of notifications in the desktop domain provides a strong incentive for developers to create these types of plug-ins. For example, an application that interacts with a notification management system for delivering frequent information updates would have a clear user experience and marketing advantage over competing applications that do not provide similar support. Though our system was implemented in the desktop domain, the basic technique of deferring notifications until breakpoints can be effective in other domains where users perform goal-directed tasks. For example, for mobile computing, researchers have already found that deferring notifications until the user initiates a physical transition (e.g., from sitting to standing) yields positive results relative to immediate delivery [Ho and Intille 2005]. In the domain of ubiquitous computing, researchers have found that transitions between different task and leisure activities within the home would also serve an important role in notification management [Nagel et al. 2004]. In safety critical domains such as driving, notifications from in-vehicle information systems could be deferred until the driver completes a critical action such as turning, stopping at a light, or reaching a steady speed. However, in this environment, systems would likely need to consider other contextual factors such as weather and traffic conditions in addition to the sequence of localized driver actions when deciding when to render notifications. Finally, system functionalities beyond notification management could be enabled with access to the perceptual structure of user tasks. For example, this information could be used by content generation tools (e.g., programming, image manipulation, and document editing tools) to organize a linear sequence of user commands into a hierarchical structure. For example, this structure would enable users to perform undo and redo actions at different levels of granularity or could be used to create a better visual summary of the command histories. 5. CONCLUSIONS AND FUTURE WORK Users rely on notifications to maintain awareness of desired information. Because the applications or services delivering notifications have no capability of ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:25 understanding the state of the user’s task, they often render their notifications at overly disruptive moments. Empirical research has shown favorable results for using perceptual breakpoints during task execution as less disruptive moments for notification, but it has been unclear whether this strategy could be effectively implemented or what the effects on end users would be for authentic activities. This article has contributed the design of a system called Oasis that demonstrates the feasibility of aligning notification delivery with the perceptual structure of user tasks. The key aspects of our system are that it detects task structure without any explicit knowledge of the task itself, layers flexible scheduling policies on top of the detection mechanism, and provides a service that applications can access for scheduling notifications with these policies. An evaluation showed that the system can provide empirical benefits for the user and led to initial recommendations for how and when each of the policies is best utilized. Our system has been fully implemented and other researchers can use it to further explore notification scheduling policies and other system functionalities based on the use of task structure. The source code of Oasis can be obtained by contacting S. T. Iqbal. We see several directions for future work. One immediate direction is to explore how to build more robust default statistical models for breakpoint detection or methods for creating personalized models that limit the training overhead for users. Another direction is to conduct longer-term studies of how a notification scheduling system like Oasis affects users and their tasks and study the use of the system for a more diverse task set. A third direction is to better understand if and how users develop a mental model of the system’s behavior and how they react to this type of automated decision making. A fourth direction is to explore development of additional notification scheduling policies that leverage task structure and other contextual factors. Finally, it may be fruitful to explore how the use of perceptual task structure could enable or improve other types of system functionalities, such as automatically creating structured command histories. ACKNOWLEDGMENTS We thank the users who volunteered to participate in our related studies. REFERENCES ADAMCZYK, P. D. AND BAILEY, B. P. 2004. If not now when? The effects of interruptions at different moments within task execution. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 271–278. BAILEY, B. P., ADAMCZYK, P. D., CHANG, T. Y., AND CHILSON, N. A. 2006. A framework for specifying and monitoring user tasks. J. Comput. Hum. Behav. 22, 4, 685–708. BAILEY, B. P. AND IQBAL, S. T. 2008. Understanding changes in mental workload during execution of goal-directed tasks and its application for interruption management. ACM Trans. Comput. Hum. Interac. 14, 4, 1–28. BAILEY, B. P. AND KONSTAN, J. A. 2006. On the need for attention aware systems: measuring effects of interruption on task performance, error rate, and affective state. J. Comput. Hum. Behav. 22, 4, 709–732. ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:26 • S. T. Iqbal and B. P. Bailey BEGOLE, J. B., MATSAKIS, N. E., AND TANG, J. C. 2004. Lilsys: sensing unavailability. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. 511–514. CZERWINSKI, M., CUTRELL, E., AND HORVITZ, E. 2000a. Instant messaging and interruption: influence of task type on performance. In Proceedings of the Annual Conference of the Human Factors and Ergonomics Society of Australia (OZCHI). C. Paris, N. Ozkan, S. Howard, and S. Lu, Eds., 356–361. CZERWINSKI, M., CUTRELL, E., AND HORVITZ, E. 2000b. Instant messaging: effects of relevance and timing. In Proceedings of HCI: People and Computers XIV. S. Turner and P. Turner, Eds., British Computer Society, 71–76. CZERWINSKI, M. AND HORVITZ, E. 2002. Memory for daily computing events. In Proceedings of HCI: People and Computers XVI. F. Culwin, Ed. DABBISH, L. AND KRAUT, R. E. 2004. Controlling interruptions: awareness displays and social motivation for coordination. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. 182–191. DEMARCO, T. AND LISTER, T. 1999. Peopleware: Productive Projects and Teams 2nd Ed. Dorset House Publishing Company, New York. DEY, A. K. AND ABOWD, G. D. 2000. CybreMinder: a context-aware system for supporting reminders. In Proceedings of 2nd International Symposium on Handheld and Ubiquitous Computing. 172–186. DISMUKES, K., YOUNG, G., AND SUMWALT, R. 1998. Cockpit interruptions and distractions. ASRS Directline 10. DRAGUNOV, A. N., DIETTERICH, T. G., JOHNSRUDE, K., MCLAUGHLIN, M., LI, L., AND HERLOCKER, J. L. 2005. TaskTracer: A desktop environment to support multi-tasking knowledge workers. In Proceedings of the International Conference on Intelligent User Interfaces. 75–82. FOGARTY, J. AND HUDSON, S. E. 2007. Toolkit support for developing and deploying sensor-based statistical models of human situations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Press, 135–144 FOGARTY, J., KO, A. J., AUNG, H. H., GOLDEN, E., TANG, K. P., AND HUDSON, S. E. 2005. Examining task engagement in sensor-based statistical models of human interruptibility. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 331–340. FOGARTY, J., LAI, J., AND CHRISTENSEN, J. 2004. Presence versus availability: The design and evaluation of a context-aware communication client. Int. J. Hum. Comp. Studies, 61, 3, 299– 317. GILLIE, T. AND BROADBENT, D. 1989. What makes interruptions disruptive? A study of length, similarity, and complexity. Psych. Rese. 50, 243–250. GLUCK, J., BUNT, A., AND MCGRENERE, J. 2007. Matching attentional draw with utility in interruption. In Proceedings of the ACM Conference on Human Factors in Computing Systems. ACM Press, 41–50 GONZALEZ, V. M. AND MARK, G. 2004. “Constant, constant, multi-tasking craziness”: managing multiple working spheres. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 113–120. HO, J. AND INTILLE, S. S. 2005. Using context-aware computing to reduce the perceived burden of interruptions from mobile devices. In Proceedings of the ACM Conference on Human Factors in Computing Systems. ACM Press, 909–918 HORVITZ, E. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 159–166. HORVITZ, E., APACIBLE, J., AND SUBRAMANI, M. 2005. Balancing awareness and interruption: investigation of notification deferral policies. In Proceedings of the 10th User Modeling International Conference. L. Ardissono, P. Brna, and A. Mitrovic, Eds., Springer, 433–437. HORVITZ, E., JACOBS, A., AND HOVEL, D. 1999. Attention-sensitive alerting. In Proceedings of the Conference on Uncertainty in Artificial Intelligence. 305–313. HORVITZ, E., KADIE, C. M., PAEK, T., AND HOVEL, D. 2003. Models of attention in computing and communications: from principles to applications. Comm. ACM, 52–59. HORVITZ, E., KOCH, P., KADIE, C. M., AND JACOBS, A. 2002. Coordinate: probabilistic forecasting of presence and availability. In Proceedings of the 18th Conference on Uncertainty and Artificial Intelligence. Morgan-Kaufmann, 224–233. ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. Oasis: A Framework for Linking Notification Delivery • 15:27 IQBAL, S. T. 2008. A framework for intelligent notification management in multitasking domains. Ph.D. Thesis. Department of Computer Science, University of Illinois at Urbana-Champaign. IQBAL, S. T. AND BAILEY, B. P. 2005. Investigating the effectiveness of mental workload as a predictor of opportune moments for interruption. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 1489–1492. IQBAL, S. T. AND BAILEY, B. P. 2006. Leveraging characteristics of task structure to predict costs of interruption. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 741–750. IQBAL, S. T. AND BAILEY, B. P. 2007. Understanding and developing models for detecting and differentiating breakpoints during interactive tasks. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 697–706. IQBAL, S. T. AND BAILEY, B. P. 2008. Effects of intelligent notification management on users and their tasks. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 93–102. IQBAL, S. T. AND HORVITZ, E. 2007. Disruption and recovery of computing tasks: field study, analysis and directions. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 677–686. IQBAL, S. T. AND HORVITZ, E. 2010. Notification and awareness: a field study of alert usage and preferences. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW). ACM, 27–30. JACKSON, T. W., DAWSON, R. J., AND WILSON, D. 2001. The cost of email interruption. J. Syst. Inform. Technol. 5, 1, 81–92. KREIFELDT, J. G. AND MCCARTHY, M. E. 1981. Interruption as a test of the user-computer interface. In Proceedings of the 17th Annual Conference on Manual Control. Jet Propulsion Laboratory, California Institute of Technology, JPL Publication 81–95, 655–667. LATORELLA, K. A. 1998. Effects of modality on interrupted flight deck performance: implications for data link. In Proceedings of the 42nd Annual Meeting of the Human Factors and Ergonomics Society. 87–91. LATORELLA, K. A. 1999. Investigating interruptions: Implications for flightdeck performance. National Aviation and Space Administration, Washington, DC. LEE, J. D., HOFFMAN, J. D., AND HAYES, E. 2004. Collision warning design to mitigate driver distraction. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 65–72. MAES, P. 1994. Agents that reduce work and information overload. Comm. ACM 37, 7, 30–40. MAGLIO, P. AND CAMPBELL, C. S. 2000. Tradeoffs in displaying peripheral information. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 241–248. MARX, M. AND SCHMANDT, C. 1996. CLUES: Dynmic prsonalized message filtering. In Proceedings of the ACM Conference on Computer Supported Cooperative work. MCCRICKARD, D. S., CATRAMBONE, R., CHEWAR, C. M., AND STASKO, J. T. 2003. Establishing tradeoffs that leverage attention for utility: Empirically evaluating information display in notification systems. Int. J. Hum.-Comput. Studies 58, 5, 547–582. MCFARLANE, D. C. 1999. Coordinating the interruption of people in human-computer interaction. In Proceedings of the IFIP TC.13 International Conference on Human-Computer Interaction. 295– 303. MCFARLANE, D. C. AND LATORELLA, K. A. 2002. The scope and importance of human interruption in HCI design. Hum.-Comput. Interact. 17, 1, 1–61. MIYATA, Y. AND NORMAN, D. A. 1986. Psychological issues in support of multiple activities. In User Centered System Design: New Perspectives on Human-Computer Interaction, D. A. Norman and S. W. Draper Eds., Lawrence Erlbaum Associates, Hillsdale, NJ, 265–284. MONK, C. A., BOEHM-DAVIS, D. A., AND TRAFTON, J. G. 2002. The attentional costs of interrupting task performance at various stages. In Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting. NAGEL, K. S., HUDSON, J. M., AND ABOWD, G. D. 2004. Predictors of availability in home life context-mediated communication. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. ACM Press, 497–506. ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010. 15:28 • S. T. Iqbal and B. P. Bailey NAVON, D. AND GOPHER, D. 1979. On the economy of the human processing system: a model of multiple capacity. Psych. Rev. 86, 254–255. NEWTSON, D. 1973. Attribution and the unit of perception of ongoing behavior. J. Person. Soc. Psych. 28, 1, 28-38. NEWTSON, D. AND ENGQUIST, G. 1976. The perceptual organization of ongoing behavior. J. Experi. Soc. Psych. 12, 436–450. SPEIER, C., VALACICH, J. S., AND VESSEY, I. 1999. The influence of task interruption on individual decision making: An information overload perspective. Decis. Sci. 30, 2, 337–360. STANTON, N. 1994. Human Factors in Alarm Design. Taylor and Francis, London, UK. TANG, J. C., YANKELOVICH, N., BEGOLE, J., KLEEK, M. V., LI, F., AND BHALODIA, J. 2001. ConNexus to awarenex: Extending awareness to mobile users. In Proceedings of the ACM Conference on Human Factors in Computing Systems. ACM Press, 221–228. VAN DANTZICH, M., ROBBINS, D., HORVITZ, E., AND CZERWINSKI, M. 2002. Scope: Providing awareness of multiple notifications at a glance. In Proceedings of the Conference on Advanced Visual Interfaces. VOIDA, S. AND MYNATT, E. D. 2009. It feels better than filing: everyday work experiences in an activity-based computing system. In Proceedings of the 27th International Conference on Human factors in Computing Systems. ACM. WICKENS, C. D. 2002. Multiple resources and performance prediction. Theo. Issues Ergon. Sci. 3, 2, 159–177. WITTEN, I. H. AND FRANK, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, CA. ZACKS, J., TVERSKY, B., AND IYER, G. 2001. Perceiving, remembering, and communicating structure in events. J. Exper. Psych. Gen. 130, 1, 29–58. ZACKS, J. M. AND TVERSKY, B. 2001. Event structure in perception and conception. Psych. Bull. 127, 3–21. ZIJLSTRA, F. R. H., ROE, R. A., LEONORA, A. B., AND KREDIET, I. 1999. Temporal factors in mental work: effects of interrupted activities. J. Occup. Organiz. Psych. 72, 163–185. Received June 2009; revised March 2010, July 2010; accepted August 2010 ACM Transactions on Computer-Human Interaction, Vol. 17, No. 4, Article 15, Publication date: December 2010.