Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Midas

2011, Proceedings of the fifth international conference on Tangible, embedded, and embodied interaction - TEI '11

Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1 , Lode Hoste2 , Beat Signer2 and Wolfgang De Meuter1 1 Software Languages Lab 2 Web & Information Systems Engineering Lab Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium {cfscholl,lhoste,bsigner,wdmeuter}@vub.ac.be ABSTRACT Over the past few years, multi-touch user interfaces emerged from research prototypes into mass market products. This evolution has been mainly driven by innovative devices such as Apple’s iPhone or Microsoft’s Surface tabletop computer. Unfortunately, there seems to be a lack of software engineering abstractions in existing multi-touch development frameworks. Many multi-touch applications are based on hardcoded procedural low level event processing. This leads to proprietary solutions with a lack of gesture extensibility and cross-application reusability. We present Midas, a declarative model for the definition and detection of multi-touch gestures where gestures are expressed via logical rules over a set of input facts. We highlight how our rule-based language approach leads to improvements in gesture extensibility and reusability. Last but not least, we introduce JMidas, an instantiation of Midas for the Java programming language and describe how JMidas has been applied to implement a number of innovative multi-touch gestures. Author Keywords multi-touch interaction, gesture framework, rule language, declarative programming ACM Classification Keywords D.2.11 Software Engineering: Software Architectures; H.5.2 Information Interfaces and Presentation: User Interfaces General Terms Algorithms, Languages INTRODUCTION More than 20 years after the original discussion of touchscreen based interfaces for human-computer interaction [9] and the realisation of the first multi-touch screen at Bell Labs in 1984, multi-touch interfaces have emerged from research prototypes into mass market products. Commercial solutions, including Apple’s iPhone or Microsoft’s Surface Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. TEI’11, January 22–26, 2011, Funchal, Portugal. Copyright 2011 ACM 978-1-4503-0478-8/11/01...$10.00. tabletop computer, introduced multi-touch user interfaces to a broader audience. Various manufacturers currently follow these early adopters by offering multi-touch screen-based user interfaces for their latest mobile devices. There is not only an increased use of multi-touch gestures on touch sensitive screens but also based on other input devices such as laptop touchpads. Some multi-touch input solutions are even offered as separate products like in the case of Apple’s Magic Trackpad1 . Furthermore, large multi-touch surfaces, as seen in Single Display Groupware (SDG) solutions [14], provide new forms of copresent interactions. While multi-touch interfaces offer significant potential for an enhanced user experience, the application developer has to deal with an increased complexity in realising these new types of user interfaces. A major challenge is the recognition of different multi-touch gestures based on continuous input data streams. The intrinsic concurrent behaviour of multitouch gestures and the scattered information from multiple gestures within a single input stream results in a complex detection process. Even the recognition of simple multi-touch gestures demands for a significant amount of work when using traditional programming languages. Furthermore, the reasoning over gestures from multiple users significantly increases the complexity. Therefore, we need a clear separation of concerns between the multi-touch application developer and the designer of new multi-touch gestures to be used within these applications. The gesture designer must be supported by a set of software engineering abstractions that go beyond simple low level input device event handling. In software engineering, a problem can be divided into its accidental and essential complexity [1]. Accidental complexity relates to the difficulties a programmer faces due to the choice of software engineering tools. It can be reduced by selecting or developing better tools. On the other hand, essential complexity is caused by the characteristics of the problem to be solved and cannot be reduced. While the accidental complexity of today’s mainstream applications is addressed by the use of high-level programming languages such as Java or C#, we have not witnessed the same software engineering support for the development of multi-touch applications. In this paper, we present novel declarative programming language constructs in order to tackle the accidental complexity of developing multi-touch gestures and to enable a developer to focus on the essential complexity. 1 http://www.apple.com/magictrackpad/ We start with a discussion of related work and present the required software engineering abstractions for multi-touch frameworks. We then introduce Midas, our three-layered multi-touch architecture. After describing the implementation of JMidas, a Midas instantiation for the Java programming language, we outline a set of multi-touch application prototypes that have been realised based on JMidas. A critical discussion of the presented approach and future work is followed by some general conclusions. RELATED WORK Recently, different multi-touch toolkits and frameworks have been developed in order to help programmers with the detection of gestures from a continuous stream of events produced by multi-touch hardware [7]. The frameworks that we are going to discuss in this section provide some basic software abstractions for recognising a fixed set of traditional multi-touch gestures, but most of them do not support the definition of new application-specific multi-touch gestures. The Sparsh UI framework [11] is an open source multi-touch library that supports a set of built-in gestures including hold, drag, multi point drag, zoom, rotate, spin (two fingers hold and one drags) as well as double tap. The implementation of gestures based on hard-coded mathematical expressions limits the reusability of existing gestures. The framework only provides limited support to reason about the history of events. Sparsh UI provides historical data for each finger on the touch-sensitive surface by keeping track of events. As soon as a finger is lifted from the surface, the history related to the specific finger is deleted. This makes it difficult to implement multi-stroke gestures but on the other hand avoids any garbage collection issues. Furthermore, Sparsh UI does not deal with the resolution of conflicting gestures. Overall, Sparsh UI is one of the more complete multi-touch frameworks providing some basic software abstractions but offers limited support for multi-stroke and multi-user gestures. Multi-touch for Java (MT4j)2 is an open source Java framework for the rapid development of visually rich applications currently supporting tap, drag, rotate as well as zoom gestures. The architecture and implementation of MT4j is similar to Sparsh UI with two major differences: MT4j offers the functionality to define priorities among gestures but on the other hand it does not provide historical data. Whenever an event is retrieved via the TUIO protocol [6], multiple subscribed gesture implementations, called processors, try to lock the event for further processing. The idea of the priority mechanism is to assign a numeric value to each gesture. Gesture processors with a lower priority are blocked until processors with a higher priority have tried (and failed) to consume the event. With such an instantaneous priority mechanism, a processor has to decide immediately whether an individual event should be consumed or released. However, many gestures can only be detected after keeping track of multiple events, which limits the usability of the priority mechanism. Finally, the reuse of gesture detection functionality is lacking from the architectural design. Grafiti [10] is a gesture recognition management framework for interactive tabletop interfaces providing similar abstractions as Sparsh UI. It is written in C# and subscribes to a TUIO input stream for any communication with different hardware devices. An automated mapping of multiple lists to multiple fingers is also not available and there are no constructs to deal with multiple users. Therefore, permutations of multi-touch input have to be performed manually which is computationally intensive and any reusability for composite gestures is lacking. Furthermore, the static time and space values are limiting the dynamic environment of multi-touch devices. The framework allows gestures to be registered and unregistered at runtime. In Grafiti, conflict resolution is based on instantaneous reasoning and there is no notion of uncertainty. The offered priority mechanism is similar to the one in MT4j where events are consumed by gestures and are no longer available for gestures with a lower priority. The libTISCH [2] multi-touch library currently offers a set of fixed gestures including drag, tap, zoom and rotate. The framework maintains the state and history of events that are performed within a widget. This allows the developer to reason about event lists instead of individual events. However, multi-stroke gestures are not supported and an automatic mapping of lists to fingers is not available. Incoming events are linked to the topmost component based on their x and y coordinates. Local gesture detection is associated with a single widget element for the total duration of a gesture. A system-wide gesture detection is further supported via global gestures. Global gestures are acceptable when working on small screens like a phone screen, but these approaches cease to work when multiple users are performing collaborative gestures. The combination of local and global gestures is not supported and gestures outside the boundaries of a widget require complex ad-hoc program code. Commercial user interface frameworks are also introducing multi-touch software abstractions. The Qt3 cross-platform library provides gestures such as drag, zoom, swipe (in four directions), tap as well as tap-and-hold. To add new gestures, one has to create a new class and inherit from the QGestureRecognizer class. Incoming events are then fed to that class one by one as if it would have been directly attached to the hardware API. Also the Microsoft .NET Framework4 offers multi-touch support since version 4.0. A simple event handler is provided together with traditional gestures such as tap, drag, zoom and rotate. In addition, Microsoft implemented a two-finger tap, a press-and-hold and a two-finger scroll. However, there is no support for implementing more complex customised gestures. Gestures can also be recognised by comparing specific features of a given input to the features of previously recorded gesture samples. These so-called template-based matching solutions make use of different matching algorithms including Rubine [12], Dynamic Time Warping (DTW), neural networks or hidden Markov models. Most template-based 3 http://qt.nokia.com/ http://msdn.microsoft.com/en-us/library/ dd940543(VS.85).aspx 4 2 http://mt4j.org gesture recognition solutions perform an offline gesture detection which means that the effect of the user input will only be visible after the complete gesture has been performed. Therefore, template-based approaches are not suitable for a number of multi-touch gestures (e.g. pinching). Some gesture recognition frameworks, such as iGesture [13], support template-based algorithms as well as algorithms relying on a declarative gesture description. However, these solutions currently offer no or only limited support for the continuous online processing of multi-touch gestures. While the presented frameworks and toolkits provide specific multi-touch gesture recognition functionality that can be used by an application developer, most of them show a lack of flexibility from a software engineering point of view. In the following, we introduce the necessary software engineering abstractions that are going to be addressed by our Midas multi-touch interaction framework. Modularisation Many existing multi-touch approaches do not modularise the implementation of gestures. Therefore, the implementation of an additional gesture requires a deep knowledge about already implemented gestures. This is a clear violation of the separation of concerns principle, one of the main principles in software engineering which dictates that different modules of code should have as little overlapping functionality as possible. MIDAS ARCHITECTURE The processing of input event streams in human-computer interaction is a complex task that many frameworks address by using event handlers. However, the use of event handlers has proven to violate a range of software engineering principles including composability, scalability and separation of concerns [8]. We propose a rule-based approach with spatiotemporal operators in order to minimise the accidental complexity in dealing with multi-touch interactions. The Midas architecture consists of the three layers shown in Figure 1. The infrastructure layer contains the hardware bridge and translator components. Information from an input device is extracted by the hardware bridge and transferred to the translator. In order to support different devices, concrete Midas instances can have multiple hardware bridges. The translator component processes the raw input data and produces logical facts which are propagated to the fact base in the Midas core layer. The inference engine evaluates these facts against a set of rules. !" Composition It should be possible to easily compose gestures in order to define more complex gestures. For example, a scroll gesture could be implemented by composing two move up gestures. Event Categorisation When detecting gestures, one of the problems is to categorise the events (e.g. events from a specific finger within the last 500 milliseconds). This event categorisation is usually a cumbersome and error-prone task, especially when timing is involved. Therefore, event categorisation should be offered to the programmer as a service by the underling system. GUI-Event Correlation While the previous requirement advocates the preprocessing of events, this requirement ensures the correlation between events and GUI elements. In most of today’s multi-touch frameworks, all events are transferred to the application from a single entry point. The decision about which events correlate to which GUI elements is left to the application developer or enforced by the framework. However, the reasoning about events correlating to specific graphical components should be straightforward. Temporal and Spatial Operators Extracting meaningful information from a stream of events produced by the multitouch hardware often involves the use of temporal and spatial operators. Therefore, the underlying framework should offer a set of temporal and spatial operators in order to keep programs concise and understandable. In current multi-touch frameworks, there is no or limited support for such operators which often leads to complex program code. Figure 1. Midas architecture The rules are defined in the Midas application layer but stored in the rule base. When a rule is triggered, it can invoke some application logic and/or generate new facts. Furthermore, GUI elements are accessible from within the reasoning engine via a special shadowing construct. Infrastructure Layer The implementation of the infrastructure layer takes care of all the details to address the hardware and transforming the low level input data into logical facts. A fact has a type and a number of attributes. The core fact that every Midas implementation has to support is shown in Listing 1. Listing 1. Core fact 1 (Cursor (id ?id) (x ?x) (y ?y) (x−speed ?xs) 2 (y−speed ?ys) (time ?t) (state ?s)) This core fact has the type Cursor and represents a single cursor (e.g. a moving finger) from the input device. The attributes id, x, y, x-speed, y-speed and time represent the id, position, movement and time the cursor moved. The attribute state indicates how the cursor has changed and can be assigned the values APPEAR, MOVE or DISAPPEAR. Core Layer The Midas core layer consists of an inference engine in combination with a fact base and a rule base that is going to be described in the following. Rules We use rules as an expressive and powerful mechanism to implement gesture recognition. Listing 2 outlines the implementation of a simple rule that prints the location of all cursors. The part that a rule should match in order to be triggered is called its prerequisites (before the ⇒), while the actions to be performed if a rule is triggered are called its consequences. Listing 2. Example rule 1 (defrule PrintCursor 2 (Cursor (x ?x) (y ?y)) 3 => 4 (printout t ‘‘A cursor is moving at: ’’ ?x ‘‘,’’ ?y)) Operator sDistance sNear sNearLeftOf sNearRightOf sInside Args f1,f2 f1,f2 f1,f2 f1,f2 f1,f2 Definition 𝑒𝑢𝑐𝑙𝑖𝑑𝑖𝑎𝑛𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑓 1, 𝑓 2) 𝑠𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑓 1, 𝑓 2) < 𝜀𝑠 𝜀𝑠 > (𝑓 2.𝑥 − 𝑓 1.𝑥) > 0 𝜀𝑠 > (𝑓 1.𝑥 − 𝑓 2.𝑥) > 0 𝛽(𝑓 1, 𝑓 2) Table 2. Spatial operators Again, we have a small distance value 𝜀𝑠 to specify that two facts are very close to each other. This distance is set as a global variable and adjustable by the developer. Since the input device coordinates are already transformed by the infrastructure layer, the value of 𝜀𝑠 is independent of a specific input device. For sInside, the fact f2 is expected to have a width and height attribute. The 𝛽 function checks whether the x and y coordinates of f1 are within the bounding box (𝑓 2.𝑥, 𝑓 2.𝑦)(𝑓 2.𝑥 + 𝑓 2.𝑤𝑖𝑑𝑡ℎ, 𝑓 2.𝑦 + 𝑓 2.ℎ𝑒𝑖𝑔ℎ𝑡). Note that we also support user-defined operators. List Operators The first line shows the definition of a rule with the name printCursor. This rule specifies the matching of all facts of type Cursor as indicated on the second line. Upon a match of a concrete fact, the values of the two x and y attributes will be bound to the variables ?x and ?y. Subsequently, the rule will trigger its actions (after the ⇒) and print the text “A cursor is moving at:” followed by the x and y coordinates. Temporal Operators As timing is very important in the context of gesture recognition, all facts are automatically annotated with timing information. This information can be easily accessed by using the dot operation and selecting the time attribute. Midas defines a set of temporal operators to check the relationship between the timing attribute of different facts. The temporal operators and their definitions are shown in Table 1. Operator tEqual tMeets tBefore tAfter tContains Args f1,f2 f1,f2 f1,f2 f1,f2 f1,f2,f3 Definition ∣𝑓 1.𝑡𝑖𝑚𝑒 − 𝑓 2.𝑡𝑖𝑚𝑒∣ < 𝜀𝑡 𝑓 1.𝑡𝑖𝑚𝑒 − 𝑓 2.𝑡𝑖𝑚𝑒 = 𝜀𝑡𝑚𝑖𝑛 𝑓 1.𝑡𝑖𝑚𝑒 < 𝑓 2.𝑡𝑖𝑚𝑒 𝑓 1.𝑡𝑖𝑚𝑒 > 𝑓 2.𝑡𝑖𝑚𝑒 𝑓 2.𝑡𝑖𝑚𝑒 < 𝑓 1.𝑡𝑖𝑚𝑒 < 𝑓 3.𝑡𝑖𝑚𝑒 Table 1. Temporal operators Note that the tEqual operator is not defined as the absolute equality but rather as being within a very small time interval 𝜀𝑡 . This fuzziness has been introduced since input device events seldom occur at exactly the same time. Similar 𝜀𝑡𝑚𝑖𝑛 , the smallest possible time interval, is used to expresses that f1 happened instantaneously after f2. The implementation of gestures often requires the reasoning over a set of events in combination with temporal and spatial constraints. Therefore, in Midas we have introduced the ListOf operator that enables the reasoning over a set of events within a specific time frame. An example of this construct is shown in Listing 3. The prerequisite will match a set of Cursor events that have the same finger id and occurred within 500 milliseconds. Finally, the matching sets are limited to those sets that contain at least 5 Cursors. Note that due to the declarative approach the developer does no longer have to keep track of the state of the cursors in the system and manually group them according to their id. Listing 3. ListOf construct 1 ?myList <− (ListOf (Cursor (same: id) 2 (within: 500) 3 (min: 5)) A list matched by the ListOf operator is guaranteed to be time ordered. This is required for the multi-touch interaction domain as one wants to reason about a specific motion along the time axis. By default, rule languages do not imply a deterministic ordering but allow arbitrary pattern matching to cover all possible combinations. The spatial operators from the previous sections are also defined over lists. For example, sAllNearBottomOf expects two lists l1 and l2 and the operator will return true only if all the elements of the first list are below all elements of the second list. Movement Operators The result of the ListOf construct is a list which can be used in combination with movement operators. A movement operator verifies that a certain property holds for an entire list. For example, the movingUp operation is defined as follows: Spatial Operators In addition to temporal operators, Midas offers the definition of spatial constraints over matching facts as shown in Table 2. These spatial operators expect that the facts they receive have an x and y attribute. 𝑚𝑜𝑣𝑖𝑛𝑔𝑈 𝑝(𝑙𝑖𝑠𝑡) ⇐⇒ ∀𝑖, 𝑗 : 𝑖 < 𝑗 ∧ 𝑙𝑖𝑠𝑡[𝑖].𝑦 > 𝑙𝑖𝑠𝑡[𝑗].𝑦 In a similar way, we have defined the movingDown, movingLeft and movingRight operators. Application Layer ‘Flick Left’ Gesture Example The application layer consists of a regular program which is augmented with a set of rules in order to describe the gestures. A large part of this program however will deal with the GUI and some gestures will only make sense if they are performed on specific GUI objects. As argued in the introduction, the reasoning over GUI objects in combination with the gestures should be straightforward. Therefore, in a Midas system the GUI objects are reified as so-called shadow facts in the reasoning engine’s working memory. This implies that the mere existence of the GUI objects automatically give rise to the associated fact. After introducing the core concepts of the Midas model, we can now explain how these concepts can be combined in order to specify simple gestures. The Flick Left gesture example that we are going to use has been implemented in numerous frameworks and interfaces for photo viewing applications. In those applications, users can move from one photo to the next one by flicking their finger over the photo in a horizontal motion to the left. In the following, we show how the Flick Left gesture can be implemented in a compact way based on Midas . The fields of a GUI object are automatically transformed into attributes of the shadow fact and can be accessed like any other fact fields. However, a shadow fact differs from regular facts in the sense that the values of its attributes are transparently kept synchronised with the values of the object it represents. This allows us to reason about application level entities inside the rule language (i.e. graphical objects). Moreover, from within the reasoning engine the methods of the object can be called in the consequence block of a rule. This is done by accessing the predefined Instance field of a shadow fact followed by the name and arguments of the method to be invoked. Listing 4 shows an example of calling the setColor method with the argument "BLUE" on a circle GUI element. Listing 4. Method call on a shadow fact instance 1 (?circle.Instance setColor ‘‘BLUE’’) Finally, the attributes of a shadow fact can be changed by using the modify construct. When the modify construct is applied to a shadow fact, the changes are automatically reflected in the shadowed object. Priorities When designing gestures, parts of certain gestures might overlap. For example, the gesture for a single click overlaps with the gesture for a double click. If the priority of the single click would be higher than the double click gesture, a user would never be able to perform a double click since the double click would always be recognised as two single click gestures. Therefore, it is important to ensure that the developer has means to define priorities between different gestures. In Midas, gestures with a higher priority will always be matched before gestures with a lower priority. An example of how to use priorities in rules is given in Figure 5. The use of priorities allows the programmer to tackle problems with overlapping gestures. The priority concept further increases modularisation since normally there is no intrusive code needed in order to separate overlapping gestures. Listing 5. Priorities 1 (defrule PrioritisedRule 2 (declare (salience 100)) 3 <prerequisites> 4 => 5 <consequence> ) One of the descriptions of the facts representing a Flick Left gesture is as follows: “an ordered list of cursor events from the same finger within a small time interval where all the events are accelerated to the left”. The implementation of new gestures in Midas mainly consists of translating such descriptions into rules as shown in Listing 6. Listing 6. Single finger ‘Flick Left’ gesture 1 (defrule FlickLeft 2 ?eventList[] <− 3 (ListOf (Cursor (same: id) (within: 500) (min: 5))) 4 (movingLeft ?eventList) 5 => 6 (assert (FlickLeft (events ?eventList)))) The prerequisites of the rule specify that there should be a list with events generated by the same finger by making use of the ListOf construct. It further defines that all these events should be generated within a time frame of 500 milliseconds and that the list must contain at least 5 elements. Gesture Composition We have shown that set operators enable the declarative categorisation of events. The Midas framework supports temporal, spatial and motion operators to declaratively specify different gestures. Furthermore, priorities increase the modularisation and shadow facts enable the programmer to reason about their graphical objects in a declarative way. In the Midas framework, it is common to develop complex gestures by combining multiple basic gestures as there is no difference in reasoning over simple or derived facts. This reusability and composition of gestures is achieved by asserting gesture-specific facts on gesture detection. The reuse and composition of gestures is illustrated in Listing 7, where a Double Flick Left composite gesture is implemented by specifying that there should be two Flick Left gestures at approximately the same time. Listing 7. ‘Double Flick Left’ gesture 1 (defrule DoubleFlickLeft 2 ?upLeftFlick <− (FlickLeft) 3 ?downLeftFlick <− (FlickLeft) 4 (sAllNearBottomOf ?downLeftFlick.events 5 ?upLeftFlick.events) 6 (tAllEqual ?upLeftFlick ?downLeftFlick) 7 => 8 (assert (DoubleFlickLeft))) The implementation of this new composite gesture in traditional programming languages would require the use of additional timers and threads. By adopting a rule-based approach, one clearly benefits from the fact that we only have to provide a description of the gestures but we do not have to be concerned on how to derive the gesture. JMIDAS PROTOTYPE IMPLEMENTATION We have implemented a concrete JMidas prototype of the Midas architecture and embedded it in the Java programming language. This enables programmers to define their rules in separate files and to load them from within Java. It also implies that developers can make use of Java objects from within their rules in order to adapt the GUI. We first describe which devices are currently supported by JMidas and then outline how to program these devices based on JMidas. Input Devices In order to support a large amount of different multi-touch devices, we have decided to implement an abstraction layer currently supported by many multi-touch devices. Two important existing abstraction layers providing a common and portable API for input device handling are the TUIO [6] and the TISCH protocol [2]. We chose TUIO as one of the input layers in the JMidas prototype implementation because of its features in terms of portability and performance. This implies that any input device supporting the TUIO protocol can automatically be used in combination with JMidas. Figure 3. JMidas Sun SPOT integration Previous experiments with the Sun SPOT device have shown that gestural movements can be successfully recognised by using a Dynamic Time Warp (DTW) template algorithm [5]. However, it was rather difficult to express concurrent gestural movements of multiple Sun SPOTs based on traditional programming languages due to similar reasons as described for the multi-touch domain. These difficulties can be overcome by using the temporal operators offered by JMidas and a developer can reason about 3D hand movements tracked by Sun SPOTs. The JMidas Sun SPOT support is still experimental but we managed to feed the reasoning engine with simple facts for individual Sun SPOTs and the recognition results for composite gestures look promising. Initialisation The setup of the JMidas engine is a two step process to be executed in the application layer as shown in Listing 8. First, a new JMidas engine is initialised (line 1) which corresponds to setting up the Midas core layer introduced earlier in Figure 1. The second step consist of connecting one or multiple input sources from the infrastructure layer (lines 2–8). This step also specifies a resolution for the display via the RES X and RES Y arguments. After this initialisation phase, the engine can be started and fed with rules. In Listing 8 this is highlighted by calling the start method and loading the rules from a file called rules.md (lines 9–10). Figure 2. JMidas application on a Stantum SMK 15.4 A first multi-touch device that has been used in combination with the JMidas framework is the Stantum SMK 15.45 shown in Figure 2, a 15.4 inch multi-touch display that can deal with more than ten simultaneous touch points. Unfortunately, the Stantum SMK 15.4 does not offer any TUIO support and we had to implement our own cross-platform TUIO bridge that runs on Linux, Windows and Mac OS X. Note that the Midas framework is not limited to multi-touch devices. A radically different type of input device supported by the JMidas framework are the Sun SPOTs6 shown in Figure 3. A Sun SPOT is a small wireless sensor device that embeds an accelerometer to measure its orientation or motion. 5 6 http://www.stantum.com http://sunspotworld.com Listing 8. JMidas engine initialisation 1 2 3 4 5 6 7 8 9 10 JMidasEngine engine = new JMidasEngine(); TuioListener listener = new TuioToFactListener(engine, RES X, RES Y); TuioClient client = new TuioClient(); client.addTuioListener(listener); SunSpotListener spListener = new SunSpotListener(engine); SunSpotClient spClient = new SunSpotClient(); spClient.addSunSpotListner(spListener); engine.start(); engine.loadFile(‘‘rules.md’’); In JMidas, we use Java annotations to reify objects as logical facts. When a class is annotated as Shadowed, the JMidas framework automatically reifies the instances as logical facts with the classname as type and the attributes corresponding to the fields of the object. The field annotation Ignore is used to exclude specific fields from being automatically reified as attributes of the logical fact. An example of how to use the Java annotation mechanism is given in Listing 9. Since the Circle object is annotated as Shadowed, its instances will be reified by JMidas as logical facts. The Circle class has the three fields x, y and z, whereas the last field has been annotated as Ignore. This implies that the z field will not be reified as an attribute of the logical fact. From within the logical rules, the programmer can match objects of type Circle by using the following expression: (Circle (x ?x) (y ?y)). Listing 9. Shadowing GUI elements 1 public @Shadowed class Circle { 2 int x; 3 int y; 4 @Ignore 5 int z; 6 Circle(int id, int x, int y) { ... } 7 } Reasoning Engine JMidas incorporates an event-driven reasoning engine based on the Rete [3] algorithm. The core reasoning engine used in JMidas is the Jess [4] rule engine for Java which had to be slightly adapted for our purpose. The Jess reasoning engine has no built in support for temporal, spatial or motion operations. Moreover, we have extended the engine to deal with ordered list operations. Other reasoning engines could have led to the same result but we opted for the Jess reasoning engine due to its performance and the fact that it already offers basic support for shadow facts. However, we felt that in the context of our work this integration was not convenient enough and added several annotations to ensure that the synchronisation between facts and classes is handled by the framework. Implemented Gestures JMidas implements all of the traditional gestures such as drag, tap, double tap, stretch and rotate. In addition to these standard gestures, we have validated our approach by implementing a set of complex gestures shown in Figure 4. According to the Natural User Interface Group, (NUI group)7 , these complex gestures possibly offer a more ergonomic and natural way to interact with multi-touch input devices. Figure 4. Complex gestures implemented in JMidas Note that we have not empirically evaluated the claims of the NUI group since the focus of our work is to prevent 7 http://nuigroup.com the programmer from dealing with the accidental complexity while implementing multi-touch applications. To the best of our knowledge, our complex gestures shown in Figure 4 are currently not implemented by any other existing multi-touch framework. DISCUSSION AND FUTURE WORK The implementation of gesture-based interaction based on low-level programming constructs complicates the introduction of new gestures and limits the reuse of common interaction patterns across different applications. Current multitouch interfaces adopt an event-driven programming model resulting in complex detection code that is hard to maintain and extend. In an event-driven application, the control flow is driven by events and not by the lexical scope of the program code, making it difficult to understand these programs. The developer has to manually store and combine events generated by several event handlers. In Midas, a programmer only has to provide a declarative description of a gesture and the underlying framework takes care of how to match gestures against events. Current state-of-the-art multi-touch toolkits provide limited software abstractions to categorise events according to specific criteria. Often they only offer information that there is an event with an identifier and a position or provide a list of events forming part of a single stroke. However, for the implementation of many multi-stroke gestures, it is crucial to categorise these events. The manual event classification complicates the event handling code significantly. With the presented Midas ListOf operator, we can easily extract lists of events with certain properties (e.g. events with the same id or events within a given time frame). With existing approaches, it is difficult to maintain temporal invariants. For example, when implementing the click and double click gestures, the gesture developer has to use a timer which takes care of triggering the click gesture after a period of time if no successive events triggered the double click gesture. This results in code that is distributed over multiple event handlers and, again, significantly complicates any event handling. We have introduced declarative temporal and spatial operators to deal with this complexity. In contrast to most existing solutions, Midas further supports the flexible composition of new complex gestures by simply using other gestures as rule prerequisites. Our Midas frameworks addresses the issues of modularisation (M), composition (C), event categorisation (EC), GUIevent correlation (G) as well as the temporal and spatial operators (TS) mentioned in the related work section. Table 3 summarises these software engineering abstractions and compares Midas with the support of these requirements in current state-of-the-art multi-touch toolkits. This comparison with related work shows that, in contrast to Midas, none of the existing multi-touch solutions addresses all the required software engineering abstractions. As we have explained earlier, there might be conflicts between different gestures in which case priorities have to be Sparsh-UI MT4j Grafiti libTISCH Qt4 .NET Templates Midas 𝑀 +/+/+/+/+/+ + 𝐶 +/+/+/+ 𝐸𝐶 +/+/+/+ 𝐺 +/+/+/+ +/+ 𝑇𝑆 + Table 3. Comparison with existing multi-touch toolkits defined to resolve these conflicts. In current approaches, this conflict resolution code is scattered over the code of multiple gestures. This implies that the addition of a new gesture requires a deep knowledge of existing gestures and results in a limited extensibility of existing multi-touch frameworks [7]. In contrast, our Midas framework offers a built-in priority mechanism and frees the developer from having to manually implement this functionality. In the future, we plan to develop an advanced priority mechanism that can be adapted dynamically. We are currently also investigating the JMidas integration of several new input devices, including Siftables8 and a voice recognition system. Even though these devices significantly differ from multitouch surfaces, we believe that the core idea of applying a reasoning engine and tightly embedding it in a host language can simplify any gesture recognition development. CONCLUSION We have presented Midas, a multi-touch gesture interaction framework that introduces a software engineering methodology for the implementation of gestures. While existing multi-touch frameworks support application developers by providing a set of predefined gestures (e.g. pinch or rotate), our Midas framework also enables the implementation of new and composed multi-touch gestures. The JMidas prototype highlights how a declarative rule-based language description of gestures in combination with a host language increases the extensibility and reusability of multi-touch gestures. The presented approach has been validated by implementing a number of standard multi-touch gestures as well as a set of more complex gestures that are currently not supported by any other toolkit. While the essential complexity in dealing with advanced multi-touch interfaces cannot be reduced, we offer the appropriate concepts to limit the accidental complexity. We feel confident that our approach will support the HCI community in implementing and investigating innovative multi-touch gestures that go beyond the current state-of-the-art. REFERENCES 1. F. P. Brooks, Jr. No Silver Bullet: Essence and Accidents of Software Engineering. IEEE Computer, 20(4):10–19, April 1987. 2. F. Echtler and G. Klinker. A Multitouch Software Architecture. In Proc. of NordiCHI 2008, 5th Nordic 8 http://sifteo.com Conference on Human-Computer Interaction, pages 463–466, Lund, Sweden, 2008. 3. C. L. Forgy. RRete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem. Artificial Intelligence, 19(1):17–37, 1982. 4. E. Friedman-Hill. Jess in Action: Java Rule-Based Systems. Manning Publications, July 2003. 5. L. Hoste. Experiments with the SunSPOT Accelerometer. Project report, Vrije Universiteit Brussel, 2009. 6. M. Kaltenbrunner, T. Bovermann, R. Bencina, and E. Costanza. TUIO: A Protocol for Table-Top Tangible User Interfaces. In Proc. of GW 2005, 6th Intl. Workshop on Gesture in Human-Computer Interaction and Simulation, Ile de Berder, France, May 2005. 7. D. Kammer, M. Keck, G. Freitag, and M. Wacker. Taxonomy and Overview of Multi-touch Frameworks: Architecture, Scope and Features. In Proc. of Workshop on Engineering Patterns for Multi-Touch Interfaces, Berlin, Germany, June 2010. 8. I. Maier, T. Rompf, and M. Odersky. Deprecating the Observer Pattern. Technical Report EPFL-REPORT-148043, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, 2010. 9. L. H. Nakatani and J. A. Rohrlich. Soft Machines: A Philosophy of User-Computer Interface Design. In Proc. of CHI ’83, ACM Conference on Human Factors in Computing Systems, pages 19–23, Boston, USA, December 1983. 10. A. D. Nardi. Grafiti: Gesture Recognition mAnagement Framework for Interactive Tabletop Interfaces. Master’s thesis, University of Pisa, 2008. 11. P. Ramanahally, S. Gilbert, T. Niedzielski, D. Velázquez, and C. Anagnost. Sparsh UI: A Multi-Touch Framework for Collaboration and Modular Gesture Recognition. In Proc. of WINVR 2009, Conference on Innovative Virtual Reality, pages 1–6, Chalon-sur-Saône, France, February 2009. 12. D. Rubine. Specifying Gestures by Example. In Proc. of ACM SIGGRAPH ’91, 18th Intl. Conference on Computer Graphics and Interactive Techniques, pages 329–337, Las Vegas, USA, August 1991. 13. B. Signer, U. Kurmann, and M. C. Norrie. iGesture: A General Gesture Recognition Framework. In Proc. of ICDAR 2007, 9th Intl. Conference on Document Analysis and Recognition, pages 954–958, Curitiba, Brazil, September 2007. 14. J. Stewart, B. B. Bederson, and A. Druin. Single Display Groupware: A Model for Co-present Collaboration. In Proc. of CHI ’99, ACM Conference on Human Factors in Computing Systems, pages 286–293, Pittsburgh, USA, May 1999.