Architecture optimisation with currawong

Ihor Kuz

Architecture optimisation with currawong

2011, ACM SIGCOMM Computer Communication Review

Architecture Optimisation with Currawong Nicholas FitzRoy-Dale Ihor Kuz Gernot Heiser NICTA∗ and University of New South Wales NICTA and University of New South Wales Sydney, Australia Sydney, Australia NICTA, University of New South Wales, and Open Kernel Labs Sydney, Australia nfd@cse.unsw.edu.au ihor.kuz@nicta.com.au gernot@nicta.com.au ABSTRACT We describe Currawong, a tool to perform system software architecture optimisation. Currawong is an extensible tool which applies optimisations at the point where an application invokes framework or library code. Currawong does not require source code to perform optimisations, effectively decoupling the relationship between compilation and optimisation. We show, through examples written for the popular Android smartphone platform, that Currawong is capable of significant performance improvement to existing applications. 1. INTRODUCTION Modern operating systems have large and complex APIs, and writing software that uses these APIs efficiently can be challenging. There may be multiple ways to accomplish the same task, in which the only differences between two or more alternatives are performance characteristics. Sometimes the right choice for one device is the wrong choice for another device. Even correct and efficient API usage can become inefficient as the API evolves, making the right choice a moving target. API-related inefficiencies are particularly important for mobile devices, such as smartphones. Their relatively-lowpowered processors, slow buses, constrained graphics hardware, and limited RAM make inefficiencies more obvious than they would be on more powerful hardware. Unnecessary use of the CPU or of devices also impacts power consumption, an important factor for hand-held devices. We propose a novel optimisation technique that addresses API inefficiencies: system software architecture optimisation, or architecture optimisation for short. Architecture optimisation performs work at API boundaries: that is, at the point where application code interacts with the rest of NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program. ∗ Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. APSys 2010 August 30, 2010, New Delhi, India. Copyright 2010 ACM 978-1-4503-0195-4/10/08 ...$10.00. the system. This interaction usually involves system frameworks or libraries, so Currawong improves performance by modifying the way in which framework invocations or library calls take place. Architecture optimisation does not require application source code. This is important, because it means that a system designer or end user could perform API-level optimisation on an application, instead of waiting for the developer to produce a faster version for their hardware. This gives users more control over their applications, particularly if the developer is unable or unwilling to support them. In this paper we describe the principles behind architecture optimisation through reference to our experimental optimisation tool, Currawong (named after the distinctive Australasian bird). Currawong performs architecture-level optimisation of Java programs on the Android platform. Section 2 gives a brief overview of Android and introduces two motivating examples which are used throughout the paper. Section 3 discusses the design of Currawong with reference to the running examples. Section 4 shows the results of running Currawong on the running examples. Section 5 discusses related work, and Section 6 concludes. 2. ANDROID Android is a mobile operating system developed by Google which runs on smartphones, netbooks, and similar devices [3]. In Android, applications are reliant on a system framework for services, some of which run in a separate process. Applications communicate across processes using a custom IPC mechanism. For example, applications communicate with an external component to draw graphics to the display. Bitmaps Drawing ctl. System Server Input events Application Figure 1: Android application communication The framework services that cannot be implemented as application-local libraries are implemented in the System Server, a privileged application which runs in a separate process. An example interaction with the System Server is shown in Figure 1. The application transmits bitmaps to the System Server for display using shared memory (represented by small squares in the diagram). It communicates via function-call-based IPC to instruct the drawing server to update the display (the “Drawing ctl.” connector). The System Server communicates via the same IPC mechanism to notify the application of input events, such as a finger moving on the touch screen. Android applications are written in Java and execute on Dalvik, a custom virtual machine. 2.1 Case studies Create canvas Application Wait for touch event Surface Flinger Enqueue event GUI Thread A. Before optimisation Event Thread Wait for touch event GUI Thread B. After optimisation Handle touch event Enqueue event Surface Flinger Update display B. After optimisation Figure 3: Redraw optimisation Parse spec Parse application Evaluate spec Finished No Yes Rewrite application Figure 4: Currawong data flow The specification is parsed and tokenised, and the application is disassembled. The specification is then evaluated. The specification is written in a logic programming language, so evaluation of the specification either fails (in which case the application cannot be optimised by this specification), or it results in a new, rewritten application. In the latter case, this new application is reassembled and written to disk. Each of these parts is discussed in more detail below. 3.1 Handle touch event Figure 2: Touch events optimisation The redraw optimisation modifies the way applications re-draw their displays. Android offers multiple ways to draw 2D graphics. The recommended method for relatively-static graphics is to use an API called onDraw(). This method is called whenever the display should be updated. This arrangement is shown in Figure 3 (A). Despite it being recommended only for relatively-static displays, this approach is sometimes also used for high-frame-rate games. Unfortunately, the onDraw() method is very CPU-intensive: for every frame, a Surface object, which contains bitmap data, is re-initialised and cleared. When the optimisation is applied, as shown in Figure 3 (B), several changes are made to the application. The application is modified to start a new thread, labelled “Currawong thread” in the diagram. This thread then calls the appropriate application code to re-draw the surface. Importantly, the Currawong thread maintains a persistent Surface object, rather than re-creating one each time. 3. Update display Re-draw surface Send to active application IPC Thread Wait for IPC call onDraw() Application A. Before optimisation Marshal for IPC Currawong thread Re-draw surface We illustrate the design of Currawong with two Androidbased examples of architecture optimisation. The touch events optimisation modifies the way applications are notified of touch events—finger-presses on a touch-sensitive screen. In the standard model, shown in Figure 2 (A), touch events are handled by the System Server component. Data representing the event are marshalled and sent to the foreground application using IPC. In the optimised version, shown in Figure 2 (B), the application reads directly from the relevant device node representing the touch screen hardware. This eliminates the cost of the IPC. IPC in Android is surprisingly slow, so this optimisation can have a noticeable effect. System Server call onDraw() Framework DESIGN Figure 4 shows a high-level overview of Currawong. Currawong requires as input a specification and an application. Specification Currawong decides how and what to optimise based on an optimisation specification. An optimisation specification consists of two parts: verification that an optimisation can be applied, and then implementation of that optimisation. Figure 5 shows the complete specification of the redraw optimisation. The source code for the touch events optimisation is omitted for space reasons, but is conceptually similar. Optimisation specifications are written in a custom language named Currawong Specification Language (CSL). CSL is a templated, declarative, extensible logic language. Each of these features plays an important role in making CSL maximally expressive with minimal overhead. Its logic-language roots mean that CSL is declarative: optimisations are specified in terms of what they should do, rather than how they should do it. In addition to hiding implementation details, declarative specifications are comparatively concise. The features of a logic language are rather well-suited to the type of optimisation that Currawong aims to perform. For example, CSL employs unification as its execution mechanism, as it is based on Prolog [5]. Unification and backtracking along an optimisation specification provide an approximation to the F (eventually) operator of linear temporal logic [10], and other operators can also be represented. It is common to use temporal logics for optimisation specification (see Section 5 for details) because of their expressive power. Currawong provides the same level of expressivity, but does so in a relatively simple way. CSL is a complete programming language. This means that CSL is extensible—optimisation support libraries writ- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 MatchOnDraw is Java { class $C extends android.view.View { protected void onDraw(Canvas _) { } } } MergeOnDraw is Java { class $ClassName { private int _cw_tok; public void surfaceChanged(SurfaceHolder _, int _, int _, int _) { _cw_tok = au.com.nicta.cw.Draw2D.init(this); } } } optimise(ondraw, App) is Match = App.match(MatchOnDraw), App.add_module(’au.com.nicta.cw’), App.rename_method(Match.C, onDraw, _onDraw), App.rename_call(Match.C, invalidate, _invalidate), App.merge(Match, MergeOnDraw). Figure 5: Specification for the redraw optimisation ten in CSL could be used to make specific optimisations simpler to write. The language example shown in Figure 5 includes a clause named optimise. This is the optimisation’s entry point. CSL’s templating support lets optimisation authors specify structural matches in a convenient way. The MatchOnDraw (lines 1 to 6) and MergeOnDraw (lines 8 to 16) clauses in Figure 5, for example, are written in a language which closely resembles Java, but allows pattern matching. This means that a portion of application code can be specified and, when it is found, parts of that code (such as the class name in this example) can be used directly within the optimise clause. The optimise clause itself first directs Currawong to locate a portion of the application which matches the MatchOnDraw template (line 19). It then proceeds to modify the application: first, it adds a Java module (line 20). It then renames two methods within the matched portion of the application (lines 21 and 22). Finally, it adds additional code to the application by adding the code supplied within the MergeOnDraw clause to the class matched by the MatchOnDraw clause (line 23). 3.2 Application input Applications are supplied to Currawong in Android Package (APK) format, which is the standard format for applications on Android systems. APK files contain, among other things, the compiled form of the Java code. Currawong creates an internal representation of Java binaries by disassembling them and creating in-memory representations of each class. The disassembly requires special tools, because Android’s Java implementation does not use standard Java bytecode. We use the Android-specific Baksmali disassembler [6]. Baksmali’s output is parsed and an in-memory representation is built. This representation includes descriptions of each class, the methods in each class, and all outgoing calls made by the methods. More detail could be added if necessary, but this level of representation has so far proved sufficient. Importantly, use of a disassem- bler means that Currawong does not require any applicationspecific knowledge: the optimisations it applies can be written by a framework expert without any knowledge of the particular applications to which they will be applied. 3.3 Finding optimisation candidates Optimisation candidates are found through matching code templates against application code. This process is driven by match commands within the specification. When matching, Currawong steps through each class in the application, attempting to apply the code template to the class. A class matches a template if the inheritance chain in the class is the same as the inheritance chain in the template, and all methods in the template match those in the class. A method matches another method if its signature is exactly the same. Additionally, match templates may include the special name “ ”, which matches any name; or a name starting with a dollar sign (such as “$ClassName”) which both matches any name and makes that name available to the optimiser for further reference. 3.4 Output To apply an optimisation, changes must be made to the application’s binary code. In the case of Java code, those changes are made to the assembly language representation of the code. The assembly-language files are then re-assembled using the Android-specific Smali assembler [6]. After re-assembly of source code, Currawong re-builds the APK. Because the code has changed, the file must be re-signed using a key created for Currawong. The resulting APK may be installed on the system via any standard method. Re-signing the application does not introduce any security issues by itself, because the unoptimised application must be downloaded and verified using its original signature. A minor annoyance is that updates to optimised applications must be downloaded manually and re-signed. This is not a fundamental issue and could be solved through updates to Android’s application installation mechanism. 3.5 Security properties Applications’ security properties may change after optimisation. For example, code added by the touch events optimisation reads data from the normally-inaccessible device node event0. We propose that Currawong re-use the existing Android security framework to manage this change. Android supports fine-grained security control mediated by a set of permission strings in the application’s manifest. For example, an application may require direct access to the graphics frame buffer. The user is presented with a list of the application’s security requirements prior to installation—thus, the user is informed of, and must explicitly allow, non-standard resource access. Additional fine-grained security requirements (such as access to event0) could thus be added to Android’s security mechanism to accommodate Currawong. This proposal has not yet been implemented. 4. RESULTS We applied currawong to several applications resembling existing applications available publicly for Android. The test applications were based on the display and touch event loops of publicly-available applications, with surrounding logic simplified to provide consisten test results. Name Standard Optimised Cycles 49.2 43.2 Stdev 10 5 Normalised Time 100 87 Table 1: Touch events optimisation results Name onDraw lockCanvas onDraw-opt Optimal Cycles 78.1 42.6 37.7 15.4 Stdev 55 3 5 3 Normalised Time 100 54 48 20 Table 2: Redraw optimisation results For the touch events optimisation, we replayed a simulated input sequence multiple times (by writing data to the Linux input device /dev/input/event0). For the redraw tests, each application was written to redraw the display as fast as possible. The test hardware is limited to a maximum of 60 frames per second. All applications reached, and stayed at, the expected 60 frames per second maximum. In both cases, the test hardware was a Google Android Developer Phone 1 running the stock version of Android version 1.6 as supplied by the phone’s manufacturer. Results for the touch events optimisation are shown in Table 1. Benchmark data are reported in millions of cycles executed for the duration of the test, which took several seconds. The Cycles column shows an average of multiple runs; the Stdev column reports the standard deviation, also in millions of cycles; and the Normalised Time column shows execution time relative to the unoptimised version, which was defined to execute in 100 units of time. Moving the touch-processing code to the application itself results in a modest but not insignificant performance improvement. Results for the redraw optimisation are shown in Table 2. Benchmark data is reported as total cycles executed over a period of one second. The “Cycles”, “Stdev”, and “Normalised time” columns are as per Table 1. Four separate applications are shown, ordered from slowest to fastest: onDraw shows the standard Android method; lockCanvas shows an alternative Android method; and onDraw-opt shows the Currawong optimised implementation of the standard method. Finally, Optimal does not perform screen updates, but merely writes to an area of non-screen memory 60 times a second. It shows the CPU time taken by the application itself, without any framework-imposed drawing overhead. The lockCanvas method and the Currawong method are similar in terms of performance, because they do roughly the same thing. Currawong is slightly faster because it uses native code whenever possible, whereas lockCanvas uses Java whenever possible. Importantly, the two-fold performance improvement due to Currawong was achieved without requiring any involvement from the application author. It is also interesting to note the high variance in the onDraw sample, which we suspect is due to memory allocations occasionally triggering garbage collection. These optimisations are orthogonal. It is quite plausible that a touch-centric, high-frame-rate Android application would benefit from both optimisations at once. 5. RELATED WORK Currawong borrows ideas from two main areas of related work, Active Libraries and refactoring. Veldhuizen and Gannon describe an active library as any library that attempts to guide its compiler to produce domainspecific optimisations [12]. In the simple case, this covers any library that makes use of the C preprocessor, or C++ templates, to generate domain-specific code. However, the definition also applies to those libraries which make use of a custom compiler. For example, the author of a matrix manipulation library may wish to include special-case code for the case when an identity matrix is involved in a multiplication. She could do this by writing a special check in the matrix-multiplication routine, but this slows down the routine in the general case. Instead, she may opt to use a precompilation tool which performs partial evaluation. In the cases where it can be discovered at compile time that the identity matrix is passed as a parameter, the active library may instruct the preprocessing tool to remove the call entirely. The Broadway domain-specific compiler is a implementation of many concepts behind active libraries [4]. Broadway’s compilation process is directed by an annotation file describing additional data-flow properties of each function in an active library. Broadway constructs a data-flow lattice for the entire system, which it then uses to make optimisation decisions. For example, Broadway can perform the identify-matrix optimisation described above. Broadway’s focus on a data-flow matrix for static analysis makes it a very powerful choice for certain classes of libraries, particularly those related to scientific computing. However, the most powerful of Broadway’s optimisations as described in the literature are all for scientific libraries, and it is less obvious that this choice applies to libraries more generally. The data-flow model used by Broadway also means that Broadway requires source code both for the active library and its client application. By contrast, Currawong’s other influence, refactoring, is a simple source-to-source transformation technique. Refactorings modify program structure, but do not modify program behaviour [2]. A typical refactoring changes a method’s name, and updates all references to that method to make use of the new name. Thus refactoring is less focused on optimisation than it is on API evolution—adapting old code to new APIs [1]. Refactoring implementations typically require source code, but some Keller and Hölzle’s binary component adaptation scheme can work with Java class files [8]. Currawong extends this concept by allow code modification as well as structural changes. Currawong is not the only optimiser to work directly with Java bytecode. The Soot optimiser, for example, can perform constant propagation, branch elimination, and copy propagation [11]. Rosser [13] builds on Soot with the inclusion of a specification language based on a temporal logic. Rosser’s specification language. This language, however, focuses on Soot-like optimisations, such as constant propagation and strength reduction; the specification language, with its focus on data flow, is unsuited to the specification of large-scale code transformations. However, Currawong’s and Rosser’s optimisations are orthogonal: a single application could benefit from both approaches. 6. DISCUSSION We have demonstrated a novel optimisation tool, Curra- wong, which is driven by a complete general-purpose logic programming language. Currawong distinguishes itself with its specification language, which provides programmers with a concise way to specify complex high-level optimisations. In combination with the unusual traits of not requiring source code and by working on existing systems, this characteristic makes Currawong a manifestly practical system: we have shown that Currawong can produce significant performance improvements, from small specifications, on production code. In this paper two APIs have been optimised: touch input, and graphical output. We suspect that other APIs may also be optimisable, particularly those related to other forms of input and output. We would like to extend Currawong in two directions. The first direction is to improve Currawong’s ability to find optimisation candidates. The examples presented here make use of pattern matching, a relatively simple technique. Pattern matching is ideal for optimisations focused on individual API functions, or a set of calls to API functions, because the specification closely resembles the affected code syntactically. However, pattern matching by itself is unsuited to finding data-sensitive optimisation candidates. A data-sensitive task might involve detecting that the same object is supplied as a parameter to two consecutive API calls, forming the basis of an optimisation which coalesces multiple API calls into one. Other forms of data-sensitive optimisation involve specialisation based on data values. For example, an optimisation for a matrix multiplication library might attempt to detect whether one of the matrices passed to a particular multiplication operation is the identity matrix. We have added preliminary support for the detection of data-sensitive optimisation candidates to Currawong. Currently Currawong can detect if the same object is passed to two consecutive API functions. This data-sensitive requirement is specified as an additional predicate within the optimise clause, via reference to variables within the template. Internally, Currawong uses symbolic execution [9] to determine whether two parameters refer to the same object. We plan to adding incremental support for additional datasensitive analyses. The second direction in which we would like to extend Currawong is towards supporting languages other than Java. We are currently working on adding support for C. Extracting information from compiled C is significantly harder than extracting the equivalent information from bytecode. Currawong builds a model of compiled code in a manner similar to that used by the Cake binary adaptation language [7]. An internal representation is built consisting of each function in the library, as well as all calls it makes to external functions. Android uses industry-standard shared object (.so) files for JNI. These are supplied in the ELF format. The information required by Currawong is easy to retrieve, because ELF requires that libraries include information about external functions, as well as their locations, in the ELF header. This produces a list of functions and their outgoing calls. To encapsulate these functions into classes, Currawong makes use of a feature of the JNI specification that specifies a special naming system for JNI-accessible functions. This support is enough to support API modification by replacing API calls. We are currently adding support for static analysis of the kind discussed above, as well as support for adding code to existing binaries, so that a wider range of optimisations becomes available. References [1] Danny Dig and Ralph Johnson. The role of refactorings in API evolution. In ICSM ’05: Proceedings of the 21st IEEE International Conference on Software Maintenance, pages 389–398, Washington, DC, USA, 2005. IEEE Computer Society. [2] M. Fowler and K. Beck. Refactoring: improving the design of existing code. Addison-Wesley Professional, 1999. [3] Google Inc. Google Projects for Android. http://code. google.com/android/. [4] Samuel Z. Guyer and Calvin Lin. Broadway: A compiler for exploiting the domain-specific semantics of software libraries. In Proceedings of the IEEE, volume 93, pages 342–357. IEEE, February 2005. [5] International Standards Organisation. Information technology – Programming languages – Prolog – Part 1: General core. Technical Report 13211-1, ISO, 1995. [6] JesusFreke. Smali and Baksmali. google.com/p/smali/. http://code. [7] S. Kell. Configuration and adaptation of binary software components. In Proceedings of the 31st International Conference in Software Engineering, 2009. [8] Ralph Keller and Urs Hölzle. Binary component adaptation. In Proc. ECOOP ’98, volume 1445, page 307, 1998. [9] James C. King. Symbolic execution and program testing. Communications of the ACM, 19(7), 1976. [10] André Thayse, editor. From modal logic to deductive databases: introducing a logic based approach to artificial intelligence. John Wiley & Sons, Inc., New York, NY, USA, 1989. [11] Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. Soot—a Java bytecode optimization framework. In Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research, page 13. IBM Press, 1999. [12] Todd L. Veldhuizen and Dennis Gannon. Active libraries: Rethinking the roles of compilers and libraries. In Proceedings of the SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing. SIAM Press, 1998. [13] Richard Warburton and Sara Kalvala. From specification to optimisation: An architecture for optimisation of Java bytecode. In Proceedings of the 18th International Conference on Compiler Construction, Berlin, Heidelberg, 2009. Springer-Verlag.

Log In

Architecture optimisation with currawong

Related papers

Related papers

Related topics