WebAssembly is a binary format compilation target for languages such as C/C++, Rust and Go. It en... more WebAssembly is a binary format compilation target for languages such as C/C++, Rust and Go. It enables execution within Web browsers and as standalone programs. Compiled modules may interoperate with other languages such as JavaScript, and use external calls (imports) to interact with a host environment. Such interoperability dependencies influence the overall WebAssembly module performance and can limit Web/standalone execution capabilities. The implementation of a WebAssembly runtime, called TruffleWasm is described that provides a single environment for execution of both, standalone modules, and, interoperation with multiple GraalVM hosted languages such as JavaScript (GraalJS) via Truffle's interoperability framework. The Graal compiler is used to speculatively and aggressively apply profiling driven optimisations to perform Just-in-Time (JIT) code generation.
ACM Transactions on Architecture and Code Optimization
The increase in computational capability of low-power Arm architectures has seen them diversify f... more The increase in computational capability of low-power Arm architectures has seen them diversify from their more traditional domain of portable battery powered devices into data center servers, personal computers, and even Supercomputers. Thus, managed languages (Java, Javascript, etc.) that require a managed runtime environment (MRE) need to be ported to the Arm architecture, requiring an understanding of different design tradeoffs. This article studies how the lack of strong hardware support for Self Modifying Code (SMC) in low-power architectures (e.g., absence of cache coherence between instruction cache and data caches), affects Just-In-Time (JIT) compilation and runtime behavior in MREs. Specifically, we focus on the implementation and treatment of call-sites, that must maintain code consistency in the face of concurrent execution and modification to redirect control (patching) by the MRE. The lack of coherence, is compounded with the maximum distance (reach of) a call-site can...
Real-time 3D space understanding is becoming prevalent across a wide range of applications and ha... more Real-time 3D space understanding is becoming prevalent across a wide range of applications and hardware platforms. To meet the desired Quality of Service (QoS), computer vision applications tend to be heavily parallelized and exploit any available hardware accelerators. Current approaches to achieving real-time computer vision, evolve around programming languages typically associated with High Performance Computing along with binding extensions for OpenCL or CUDA execution. Such implementations, although high performing, lack portability across the wide range of diverse hardware resources and accelerators. In this paper, we showcase how a complex computer vision application can be implemented within a managed runtime system. We discuss the complexities of achieving high-performing and portable execution across embedded and desktop configurations. Furthermore, we demonstrate that it is possible to achieve the QoS target of over 30 frames per second (FPS) by exploiting FPGA and GPGPU ...
This paper describes the provision of networking facilities in a customisable operating system ca... more This paper describes the provision of networking facilities in a customisable operating system called Arena. Customisable operating systems allow applications to determine their own resource management, so that an application can execute optimally on a particular architecture. On distributed architectures there are opportunities for tailoring message passing, virtual shared memory and distributed persistent store abstractions for particular applications. However, in all these cases, the eciency of the network management is a critical determinant of performance. Arena seeks to balance exible distributed resource management with ecient network handling. Arena provides an optimised form of network message transfer which avoids unnecessary copying, and allows the application control over the processing of the message. Performance gures are given for Arena network transfer on a distributed store multicomputer, and are compared with results from a microkernel-based operating system on the...
WebAssembly is a binary format compilation target for languages such as C/C++, Rust and Go. It en... more WebAssembly is a binary format compilation target for languages such as C/C++, Rust and Go. It enables execution within Web browsers and as standalone programs. Compiled modules may interoperate with other languages such as JavaScript, and use external calls (imports) to interact with a host environment. Such interoperability dependencies influence the overall WebAssembly module performance and can limit Web/standalone execution capabilities. The implementation of a WebAssembly runtime, called TruffleWasm is described that provides a single environment for execution of both, standalone modules, and, interoperation with multiple GraalVM hosted languages such as JavaScript (GraalJS) via Truffle's interoperability framework. The Graal compiler is used to speculatively and aggressively apply profiling driven optimisations to perform Just-in-Time (JIT) code generation.
ACM Transactions on Architecture and Code Optimization
The increase in computational capability of low-power Arm architectures has seen them diversify f... more The increase in computational capability of low-power Arm architectures has seen them diversify from their more traditional domain of portable battery powered devices into data center servers, personal computers, and even Supercomputers. Thus, managed languages (Java, Javascript, etc.) that require a managed runtime environment (MRE) need to be ported to the Arm architecture, requiring an understanding of different design tradeoffs. This article studies how the lack of strong hardware support for Self Modifying Code (SMC) in low-power architectures (e.g., absence of cache coherence between instruction cache and data caches), affects Just-In-Time (JIT) compilation and runtime behavior in MREs. Specifically, we focus on the implementation and treatment of call-sites, that must maintain code consistency in the face of concurrent execution and modification to redirect control (patching) by the MRE. The lack of coherence, is compounded with the maximum distance (reach of) a call-site can...
Real-time 3D space understanding is becoming prevalent across a wide range of applications and ha... more Real-time 3D space understanding is becoming prevalent across a wide range of applications and hardware platforms. To meet the desired Quality of Service (QoS), computer vision applications tend to be heavily parallelized and exploit any available hardware accelerators. Current approaches to achieving real-time computer vision, evolve around programming languages typically associated with High Performance Computing along with binding extensions for OpenCL or CUDA execution. Such implementations, although high performing, lack portability across the wide range of diverse hardware resources and accelerators. In this paper, we showcase how a complex computer vision application can be implemented within a managed runtime system. We discuss the complexities of achieving high-performing and portable execution across embedded and desktop configurations. Furthermore, we demonstrate that it is possible to achieve the QoS target of over 30 frames per second (FPS) by exploiting FPGA and GPGPU ...
This paper describes the provision of networking facilities in a customisable operating system ca... more This paper describes the provision of networking facilities in a customisable operating system called Arena. Customisable operating systems allow applications to determine their own resource management, so that an application can execute optimally on a particular architecture. On distributed architectures there are opportunities for tailoring message passing, virtual shared memory and distributed persistent store abstractions for particular applications. However, in all these cases, the eciency of the network management is a critical determinant of performance. Arena seeks to balance exible distributed resource management with ecient network handling. Arena provides an optimised form of network message transfer which avoids unnecessary copying, and allows the application control over the processing of the message. Performance gures are given for Arena network transfer on a distributed store multicomputer, and are compared with results from a microkernel-based operating system on the...
Uploads
Papers by Andy Nisbet