research-article

Efficient memory management for hardware accelerated Java Virtual Machines

Authors:

Erik D'Hollander,

Dirk StroobandtAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 14, Issue 4

Article No.: 48, Pages 1 - 18

https://doi.org/10.1145/1562514.1562516

Published: 28 August 2009 Publication History

Abstract

Application-specific hardware accelerators can significantly improve a system's performance. In a Java-based system, we then have to consider a hybrid architecture that consists of a Java Virtual Machine running on a general-purpose processor connected to the hardware accelerator. In such a hybrid architecture, data communication between the accelerator and the general-purpose processor can incur a significant cost, which may even annihilate the original performance improvement of adding the accelerator. A careful layout of the data in the memory structure is therefore of major importance to maintain the acceleration performance benefits.

This article addresses the reduction of the communication cost in a distributed shared memory consisting of the main memory of the processor and the accelerator's local memory, which are unified in the Java heap. Since memory access times are highly nonuniform, a suitable allocation of objects in either main memory or the accelerator's local memory can significantly reduce the communication cost. We propose several techniques for finding the optimal location for each Java object's data, either statically through profiling or dynamically at runtime. We show how we can reduce communication cost by up to 86% for the SPECjvm and DaCapo benchmarks. We also show that the best strategy is application dependent and also depends on the relative cost of remote versus local accesses. For a relative cost higher than 10, a self-learning dynamic approach often results in the best performance.

References

[1]

Beck, A. C. S. and Carro, L. 2005. Dynamic reconfiguration with binary translation: Breaking the ILP barrier with software compatibility. In Proceedings of the 42nd Annual Design Automation Conference (DAC). ACM, New York, 732--737.

Digital Library

[2]

Bertels, P., Heirman, W., and Stroobandt, D. 2008. Efficient measurement of data flow enabling communication-aware parallelisation. In Proceedings of the International Forum on Next-Generation Multicore/Manycore Technologies (IFMT). ACM, New York, 1--7.

Digital Library

[3]

Blackburn, S. M., Garner, R., Hoffman, C., Khan, A. M., McKinley, K. S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S. Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J. E. B., Phansalkar, A., Stefanović, D., VanDrunen, T., von Dincklage, D., and Wiedermann, B. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA'06). ACM Press, New York, 169--190.

Digital Library

[4]

Borg, A., Gao, R., and Audsley, N. 2006. A codesign strategy for embedded Java applications based on a hardware interface with invocation semantics. In Proceedings of the 4th International Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES). ACM, New York, 58--67.

Digital Library

[5]

Eeckhaut, H., Devos, H., Lambert, P., De Schrijver, D., Van Lancker, W., Nollet, V., Avasare, P., Clerckx, T., Verdicchio, F., Christiaens, M., Schelkens, P., Van de Walle, R., and Stroobandt, D. 2007. Scalable, wavelet-based video: From server to hardware-accelerated client. IEEE Trans. Multimedia 9, 7, 1508--1519.

Digital Library

[6]

Ernst, R., Henkel, J., and Benner, T. 1993. Hardware-software cosynthesis for micro-controllers. IEEE Des. Test Comput. 10, 4, 64--75.

Digital Library

[7]

Faes, P., Christiaens, M., Buytaert, D., and Stroobandt, D. 2005. FPGA-aware garbage collection in Java. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). IEEE, 675--680.

[8]

Faes, P., Christiaens, M., and Stroobandt, D. 2004. Transparent communication between Java and reconfigurable hardware. In Proceedings of the 16th IASTED International Conference Parallel and Distributed Computing and Systems, T. Gonzalez, Ed. ACTA Press, Cambridge, MA, 380--385.

[9]

Faes, P., Christiaens, M., and Stroobandt, D. 2007. Mobility of data in distributed hybrid computing systems. In Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS). IEEE Computer Society, 386.

[10]

Faes, P., Minnaert, B., Christiaens, M., Bonnet, E., Saeys, Y., Stroobandt, D., and Van de Peer, Y. 2006. Scalable hardware accelerator for comparing DNA and protein sequences. In Proceedings of the 1st International Conference on Scalable Information Systems (InfoScale'06). ACM, 33.

Digital Library

[11]

Gupta, R. K. and De Micheli, G. 1993. Hardware-software cosynthesis for digital systems. IEEE Des. Test Comput. 10, 3, 29--41.

Digital Library

[12]

Hakkennes, E. A. and Vassiliadis, S. 2001. Multimedia execution hardware accelerator. J. VLSI Signal Process. Syst. Signal Image Video Technol. 28, 3, 221--234.

Digital Library

[13]

Helaihel, R. and Olukotun, K. 1997. Java as a specification language for hardware/software systems. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). IEEE Computer Society, 690--697.

Digital Library

[14]

Lattanzi, E., Gayasen, A., Kandemir, M., Vijaykrishnan, N., Benini, L., and Bogliolo, A. 2005. Improving Java performance using dynamic method migration on FPGAs. Int. J. Embed. Syst. 1, 3, 228--236.

[15]

Lysecky, R., Stitt, G., and Vahid, F. 2006. WARP processors. Trans. Des. Autom. Electron. Syst. 11, 3, 659--681.

Digital Library

[16]

Maddimsetty, R. P., Buhler, J., Chamberlain, R. D., Franklin, M. A., and Harris, B. 2006. Accelerator design for protein sequence HMM search. In Proceedings of the 20th Annual International Conference on Super-Computing (ICS '06). ACM, New York, 288--296.

Digital Library

[17]

Panainte, E. M., Bertels, K., and Vassiliadis, S. 2007. The MOLEN compiler for reconfigurable processors. Trans. Embed. Comput. Syst. 6, 1, 6.

Digital Library

[18]

Standard Performance Evaluation Corporation. 1998. Java Virtual Machine benchmarks (SPECjvm1998).

[19]

Standard Performance Evaluation Corporation. 2008. Java Virtual Machine benchmarks (SPECjvm2008).

[20]

Vassiliadis, S., Wong, S., Gaydadjiev, G., Bertels, K., Kuzmanov, G., and Panainte, E. M. 2004. The MOLEN polymorphic processor. IEEE Trans. Comput. 53, 11, 1363--1375.

Digital Library

Cited By

Fumero JBlanaru FStratikopoulos ADohrmann SViswanathan SKotselidis CBruno RMoss E(2023)Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed HeapsProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622984(143-157)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3617651.3622984
Faes PBertels PCampenhout JStroobandt D(2018)Using method interception for hardware/software co-developmentDesign Automation for Embedded Systems10.1007/s10617-009-9040-813:4(223-243)Online publication date: 19-Dec-2018
https://dl.acm.org/doi/10.1007/s10617-009-9040-8

Index Terms

Efficient memory management for hardware accelerated Java Virtual Machines
1. Hardware
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Distributed memory
        Garbage collection

Recommendations

Shared heap management for memory-limited java virtual machines

One scarce resource in embedded systems is memory. Multitasking makes the lack of memory problem even worse. Most current embedded systems, which do not provide virtual memory, simply divide physical memory and evenly assign contiguous memory chunks to ...
Hardware-accelerated generation of 3D diffusion-limited aggregation structures

The diffusion and aggregation of particles in a medium can result in complex geometric forms with an artistic interpretation, yet these aggregates can represent many natural processes as well. Although the method is quite simple, it takes many particles ...
Implementations of hardware acceleration for MD4-family algorithms based on GPU
ASID'09: Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication

The MD4-family algorithms have been widely applied in cryptographic field. Nowadays, it is discovered that MD4- family algorithms are also suitable for random number generators. Since the MD4-family algorithms are computing intensive, they can be ...

Reviews

Reviewer: David E. Goldfarb

Bertels et al. study the problem of memory allocation in an environment that includes both a general processor and an attached hardware accelerator. In the model studied, memory is shared between the two processors, with all memory accessible to either processor. Still, local memory can be accessed much cheaper than memory attached to the other processor, so it is desirable to allocate each object to the processor that will access it more frequently. The problem is studied through several models, including naive, ideal (Delphic), and heuristically chosen allocations. The authors show that a heuristic allocator performs very well in many important cases, and they do a fine job of discussing common cases and showing why their solution is effective and efficient. In their future work, the authors should include studies of more complex usage cases, multiple attached processors, and other parallel models. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 14, Issue 4

August 2009

226 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/1562514

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 28 August 2009

Accepted: 01 April 2009

Revised: 01 March 2009

Received: 01 November 2008

Published in TODAES Volume 14, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

OptiMMA
FlexWare

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
396
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fumero JBlanaru FStratikopoulos ADohrmann SViswanathan SKotselidis CBruno RMoss E(2023)Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed HeapsProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622984(143-157)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3617651.3622984
Faes PBertels PCampenhout JStroobandt D(2018)Using method interception for hardware/software co-developmentDesign Automation for Embedded Systems10.1007/s10617-009-9040-813:4(223-243)Online publication date: 19-Dec-2018
https://dl.acm.org/doi/10.1007/s10617-009-9040-8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents