The document describes a proposed new architecture for a Java processor for embedded applications. It discusses how traditional Java Virtual Machines (JVMs) use a software stack that requires significant memory resources not suitable for embedded systems. The proposed processor replaces the stack with a Way Predictive Java Look Aside Buffer (WAY JLAB) to directly execute Java bytecodes in hardware. It also includes components like a core, variable method cache, and WAY JLAB to improve performance while meeting embedded memory constraints.
The document describes a proposed new architecture for a Java processor for embedded applications. It discusses how traditional Java Virtual Machines (JVMs) use a software stack that requires significant memory resources not suitable for embedded systems. The proposed processor replaces the stack with a Way Predictive Java Look Aside Buffer (WAY JLAB) to directly execute Java bytecodes in hardware. It also includes components like a core, variable method cache, and WAY JLAB to improve performance while meeting embedded memory constraints.
Department of Computer Science and Engineering Southern Methodist University Dallas, Texas, USA
ABSTRACT Java, one of the most powerful coding languages which is used for developing different kinds of applications for different devices ranging from computers to set-top boxes. The JVM (Java Virtual Machine) is considered as the heart of executing JAVA applications, this JVM along with JIT(Just- in - Time) Complier are considered as the perfect combination for PC related JAVA applications, but as for the JIT compiler memory requirement is very high to an extent where it is extremely priceless for embedded systems like Internet Television's, Digital STB's (Set Top Box) etc. This paper presents a new kind of stack less Java processor architecture for embedded applications, which is capable of executing JAVA bytecodes directly on the hardware architecture. This processor takes the advantages of not having a stack that is replaced by Way Predictive Java Look Aside buffer (WAY JLAB) . This paper introduces a way predictive JLAB, which provides fast access to the constant pool references for JVM byte code than the Direct mapped Buffer, Set Associative Buffer. 1. INTRODUCTION Java was a huge success because of its support to the security and portability. Although there are many other reasons for its success [1], security and portability where the key factors. An applet is a kind of Java program, which can be transmitted over the Internet and can automatically be executed by a Java compatible web browser. An application which is downloadable is more prone to virus which may gather our private information like credit card details etc. by gaining unauthorized access to the system resources. But Java provides security where it only confines the applet only to the Java execution environment and by not allowing it to access the other parts of the computer. Java's Magic, the Bytecode, allows java to handle the problems like security and the portability. Bytecode is the output obtained from the Java compiler unlike the executable code obtained by the other programs, which is a set of highly optimized instructions that can be executed by the Java run time system, the so called the Java Virtual Machine (JVM). Only thing is that the Java Virtual Machine needs to be implemented for each platform, any Java program can run on it. This is how portability is obtained. A JIT (Just - In - Time) compiler for the bytecode is used in order to boost the performance. Also features like [1] robustness, simple, dynamic, multithreaded, high performance made Java such a unique language. The JVM is responsible for the features like portability and security. The key internal components present inside the JVM are the stack, Non-Heap memory, Heap memory. Thread is a thread of execution in the program. Each thread will have its own stack and each stack consists of a frames for each method executing on that thread. Each frame consists of different fields return value, operand stack, reference to run-time constant pool of current class. Heap section of the JVM is used to allocate arrays and instances of classes at run - time. As the size of the frame is fixed after it is created so we cannot store objects and arrays. Frames only store references that map to the arrays and objects on the heap. Arrays and objects can never be de-allocated, instead garbage collector reclaims them automatically. Non- heap memory consists of code cache, method area. Method area consists the Run time constant pool, field data, method data etc. Bytecode requires data, as this data is too big to store directly in the bytecodes, data is stored in the run time constant pool and the bytecodes contains reference to that constant pool. Different types of data are stored in the constant pool, some of them are class references, method references, field references, numeric and string literals. Code cache stores the methods that are compiled to the native code by the Just - In - Time compiler. JIT compiles the areas of the bytecodes that are regularly executed to the native code and this native code is stored in the code cache. The JIT compilation technique requires more memory and this cannot be supported by the embedded implementations. So, one solution [2] that can be considered in order to improve the execution performance is by using a Java Processor, where the JVM is implemented by the hardware. In the recent years many of the researchers focused on developing an efficient Java processor for embedded applications. Section 2 of the paper shows the related work of developing a Java processor. Section 3 introduces the proposed Java Processor, section 4 concludes the work by providing the advantages and disadvantages of the proposed Java Processor along with the areas of scope for future work.
2. RELATED WORK Many approaches were proposed for implementing the Java processor. Sun Microsystems developed PicoJava I [3] and PicoJava II [4] and aJile systems Inc. developed aJ-100 [5] Java processors, where all the three processors have same basic design principle of replacing the JVM with a hardware stack based machine, as JVM is a software controlled stack based machine. From [3-5] we can tell that, most of the Java Bytecodes which are simple were employed by the hardware directly and rest of the instructions are employed by microcode or software traps. Aurora VLSI Inc. developed a processor [6] where a Java processor attached to the host processor inside the host processor core as a coprocessor so that the system can execute the programs written in other programming languages using host processor and Java programs using the attached Java coprocessor. Another respectable contribution to the Java processor field is by M. Schoberl 's design of JOP (Java Optimized Processor) [9] where JVM is implemented in the hardware for time predictable execution of the real-time tasks. Stack dependent JVM's cannot support Instruction Level parallelism because they impose data dependency among consecutive instructions [2]. M. Watheq El-Kharashi et al.[7] proposed a method where stack dependency is eliminated by with the help of a hardware bytecode folding algorithm along with Tomasulo's scheduling algorithm and based on that they designed a JAFFARD processor [8] where it dynamically translates stack based bytecodes into RISC style which are stack independent instructions. Even though coprocessors support Java applications without any effect on the host processor they consume more chip area and high power consumption which are important factors to be considered for embedded devices [2]. So, in order to overcome these issues related to stack, stack is replaced by a WAY JLAB (Way Predictive Java Look Aside buffer) and the basic idea of this appears in [10,11]. Mainly the design of the WAY JLAB is similar to the design proposed in [11] and the main difference will be the usage of the Way predictive buffer instead of the Direct mapped buffer used in [11]. This Way predictive buffer will provide many advantages than compared to the Direct mapped one.
3. THE PROPOSED JAVA PROCESSOR The basic structure of the micro architecture is similar to that of the one proposed in [12] but with different functionalities for method cache and the stack. Fig 1. shows the architecture of the whole JVM design on the hardware referred to as micro architecture. Instead of method cache I used a two block variable method cache proposed in [13]. The main purpose of using this is instruction cache will be time predictable. For faster access of the constant pool and also to reduce the effective object access time WAY JLAB is used.
Fig. 1 Micro architecture for proposed hardware based JVM 3.1 Core Core can directly execute the java bytecode and also the exceptions raised by the user or the exceptions raised by the system are also handled by the core itself. One of the main tasks of the core is to either make sure whether all Java bytecodes are executed in constant time or based on the information available execution time should be known. But in some cases like when Bytecodes like INVOKEVIRTUAL arises then the called method may be un know and hence their execution time will not be known. The core also provides an interface to the Garbage Collector. 3.2 WAY JLAB Constant pool is a part of the .class file (contains java bytecode) that contains constants needed to run the code of that class. These constants include symbolic references generated by compiler and literals specified by the programmer and this constant pool can be considered as a table of variable length structures. Symbolic references are basically methods, names of classes and fields referenced from the code. These kind of references are used by the Java Virtual Machine to link code to other classes that it depends on. In order to speed up the constant pool references, resolved information is stored in WAY JLAB. At the point of run time, symbolic representation of the reference in the constant pool is used to calculate the location of actual referenced entity, and this process is referred to as constant pool resolution [11]. Sun JVM [14] and Kaffe JVM uses different methods to store the resolved information. Former converts the bytecode into _quick instructions where the offset part determines the field offset of a specific object. Later takes a different approach, where it changes the constant pool entry tag to show whether it is resolved or not and also updates constant pool entry with the resolved reference. But these are not efficient strategies for small embedded systems since memory is very precious. So, a associative buffer would be a better solution to resolve reference. But Way predictive buffer is better than the associative buffer since way predictive buffer will have better Hit time than compared to that of the associative buffers. Hence this buffer is named as WAY JLAB (Way Predictive Java Look Aside Buffer). Fig. 2 shows the design of associative Java Look Aside Buffer.
Fig. 2 Design of Associative Java Look Aside Buffer
From Fig. 2, Tag field of the class instance is compared with that of the tag's present in the buffer. Now the data present at the index location for all entries are read and each of them is considered as input for that three state buffer and then the compared tag lines are used to control input to drive the three state buffer and the three state buffer which produces the valid data output is the one which has a valid control input, that is, the one which has the tag comparison hit. Here if we look at the access it is more because it has to compare all the tags, read all the data from buffer and the use the compared tag lines to get the correct output. So, in order to reduce the access time WAY JLAB is used. Fig. 3 shows the design of WAY JLAB.
Fig. 3 Design of WAY JLAB Now here from the WAY JLAB, access time will be less because we use a prediction to get the output. After that we compare it with the tag, if it is a HIT then we put data onto the Data bus. If it is a MISS then only thing we have to do is that we have to go back and select the way that is correct and get the data from that. Prediction from the Look up table drives the MUX1 (Multiplexer) to select the predicted way and then the data is passed to second MUX and it is driven by the offset to select a particular data from that. So, by using this Way predicted JLAB we reduce the access time to a great extent. 3.3 Variable Two Block Method Cache This concept of Variable Two Block Method Cache was introduced by M.Schoeberl in [13] to make the instruction cache time predictable. According to [13], typically Java programs consists of shorter methods and also there are no branches out of the method. Here this cache is filled only on calls and returns. If single method cache is used then, for example consider,
xyz( ) { a( ) b( ) } As the cache is accessed on calls and returns, method xyz( ) may be filled multiple times when ever methods a( ) and b( ) returns. LRU replacement policy is used. First xyz () is invoked, so it will be cached and then method a ( ) will be invoked and now xyz( ) will be replaced by a( ) in the cache, after the method a( ) returns xyz( ) will be cached again, now same process will continue when method b() is invoked. This issue can be solved by caching two or more methods. So, we will consider a cache which is capable of storing 2 methods. We can use two replacement policies Next block and Stack oriented replacement policies. Consider the following example x ( ) { for ( ; ; ) { y( ) z( ) } } let x(), y(), z() block sizes be 2, 1, 2 and consider cache consists of four blocks. Next block replacement policy uses a next pointer which points to the first block . When a method of length l is loaded in the block n, next is updated to (n+l) % block count. Stack oriented block replacement policy updates the next pointer in the same way as before on a method load. It is also updated on a method return, so that it will point to the first block of the leaving method. Instruction X() Y() Ret Z() Ret Y() Ret Z() Ret Y() Ret Z() Ret Block 1 X x x Z z -->- -->- Z -->z Y Y y X Block 2 X x x -->- X x x Z z -->- -->- Z -->z Block 3 -->- Y y y X x x -->- X x x Z z Block 4 - -->- -->- Z -->z Y y y X x x -->- X Table. 1 Next Block replacement policy
Instruction X() Y() Ret Z() Ret Y() Ret Z() Ret Y() Ret Z() Ret Block 1 X x x Z X x -->x Z -->z Y Y y y Block 2 X x -->x Z -->- - - Z z -->- - -->- X Block 3 -->- Y y y y -->y y -->y X x x Z X Block 4 - -->- - -->- X x x - X x -->x Z -->- Table. 2 Stack oriented block replacement policy Above two tables show the content in the cache during for the program execution for both the replacement policies. X,Y,Z show that they are first time loaded into the cache. x,y,z represent they are present in the cache. Pointer --> points to the block which can be replaced on method call or returns based on the replacement strategy used. In the cache design next block replacement is used because when we reduce the block size of method z( ) to 1 block then in the cache we can store all the methods but in the stack oriented methods will be still exchanging. So, if we fit all the three methods into the cache then there will be no placement conflicts. So, Next block replacement policy will be better option. 3.4 Memory Manager with Garbage Collector Operations performed by the memory manager are executed in parallel with the CPU in constant time. Memory manager is responsible for managing the Java heap by allocating objects and then performing read and write operations on those and also freeing up memory used by unreferenced objects. Memory is divided into different equal-sized segments to achieve constant allocation time. Out of all the segments one of the segment is selected and it is called as the current allocation segment which is used for allocation of all the new objects, when the size left in the current allocation segment is smaller than the new object, then a new allocation segment will be selected. This memory management with garbage collector is similar to the one presented in [12]. 4. CONCLUSION A Java processor with hardware based Java Virtual Machine is presented in this paper. This model takes the advantage of using a stack less Java processor and also reduces the access time to constant pool index by using a Way predictive JLAB and this is well suited for embedded applications because of its simple design and memory usage. Beside the advantages, there are some places were more work can be done, the method cache can be configured in such a way that very less number of misses occur and also for way prediction buffer, the process of calculating the prediction can be more effective than the present methods like using PC to index the prediction table and calculating the address by XOR between register and offset.
REFERENCES [1] Herbert Schildt, Java: The Complete Reference, 8th Edition. New York, USA: McGraw- Hill, 7th ed., 2006. [2] Yi-Yu Tan; Yau, C. H.; Lo, K. M.; Yu, W. S.; Pak-Lun Mok; Shi-Sheung Fong, A., "Design and implementation of a Java processor," Computers and Digital Techniques, IEE Proceedings - , vol.153, no.1, pp.20,30, 10 Jan. 2006 [3] OConnor, J.M., and Tremblay, M.: PicoJava I: The Java virtual machine in hardware, IEEE Micro, March 1997, 17, (2), pp. 4553 [4] Sun Microsystems: PicoJava-II: Java processor core. Sun Microsystems data sheet, April 1998. [5] aJile Systems, Inc. : aJ-100 real-time low power Java TM processor, aJ-100 TM reference manual. Version 2.1, December 2001 [6] Aurora VLSI Inc.: AU-J2000: super high performance Java processor core (data sheet). Aurora VLSI Inc., 2000 [7] M. Watheq El-Kharashi, Fayez Elguibaly, and Kin F. Li. 2001. Adapting Tomasulo's algorithm for bytecode folding based Java processors. SIGARCH Comput. Archit. News 29, 5 (December 2001), 1-8 [8] El-Kharashi, M.W.; Gebali, F.; Li, K.F.; Fang Zhang, "The JAFARDD processor: a Java architecture based on a Folding Algorithm, with Reservation stations, Dynamic translation, and Dual processing," Consumer Electronics, IEEE Transactions on , vol.48, no.4, pp.1004,1015, Nov 2002 [9] Martin Schoberl, JOP: A Java Optimized Processor for Embedded Real-Time Systems, Ph.D. Thesis, Tech. Universitaet Wien, Jan 2005. [10] N. Shimizu, M. Naito, Dual Issue Queued Pipelined Java Processor TRAJA, Toward an Open Source Processor Project, Proceedings of the first IEEE Asia Pacific Conference on ASICs, pp. 213216, 1999 [11] Naohiko Shimizu and Chiaki Kon. 2003. "Java object look aside buffer for embedded applications". SIGARCH Comput. Archit. News 32, 3 (September 2003), 43-49. [12] Zabel, M.; Preusser, T.B.; Reichel, P.; Spallek, R.G., "Secure, Real-Time and Multi- Threaded General-Purpose Embedded Java Microarchitecture," Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on, vol., no., pp.59,62, 29-31 Aug. 2007 [13] Martin Schoeberl. "Time-predictable cache organization. ln Proceedings of the First International Workshop on Software Technologies for Future Dependable Distributed Systems," STFSSD 2009, Tokyo, Japan. [14] T. Lindholm, F. Yellin The Java TM Vritual Machine Specification, Addison-Wesley, 1997