InfoQ EMag Java Agents and Bytecode
InfoQ EMag Java Agents and Bytecode
Java Agents
and Bytecode
eMag Issue 42 - May 2016
ARTICLE
ARTICLE
Java Bytecode:
Bending the Rules
Secrets of the
Bytecode Ninjas
ARTICLE
Java Bytecode:
Bending the Rules
Few developers ever work with Java bytecode
directly, but bytecode format is not difficult to
understand. In this article Rafael Winterhalter
takes us on a tour of Java bytecode & some of
its capabilities
Five Advanced
Debugging Techniques
Every Java Developer
Should Know
With architectures becoming more distributed and code more asynchronous, pinpointing
and resolving errors in production is harder
than ever. In this article we investigate five
advanced techniques that can help you get
to the root cause of painful bugs in production more quickly, without adding material
overhead.
FOLLOW US
CONTACT US
GENERAL FEEDBACK feedback@infoq.com
ADVERTISING sales@infoq.com
EDITORIAL editors@infoq.com
facebook.com
/InfoQ
@InfoQ
google.com
/+InfoQ
linkedin.com
company/infoq
A LETTER FROM
THE EDITOR
Java bytecode programming is not for the faint of
heart, but in a world where new JVM languages, fancy profilers, and proxying frameworks are prevalent,
it can be a powerful tool not just for reengineering
existing code, but for creating clean, reusable, and reduced coupling architectures.
In this eMag we have curated articles on bytecode manipulation, including how to manipulate bytecode using three important frameworks: Javassist,
ASM, and ByteBuddy, as well as several higher level
use cases where developers will benefit from understanding bytecode.
We start with Living in the Matrix with Bytecode
Manipulation, where we capture in detail two popular bytecode manipulation frameworks - Javassist and
ASM, following a presentation by New Relics Ashley
Puls.
Next, Byte Buddy creator Rafael Winterhalter
gives us a detailed recipe for easily creating Java
agents using Byte Buddy in Easily Create Java Agents
with Byte Buddy.
Takipis Tal Weiss is next up, in an accessible tutorial on valuable production debugging techniques
in 5 Advanced Java Debugging Techniques Every
Watch on InfoQ
You are probably all too familiar with the following sequence: You input
a .java file into a Java compiler, (likely using javac or a build tool like
ANT, Maven or Gradle), the compiler grinds away, and finally emits one
or more .class files.
What is Java Bytecode?
Many common Java libraries such as Spring and Hibernate, as well as most JVM languages and even
your IDEs, use bytecode-manipulation frameworks.
For that reason, and because its really quite fun, you
might find bytecode manipulation a valuable skillset
to have. You can use bytecode manipulation to perform many tasks that would be difficult or impossible
to do without it, and once you learn it, the skys the
limit.
One important use case is program analysis.
For example, the popular FindBugs bug-locator tool
uses ASM under the hood to analyze your bytecode
and locate bug patterns. Some software shops have
code-complexity rules such as a maximum number
of if/else statements in a method or a maximum
method size. Static analysis tools analyze your bytecode to determine the code complexity.
Another common use is class generation. For
example, ORM frameworks typically use proxies
based on your class definitions. Or consider security
Our example
001 /**
002 * A method annotation which should be used to
indicate
003 * important methods whose invocations should be
logged.
004 */
005 public @interface ImportantLog {
006
/**
007
* The method parameter indexes whose values
should be logged.
008
* For example,if we have the method
009
* hello(int paramA, int paramB, int paramC), and
we
010
* wanted to log the values of paramA and paramC,
then fields
011
* would be [0,2]. If we only want to log the
value of
012
* paramB, then fields would be [1].
013
*/
014
String[] fields();
015 }
For login, Sue wants to record the account ID and the username so her
fields will be set to 1 and 2, (she doesnt want to display the password!)
For the withdraw method, her fields are 0 and 1 because she wants to
output the first two fields: account ID and the amount of money to remove. Her audit log ideally will contain something like this:
To hook this up, Sue is going to use a Java agent. Introduced in JDK 1.5,
Java agents allow you to modify the bytes that comprise the classes in a
running JVM, without requiring any source code.
Without an agent, the normal execution flow of Sues program is:
1. Run Java on a main class, which is then loaded by a class loader.
2. Call the classs main method, which executes the defined process.
3. Print transactions completed.
When you introduce a Java agent, a few more things happen but lets
first see whats required to create an agent. An agent must contain a class
with a premain method. It must be packaged as a JAR file with a properly constructed manifest that contains a Premain-Class entry. There is a
switch that must be set on launch to point to the JAR path, which makes
the JVM aware of the agent.
001 java -javaagent:/to/agent.jar com/example/spring2gx/
BankTransactions
Javassist
A subproject of JBoss, Javassist (short for Java Programming Assistant) consists of a high-level object-based API and a lower-level one that
is closer to the bytecode. The more object-based one enjoys more community activity and is the focus of this article. For a complete tutorial, refer
to the Javassist website.
In Javassist, the basic unit of class representation is the CtClass
(compile time class). The classes that comprise your program are stored
in a ClassPool, essentially a container for CtClass instances.
The ClassPool implementation uses a HashMap, in which the key is
the name of the class and the value is the corresponding CtClass object.
A normal Java class contains fields, constructors, and methods. The
corresponding CtClass represents those as CtField, CtConstructor,
and CtMethod. To locate a CtClass, you can grab it by name from the
ClassPool, then grab any method from the CtClass and apply your
modifications.
Figure 3
CtMethod contains lines of code for the associated method. We can
insert code at the beginning of the method using the insertBefore
command. The great thing about Javassist is that you write pure Java, albeit with one caveat: the Java must be implemented as quoted strings.
But most people would agree thats much better than having to deal with
bytecode! (Although, if you happen to like coding directly in bytecode,
stay tuned for the ASM section.) The JVM includes a bytecode verifier to
guard against invalid bytecode. If your Javassist-coded Java is not valid,
the bytecode verifier will reject it at runtime.
Similar to insertBefore, theres an insertAfter to insert code at
the end of a method. You can also insert code in the middle of a method
by using insertAt or add a catch statement with addCatch.
Lets kick off your IDE and code your logging feature. We start with an
Agent (containing premain) and our ClassTransformer.
001 package com.example.spring2gx.agent;
002 public class Agent {
003 public static void premain(String args,
Instrumentation inst) {
004
System.out.println(Starting the agent);
005
inst.addTransformer(new
ImportantLogClassTransformer());
006 }
007 }
008
009 package com.example.spring2gx.agent;
010 import java.lang.instrument.ClassFileTransformer;
011 import java.lang.instrument.
IllegalClassFormatException;
012 import java.security.ProtectionDomain;
013
014 public class ImportantLogClassTransformer
015
implements ClassFileTransformer {
016
017 public byte[] transform(ClassLoader loader, String
className,
018
Class classBeingRedefined,
ProtectionDomain protectionDomain,
019
byte[] classfileBuffer)
throws IllegalClassFormatException {
020 // manipulate the bytes here
021
return modified_bytes;
022 }
To add audit logging, first implement transform to convert the bytes of
the class to a CtClass object. Then, you can iterate its methods and capture ones with the @ImportantLogin annotation on them, grab the input
parameter indexes to log, and insert that code at the beginning of the
method.
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016 // get important method parameter indexes
017
List<String> parameterIndexes = getParamIndexes(annotation);
018 // add logging statement to beginning of the method
019
currentMethod.insertBefore(createJavaString(currentMethod, className,
parameterIndexes));
020
}
021
}
022
return cclass.toBytecode();
023
}
024
return null;
025
}
With the annotation in hand, you can retrieve the parameter indexes. Using Javassists ArrayMemberValue, the member value fields are returned as a String
array, which you can iterate to obtain the field indexes you had embedded in the annotation.
001
003
004
002
003
004
005
006
007
008
009
10
private Annotation
getAnnotation(CtMethod method) {
MethodInfo methodInfo = method.
getMethodInfo();
AnnotationsAttribute attInfo =
(AnnotationsAttribute) methodInfo
.getAttribute(AnnotationsAttribute.
invisibleTag);
if (attInfo != null) {
return attInfo.
getAnnotation(com.example.spring.
mains.ImportantLog);
}
return null;
}
001
002
005
006
007
008
009
010
011
012
private List<String>
getParamIndexes(Annotation
annotation) {
ArrayMemberValue fields =
(ArrayMemberValue) annotation.
getMemberValue(fields);
if (fields != null) {
MemberValue[] values =
(MemberValue[]) fields.getValue();
List<String> parameterIndexes =
new ArrayList<String>();
for (MemberValue val : values)
{
parameterIndexes.
add(((StringMemberValue) val).
getValue());
}
return parameterIndexes;
}
return Collections.emptyList();
}
ASM
ASM began life as a Ph.D. project and was open-sourced in 2002. It is actively updated, and
supports Java 8 since the 5.x version. ASM consists of an event-based library and an object-based one, similar in behavior respectively to SAX and DOM XML parsers. This article will
focus on the event-based library. Complete documentation can be found at http://download.forge.objectweb.org/asm/asm4-guide.pdf.
A Java class contains many components, including a superclass, interfaces, attributes,
fields, and methods. With ASM, you can think of each of these as events; you parse the class
by providing a ClassVisitor implementation, and as the parser encounters each of those
components, a corresponding visitor event-handler method is called on the ClassVisitor
(always in this sequence).
11
12
001
002
003
004
005
import jdk.internal.org.objectweb.asm.ClassReader;
import jdk.internal.org.objectweb.asm.ClassWriter;
import jdk.internal.org.objectweb.asm.ClassVisitor;
import java.lang.instrument.ClassFileTransformer;
import java.lang.instrument.
IllegalClassFormatException;
006 import java.security.ProtectionDomain;
007
008 public class ImportantLogClassTransformer implements
ClassFileTransformer {
009
public byte[] transform(ClassLoader loader, String
className,
010
Class classBeingRedefined,
ProtectionDomain protectionDomain,
011
byte[] classfileBuffer)
throws IllegalClassFormatException {
012
ClassReader cr = new ClassReader(classfileBuffer);
013
ClassWriter cw = new ClassWriter(cr, ClassWriter.
COMPUTE_FRAMES);
014
cr.accept(cw, 0);
015
return cw.toByteArray();
016
}
017 }
Now, modify your ClassWriter to do something a little more useful by
adding a ClassVisitor (named LogMethodClassVisitor) to call your
event-handler methods, such as visitField or visitMethod, as the corresponding components are encountered during parsing.
13
14
15
around, you can actually get the existing bytecode pretty easily with javap:
001 javap -c com/example/spring2gx/mains/PrintMessage
I recommend writing the code you need in a Java test class, compiling that, and running it though javap -c to see the exact bytecode.
In the code sample above, everything in blue is actually the bytecode.
On each line, you get a one-byte opcode followed by zero or more arguments. You will need to determine those arguments for the target code,
and they can usually be extracted by doing a javap-c -v on the original
class (-v for verbose, which displays the constant pool).
I encourage you to look at the JVM spec, which defines every opcode.
There are operations like load and store (which move data between
your operand stack and your local variables), overloaded for each parameter type. For example, ILOAD moves an integer value from the stack into
a local variable field whereas LLOAD does the same for a long value.
There are also operations like invokeVirtual, invokeSpecial,
invokeStatic, and the recently added invokeDynamic, for invoking
standard instance methods, constructors, static methods, and dynamic
methods in dynamically typed JVM languages, respectively. There are also
operations for creating new classes using the new operator, or to duplicate the top operand on the stack.
In sum, the positives of ASM are:
It has a small memory footprint.
Its typically pretty quick.
Its well documented on the web.
All of the opcodes are available, so you can really do a lot with it.
Theres lots of community support.
The really only one negative, but its a big one: youre writing bytecode, so you have to understand whats going on under the hood and as
a result developers tend to take some time to ramp up.
Lessons learned
16
Events
Health Rule Violations Started
IIS
Internet
Information
Services
Device
Java
Java
Transaction Scorecard
Java
Normal
Java
83.1%
963
Slow
0.3%
Very Slow
1.3%
15
Stall
Errors
0.2%
15.1%
175
Shipments DB
Java
When your business runs on Java, count on AppDynamics to give you the complete visibility you need to be
sure they are delivering the performance and business results you need no matter how complex,
distributed or asynchronous your environment, live in production or during development.
See every line of code. Get a complete view of your environment with deep code diagnostics and
auto-discovery. Understand performance trends with dynamic baselining. And drastically reduce time
to root cause and remediation.
See why the worlds largest Java deployments rely on the AppDynamics Application Intelligence
Platform. Sign up for a FREE trial today at www.appdynamics.com/java.
appdynamics.com
17
A Java agent is a Java program that executes just prior to the start of
another Java application (the target application), affording that agent
the opportunity to modify the target application or the environment in
which it runs. In the most basic use case, a Java agent sets application
properties or configures a certain environment state, enabling the
agent to serve as a reusable and pluggable component. The following
example describes such an agent, which sets a system property that
becomes available to the actual program.
001 public class Agent {
002
public static void
premain(String arg)
{
003
System.
setProperty(myproperty, foo);
004
}
005 }
18
JAR manifest to the name of your agent class containing thepremain method. (An agent must always
be bundled as a JAR file; it cannot be specified in an
exploded format.) Next, you must launch the application by referencing the JAR files location via the
javaagent parameter on the command line.
001 java -javaagent:myAgent.jar -jar
myProgram.jar
You can also prepend optional agent arguments to
this location path. The following command starts
a Java program and attaches the given agent that
provides the value myOptions as the argument to
thepremainmethod:
001 java -javaagent:myAgent.jar=myOptions
-jar myProgram.jar
It is possible to attach multiple agents by repeating
the javaagentcommand.
A Java agent is capable of much more than
only altering the state of an applications environment; you can grant a Java agent access to the Java
instrumentation API, allowing it to modify the code
of the target application. This little known feature of
the Java virtual machine offers a powerful tool that
facilitates the implementation of aspect-oriented
programming.
You apply such modifications to a Java program
by adding a second parameter of type Instrumentation to the agents premain method. You can
use the Instrumentation parameter to perform a
range of tasks, fromdetermining an objects size in
bytes to modifying class implementations by registration of ClassFileTransformers. After it is registered, aClassFileTransformeris invoked by any
class loader upon loading a class. When invoked, a
class-file transformer has the opportunity to transform or to even fully replace any class file before the
represented class is loaded. In this way, it is possible
to enhance or modify a classs behavior before it is
put to use, as exemplified by the following example.
001 public class Agent {
002 public static void premain(String
argument, Instrumentation inst) {
003
inst.addTransformer(new
ClassFileTransformer() {
004
@Override
005
public byte[] transform(
006
ClassLoader loader,
007
String className,
008
Class<?> classBeingRedefined,
// null if class was not previously
loaded
009
ProtectionDomain
protectionDomain,
010
byte[] classFileBuffer) {
011
file.
}
});
}
012
013
014
015 }
19
20
Figure 1
often interested in a class in the context of its type hierarchy. For example, a Java agent might be required
to modify any class that implements a given interface. To determine information about a classs super
types, it no longer suffices to parse the class file that
is provided by a ClassFileTransformer, which
only contains the names of the direct super type and
interfaces. A programmer would still have to locate
the class files for these types in order to resolve a potential super-type relationship.
Another difficulty is that making direct use of
ASM in a project requires any developer on a team
to learn about the fundamentals of Java bytecode.
In practice, this often excludes many developers
from changing any code that is concerned with bytecode manipulation. In such a case, implementing a
Java agent imposes a threat to a projects long-term
maintainability.
To overcome these problems, it is desirable to
implement a Java agent using a higher-level abstraction than direct manipulation of Java bytecode. Byte
Buddy is an open-source, Apache 2.0-licensed library
that addresses the complexity of bytecode manipulation and the instrumentation API. Byte Buddys
declared goal is to hide explicit bytecode generation
behind a type-safe domain-specific language. Using
Byte Buddy, bytecode manipulation hopefully becomes intuitive to anybody who is familiar with the
Java programming language.
Byte Buddy is not exclusively dedicated to the generation of Java agents. It offers an API for the generation of arbitrary Java classes, and on top of this class
generation API, Byte Buddy offers an additional API
for generating Java agents.
a ClassLoadingStrategy. Using the default WRAPPERstrategy, above, a class is loaded by a new class
loader that has the environments class loader as a
parent.
After a class is loaded, it is accessible using the
Java reflection API. If not specified differently, Byte
Buddy generates constructors similar to those of the
superclass such that a default constructor is available
for the generated class. Consequently, it is possible
to validate that the generated class has overridden
the toString method as demonstrated by the following code.
001 assertThat(dynamicType.
newInstance().toString(),
002
is(Hello World!));
Of course, this generated class is of little practical
use. For a real-world application, the return value of
most methods is computed at run time and depends
on method arguments and object state.
Instrumentation by delegation
21
by this type. In this case, only static methods are considered because aclasswas specified as the target of
the delegation. In contrast, it is possible to delegate
to an instance of a class, in which case Byte Buddy
considers all virtual methods. If several such methods are available on a class or instance, Byte Buddy
first eliminates all methods that are not compatible
with a specific instrumentation. Among the remaining methods, the library then chooses a best match,
typically the method with the most parameters. It is
also possible to choose a target method explicitly, by
narrowing down the eligible methods by handing
an ElementMatcher to the MethodDelegation by
invoking the filter method. For example, by adding
the following filter, Byte Buddy only considers
methods named intercept as a delegation target.
001 MethodDelegation.
to(ToStringInterceptor.class)
002
.filter(ElementMatchers.
named(intercept))
After intercepting, the intercepted method still
prints Hello World! but this time, the result is computed dynamically so that, for example, it is possible
to set a breakpoint in the interceptor method that
is triggered every time toStringis called from the
generated class.
The full power of the MethodDelegation is
unleashed when specifying parameters for the interceptor method. A parameter is typically annotated for instructing Byte Buddy to inject a value when
calling the interceptor. For example, using the @Originannotation, Byte Buddy provides an instance of
the instrumented Methodas an instance of the class
provided by the Java reflection API.
001 class ContextualToStringInterceptor {
002
static String intercept(@Origin
Method m) {
003
return Hello World from +
m.getName() + !;
004
}
005 }
When intercepting the toStringmethod, the invocation is now instrumented to return Hello world
from toString!In addition to the @Origin annotation, Byte Buddy offers a rich set of annotations.
For example, using the @Super annotation on a parameter of type Callable, Byte Buddy creates and
injects a proxy instance that allows invocation of the
instrumented methods original code. If the provided
annotations are insufficient or impractical for a specific use case, it is even possible to register custom
annotations that inject a user-specified value.
22
As weve seen, it is possible to use a MethodDelegationto dynamically override a method at run time
using plain Java. That was a simple example but the
technique can apply to more practical applications.
Consider, for example, the use of code generation to
implement an annotation-driven library for enforcing method-level security. In our first iteration, the
library will generate subclasses to enforce this security. Then we will use the same approach to implement
a Java agent to do the same.
The library uses the following annotation to allow a user to specify that a method be considered
secured:
001 @interface Secured {
002
String user();
003 }
For example, consider an application that uses
the Service class to perform a sensitive action that
should only be performed if the user is authenticated as an administrator. This is specified by declaring
the Secured annotation on the method for executing this action.
001 class Service {
002
@Secured(user = ADMIN)
003
void doSensitiveAction() {
004
// run sensitive code...
005
}
006 }
It is of course p\ossible to write the security check
directly into the method. In practice, hard-coding
crosscutting concerns frequently results in copy/
pasted logic that is hard to maintain. Furthermore, directly adding such code does not scale well once an
application reveals additional requirements, such as
logging, collecting invocation metrics, or result caching. By extracting such functionality into an agent, a
method purely represents its business logic, making
it easier to read, test, and maintain a codebase.
In order to keep the proposed library simple,
the contract of the annotation declares that an IllegalStateException should be thrown if the
current user is not the one specified by the annotations user property. Using Byte Buddy, you can
implement this behavior with a simple interceptor,
such as the SecurityInterceptorin the following
example, which also keeps track of the user that is
currently logged in by its static user field.
.subclass(Service.class)
.method(ElementMatchers.
isAnnotatedBy(Secured.class))
.intercept(MethodDelegation.
to(SecurityInterceptor.class)
.andThen(SuperMethodCall.
INSTANCE)))
.make()
.load(getClass().getClassLoader(),
ClassLoadingStrategy.Default.
WRAPPER)
.getLoaded()
.newInstance()
.doSensitiveAction();
An alternative implementation for the above security framework would be to use a Java agent to
modify the original bytecode of a class such as the
above Servicerather than overriding it. By doing so,
it would no longer be necessary to create managed
instances; simply calling the following line of code
001 new Service().doSensitiveAction()
To support this approach to modifying a class, Byte
Buddy offers a concept calledrebasing a class. Rebasing a class does not create a subclass but instead
merges the instrumented code into the instrumented class to change its behavior. With this approach,
you can still access the original code of any method
of the instrumented class after instrumenting it, so
that instrumentations like SuperMethodCall work
exactly the same way as when creating a subclass.
Thanks to the similar behavior when either subclassing or rebasing, the APIs for both operations
are executed in the same way: by describing a type
using the same DynamicType.Builder interface.
You can access both forms of instrumentation via
theByteBuddyclass. To make the definition of a Java
agent more convenient, Byte Buddy does also offers
the AgentBuilderclass, which is dedicated to solve
common use cases in a concise manner. In order to
define a Java agent for method-level security, the
definition of the following class as the agents entry
point suffices.
23
24
Tal Weissis the CEO ofTakipi. Tal has been designing scalable, real-time Java and C++
applications for the past 15 years. He still enjoys analyzing a good bug, though, and
instrumenting Java code. In his free time, Tal plays jazz drums.
25
ting it correctly, you can move away from uninformative jstack thread printouts that look like:
001 pool-1-thread-1 #17 prio=5 os_
prio=31 tid=0x00007f9d620c9800
nid=0x6d03 in Object.wait()
[0x000000013ebcc000]
Compare that with the following thread printout that
contains a description of the actual work being done
by the thread, the input parameters passed to it, and
the time in which it started processing the request:
001 pool-1-thread- #17: Queue: ACTIVE_
PROD, MessageID: AB5CAD, type:
Analyze, TransactionID: 56578956,
Start Time: 10/8/2014 18:34
Heres an example for how we set a stateful thread
name.
001 private void processMessage(Message
message) { //an entry point into
your code
002
String name = Thread.
currentThread().getName();
003
try {
004
Thread.currentThread().
setName(prettyFormat(name,
getCurrTranscationID(),
005
message.
getMsgType(), message.getMsgID(),
getCurrentTime()));
006
doProcessMessage(message);
007
}
008
finally {
009
Thread.currentThread().
setName(name); // return to
original name
010
}
011 }
In this example, the thread processes messages out
of a queue, and we see the target queue from which
the thread is de-queuing messages as well as the ID
of the message being processed, the transaction to
which it is related (which is critical for reproducing
locally), and when the processing of this message
began. This last bit of information lets you look at
a server jstack with upwards of a hundred worker
threads to see which ones started first and are most
likely causing an application server to hang. (Figure
1)
The capability works just as well when youre
using a profiler, a commercial monitoring tool, a JMX
console, or even Java 8s new Mission Control. In all
these cases, having stateful threadcontexts enhances your ability to look at the live thread state or a his-
26
Capturing state from the JVM through thread contexts, however effective, is restricted to variables that
you had to format into the thread name in advance.
Figure 1. An example of how an enhanced jstack shows dynamic variable state for each thread in the dump.
Thread start time is marked asTS.
Figure 2. This thread variable state will also be shown by any JDK or commercial debugger or profiler.
Ideally, we want to be able to get the value of any
variable from any point in the code from a live JVM
without attaching a debugger or redeploying code.
BTrace, a great tool that hasnt got the recognition it
deserves, lets you do just that.
BTrace lets you run Java-like scripts on top of
a live JVM to capture or aggregate any form of variable state without restarting the JVM or deploying
new code. This lets you do pretty powerful things
like printing the stack traces of threads, writing to a
specific file, or printing the number of items of any
queue or connection pool.
You do this with BTrace scripting, a Java-like
syntax for functions that you inject into the code
through bytecode transformation (a process well
27
FileTracker.java prints
whenever the application
writes to a specific file location. Its great for pinpointing the cause of excessive
I/O operations.
Classload.java
reacts
whenever a target class is
loaded into the JVM. Its
useful for debugging JAR
Hell situations.
BTrace was designed as a
non-intrusive tool, which means
it cannot alter application state
or flow control. Thats a good
thing, as it reduces the chance
of negatively interfering with the
execution of live code and makes
Figure 3. Dynamically generating bytecode generation scripts
its use in production much more
from your IDE using the ASM Bytecode Outlineplugin.
acceptable. But this capability
comes with some heavy restricClick herefor a real-world example of a sample
tions: you cant create objects (not even Strings!),
agent we used to detect and fix sporadic memory
call into your own or third-party code (to perform acleaks coming from third-party code on our server, cortions such as logging), or even do simple things such
relating it to application state. (Figure 3)
as looping for fear of creating an infinite loop. To be
The last technique Id like to touch on is buildable to do those, youll have to write your own Java
ing native JVM agents. This approach uses the JVM
agent.
TI C++ API layer, which gives you unprecedented
A Java agent is a JAR file that providesaccess by
control of and access to the internals of the JVM. This
the JVM to an Instrumentation object to lets you
includes things like getting callbacks whenever GC
modify bytecode that has already been loaded into
starts and stops, new threads spawn, monitors are
the JVM to alter its behaviour. This essentially lets
acquired, and many more low-level events. This is by
you rewrite code that has already been loaded and
far the most powerful approach to acquire state from
compiled by the JVM without restarting the applicarunning code, as you are essentially running at the
tion or changing the .class file on disk. Think about
JVM level.
it like BTrace on steroids you can essentially inject
But with great power comes great responsibilnew code anywhere in your app, into both your own
ity, and some pretty complex challenges make this
and third-party code, to capture any information you
approach harder to implement. Since youre running
want.
at the JVM level youre no longer writing in cross-platThe biggest downside to writing your own
form Java but in low-level, platform-dependent C++.
agent is that unlike BTrace, which lets you write JaA second disadvantage is that the APIs themselves,
va-like scripts to capture state, Java agents operate
while extremely powerful, are hard to use and can
at the bytecode level. This means that if you want to
significantly impact performance, depending on the
inject code into an application, youll have to create
specific set of capabilities youre consuming.
the right bytecode. This can be tricky; bytecode can
On the plus side, this layer provides terrific access
be hard to produce and read as it follows an operato parts of the JVM that would otherwise be closed to
tor-stack-like syntax that is similar in many ways to
you in your search for the root cause of production
Assembly language. And to make things harder, since
bugs. When we began writingTakipi for production
bytecode is already compiled, any miscorrelation to
debugging, Im not sure we knew the extent to which
the location in which it is injected will be rejected
TI would play a crucial role in our ability to build the
without much fanfare by the JVMs verifier.
tool. Through the use of this layer, youre able to deBytecode generation libraries such as JavaAstect exceptions, detect calls into the OS, and map apsistandASM(which is my personal favorite) can asplication code without manual user instrumentation.
sist with this. I often use the Bytecode Outlineplugin,
If you have the time to take a look at this API layer, I
which automatically generates the right ASM code for
highly recommend it, as it opens a window into the
any Java code you type in, then generates its equivJVM not many of us use.
alent bytecode, which you can copy and paste into
your agent.
28
29
Figure 1
Java bytecode
fundamentals
30
equally bound by this static consistency check and must not generate
bytecode that would fail the verifiers consistency check.
However, the verifier does not audit all rules that are imposed by the
Java programming language. A famous example for such a rule is the
run-time erasure of generic types.
When a generic type is translated
during compilation, it is reduced
to its most general boundary. Of
course, this narrows the capabilities of the verifier when asserting
bytecode for interacting with generic types. As these generic types
were erased during compilation,
the verifier can only assure assignability to the erasure of a generic
type. Because this compromises
the capability of the JVM to verify
loaded code, the Java compiler explicitly warns about potentially unsafe usage of generic types.
Note that generic types are
not fully erased but are embedded
as meta information in a class file.
Several frameworks extract this
meta information via the reflection
API and change their behavior ac-
Unchecking checked
exceptions
A lesser-known difference between the JVM and the Java programming language is the treatment of checked exceptions. For
a checked exception, the Java
compiler normally assures that
it is either caught within a method or explicitly declared to be
thrown. However, this is only a
convention of the Java compiler
and not a feature of the JVM. At
run time, a checked exception
can be thrown independently of
any declaration.
By abusing the mentioned
erasure of generic types, it is even
possible to trick the Java compiler into throwing a checked
exception. This can be accomplished by casting a checked exception to a run-time exception.
To prevent this casting from producing a type error, it is conducted using generic types, which
are removed when translating a
method to bytecode. The following example demonstrates how
to implement such a generic
casting. (Code 1)
To throw such an exception in your code, you would just
call the static doThrow method,
supplying the root Throwable,
without having to declare an explicit throws clause. The uncheck
method is defined to throw a
generic exception T, which the
compiler must allow since the T
generic parameter, being a subclass of Throwable, might be a
RuntimeException.
Since the generic information is erased during compilation, the casting to T does not
translate into a bytecode instruction. The Java compiler warns
about this unsafe use of generics
but here this warning is intentionally ignored. This unsafe op-
Code 1
31
Return-type
overloading
32
Breaking the
constructor chain
Final-ishfield
Delaying compilation
decisions until run time
33
Java Application
Performance
Monitoring
End-to-end transaction tracing
Code level visiblity
Dynamic baselining and alerting
Data rentention
Scalability
Supports all common
Java frameworks including:
Spring
Play
Grails
Resin
Apache CXF
Jetty
Tomcat
Glassfish
JBoss
WebLogic
WebSphere
Struts
Apache TomEE
Akka
For full list, go to:
AppDynamics.com/Java
34
Ben Evansis the CEO of jClarity, a Java/JVM performance-analysis startup. In his spare time he
is one of the leaders of the London Java Community and holds a seat on the Java Community
Process Executive Committee. His previous projects include performance testing the Google IPO,
financial trading systems, writing award-winning websites for some of the biggest films of the
90s, and others.
35
36
Introduction to ASM
Figure 1
formulation, ASM starts from a
blank slate, with the ClassWriter. (When getting used to working
with ASM and direct bytecode manipulation, many developers find
the CheckClassAdapter a useful
starting point this is a Class-
Examples
001 public class Simple implements ClassGenerator {
002 // Helpful constants
003 private static final String GEN_CLASS_NAME = GetterSetter;
004 private static final String GEN_CLASS_STR = PKG_STR + GEN_CLASS_NAME;
005
006 @Override
007 public byte[] generateClass() {
008
ClassWriter cw = new ClassWriter(0);
009
CheckClassAdapter cv = new CheckClassAdapter(cw);
010
// Visit the class header
011
cv.visit(V1_7, ACC_PUBLIC, GEN_CLASS_STR, null, J_L_O, new String[0]);
012
generateGetterSetter(cv);
013
generateCtor(cv);
014
cv.visitEnd();
015
return cw.toByteArray();
016 }
017
018 private void generateGetterSetter(ClassVisitor cv) {
019
// Create the private field myInt of type int. Effectively:
020
// private int myInt;
021
cv.visitField(ACC_PRIVATE, myInt, I, null, 1).visitEnd();
022
// Create a public getter method
023
024
// public int getMyInt();
025
MethodVisitor getterVisitor =
026
cv.visitMethod(ACC_PUBLIC, getMyInt, ()I, null, null);
Java Agents and Bytecode // eMag Issue 42 - May 2016
37
027
// Get ready to start writing out the bytecode for the method
028
getterVisitor.visitCode();
029
// Write ALOAD_0 bytecode (push the this reference onto stack)
030
getterVisitor.visitVarInsn(ALOAD, 0);
031
// Write the GETFIELD instruction, which uses the instance on
032
// the stack (& consumes it) and puts the current value of the
033
// field onto the top of the stack
034
getterVisitor.visitFieldInsn(GETFIELD, GEN_CLASS_STR, myInt, I);
035
// Write IRETURN instruction - this returns an int to caller.
036
// To be valid bytecode, stack must have only one thing on it
037
// (which must be an int) when the method returns
038
getterVisitor.visitInsn(IRETURN);
039
// Indicate the maximum stack depth and local variables this
040
// method requires
041
getterVisitor.visitMaxs(1, 1);
042
// Mark that weve reached the end of writing out the method
043
getterVisitor.visitEnd();
044
// Create a setter
045
046
// public void setMyInt(int i);
047
MethodVisitor setterVisitor =
048
cv.visitMethod(ACC_PUBLIC, setMyInt, (I)V, null, null);
049
setterVisitor.visitCode();
050
// Load this onto the stack
051
setterVisitor.visitVarInsn(ALOAD, 0);
052
// Load the method parameter (which is an int) onto the stack
053
setterVisitor.visitVarInsn(ILOAD, 1);
054
// Write the PUTFIELD instruction, which takes the top two
055
// entries on the execution stack (the object instance and
056
// the int that was passed as a parameter) and set the field
057
// myInt to be the value of the int on top of the stack.
058
// Consumes the top two entries from the stack
059
setterVisitor.visitFieldInsn(PUTFIELD, GEN_CLASS_STR, myInt, I);
060
setterVisitor.visitInsn(RETURN);
061
setterVisitor.visitMaxs(2, 2);
062
setterVisitor.visitEnd();
063 }
064
065 private void generateCtor(ClassVisitor cv) {
066
// Constructor bodies are methods with special name <init>
067
MethodVisitor mv =
068
cv.visitMethod(ACC_PUBLIC, INST_CTOR, VOID_SIG, null, null);
069
mv.visitCode();
070
mv.visitVarInsn(ALOAD, 0);
071
// Invoke the superclass constructor (we are basically
072
// mimicing the behaviour of the default constructor
073
// inserted by javac)
074
// Invoking the superclass constructor consumes the entry on the top
075
// of the stack.
076
mv.visitMethodInsn(INVOKESPECIAL, J_L_O, INST_CTOR, VOID_SIG);
077
// The void return instruction
078
mv.visitInsn(RETURN);
079
mv.visitMaxs(2, 2);
080
mv.visitEnd();
081 }
082
083 @Override
084 public String getGenClassName() {
085
return GEN_CLASS_NAME;
086 }
087 }
38
This uses a simple interface with a single method to generate the bytes of the class, a helper method toreturn the name of the generated class, and some useful constants.
001 interface ClassGenerator {
002 public byte[] generateClass();
003
004 public String getGenClassName();
005
006 // Helpful constants
007 public static final String PKG_STR = kathik/java/bytecode_examples/;
008 public static final String INST_CTOR = <init>;
009 public static final String CL_INST_CTOR = <clinit>;
010 public static final String J_L_O = java/lang/Object;
011 public static final String VOID_SIG = ()V;
012 }
To drive the classes well generate, we use a harness, called Main. This provides a simple class loader and a reflective way to call back onto the methods of the generated class. For simplicity, we also write out our generated
classes into the Maven target directory into the right place to be picked up on the IDEs classpath.
001 public class Main {
002 public static void main(String[] args) {
003
Main m = new Main();
004
ClassGenerator cg = new Simple();
005
byte[] b = cg.generateClass();
006
try {
007
Files.write(Paths.get(target/classes/ + PKG_STR +
008
cg.getGenClassName() + .class), b, StandardOpenOption.CREATE);
009
} catch (IOException ex) {
010
Logger.getLogger(Simple.class.getName()).log(Level.SEVERE, null, ex);
011
}
012
m.callReflexive(cg.getGenClassName(), getMyInt);
013 }
The following class just provides a way to get access to the protected defineClass() method so we can convert
a byte[] into a class object for reflective use.
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
39
This setup makes it easy, with minor modifications, to test different class generators to explore different
aspects of bytecode generation.
The non-constructor class is similar. For example, here is a how to generate a class that has a single static
field with getters and setters for it (this generator has no call to generateCtor()):
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
Note how the methods are generated with the ACC_STATIC flag set and how the method arguments are first in
the local variable list (as implied by the ILOAD 0 pattern in an instance method, this would be ILOAD 1, as
the this reference would be stored at the 0 offset in the local variable table).
Using javap, we can confirm that this class genuinely has no constructor.
001
002
003
004
005
006
007
008
009
010
011
$ javap -c kathik/java/bytecode_examples/StaticOnly.class
public class kathik.StaticOnly {
public static int getMyInt(); Code:
0: getstatic
#11
// Field myStaticInt:I
3: ireturn
public static void setMyInt(int); Code:
0: iload_0
1: putstatic
#11
// Field myStaticInt:I
4: return
}
Until now, we have worked reflexively with the classes weve generated via ASM. This helps to keep the examples self-contained, but in many cases we want to use the generated code with regular Java files. This is easy
enough to do. The examples helpfully place the generated classes into the Maven target directory.
001 $ cd target/classes
002 $ jar cvf gen-asm.jar kathik/java/bytecode_examples/GetterSetter.class kathik/java/
bytecode_examples/StaticOnly.class
003 $ mv gen-asm.jar ../../lib/gen-asm.jar
Now we have a JAR file that can be used as a dependency in some other code. For example, we can useour
GetterSetter class:
40
This wont compile in the IDE as the GetterSetter class is not on the classpath. However, if we drop down to
the command line and supply the appropriate dependency on the classpath, everything worksfine.
001 $ cd ../../src/main/java/
002 $ javac -cp ../../../lib/gen-asm.jar kathik/java/bytecode_examples/withgen/
UseGenCodeExamples.java
003 $ java -cp .:../../../lib/gen-asm.jar kathik.java.bytecode_examples.withgen.
UseGenCodeExamples
004 42
Conclusion
In looking at the basics of generating class files from scratch, using the simple API from the ASM library, weve
seen some of the differences between the requirements of Java language and bytecode, and seen that some
of the rules of Java are actually only conventions from the language that the runtime does not enforce. Weve
also seen that you can use a correctly written class file directly from the language, just as though it had been
produced by javac. This is the basis of Javas interoperability with non-Java languages, such as Groovy or Scala.
There are a number of much more advanced techniques available, but this should get you started with
deeper investigations of the JVM runtime and how it operates.
41
byBen Evans
42
Method java/lang/Object.<init>:()V
class kathik/InvokeExamples
Method <init>:()V
Method run:()V
class java/util/ArrayList
Method java/util/
String Dydh Da
Method java/util/ArrayList.
43
and
als.add(Dydh Da)
44
opcodes, raising the questions of what invokedynamic is for and why it is useful to Java developers.
One way to think of method handles is as core reflection done in a safe, modern way
with maximum possible type safety. They are needed for MethodHandle but can also be
used alone.
Method types
// Signature of toString()
MethodType mtToString = MethodType.methodType(String.class);
// Signature of a setter method
MethodType mtSetter = MethodType.methodType(void.class, Object.
class);
006
007 // Signature of compare() from Comparator<String>
008 MethodType mtStringComparator = MethodType.methodType(int.class,
String.class, String.class);
You can now use the MethodType, along with the name and the class that defines the method, to look up the method handle. To do this, we need to call the static MethodHandles.
lookup() method. This gives us a lookup context that is based on the access rights of the
currently executing method (i.e. the method that called lookup()).
The lookup-context object has a number of methods that have names that start with
find, e.g. findVirtual(), findConstructor(), and findStatic(). These methods will return the actual method handle, but only if the lookup context was created in a method that
could access (call) the requested method. Unlike reflection, there is no way to subvert this
access control. In other words, method handles have no equivalent of the setAccessible() method. For example:
001 public MethodHandle getToStringMH() {
002
MethodHandle mh = null;
003
MethodType mt = MethodType.methodType(String.class);
004
MethodHandles.Lookup lk = MethodHandles.lookup();
005
try {
006
007
mh = lk.findVirtual(getClass(), toString, mt);
008
} catch (NoSuchMethodException | IllegalAccessException mhx) {
009
throw (AssertionError)new AssertionError().initCause(mhx);
010
}
011
return mh;
012
013 }
45
There are two methods on MethodHandle that can be used to invoke a method handle invoke() and invokeExact(). Both methods take the receiver and call arguments as parameters, so the signatures are:
001 public final Object invoke(Object... args) throws Throwable;
002 public final Object invokeExact(Object... args) throws
Throwable;
The difference between the two is that invokeExact() tries to call the method handle directly with the precise arguments provided. On the other hand, invoke() has the ability to slightly alter the method arguments
if needed. invoke() performs an asType() conversion, which can convert arguments according to this set of
rules:
Primitives will be boxed if required.
Boxed primitives will be unboxed if required.
Primitives will be widened if necessary.
A void return type will be converted to 0 (for primitive return types) or null for return types that expect a
reference type.
Null values are assumed to be correct and passed through regardless of static type.
Lets look at a simple invocation example that takes these rules into account.
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
Object rcvr = a;
try {
MethodType mt = MethodType.methodType(int.class);
MethodHandles.Lookup l = MethodHandles.lookup();
MethodHandle mh = l.findVirtual(rcvr.getClass(), hashCode, mt);
int ret;
try {
ret = (int)mh.invoke(rcvr);
System.out.println(ret);
} catch (Throwable t) {
t.printStackTrace();
}
} catch (IllegalArgumentException | NoSuchMethodException | SecurityException e) {
e.printStackTrace();
} catch (IllegalAccessException x) {
x.printStackTrace();
}
46
47
052
053
054
055
056
057 }
From the point of view of Java applications, these appear as regular class files (although they, of course, have no
possible Java source code representation). Java code treats them as black boxes that we can nonetheless call
methods on and make use of invokedynamic and related functionality.
Heres an ASM-based class for creating a Hello World using invokedynamic. (See next page)
The code is divided into two sections, the first of which uses the ASM Visitor API to create a class file called
kathik.Dynamic. Note the key call to visitInvokeDynamicInsn(). The second section contains the target
method that will be laced into the call site and the BSM that the invokedynamic instruction needs.
Note that these methods are within the InvokeDynamicCreator class and not part of our generated
kathik.Dynamic class. This means that at run time, InvokeDynamicCreator must also be on the classpath
with kathik.Dynamic for the method to be found.
When InvokeDynamicCreator runs, it creates a new class file, Dynamic.class, which contains an invokedynamic instruction, as we can see by using javap on the class.
001 public static void main(java.lang.String[]);
002
descriptor: ([Ljava/lang/String;)V
003
flags: ACC_PUBLIC, ACC_STATIC
004
Code:
005
stack=0, locals=1, args_size=1
006
0: invokedynamic #20, 0
007
5: return
// InvokeDynamic #0:runDynamic:()V
This example shows the simplest case of invokedynamic, which uses the special case of a constant CallSite object. This means that the BSM (and lookup) is done only once and so subsequent calls are fast.
More sophisticated usages of invokedynamic can quickly get complex, especially when the call site and
target method can change during the lifetime of the program.
48
PREVIOUS ISSUES
40
39
Frugal Innovation
41
#noprojects
38