Our detailed examples of how 3 top decompilers handle an extensive test suite will help you determine which, if any, meet your needs
The object of a Java decompiler is to convert Java class files into Java source code. In the chaotic world of software development there are many reasons, legitimate and otherwise, to wish for such a tool. Decompilers can save the day when you have the binary for your own code, but have misplaced or otherwise lost the corresponding source code. On the other hand, decompilers are the prized components of any good software piracy kit. Most often, however, decompilers help programmers clarify poor documentation (one decompiled function is worth a thousand words) or provide a means for creating not-yet-written documentation. When was the last time you thought the documentation for any software was complete and correct?
In any case, the transparent and information-rich structure of Java class files โ a feature that makes Javaโs dynamic linking much better than previous models โ also makes such tools particularly easy to build. In fact, there is an arms race brewing between decompilers and so-called obfuscators, which profess to provide Java code some measure of protection from decompilers. In essence, obfuscators remove all non-essential symbolic information from your class files and, optionally, replace it with fake symbolic information designed to confuse the decompiler. Crema, the companion obfuscator to the Mocha decompiler, was examined in detail in the December issue of JavaWorld. (See the Resources section at the end of this column for a link to this article and to several obfuscator products.)
Product overview
Iโll be reviewing three Java decompilers in this article: DejaVu, Mocha, and WingDis. These products are the only commercial decompilers Iโm aware of, but surely there are more to come.
DejaVu, distributed as part of Innovative Softwareโs OEW for Java development environment, appears to be completely independent of it. DejaVu is available on a trial basis for free.
Mocha, the first and most widely known decompiler, is free. Although Mochaโs creator, Hanpeter van Vliet, met with an untimely demise, you can still obtain a copy of the program free of charge on the Web. An official descendant of Mocha will probably be commercially available before long.
- WingDis version 2.06, a product from WingSoft, is available free as a crippled demo version and as a time-limited fully capable trial version. The full version costs 9.95.
See the Resources section at the end of this article for more information on where to find each of these products.
Each of these tools is 100% Pure Java, so the essential distribution consists of a Java class library and instructions to invoke it. Theyโre all a little quirky to set up and use, a characteristic shared by many standalone Java applications.
These are all command-line-oriented tools, so the most practical way to invoke them is to embed the detailed class path and other invocation instructions in a command file. Unfortunately, there is no standardized way to do this; the details vary depending on your choice of operating system. However, once youโve conquered the setup, the decompilers easily produce output that is virtually compiler-ready.
Testing method
I chose a small utility library, consisting of about 15 classes, as my standard test set. I compiled the library using JDK 1.02, with optimization (with the -o switch) and without debugger information (without the -g switch); settings which correspond to how most Java code would actually be delivered. I decompiled the class files with each of the three decompilers, then manually edited the decompiled sources until they could be successfully recompiled. I then decompiled these three sets of โsecond-generationโ binaries with each of the three decompilers, yielding nine sets of โthird-generationโ sources. Once I had my data, I manually compared various pairs of sources, looking for inconsistencies that might indicate incorrectly decompiled code.
Keep in mind that in performing this set of tests I had the luxury of referring to the original sources at any time, and the double luxury of having written these sources myself โ two advantages not generally available to anyone using a decompiler in earnest.
I organized decompilation errors into the categories described below. Iโve based the class error types 1 through 6 (class 1 being the least offensive) on my assumption that easy-to-spot and easy-to-fix errors are less significant than hidden or hard-to-fix errors. In the last portion of this article Iโll examine detailed code examples of these error types.
Class 1 errors
Description: Errors flagged by the compiler that are easily fixed
Examples: Boolean variable incorrectly identified as an
int; missing, but trivial, type cast
Class 2 errors
Description: Errors flagged by the compiler that are not easily fixed
Example: Generating code containing
goto
Class 3 errors
Description: Errors that create ugly and incomprehensible, but correct code
Examples: Unreconstructed flow control; unreconstructed use of
+for string appends
Class 4 errors
Description: Errors that cause subtle misprints and create subtly incorrect code
Examples: Failing to use
to escape characters in string constants; misprintingcharacter constants
Class 5 errors
Description: Errors that cause total failure
Example: Crashing without producing output
Class 6 errors
Description: Errors not flagged by the compiler that result in severely damaged
semantics
Example: Misuse or non-use of
this, and other patently incorrect code
The following table shows you which decompiler is guilty of which type of error.
| Decompiler errors by type | ||||||
|---|---|---|---|---|---|---|
| DejaVu version 1.0 | Several | No | Major problem with flow analysis | Yes | No | No |
| Mocha version beta 1 | Several | No | No | No | Crashes on some class files | No |
| WingDis version 2.06 | One | No | Overuse of if(x!=false) and similar construction | No | No | Misuse or non-use of super; mistranslation of x=a++ to a++; x=a; |
Caveat emptor: The test set was not specifically designed to validate or torture the decompilers, and it is impossible to know if the results here are representative of all classes, or if the list of problems encountered is complete.
Letโs get to the heart of the matter and see some of my testing in action. The remainder of this article provides the actual code examples of the tests, which will allow you to see how the individual decompilers fared on each class of error.
Class 1 errors: Errors flagged by the compiler that are easily fixed
All three decompilers sometimes failed to infer a Boolean type for integer operations, although it is interesting to note that they failed in different places.
Example 1: Missed inference to Boolean
PrintStream PrintStream()
{
return new PrintStream(outputstream, 1); // 1 should be true
}
At the level of bytecodes, Boolean does not exist as a type; rather, Boolean exists as a special subclass of integer, and the Boolean nature of variables has to be deduced. In the case shown above, 1 should have been true, which could have been deduced by examing the definition of Printstream. Example 2: Beautiful, but itโs not Java
Mocha transformed a static initializer into an elegant, but illegal, construction:
public ConsoleWindow(String string, int i1)
{
dead = false;
styles = { "Plain", "Bold", "Italic" };
sizes = { "8", "9", "10", "12", "14", "16", "18", "24" };
...
Bracketed initializer lists for arrays are valid only as initializers for variable declarations (either class or local), not for other assignments. The reason for this differentiation is obscure to me, but Iโm sure Sun must have had a reason. In any case, itโs apparent that these initializers are actually implemented by inline code inside constructors, generated by the compiler.
When decompiling this same static initializer, WingDis produced equally beautiful and syntactically correct code. Unfortunately, the code was not semantically correct, which results in a class 6 error type.
Using this same static initializer, DejaVu emitted perfectly legal (but ugly) code, as shown in this snippet:
public ConsoleWindow(String arg1, int arg2) {
...
String[] Har1;
Har1 = new String[3];
Har1[0] = "Plain";
Har1[1] = "Bold";
Har1[2] = "Italic";
this.styles = Har1;
...
Class 2 errors: Errors flagged by the compiler that are not easily fixed
The ability to reverse-engineer code and reproduce the same for, while, or if statements as the original code is the most surprising (approaching magical) capability of Java decompilers. Java is โdecompiler friendlyโ in several ways:
At the level of bytecodes, the much-maligned
gotostatement is the workhorse within any function, so the task of inferring the original structure from rawgotos is daunting indeed. In Java, however, there are no explicitgotostatements added by the programmer. If anygotos do exist in the code to be decompiled, they must be part of some higher-level construction.The set of control structures in Java is small, and compilers compile them in fairly stylized ways.
The Java compiler technology is immature. Highly optimizing compilers (which will eventually appear) will be able to transform code much more significantly than do current compilers.
- There is a close semantic match between Java source code and Java bytecode.
Earlier versions of WingDis sometimes produced code containing incorrect goto statements. These erroneous statements were nearly impossible to understand and a royal pain to recode correctly. Iโm pleased to report that this class of error seems to be extinct. The reviewed version of WingDis seems to flawlessly job handle flow analysis, as does Mocha. However, despite their success on my test cases, Iโm sure that โsufficiently complexโ cases would present a challenge that neither Mocha nor WingDis could overcome.
DejaVu sometimes emits correct code in this situation, but it is often incomprehensible. This type of error falls into the next error class.
Class 3 errors: Errors that create ugly and incomprehensible, but correct, code
This type of error involves reconstructions that are correct Java-wise, but perhaps not as easy to read or understand as the original. The quality of reconstructions I encountered varied widely โ from elegant to abysmal.
All three programs reconstructed simple loops quite well, but Mocha and WingDis were also able to handle complex reconstructions with equal grace. DejaVu frequently resorts to emitting legal, but nearly incomprehensible, code dominated by switch statements. Letโs look at some simple cases of loop reconstruction first.
Note: Iโve edited the following source and reconstruction examples slightly to optimize indentation and whitespace for this presentation, but the non-whitespace code is unchanged. Also, note that these functions are uncommented, as was the case in the original. This is a realistic level of commentary in small functions found in the wild.
Original source:
public String toString ()
{
String myname = this.getName();
return("#<"
+ super.toString()
+ (myname!=null ? (" " + myname) : "" )
+ ">");
}
public static int LList_Length(LList l)
{ int len=0;
while(l!=null) { len++; l=l.next; }
return(len);
}
Mochaโs reconstruction:
public String toString()
{
String string;
string = getName();
return "##<" + super.toString()
+ ((string != null) ? (" " + string) : "")
+ "#>";
}
public static int LList_Length(LList lList)
{
int i = 0;
for (; lList != null; lList = lList.next)
i++;
return i;
}
Note that in the toString method, stringโs initialization was not combined with its declaration; a minor stylistic point. In LList_Length, my while became an equivalent for.
WingDisโs reconstruction:
public String toString() {
String Stri1= getName();
return "#<" + super.toString()
+ ( (Stri1 == null)
? ""
: new StringBuffer(" "
+ Stri1).toString() )
+ ">";
}
public static int LList_Length(
LList LLis0) {
int int1;
for (int1= 0; (LLis0!= null) ; LLis0= LLis0.next) {
int1++;
}
return int1;
}
Score: 99 percent. Other than the choice of
for
instead of
while
, these first two decompilations are just like the originals. Closer than this you canโt reasonably ask.
DejaVuโs reconstruction:
public String toString() {
String obj;
StringBuffer Hobj1;
String Hobj;
obj = this.getName();
Hobj1 = new StringBuffer().append("#>").append(super.toString());
if (!(obj == null)) {
Hobj = new StringBuffer().append(" ").append(obj).toString();
}
else {
Hobj = "";
}
return Hobj1.append(Hobj).append("<").toString();
}
public static int LList_Length(LList arg0) {
int i;
i = 0;
while (arg0 != null) {
i++;
arg0 = arg0.next;
} /* end while loop */
return i;
}
The reconstruction of the
toString
method allows lots of ugly detail from Javaโs low-level implementation of
+
to show through. However, the reconstruction of
LList_Length
using
while
rather than
for
is just like the original, and it reads much better.
Now letโs look at a more troublesome case of complex loop reconstruction.
Original source:
public LList Sort_Short_LList (CompareFunction fn) {
LList out_list = this;
LList l=this;
LList in_list = l.next;
l.next = null;
while(in_list!=null)
{ /* scan through the in list, performing an insertion
sort into the out list */
LList current_list = in_list;
Object current_item = current_list.contents;
LList scan_list = out_list;
LList prev_scan_list = null;
in_list = in_list.next;
while(scan_list!=null
&& !fn.InOrder(current_item,scan_list.contents))
{
prev_scan_list = scan_list;
scan_list = scan_list.next;
}
current_list.next = scan_list;
if(prev_scan_list!=null)
{ prev_scan_list.next = current_list;
}
else
{ out_list = current_list;
}
}
return(out_list);
}
The most important thing all that three translations lost as a result of the decompilation is the comment /* scan through the in list, performing an insertion sort into the out list */. The next most significant omission is the variable names. Unfortunately, you canโt do a darn thing about it!
Mochaโs reconstruction:
public LList Sort_Short_LList(CompareFunction compareFunction)
{
LList lList1 = this;
LList lList2 = next;
next = null;
while (lList2 != null)
{
LList lList3 = lList2;
Object object = lList3.contents;
LList lList4 = lList1;
LList lList5 = null;
lList2 = lList2.next;
for (; lList4 != null && !compareFunction.InOrder(object, lList4.contents); lList4 = lList4.next)
lList5 = lList4;
lList3.next = lList4;
if (lList5 != null)
lList5.next = lList3;
else
lList1 = lList3;
}
return lList1;
}
This reconstruction is about as good and clean as you could hope for, but the absence of the comment and the meaningless variable names really hurt. Just what does that while loop do anyway?
WingDisโs reconstruction:
public LList Sort_Short_LList(
CompareFunction Comp1) {
dlib.LList LLis2= this;
LList LLis3= next;
next= null;
while (LLis3!= null) {
LList LLis4= LLis3;
Object Obje5= LLis4.contents;
LList LLis6= LLis2;
LList LLis7= null;
for (LLis3= LLis3.next; (LLis6!= null) ; LLis6= LLis6.next) {
if (Comp1.InOrder(Obje5, LLis6.contents)!= false) {
break;
}
LLis7= LLis6;
}
LLis4.next= LLis6;
if (LLis7== null) {
LLis2= LLis4;
continue;
}
LLis7.next= LLis4;
}
return LLis2;
}
WingDisโs reconstruction produced a few extra variables. Why? Got me. In cases like this โ trying to figure out just why the decompiler did what it did โ I think it would probably be easier to write a new program from scratch.
DejaVuโs reconstruction:
public LList Sort_Short_LList(CompareFunction arg1) {
LList obj5 = null;
LList obj4 = null;
LList obj3 = null;
Object obj2 = null;
LList obj1 = null;
LList obj = null;
int CTL_PC = 1;
while (true) {
switch (CTL_PC) {
case 1: {
obj5 = this;
obj4 = this.next;
this.next = null;
CTL_PC = 2;
break;
}
case 2: {
if (obj4 != null) { CTL_PC = 3; break; }
CTL_PC = 10;
break;
}
case 10: {
return obj5;
}
case 3: {
obj3 = obj4;
obj2 = obj3.contents;
obj1 = obj5;
obj = null;
obj4 = obj4.next;
CTL_PC = 4;
break;
}
case 4: {
if (obj1 == null) { CTL_PC = 7; break; }
CTL_PC = 5;
break;
}
case 5: {
if (!(arg1.InOrder(obj2, obj1.contents))) { CTL_PC = 6; bre ak; }
CTL_PC = 7;
break;
}
case 7: {
obj3.next = obj1;
if (obj == null) { CTL_PC = 8; break; }
CTL_PC = 9;
break;
}
case 9: {
obj.next = obj3;
CTL_PC = 2;
break;
}
case 8: {
obj5 = obj3;
CTL_PC = 2;
break;
}
case 6: {
obj = obj1;
obj1 = obj1.next;
CTL_PC = 4;
break;
}
}
}
}
Now this reconstruction is so bizarre, only a compiler could have produced it. My best guess is that Mocha and WingDis, starting with a graph very similar to DejaVuโs final output, deduced the clear and concise output you have seen. In defense of DejaVuโs output, it is, as far as I can tell, completely correct. Producing only this very rudimentary reconstruction, DejaVu is unlikely to have made any subtle semantic errors.
Class 4 errors: Errors that cause subtle misprints and create subtly incorrect code
Mocha and WingDis crept by this error type leaving not a single casualty. DejaVuโs handling was a little less than perfect. DejaVu made an error printing a character constant. Hereโs the original:
//the original
if((ch == 'r') || (ch =='n'))
{ charisready = true; break;
}
What DejaVu produced was this:
//Oops! character constants in this format are base 8,
//so should be '15'
if (c == '13' || !(c != 'n')) {
this.charisready = true;
}
No small potatoes here. Such an error would cause the recompiled program to look for the wrong type of end of line code.
Class 5 errors: Errors that cause total failure
Mocha crashed (or caused Java runtime to crash) when decompiling the following method, and consequently no output was generated for the class that contained the method.
public void run()
{
int target;
while((target = pickclass())classlist.length)
{
try
{Class.forName(classlist[target]);
classes_to_go--;
}
catch (ClassNotFoundException err)
{classes_to_go--;
System.out.println("Class " + classlist[target]
+ " not found " + err.toString());
}
changeLabel();
}
}
In my opinion, this type of bug is one of the most serious a decompiler can have. I have no idea why this innocuous-looking function caused Mocha to crash, but itโs interesting to note that Mocha did not crash when decompiling the reconstruction (by DejaVu) of the same function.
In any case, all decompilers are likely to encounter situations where they do not produce meaningful output; for example, all three decompilers I tested have much more difficulty when decompiling โobfuscatedโ code, and WingDis refuses to decompile anything it thinks is part of a WingSoft product.
Class 6 errors: Errors not flagged by the compiler that result in severely damaged semantics
The very worst thing a decompiler can do is to produce incorrect code, without any warning that there is a problem. WingDis has a serious, systematic problem using super โ it often adds super where it shouldnโt be, and omits super where it must be.
For example, this
public void Dispose()
{
super.Dispose();
}
became this:
public void Dispose()
{ Dispose(); //missing "super"
return;
}
WingDis produced this beautiful, legal Java code for a static initializer,
public ConsoleWindow(String string, int i1)
{
dead = false;
String [] styles = { "Plain", "Bold", "Italic" };
String [] sizes = { "8", "9", "10", "12", "14", "16", "18", "24" };
...
Unfortunately, the desired effect of initializing the instance variables styles and sizes is not preserved. To remedy this error, you would also have to add a line such as this.styles=styles;.
WingDis also has a problem reconstructing expressions that contain ++ and --. For example, this:
a[b++]=c;
results in this:
b++;
a[b]=c;
No, youโre not seeing things; WingDis produced the wrong, already incremented, value of b. The correct reconstruction would have been this:
a[b]=c;
b++;
Analyzing the results
Youโve seen what these tools can doโฆ and at what cost. That leaves us with one question: Are they worthwhile? Well, that depends. (You should know by now that nothing is ever clear cut in this industry!) As a basis for a new software product, based on reverse-engineered code? Definitely not. The complete absence of comments, and the intrinsic uncertainty that the decompiled code accurately reflects the original, makes wholesale decompilation an unattractive basis for a product. As a way to deduce out what a particular method of a class library is actually doing? Definitely! Of course, โtrade secretโ algorithms are vulnerable, but in cases of โThis function call doesnโt work, and I havenโt a clue why not!โ decompiling can provide some excellent clues. As a means to make an emergency repair in someone elseโs classes? Maybe, but with great reluctance. The effort required to understand, repair, and certify as correct even one class would be large; but if you are dealing with code of unknown origin, code from a defunction source, or code from an unresponsive source, it might be the only alternative.
Executive summary: Mocha does the best job overall, but unless someone offers to support this fatherless product, it will be overtaken by WingDis, which has some serious, but fixable, problems. DejaVu brings up the rear (the far rear) in producing readable code, but is more reliable at producing correct code than either Mocha (which crashes) or WingDis (which has semantic bugs).
What does this mean for commercial software producers? At this point you can rest easy; decompiler-assisted piracy is not your main concern โ unless, of course, you rely on absurdly simple โsecretsโ to protect your code. One area that is especially vulnerable to decompilation is simple product-enabling key schemes. (Hint: Crackers wonโt break your encryption, theyโll replace your key verification function with true.) Youโre on your own from here.


