Not using garbage collection

Minimize heap thrashing in your Java programs

No doubt about it, garbage collectors are weird. No, I don’t mean the helpful men and women who go from door to door collecting refuse once a week. I mean programming systems where the programmer only allocates memory, and the system decides when to free it.

Last month, Bill Venners wrote an excellent article on how garbage collectors works and a bit about how garbage collection is implemented in Sun’s Java Developer’s Kit (JDK). The important point is that garbage collection isn’t new; it simply hasn’t been so obviously present in a language targeted at such a wide programmer group before.

Also, the mythology surrounding the slowness of garbage-collected systems is just that, myth. I can show that the number of instructions executed is the same whether I call malloc() and free() or I only call malloc() and some other code calls free(). Further, I can show that a conservative garbage collector will never free something I’m currently using. The only non-zero difference in this equation is the “detection” stage in which the garbage collector finds out what is and what isn’t an unreferenced piece of memory. Here I have the advantage since I know immediately that I no longer need a piece of allocated memory and thus can free it right away.

“Aha,” you may say, “Chuck is against garbage collectors!” That is not correct because I have also tracked down too many bugs where I knew I wasn’t going to use another piece of memory again so I freed the allocated memory, only to modify the code at some later date without that knowledge and in the process introduce a bug by using memory I had already freed. Further, the “cost” or time penalty of running a garbage collection thread (which needs only access memory) is greatly reduced on a fast processor that is spending most of its time waiting for activity on the keyboard or disk drive. In the final analysis, I prefer garbage-collected systems over non-collected systems like C much the same that I prefer C over assembly language. I can spend less time thinking about the mechanics of programming and more time concentrating on the program itself.

All that being said, why write a column on not using garbage collectors?

Like most programmers who originally used C, I am very sensitive to memory issues. Thus, I find it difficult to trust a garbage collector to manage memory better than I can. (This is different than saying a garbage collector can manage memory more accurately than I can, which is a given.) Better in this context means perhaps more efficiently. As a karate student in Los Angeles I was struck by how universal one of the fundamental principles of the martial arts was: “You don’t need to block a blow that can’t hit you.” Applying this principle in this context becomes “You don’t need to collect garbage that you don’t generate.”

The compelling issue

I needed a terminal — not some surplus VT100 to hook up on my desk but a virtual terminal I could hook up to Java applications on a Web page. You see, it is one thing to talk about writing HelloWorld in Java and try to describe it on a Web page; I wanted to be able to run it on a Web page. But it isn’t an applet, it’s an application, the difference being that it communicates with the real world via a console interface like many computer programs do, not via a GUI/Web interface like applets do.

The solution seemed pretty obvious. If you’ll recall, in my July column I showed how two applets could communicate using data channels. What I needed was a way for HelloWorld to communicate to a virtual terminal. The answer was to make it out of two applets, then apply the previously workable solution: data channels.

A data channel is an interthread communication object that allows one thread to write a series of objects to another object. I was playing around with some virtual terminal applets on the Web (see the Resources section below) and decided what I really needed was an I/O stream based on the DataChannel class. Note that I could have used PipedInputStream and PipedOutputStream, but these are only one-to-one connections. With a data channel I could connect several applications to the same terminal.

First efforts

The neat thing about object-oriented languages is that you can define an interface, implement it quickly (but stupidly) to check your ideas, and then optimize the implementation for the desired performance goals. Java I/O streams are conceptually very simple things, and to implement one you need only subclass InputStream or OutputStream and then implement a couple of very simple methods. The first implementation of DataChannelOutputStream was as follows:

package util.comm;
import java.io.OutputStream;
import java.io.IOException;
public class DataChannelOutputStream extends OutputStream {
    private DataChannel dc;
    public DataChannelOutputStream(String chanID) {
        dc = DataChannel.getChannel(chanID);
    }
    public void write(int c) throws IOException {
        dc.putValue(new Integer(c));
    }
}

As you can see, 12 lines of code is pretty simple. The constructor takes the name of the underlying DataChannel, and the write() method simply spews integers (actually, they are bytes) into the channel. The other side, DataChannelInputStream, is a bit more complicated and shows why things are not as simple as one might hope. I’ll dissect it in sections.

package util.comm;
import java.io.InputStream;
import java.io.IOException;
public class DataChannelInputStream extends InputStream 
                    implements Runnable {
    private Thread monitor;
    private DataChannel dc;
    private byte buffer[] = new byte[1024];
    int getIndex = 0;
    int putIndex = 0;
    public DataChannelInputStream(String chanID) {
        monitor = new Thread(this);
        dc = DataChannel.getChannel(chanID, monitor, 1024);
        monitor.start();
    }

In the first part of the class, one thing that immediately stands out is that the input stream has to start a separate thread to monitor the data channel. The basic idea is that this monitor thread will watch the channel, pull off data as it comes across, and store it in the array named buffer. The other interesting bit is that in the previous uses of DataChannels the queue was quite small, but in this class we make it much larger. This was an early attempt at overflow management.

    private boolean overflow = false;
    private synchronized void add(int b) {
        buffer[putIndex] = (byte) b;
        putIndex = (putIndex + 1) & 1023;
        if (putIndex == getIndex)
            overflow = true;
        notify();
    }
    public synchronized int read() throws IOException {
        int r;
        while (getIndex == putIndex)
            try { wait(); } catch (InterruptedException e) { }
        r = buffer[getIndex];
        getIndex = (getIndex + 1) & 1023;
        if (overflow)
            throw new IOException("Buffer overflowed - Data Lost!");
        return r;
    }

In the above section we provide the required read method of all extenders of class OutputStream that are not abstract, and we provide a method add that the monitor thread will use to store data in our holding buffer. Both are synchronized so that the read method can block (using the wait() method) when there is no data to read. The buffer is simply a circular ring of 1024 bytes that gets filled as data comes in, and the add method checks to see if data is overwriting data that hasn’t been read. If it is, the flag overflow is set and an IOException is thrown when the buffer overflows so that the client can know that data was lost. More on this a bit later.

    public void close() throws IOException {
    Thread x = monitor;
        monitor = null;
        dc.releaseChannel(x);
    }
    public void run() {
        while (monitor == Thread.currentThread()) {
            Object q = null;
            try {
                q = dc.getValue();
            } catch (DataChannelException e) {
                System.out.println("Got exception.");
                e.printStackTrace();
                continue;
            }
            if (q instanceof Integer) {
                add(((Integer) q).intValue());
            }
        }
    }
}

The thread management methods are the final section of DataChannelInputStream. The close method simply sets monitor to null so that the thread will exit when the channel is released. We could also catch DataChannelShutdown in the run method and exit but this works fine.

The run method is a loop that sucks data out of the DataChannel and stuffs it into the objects queuing buffer. Imagine that add is the interface for the DataChannel thread and read is the interface for the user of this object.

But weren’t we talking about garbage collectors?

Purity meets practicality

As I mentioned earlier, these two classes do the job. Using them I can create an applet, open up a DataChannelOutputStream, create another applet, open up a DataChannelInputStream, and spew data all day long. However, in doing so I uncovered two issues with this implementation.

The first issue has to do with how important it is for data to get from here to there. As a communication channel, data channels have some compliance, or flexibility, to the load placed on them. However, when the data producer turns out more data than the consumer on the other end of a channel is able to process, the consumer gets DataChannelOverrun exceptions that it can elect to ignore. In a terminal’s data stream; however, it is vital that all data get from the input to the output; thus you must have some way of causing the producer to wait when the pipeline is full.

The second issue is an inordinate amount of garbage production. Consider that this simple design allocates an Integer object for every byte sent through the stream, and printing out this column through a DataChannelOutputStream could generate and discard more than 30,000 objects. Those 30,000 objects can represent more than 5 megabytes of heap space. And for what? Simple coding.

Searching for a middle ground

Addressing the first issue is straightforward. The DataItem class’ implementation needs to be updated to allow a DataItem object to block the writer. However, to prevent this from becoming an incompatible change to an existing interface, I added a new method, setBlocking(), to turn this behavior “on.” The actual implementation required changing the insert() method in DataItem to call wait() if the queue was full, and to have the fetch() method call notify() after reading to wake up any waiting writer threads and recycling.

To facilitate reusing an object, the producer client of that object has to know when the object is free to be reused. Further, the consumer client of that object has to know who to notify when the object is no longer needed. These two properties are instantiated in the interfaces Referenced and Recyclable.

The Referenced interface is a simple reference-counting interface to do bookkeeping on the number of times the object reference has been “handed out” versus the number of times it has been “turned in.” It defines three methods: addRefCount(), which increases the number of references; decRefCount, which reduces the number of references; and refCount(), which simply returns the number of references.

The Recyclable interface serves a dual purpose. It is implemented both by objects that can be recycled and by objects that can recycle recyclable objects. The two methods it defines are recycle() and recycle(Object x). For any recyclable object, one simply calls the recycle() method. When the object determines that there are no longer any references to it, it calls the recycle() method on the recyclable object that produced it and passes a pointer to itself.

If you will recall, data channels work by passing around references to objects. This makes them very efficient and general-purpose. By using interfaces for these new capabilities, the ability to reuse objects can be integrated into the DataChannel class with no impact on existing clients.

Before object reuse, the flow for the putValue() method of DataChannel was:

    for every reader of this channel.
        call <b>DataItem</b>.<i>insert(passed object)</i>;

and the flow for getValue() was simply:

    
    find my queue.
    read the next object (block if queue is empty)
    return the object.

We now modify the flow of putValue() to be:

    for every reader of this channel.
        if (object instanceof Referenced)
            object.addRefCount();
        call DataItem.insert(object);

and on getValue() we can use:

    find my queue.
    read the next object.
    <use it>
    if (object instanceof Recyclable)
         object.recycle();
    return;

For clients that don’t pass objects that implement the new recycling paradigm, the channels work the same way they always did.

To limit the actual production of objects I needed some way to buffer writes to the data channel in an intelligent way. I/O streams have no specific boundaries for when a complete “record” of bytes are sent, thus the DataChannelOutputStream doesn’t really know when to ship the bytes to the other end of the channel. However, output streams do have a flush() method, which means “ship data now!” Therefore my solution to the production problem was to create a recyclable buffer object that could be partially filled and shipped on demand to the other side of the channel. This became embodied in the StreamBuffer class.

The StreamBuffer interface is pretty simple; the constructor takes two parameters, a size and a reference to an object that implements the Recyclable interface. The size determines how big the internal buffer is, and should be large enough to be efficient but not so large as to be often wasted. The recycler is used to actually recycle the object when it is no longer needed.

This class then has a couple of housekeeping methods — space(), and avail(). Space returns the amount of space left in the buffer area, and avail() returns how much data is in the buffer. By summing the two values, the total size of the buffer can be computed.

These methods are followed by read() and write. As you would expect, read returns bytes from the buffer whereas write stores bytes into the buffer. There is one difference, however: write takes no parameters, as there is only one data buffer to write into; internally it maintains the amount of data stored into the buffer and returns a boolean true when the buffer is full. The read method, on the other hand, is used by all of the consumers of this data, and while one reader may have read ten bytes, another may have read one hundred, so it takes a parameter from the client hasBeenRead to indicate which bytes the client has already seen. In this way the client can increment hasBeenRead after each call and get bytes sequentially out of the buffer. This method is defined to return an integer but only returns bytes from the buffer. This is to allow the return value to be overloaded such that a return of -1 indicates no more bytes are available whereas a byte with the value 0xff would be returned as 255.

Finally, the StreamBuffer implements the reference counting and recycling interfaces. These are pretty straightforward but you can check out the code if you are curious.

Now the stage is nearly set for the completion of my efforts to eliminate garbage through recycling and limited production.

Hooking them together

With the StreamBuffer object we can now call its write method when the client of our input stream writes bytes. This will cue up bytes in this object until one of two events occur: Either the client calls the flush() method, or the buffer fills up and write() returns true. In either case the buffer is immediately sent out on the data channel.

In the new implementation of insert() in the DataItem class, the reference is dropped into the queue of every listener on the channel. Each time the reference is put into a queue, insert() calls addRefCount() on it. Thus, once “delivered,” the object’s reference count is now the same number as the number of clients who want to read it.

Each client is then notified of the arrival of this new object, and the object is removed from the queue. In the case of DataChannelInputStream I chose to actually requeue the object in the object. The downside was that I needed another class for maintaining this queue, but the upside is that the channel buffer, which was set to 1024 originally, could be set to a much lower level (I chose eight). This queue was built into a class named StreamBufferList.

The DataChannelOutputStream class did not change too much from my first attempt; in fact, the only two significant additions were a recycle method that would receive the StreamBuffers after they had been read by all of the readers, and a StreamBufferList object to keep free buffers around for reuse.

The DataChannelInputStream class changed somewhat more radically.

First the run() method was rewritten to deal with StreamBuffer objects instead of Integer objects. Note that non-stream buffer objects are simply ignored, which allows for bogus producers sending data to this channel. Further, since the StreamBuffers replace the buffer array, they are put into a queue using a StreamBufferList object.

Next the read() method was rewritten to read data out of the current StreamBuffer rather than the circular array. Further, when the read exhausts the StreamBuffer, the recycle() method is called on the object. If this was the last client to read the data, the object sends itself back to the DataChannelOutputStream that produced it to be recycled.

Finally, the avail() method in the StreamBufferList class was used in conjunction with the avail() method in the StreamBuffer class to implement the avail() function in InputStream. By doing this, our virtual terminal class can ask the stream how many bytes are available and read them all in one operation.

Wrapping up

So was I successful? In minimizing the impact on the heap by my virtual terminal I was quite successful. Whereas the simple version would generate tens of thousands of objects, the modified version generates only tens of objects. You can always check your impact on the system using the -verbosegc flag to the Java program. For example, to run the appletviewer using this flag, type the following command:

    java -verbosegc sun.applet.AppletViewer URL

The moral of this story is that you can improve performance of your Java applications by not creating excess objects to be garbage collected. This is true for two reasons. First, the smaller heap requirement will keep the amount of virtual memory that the run time uses smaller, which results in less swapping. Second, while it is true that the same number of cycles are used whether or not it is the garbage collector freeing memory, it is less expensive in terms of CPU cycles to reuse an existing object than it is to allocate a new copy of that object. A valuable tool in the Java programmer’s toolbox is the knowledge of when not to call new.

Chuck McManis is currently the director of system software at FreeGate Corp. FreeGate is a venture-funded start-up that is exploring opportunities in the Internet marketplace. Before joining FreeGate, McManis was a member of the Java group. He joined the Java group just after the formation of FirstPerson Inc. and was a member of the portable OS group (the group responsible for the OS portion of Java). Later, when FirstPerson was dissolved, he stayed with the group through the development of the alpha and beta versions of the Java platform. He created the first “all Java” home page on the Internet when he did the programming for the Java version of the Sun home page in May 1995. He also developed a cryptographic library for Java and versions of the Java class loader that could screen classes based on digital signatures. Before joining FirstPerson, Chuck worked in the operating systems area of SunSoft developing networking applications, where he did the initial design of NIS+.

Topics

About

Policies

Our Network

More

Not using garbage collection

Minimize heap thrashing in your Java programs

The compelling issue

First efforts

Purity meets practicality

Searching for a middle ground

Hooking them together

Wrapping up

Show me more

What makes JavaScript great

Swiss launch open source AI model as “ethical” alternative to big US LLMs

Gemini 2.5 Flash Image model advances AI image generation

Getting encryption wrong (and getting it right, too)

How to build a native desktop app vs. a web UI app

PyApp: Build click-to-run Python apps with Rust