Mechanical Sympathy: Native C/C++ Like Performance For Java Object Serialisation

Thursday 5 July 2012

Native C/C++ Like Performance For Java Object Serialisation

Do you ever wish you could turn a Java object into a stream of bytes as fast as it can be done in a native language like C++? If you use standard Java Serialization you could be disappointed with the performance. Java Serialization was designed for a very different purpose than serialising objects as quickly and compactly as possible.

Why do we need fast and compact serialisation? Many of our systems are distributed and we need to communicate by passing state between processes efficiently. This state lives inside our objects. I've profiled many systems and often a large part of the cost is the serialisation of this state to-and-from byte buffers. I've seen a significant range of protocols and mechanisms used to achieve this. At one end of the spectrum are the easy to use but inefficient protocols likes Java Serialisation, XML and JSON. At the other end of this spectrum are the binary protocols that can be very fast and efficient but they require a deeper understanding and skill.

In this article I will illustrate the performance gains that are possible when using simple binary protocols and introduce a little known technique available in Java to achieve similar performance to what is possible with native languages like C or C++.

The three approaches to be compared are:

Java Serialization: The standard method in Java of having an object implement Serializable.
Binary via ByteBuffer: A simple protocol using the ByteBuffer API to write the fields of an object in binary format. This is our baseline for what is considered a good binary encoding approach.
Binary via Unsafe: Introduction to Unsafe and its collection of methods that allow direct memory manipulation. Here I will show how to get similar performance to C/C++.

The Code

import sun.misc.Unsafe;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.lang.reflect.Field;
import java.nio.ByteBuffer;
import java.util.Arrays;

public final class TestSerialisationPerf
{
    public static final int REPETITIONS = 1 * 1000 * 1000;

    private static ObjectToBeSerialised ITEM =
        new ObjectToBeSerialised(
            1010L, true, 777, 99,
            new double[]{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0},
            new long[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10});


    public static void main(final String[] arg) throws Exception
    {
        for (final PerformanceTestCase testCase : testCases)
        {
            for (int i = 0; i < 5; i++)
            {
                testCase.performTest();

                System.out.format("%d %s\twrite=%,dns read=%,dns total=%,dns\n",
                                  i,
                                  testCase.getName(),
                                  testCase.getWriteTimeNanos(),
                                  testCase.getReadTimeNanos(),
                                  testCase.getWriteTimeNanos() + 
                                  testCase.getReadTimeNanos());

                if (!ITEM.equals(testCase.getTestOutput()))
                {
                    throw new IllegalStateException("Objects do not match");
                }

                System.gc();
                Thread.sleep(3000);
            }
        }
    }

    private static final PerformanceTestCase[] testCases =
    {
        new PerformanceTestCase("Serialisation", REPETITIONS, ITEM)
        {
            ByteArrayOutputStream baos = new ByteArrayOutputStream();

            public void testWrite(ObjectToBeSerialised item) throws Exception
            {
                for (int i = 0; i < REPETITIONS; i++)
                {
                    baos.reset();

                    ObjectOutputStream oos = new ObjectOutputStream(baos);
                    oos.writeObject(item);
                    oos.close();
                }
            }

            public ObjectToBeSerialised testRead() throws Exception
            {
                ObjectToBeSerialised object = null;
                for (int i = 0; i < REPETITIONS; i++)
                {
                    ByteArrayInputStream bais = 
                        new ByteArrayInputStream(baos.toByteArray());
                    ObjectInputStream ois = new ObjectInputStream(bais);
                    object = (ObjectToBeSerialised)ois.readObject();
                }

                return object;
            }
        },

        new PerformanceTestCase("ByteBuffer", REPETITIONS, ITEM)
        {
            ByteBuffer byteBuffer = ByteBuffer.allocate(1024);

            public void testWrite(ObjectToBeSerialised item) throws Exception
            {
                for (int i = 0; i < REPETITIONS; i++)
                {
                    byteBuffer.clear();
                    item.write(byteBuffer);
                }
            }

            public ObjectToBeSerialised testRead() throws Exception
            {
                ObjectToBeSerialised object = null;
                for (int i = 0; i < REPETITIONS; i++)
                {
                    byteBuffer.flip();
                    object = ObjectToBeSerialised.read(byteBuffer);
                }

                return object;
            }
        },

        new PerformanceTestCase("UnsafeMemory", REPETITIONS, ITEM)
        {
            UnsafeMemory buffer = new UnsafeMemory(new byte[1024]);

            public void testWrite(ObjectToBeSerialised item) throws Exception
            {
                for (int i = 0; i < REPETITIONS; i++)
                {
                    buffer.reset();
                    item.write(buffer);
                }
            }

            public ObjectToBeSerialised testRead() throws Exception
            {
                ObjectToBeSerialised object = null;
                for (int i = 0; i < REPETITIONS; i++)
                {
                    buffer.reset();
                    object = ObjectToBeSerialised.read(buffer);
                }

                return object;
            }
        },
    };
}

abstract class PerformanceTestCase
{
    private final String name;
    private final int repetitions;
    private final ObjectToBeSerialised testInput;
    private ObjectToBeSerialised testOutput;
    private long writeTimeNanos;
    private long readTimeNanos;

    public PerformanceTestCase(final String name, final int repetitions,
                               final ObjectToBeSerialised testInput)
    {
        this.name = name;
        this.repetitions = repetitions;
        this.testInput = testInput;
    }

    public String getName()
    {
        return name;
    }

    public ObjectToBeSerialised getTestOutput()
    {
        return testOutput;
    }

    public long getWriteTimeNanos()
    {
        return writeTimeNanos;
    }

    public long getReadTimeNanos()
    {
        return readTimeNanos;
    }

    public void performTest() throws Exception
    {
        final long startWriteNanos = System.nanoTime();
        testWrite(testInput);
        writeTimeNanos = (System.nanoTime() - startWriteNanos) / repetitions;

        final long startReadNanos = System.nanoTime();
        testOutput = testRead();
        readTimeNanos = (System.nanoTime() - startReadNanos) / repetitions;
    }

    public abstract void testWrite(ObjectToBeSerialised item) throws Exception;
    public abstract ObjectToBeSerialised testRead() throws Exception;
}

class ObjectToBeSerialised implements Serializable
{
    private static final long serialVersionUID = 10275539472837495L;

    private final long sourceId;
    private final boolean special;
    private final int orderCode;
    private final int priority;
    private final double[] prices;
    private final long[] quantities;

    public ObjectToBeSerialised(final long sourceId, final boolean special,
                                final int orderCode, final int priority,
                                final double[] prices, final long[] quantities)
    {
        this.sourceId = sourceId;
        this.special = special;
        this.orderCode = orderCode;
        this.priority = priority;
        this.prices = prices;
        this.quantities = quantities;
    }

    public void write(final ByteBuffer byteBuffer)
    {
        byteBuffer.putLong(sourceId);
        byteBuffer.put((byte)(special ? 1 : 0));
        byteBuffer.putInt(orderCode);
        byteBuffer.putInt(priority);

        byteBuffer.putInt(prices.length);
        for (final double price : prices)
        {
            byteBuffer.putDouble(price);
        }

        byteBuffer.putInt(quantities.length);
        for (final long quantity : quantities)
        {
            byteBuffer.putLong(quantity);
        }
    }

    public static ObjectToBeSerialised read(final ByteBuffer byteBuffer)
    {
        final long sourceId = byteBuffer.getLong();
        final boolean special = 0 != byteBuffer.get();
        final int orderCode = byteBuffer.getInt();
        final int priority = byteBuffer.getInt();

        final int pricesSize = byteBuffer.getInt();
        final double[] prices = new double[pricesSize];
        for (int i = 0; i < pricesSize; i++)
        {
            prices[i] = byteBuffer.getDouble();
        }

        final int quantitiesSize = byteBuffer.getInt();
        final long[] quantities = new long[quantitiesSize];
        for (int i = 0; i < quantitiesSize; i++)
        {
            quantities[i] = byteBuffer.getLong();
        }

        return new ObjectToBeSerialised(sourceId, special, orderCode, 
                                        priority, prices, quantities);
    }

    public void write(final UnsafeMemory buffer)
    {
        buffer.putLong(sourceId);
        buffer.putBoolean(special);
        buffer.putInt(orderCode);
        buffer.putInt(priority);
        buffer.putDoubleArray(prices);
        buffer.putLongArray(quantities);
    }

    public static ObjectToBeSerialised read(final UnsafeMemory buffer)
    {
        final long sourceId = buffer.getLong();
        final boolean special = buffer.getBoolean();
        final int orderCode = buffer.getInt();
        final int priority = buffer.getInt();
        final double[] prices = buffer.getDoubleArray();
        final long[] quantities = buffer.getLongArray();

        return new ObjectToBeSerialised(sourceId, special, orderCode, 
                                        priority, prices, quantities);
    }

    public boolean equals(final Object o)
    {
        if (this == o)
        {
            return true;
        }
        if (o == null || getClass() != o.getClass())
        {
            return false;
        }

        final ObjectToBeSerialised that = (ObjectToBeSerialised)o;

        if (orderCode != that.orderCode)
        {
            return false;
        }
        if (priority != that.priority)
        {
            return false;
        }
        if (sourceId != that.sourceId)
        {
            return false;
        }
        if (special != that.special)
        {
            return false;
        }
        if (!Arrays.equals(prices, that.prices))
        {
            return false;
        }
        if (!Arrays.equals(quantities, that.quantities))
        {
            return false;
        }

        return true;
    }
}

class UnsafeMemory
{
    private static final Unsafe unsafe;
    static
    {
        try
        {
            Field field = Unsafe.class.getDeclaredField("theUnsafe");
            field.setAccessible(true);
            unsafe = (Unsafe)field.get(null);
        }
        catch (Exception e)
        {
            throw new RuntimeException(e);
        }
    }

    private static final long byteArrayOffset = unsafe.arrayBaseOffset(byte[].class);
    private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
    private static final long doubleArrayOffset = unsafe.arrayBaseOffset(double[].class);

    private static final int SIZE_OF_BOOLEAN = 1;
    private static final int SIZE_OF_INT = 4;
    private static final int SIZE_OF_LONG = 8;

    private int pos = 0;
    private final byte[] buffer;

    public UnsafeMemory(final byte[] buffer)
    {
        if (null == buffer)
        {
            throw new NullPointerException("buffer cannot be null");
        }

        this.buffer = buffer;
    }

    public void reset()
    {
        this.pos = 0;
    }

    public void putBoolean(final boolean value)
    {
        unsafe.putBoolean(buffer, byteArrayOffset + pos, value);
        pos += SIZE_OF_BOOLEAN;
    }

    public boolean getBoolean()
    {
        boolean value = unsafe.getBoolean(buffer, byteArrayOffset + pos);
        pos += SIZE_OF_BOOLEAN;

        return value;
    }

    public void putInt(final int value)
    {
        unsafe.putInt(buffer, byteArrayOffset + pos, value);
        pos += SIZE_OF_INT;
    }

    public int getInt()
    {
        int value = unsafe.getInt(buffer, byteArrayOffset + pos);
        pos += SIZE_OF_INT;

        return value;
    }

    public void putLong(final long value)
    {
        unsafe.putLong(buffer, byteArrayOffset + pos, value);
        pos += SIZE_OF_LONG;
    }

    public long getLong()
    {
        long value = unsafe.getLong(buffer, byteArrayOffset + pos);
        pos += SIZE_OF_LONG;

        return value;
    }

    public void putLongArray(final long[] values)
    {
        putInt(values.length);

        long bytesToCopy = values.length << 3;
        unsafe.copyMemory(values, longArrayOffset,
                          buffer, byteArrayOffset + pos,
                          bytesToCopy);
        pos += bytesToCopy;
    }

    public long[] getLongArray()
    {
        int arraySize = getInt();
        long[] values = new long[arraySize];

        long bytesToCopy = values.length << 3;
        unsafe.copyMemory(buffer, byteArrayOffset + pos,
                          values, longArrayOffset,
                          bytesToCopy);
        pos += bytesToCopy;

        return values;
    }

    public void putDoubleArray(final double[] values)
    {
        putInt(values.length);

        long bytesToCopy = values.length << 3;
        unsafe.copyMemory(values, doubleArrayOffset,
                          buffer, byteArrayOffset + pos,
                          bytesToCopy);
        pos += bytesToCopy;
    }

    public double[] getDoubleArray()
    {
        int arraySize = getInt();
        double[] values = new double[arraySize];

        long bytesToCopy = values.length << 3;
        unsafe.copyMemory(buffer, byteArrayOffset + pos,
                          values, doubleArrayOffset,
                          bytesToCopy);
        pos += bytesToCopy;

        return values;
    }
}

Results

2.8GHz Nehalem - Java 1.7.0_04
==============================
0 Serialisation  write=2,517ns read=11,570ns total=14,087ns
1 Serialisation  write=2,198ns read=11,122ns total=13,320ns
2 Serialisation  write=2,190ns read=11,011ns total=13,201ns
3 Serialisation  write=2,221ns read=10,972ns total=13,193ns
4 Serialisation  write=2,187ns read=10,817ns total=13,004ns
0 ByteBuffer     write=264ns   read=273ns    total=537ns
1 ByteBuffer     write=248ns   read=243ns    total=491ns
2 ByteBuffer     write=262ns   read=243ns    total=505ns
3 ByteBuffer     write=300ns   read=240ns    total=540ns
4 ByteBuffer     write=247ns   read=243ns    total=490ns
0 UnsafeMemory   write=99ns    read=84ns     total=183ns
1 UnsafeMemory   write=53ns    read=82ns     total=135ns
2 UnsafeMemory   write=63ns    read=66ns     total=129ns
3 UnsafeMemory   write=46ns    read=63ns     total=109ns
4 UnsafeMemory   write=48ns    read=58ns     total=106ns

2.4GHz Sandy Bridge - Java 1.7.0_04
===================================
0 Serialisation  write=1,940ns read=9,006ns total=10,946ns
1 Serialisation  write=1,674ns read=8,567ns total=10,241ns
2 Serialisation  write=1,666ns read=8,680ns total=10,346ns
3 Serialisation  write=1,666ns read=8,623ns total=10,289ns
4 Serialisation  write=1,715ns read=8,586ns total=10,301ns
0 ByteBuffer     write=199ns   read=198ns   total=397ns
1 ByteBuffer     write=176ns   read=178ns   total=354ns
2 ByteBuffer     write=174ns   read=174ns   total=348ns
3 ByteBuffer     write=172ns   read=183ns   total=355ns
4 ByteBuffer     write=174ns   read=180ns   total=354ns
0 UnsafeMemory   write=38ns    read=75ns    total=113ns
1 UnsafeMemory   write=26ns    read=52ns    total=78ns
2 UnsafeMemory   write=26ns    read=51ns    total=77ns
3 UnsafeMemory   write=25ns    read=51ns    total=76ns
4 UnsafeMemory   write=27ns    read=50ns    total=77ns

Analysis

To write and read back a single relatively small object on my fast 2.4 GHz Sandy Bridge laptop can take ~10,000ns using Java Serialization, whereas when using Unsafe this can come down to well less than 100ns even accounting for the test code itself. To put this in context, when using Java Serialization the costs are on par with a network hop! Now that would be very costly if your transport is a fast IPC mechanism on the same system.

There are numerous reasons why Java Serialisation is so costly. For example it writes out the fully qualified class and field names for each object plus version information. Also ObjectOutputStream keeps a collection of all written objects so they can be conflated when close() is called. Java Serialisation requires 340 bytes for this example object, yet we only require 185 bytes for the binary versions. Details for the Java Serialization format can be found here. If I had not used arrays for the majority of data, then the serialised object would have been significantly larger with Java Serialization because of the field names. In my experience text based protocols like XML and JSON can be even less efficient than Java Serialization. Also be aware that Java Serialization is the standard mechanism employed for RMI.

The real issue is the number of instructions to be executed. The Unsafe method wins by a significant margin because in Hotspot, and many other JVMs, the optimiser treats these operations as intrinsics and replaces the call with assembly instructions to perform the memory manipulation. For primitive types this results in a single x86 MOV instruction which can often happen in a single cycle. The details can be seen by having Hotspot output the optimised code as I described in a previous article.

Now it has to be said that "with great power comes great responsibility" and if you use Unsafe it is effectively the same as programming in C, and with that can come memory access violations when you get offsets wrong.

Adding Some Context

"What about the likes of Google Protocol Buffers?", I hear you cry out. These are very useful libraries and can often offer better performance and more flexibility than Java Serialisation. However they are not remotely close to the performance of using Unsafe like I have shown here. Protocol Buffers solve a different problem and provide nice self-describing messages which work well across languages. Please test with different protocols and serialisation techniques to compare results.

Also the astute among you will be asking, "What about Endianness (byte-ordering) of the integers written?" With Unsafe the bytes are written in native order. This is great for IPC and between systems of the same type. When systems use differing formats then conversion will be necessary.

How do we deal with multiple versions of a class or determining what class an object belongs to? I want to keep this article focused but let's say a simple integer to indicate the implementation class is all that is required for a header. This integer can be used to look up the appropriately implementation for the de-serialisation operation.

An argument I often hear against binary protocols, and for text protocols, is what about being human readable and debugging? There is an easy solution to this. Develop a tool for reading the binary format!

Conclusion

In conclusion it is possible to achieve the same native C/C++ like levels of performance in Java for serialising an object to-and-from a byte stream by effectively using the same techniques. The UnsafeMemory class, for which I've provided a skeleton implementation, could easily be expanded to encapsulate this behaviour and thus protect oneself from many of the potential issues when dealing with such a sharp tool.

Now for the burning question. Would it not be so much better if Java offered an alternative Marshallable interface to Serializable by offering natively what I've effectively done with Unsafe???

42 comments:

Antony Stubbs5 July 2012 at 19:09
Does AKKA use the unsafe technique?
ReplyDelete
Replies
iotsakp5 July 2012 at 20:33
You can get much better performance (~3x) out of the ByteBuffer implementation with a couple simple changes:

- Use native ordered direct ByteBuffers. (use ByteBuffer.allocateDirect(1024).order(ByteOrder.nativeOrder()))

- Change the array loops to Unsafe's copyMemory equivalent, that is:

replace this:

for ( final double price : prices ) {
byteBuffer.putDouble(price);
}

with this:

byteBuffer.asDoubleBuffer().put(prices);
byteBuffer.position(byteBuffer.position() + prices.length * 8);

Tested on 1.7.0_06, Sandy Bridge 3.1GHz.
ReplyDelete
Replies
mebigfatguy5 July 2012 at 21:56
It would be interesting for completeness sake to time the use of standard java serialization but with an overridded readObject and writeObject implementation that just wrote the field values.
ReplyDelete
Replies
Ariel Weisberg5 July 2012 at 23:10
Thanks for posting these measurements. Very helpful to know what the tradeoff is. I will probably continue to use ByteBuffer's for the bounds checking. So much parallelism is available that I don't see it as the right tradeoff for serialization to disk or network.

I think the right way to view these results are in terms of the # of megabytes of serialization you can do per core per second. My napkin math says 974 megabytes per core. By way of comparison when I tested Snappy I got the advertised 250 megabytes/second and LZ4 compression claims to do 300 megabytes/sec.

As you point out these numbers matter if you are doing IPC on a local system, and if you have 10-gig E there is some fat to trim, but you can scale your way out of this particular bottleneck. I'll bet there is more fat to trim thinking about the cache misses being incurred by local IPC and associated coordination then the serialization scheme itself. That too is on the order of 100s of nanoseconds.
ReplyDelete
Replies
Unknown5 July 2012 at 23:50
It would be nice if you could collaborate with team behind [https://github.com/eishay/jvm-serializers] (there is a mailing list). Much easier to get decent perspective; especially since while sometimes (de)serialization is a significant cost, quite often it really is not (which was implied in the article too).
ReplyDelete
Replies
Michael Barker6 July 2012 at 08:58
If you want a specific byte ordering for the Unsafe, you can use Long/Integer.reverseBytes(). On Intel these are both implemented as intrinsics (BSWAP) and are quite efficient. You can statically evaluate whether your need to reverse the bytes for your platform by writing out a long in the desired byte order and reading it in again. Store this in the final static field and the value can be read via Unsafe and reversed if required.

long longFromMemory = unsafe.readLong(buffer, offset);
return SHOULD_REVERSE ? Long.reverseBytes(longFromMemory) : longFromMemory;

Because SHOULD_REVERSE is a constant, Hotspot will remove the alternative unused branch, if not the CPU will probably correctly branch predict it.

We're currently using this technique and it adds about 20% overhead and is still faster than ByteBuffer.
ReplyDelete
Replies
Mike Vorontsov8 July 2012 at 09:10
Hi Martin,
I've recently written a similar article in my blog and updated it after reading this article. I've made a full set of ByteBuffer performance tests: heap/direct bufers; little/big endian; working on separate elements of array or using bulk methods; Java 6 b30/Java 7 b2. I've tested byte[] and long[] processing times separately, so I ended up with 64 time measurements for ByteBuffers and 16 - for Unsafe.
As it tturned out, direct byte buffers with native byte order used to serialize arrays of any primitive type are nearly indistinguishable by performance from Unsafe.
You can read my article here: http://vorontsovm.blogspot.com.au/2012/07/javaniobytebuffer-javaiodatainputstream.html
ReplyDelete
Replies
Benoit8 July 2012 at 19:55
Hi Folks,

Very nice article on a topic that is frequentky overlooked I think. On second thaught however I wonder whether comparing JSE serialization to a custom protocol is of any value. Are we comparing apples to apples here?
JSE serialisation only requires you to implement the correct interface. Your object can but does not need custom IO code at all. This is a pretty big contrast to the ByteBuffer and UnsafeMemory implementations.
Secondly, JSE serialization has additional logic to serialize object references to (re)construct object graphs etc. I see no equivalence of that feature in the other 2 testcases.
I have added a test case for the Hessian library which have results in the line of those for JSE serialization.

Curious to here your thaughts on this...

Regards,
Benoit
ReplyDelete
Replies
BXCellent10 July 2012 at 05:34
Someone let me know when they've modified Kryo to use Unsafe in the com.esotericsoftware.kryo.io Output and Input classes. I'd love to see the results then. I'd do it myself, but I'm sure someone else has already thought of this minor mod.
ReplyDelete
Replies
Emanuele Ziglioli26 July 2012 at 23:43
Very good article. I'm a fan of Kryo's.
But I've also ported to Java "Python Construct", an excellent library for parsing and building custom binary protocols.
Speed hasn't been a requirement so far, but if someone wants to develop the low level further and use "unsafe" rather than ByteBuffer, that'd be a useful contribution:

https://github.com/ZiglioNZ/construct

http://techno.emanueleziglioli.it/2012/07/java-construct-112-release.html

https://github.com/construct/construct

http://construct.readthedocs.org/en/latest/index.html
ReplyDelete
Replies
Chris Engelbert (noctarius)27 July 2012 at 17:11
It's interesting to see that I'm not the only person featuring the sun.misc.Unsafe idea. I started some Github project (Lightning) a few months ago to implement a serialization framework that uses as much Unsafe constructs as possible but can failover to reflection.
Another thing it does is generating marshaller / unmarshaller bytecode at runtime to run at native speed the moment HotSpot jumps in.
The last performance feature is the missing constructor invocation. This means ONLY value objects can be transfered with this serializer but for most cases this is enough.

Another reason for the framework was a fast-failing principle means it is used for clusters and all clusternodes need to have the same codebase. So the masternode builds a classdefinition container holding all information of registered / configured classes and attributes.

The current implementation only features JGroups integration so when a new node connects the clustermaster transfers it's own classdefinition container to the node and the new cluster member tests it's own classes against the definitions. If one class fails the new node is automatically disconnected from the cluster.

The whole project started as a prototyping to find out if my expectations about the speed improvement would be correct but this implementation was good enough to make it a real project.

Here are some results (for interested people this is the source is linked below):
Lightning Serializer build time: 2795 ms
Lightning Serialization Avg: 899,13 ns, runs: 800000, size: 42 bytes
Lightning Deserialization Avg: 918,84 ns, runs: 800000, size: 40 bytes
Java Serialization Avg: 3939,51 ns, runs: 800000, size: 375 bytes
Java Deserialization Avg: 19581,37 ns, runs: 800000, size: 375 bytes

The building time is a one time operation when generating the classdefinition container and generating the bytecode marshallers. This is normally done at startup time of the clusternode.
The possible differences in byte size depends on if a field was randomized to null or filled with a value.

For interested people I would like to see others joining the project or offering ideas on what to add or how to improve the framework. I'm open for questions and optinions as well. You can contact me on m e [at] n o c t a r i u s [DOT] c o m or Google+. I hope to see some reaction of the blog's owner.

Cheers,
Noctarius

https://github.com/noctarius/Lightning
https://github.com/noctarius/Lightning/raw/master/lightning-core/src/test/java/com/github/lightning/Benchmark.java
ReplyDelete
Replies
Sabeer24 October 2012 at 00:47
Very timely article for software I am developing write now. However, we are trying to implement memory-mapped buffers to allow paging in order to make our application work on devices with limited physical memory. Is there (a reasonable) way I could leverage the use of Unsafe operations as an alternative to using a MappedByteBuffer?
ReplyDelete
Replies
Unknown21 January 2013 at 21:10
Guys, I just came across this article which follows up on this one with some brief benchmarks:
http://java.dzone.com/articles/fast-java-file-serialization

One surprising claim made in that article is that standard Java serialization to a File can be sped up by ~4X if you use a FileOutputStream(RandomAccessFile.getFD()) instead of the usual BufferedOutputStream(FileOutputStream).

What? That claim was news to me! Anyone one else ever seen a performance claim like that before?

It cannot be because RandomAccessFile natively implements DataOutput, since a FileOutputStream wraps and hides that API...

I have found that benchmarking file I/O is really nasty due to disk caching. You gotta be super careful between benchmarks to do stuff to ruin the cache before doing a new benchmark. Perhaps these results are just not trustworthy?
ReplyDelete
Replies
Rüdiger Möller18 June 2013 at 18:51
I've implemented a serialization that uses unsafe optionally to speed things up. Additionally it is possible to inject hints using annotations into the serialization algorithm.
In a real project, doing handcrafted serialization of complex structures is costly in terms of mandays and resulting bugs.
Theoretically it should be possible to achieve similar or (in most cases better) performance by using a mix of efficient generic serialization implementation+Annotation hints.

see http://code.google.com/p/fast-serialization/
ReplyDelete
Replies
Robert DiFalco7 December 2013 at 23:49
Not really an apples to apples comparison. For example the byte buffer approach does not serialize the class type itself, IOW it is not polymorphic, it also uses a fixed size which often times is not possible, especially if the object is bigger than the allocated size.

For an apples to apples comparison you should use externalize but only invoke the writeExternal and readExternal methods not ever call #writeObject on the stream with the object to be externalized. Make sense? And of course if you really want to be fair you will either reuse the byte array for the ObjectOutputStream/InputStream or just allocate a new ByteBuffer in the loop as this is how most code would work.
ReplyDelete
Replies
Unknown5 March 2014 at 12:28
Hello Martin,

Thank you for the excellent article. Extending your idea I have implemented a general purpose serialization using the Unsafe class, and extended on it to implement an off-heap cache.

I have tested my solution using your performance test framework and found it to be as fast as (and even slightly better) than Kryo.

However, before talking real big, I would really like to have it tested through some other robust test suite as well.

Can you pls provide any pointer so as to how should I go about in testing it?

Thanks,
Sutanu
ReplyDelete
Replies
kstepien16 May 2016 at 10:54
Hi Martin,

I've followed @iotsakp's direct buffer suggestion and decided to post the results for reference.
Platform used: Windows 10 64bit, Haswell i5-4460 3.2 GHz, 16GB RAM, JDK 1.8.0_91 x64, default JVM settings.

Direct ByteBuffer actually comes quite close to Unsafe.

0 Serialisation write=1,663ns read=6,784ns total=8,447ns
1 Serialisation write=1,412ns read=6,425ns total=7,837ns
2 Serialisation write=1,404ns read=6,388ns total=7,792ns
3 Serialisation write=1,391ns read=6,427ns total=7,818ns
4 Serialisation write=1,395ns read=6,394ns total=7,789ns

0 ByteBuffer write=118ns read=165ns total=283ns
1 ByteBuffer write=97ns read=141ns total=238ns
2 ByteBuffer write=98ns read=143ns total=241ns
3 ByteBuffer write=96ns read=142ns total=238ns
4 ByteBuffer write=96ns read=142ns total=238ns

0 DirectByteBuffer write=49ns read=78ns total=127ns
1 DirectByteBuffer write=33ns read=62ns total=95ns
2 DirectByteBuffer write=29ns read=62ns total=91ns
3 DirectByteBuffer write=28ns read=62ns total=90ns
4 DirectByteBuffer write=28ns read=62ns total=90ns

0 UnsafeMemory write=27ns read=52ns total=79ns
1 UnsafeMemory write=22ns read=51ns total=73ns
2 UnsafeMemory write=24ns read=55ns total=79ns
3 UnsafeMemory write=21ns read=49ns total=70ns
4 UnsafeMemory write=20ns read=48ns total=68ns
ReplyDelete
Replies

Add comment