Showing posts with label data structures. Show all posts
Showing posts with label data structures. Show all posts

Sunday, January 10, 2010

.NET Serialization Performance Comparison

After reading the blog post from James Newton-King on serialization speed of the the new release of Jason.Net, I decided to benchmark the different serializers I have in my Disk Bases Data Structures project. The serialization is done to a byte array. (The project contains a factory class which benchmarks your data type and returns the fastest one)

AltSerialize can be found at codeproject, and the .Net implementations of Google Protocol Buffers at Google Code.

For the first test I used the same class hierarchy as Jason.Net.

image

The serialization sizes were as follow:

BinaryFormatter 2937 bytes
AltSerialize 610 bytes
DataContractSerializer 1237 bytes
protobuf-net 245

The second test is done on a well defined struct located at the bottom of this posting.

image

The serialization sizes were as follow:

BinaryFormatter 303 bytes
DataContractSerializer 272 bytes
AltSerialize 150 bytes
Marshal.Copy 144
Unsafe pointers 144

As you can see the memory copying variants are a lot faster than the other serializers when it comes to structs laid out sequential in memory. AltSerialize is also fairly quick, as it uses Marshal.Copy as well. The big winner is the version using pointers to copy the data. It’s 10x to Marshal.Copy on serialization and 17x on deserialization. Compared to the DataContractSerializer we’re talking almost 100x on serializing and over 250x on deserializing.

But remember that these tests were done on 100,000 iterations. For all normal purposes they would all work just fine.

If speed is of importance to you combined with a lot of serializing happening, then you can gain speed by choosing the right serializer.

[DataContract]
[Serializable]
[StructLayout(LayoutKind.Sequential)]
public struct Coordinate
{
[DataMember(Order = 1)]
public float X;
[DataMember(Order = 2)]
public float Y;
[DataMember(Order = 3)]
public float Z;
[DataMember(Order = 4)]
[MarshalAs(UnmanagedType.Currency)]
public decimal Focus;
[DataMember(Order = 5)]
[MarshalAs(UnmanagedType.Struct)]
public Payload Payload;

}

[DataContract]
[Serializable]
[StructLayout(LayoutKind.Sequential,Size = 113)]
public struct Payload
{
[DataMember(Order = 1)]
public byte Version;
[DataMember(Order = 2)]
public byte Data;
}

Saturday, January 9, 2010

Disk based data structures – Release 2

The seconds release is now available for download at Codeplex.

Changelog

  • Dictionary<TKey,TValue> class now persists all data to disk, so you should not run out of memory on a 64bit system. Only available disk space matters.
  • Strings can now be used for key/values. Strings don’t have a default empty constructor so I’ve added code to make them work.
  • I’ve included protobuf-net (Google Protocol Buffers) as a serializer. It’s very fast and efficient on size, but requires decorating your classes either with DataContract/DataMember attributes or ProtoContract/ProtoMember attributes. Check out the Getting Started section on protobuf-net.
  • Improved locking throughout the code.

Thursday, November 12, 2009

Disk based data structures

codeplex-logo Last year I created a project where I used memory mapped files as storage for a large Array. I’ve now polished the project a bit and included generic List and Dictionary implementations as well. The project can be found at Disk Based Data Structures - CodePlex.

I’ve also created a serializer project which benchmarks and picks the fastest serializer method for your type. This serializer is used to persist the data to disk. The classes are also implemented thread safe.

Background for the project

A disk based version of an array would require a lot of caching logic to make it perform fast enough compared to a pure memory implementation and a couple of years ago I stumbled across Memory Mapped Files which has long existed in the operating systems and is typically used in OS’ for the swap space.

The first time I worked with Memory Mapped files I used a library from MetalWrench, but this time around I got hold of Winterdom's much nicer implementation of the Win32 API. I've included the patch from Steve Simpson, but removed the dynamic paging since it slows things down and it's not necessary on 64bit systems. (If you want to use arrays which hold over 2gb of data on 32bit systems I recommend reverting to Steve's original version and set a view size of 200-500mb.) Future releases will use .Net 4.0’s System.IO.MemoryMappedFiles namespace.

The beauty of 64bit is that you have virtually unlimited address space, so each thread can get it's own view of the mapped file without running out of address space. 32bit Windows can only address 4gb.

As for performance my theory is that Microsoft has implemented a fairly good caching algorithm for it's swap file, so it should prove good enough for me. A few tests show a much better disk IO with the Memory Mapped API than using .Net's file IO library. I haven't testet the performance if you add the SEC_LARGE_PAGES flag, but it might help some.

Hope this library is useful for someone out there :)