Sunday, January 24, 2010

Code improvement tools - NDepend

VisualNDependView After I released the first version of Disk Based Data Structures, a library for persisting collections on disk (Dictionary<>, List<>), I had the opportunity to try NDepend. As I worked towards a bugfixing/relfactoring release, I used NDepend on the code to clue me in to where I should focus my efforts.

If you haven’t used NDepend, it’s basically a tool which gives you a lot of metrics on your code base along with visual representations.

AbstractnessVSInstability Code metrics can be scary at first, but once you understand what they tell you, they really help. For instance in the “Abstractness vs. Instability” chart my assemblies are plotted on the green area down right. This is due to the fact that I don’t expose many interfaces at all. The library is a concrete implementation of already established interfaces. So it’s ok to be down in the right corner.

I’ve also changed the signature from public to internal/private for many classes which were only used internally, in order to provide a cleaner public interface. All provided by a metric:

WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE CouldBePrivate

My Serializer assembly have a very low relational cohesion, which actually went down from 1.28 to 1.25. And it’s expected since I added one more serializer to the project, and none of the serializers have anything to do with each other. But it makes more sense to bundle them up together than to have multiple assemblies.

ComponentDependenciesDiagram The most useful metrics for my refactor release was to identify long methods and the ones with circular complexity. This allowed me to break them up into more understandable code pieces. I try to write short code, but sometimes you forget. By using a metric tool it’s easy to find those pains and bring them up in the open, especially on old code. We all know our old code is worse than what we write today ;)



Another interesting fact is that while my code grew 40% in line size, my comments coverage increased with 2%. So during the refactoring process I wrote more comments. Not only did I break up the code, I documented it as well.

NDepend has it’s own query language so you can easily create your own code insights, or you can modify the existing ones.

All in all, I’m glad I stumbled over NDepend, and it’s become as natural an addition as R#, and I will most likely include it in the automatic build process in future projects.

Sunday, January 10, 2010

.NET Serialization Performance Comparison

After reading the blog post from James Newton-King on serialization speed of the the new release of Jason.Net, I decided to benchmark the different serializers I have in my Disk Bases Data Structures project. The serialization is done to a byte array. (The project contains a factory class which benchmarks your data type and returns the fastest one)

AltSerialize can be found at codeproject, and the .Net implementations of Google Protocol Buffers at Google Code.

For the first test I used the same class hierarchy as Jason.Net.

image

The serialization sizes were as follow:

BinaryFormatter 2937 bytes
AltSerialize 610 bytes
DataContractSerializer 1237 bytes
protobuf-net 245

The second test is done on a well defined struct located at the bottom of this posting.

image

The serialization sizes were as follow:

BinaryFormatter 303 bytes
DataContractSerializer 272 bytes
AltSerialize 150 bytes
Marshal.Copy 144
Unsafe pointers 144

As you can see the memory copying variants are a lot faster than the other serializers when it comes to structs laid out sequential in memory. AltSerialize is also fairly quick, as it uses Marshal.Copy as well. The big winner is the version using pointers to copy the data. It’s 10x to Marshal.Copy on serialization and 17x on deserialization. Compared to the DataContractSerializer we’re talking almost 100x on serializing and over 250x on deserializing.

But remember that these tests were done on 100,000 iterations. For all normal purposes they would all work just fine.

If speed is of importance to you combined with a lot of serializing happening, then you can gain speed by choosing the right serializer.

[DataContract]
[Serializable]
[StructLayout(LayoutKind.Sequential)]
public struct Coordinate
{
[DataMember(Order = 1)]
public float X;
[DataMember(Order = 2)]
public float Y;
[DataMember(Order = 3)]
public float Z;
[DataMember(Order = 4)]
[MarshalAs(UnmanagedType.Currency)]
public decimal Focus;
[DataMember(Order = 5)]
[MarshalAs(UnmanagedType.Struct)]
public Payload Payload;

}

[DataContract]
[Serializable]
[StructLayout(LayoutKind.Sequential,Size = 113)]
public struct Payload
{
[DataMember(Order = 1)]
public byte Version;
[DataMember(Order = 2)]
public byte Data;
}

Saturday, January 9, 2010

Disk based data structures – Release 2

The seconds release is now available for download at Codeplex.

Changelog

  • Dictionary<TKey,TValue> class now persists all data to disk, so you should not run out of memory on a 64bit system. Only available disk space matters.
  • Strings can now be used for key/values. Strings don’t have a default empty constructor so I’ve added code to make them work.
  • I’ve included protobuf-net (Google Protocol Buffers) as a serializer. It’s very fast and efficient on size, but requires decorating your classes either with DataContract/DataMember attributes or ProtoContract/ProtoMember attributes. Check out the Getting Started section on protobuf-net.
  • Improved locking throughout the code.