Wednesday, February 17, 2010

Blazing fast IPC in .Net 4: WCF vs. Signaling and Shared Memory

[Update 2011-02-02: Did a test against NamedPipeServerStream and NamedPipeClientStream which i mention in a comment at the end]

An MSDN article from 2007 compares the speed of WCF vs. .Net Remoting, and shows the speed increase WCF gives over remoting in an IPC scenario using named pipes as transport. With the introduction of the System.IO.MemoryMappedFiles namespace in .Net 4 and a blog post by Salva Patuel which outlines that almost all communication inside windows uses memory mapped files at it’s core, I had to try this myself with the new capabilities in the .Net 4 framework.

My approach implements messaging between two processes by writing the data to a memory mapped file (which essentially is shared memory), and signaling between the processes using an EventWaitHandle object. One process signals when the data is written, and the other when it has been processed.

While WCF converge towards the speed of .Net Remoting when the payload increases, the memory mapped approach with signaling is consistent with being 10x as fast as WCF for all three payload sizes. With a payload of 256k bytes I get a throughput of 4000 ops/sec on my 2.2ghz laptop with a 800mhz FSB.
The process receiving handling the reading can be incorporated like this:

var message = new byte[..];
var messageWait = new EventWaitHandle(false, EventResetMode.AutoReset, "wait");
var messageHandled = new EventWaitHandle(false, EventResetMode.AutoReset, "handled");
var mmf = MemoryMappedFile.CreateOrOpen("mmf", message.Length);
var viewStream = mmf.CreateViewStream();

while (!_quit)
if (_quit) break;               
viewStream.Position = 0;
viewStream.Read(message, 0, message.Length);
// handle the message
//dispose of objects

You would write similar code for the other process, changing the order of the wait handles.

So by using a fairly simple pattern and little code you can achieve blazing fast communication between two or more processes on the same machine.

Other obvious usages for memory mapped files are random access into large files due to the files being paged smartly. Particularly check out MemoryMappedFile.CreateViewAccessor() for an easy way to access value types individually or in an array.


  1. Having done some extensive testing with WCF myself, I would be curious to know how these tests were compiled.

    In my past testing, when built as DEBUG, WCF's general throughput was terrible, capped at around 1000 ops/sec, and degrading to only a few hundred ops/sec after 15-20 minutes.

    When built as RELEASE, WCF's general throughput for small payload size (say 128 to 256 bytes) over any protocol (http, tcp, pipe) was a sustained 60k when hosted on a Core i7 920 with several clients sending sustained requests (for pipe, there was obviously only one client, and throughput was approximately halved as both client and service were utilizing the same resources.)

  2. Redid my test and for 128 byte payload in Release mode I get 11500/s. I've included my test code in the comment. Feel free to point out any errors in it.

    Server code

    ServiceHost host = new ServiceHost(typeof(RemoteObject), new Uri("http://localhost:8001/Demo/"));
    ServiceMetadataBehavior smb = new ServiceMetadataBehavior();
    smb.HttpGetEnabled = true;
    smb.HttpGetUrl = new Uri("http://localhost:8001/Demo/Meta");
    WSHttpBinding ws = new WSHttpBinding(SecurityMode.None);
    Binding mex = MetadataExchangeBindings.CreateMexHttpBinding();
    host.AddServiceEndpoint("WCFLib.IRemoteObject", ws, "http://localhost:8001/Demo/DoStuff"); //one to actually do things
    host.AddServiceEndpoint(typeof(IMetadataExchange), mex, "http://localhost:8001/Demo/Meta"); //and one to provide metadata

    NetNamedPipeBinding pipe = new NetNamedPipeBinding(NetNamedPipeSecurityMode.None);
    pipe.MaxReceivedMessageSize = 2147483647;
    pipe.ReaderQuotas.MaxArrayLength = 2147483647;
    pipe.ReaderQuotas.MaxBytesPerRead = 2147483647;
    host.AddServiceEndpoint("WCFLib.IRemoteObject", pipe, "net.pipe://localhost/Demo/DoStuff");

    Console.WriteLine("Server Started");

    Client code
    WCFDemo.RemoteObjectClient client = new RemoteObjectClient("NetNamedPipeBinding_IRemoteObject");
    byte[] buffer = new byte[bufferLength];

    var stopWatch = Stopwatch.StartNew();
    for (int i = 0; i < 50000; i++)
    Console.WriteLine("Time used: {0}\tPer sec: {1} - {2}", stopWatch.Elapsed, (50000 * 1000) / stopWatch.ElapsedMilliseconds, bufferLength);


    public interface IRemoteObject
    [OperationContract(IsOneWay = false)]
    byte[] GetRBytes(int numBytes);

    [OperationContract(IsOneWay = false)]
    bool ReceiveRBytes(byte[] bytes);

    Client Binding app.config

  3. Hi
    Is there anything new on the speed of raw named pipes versus WCF for internal process communication?
    I need inter-process communication between a 32 bit legacy code and some new 64 bit process. I want very fast operation time.

  4. Jens, thanks for pointing out the use of raw Named Pipes directly.

    I hadn't really looked into the NamedPipeServer and NamedPipeClient objects before. I did a test right now and if I create a client which sends 256kb arrays sequntially, which are read on the server side, I only get half the speed compared to the MMF and signal approach. NamedPipes gives ~6000/s while MMF gives ~12000. For small arrays the speed is about the same.

    Named pipes also uses shared memory behind the scenes, but there might be more logic which accounts for the time difference.

  5. Hi,
    How do I implement this in a way that one process will push message to a message queue in the share memory and the 2nd process will read from it queue?
    All of this should be executed a synchronized.

  6. Well, the EventWaitHandle could work. You could have a pool of events which you signal and which the other proc waits for.

    When you say async, do you mean the first proc could arbitrarily add items which the second should pick up?

    Another possibility is to have a set number of slots for you queue (memory mapped file) where you set a bit when there is data there, and clear it once the data has been read. Then you poll each slot from both processes to see if you can read or write data to a slot.

    And, any reason why you can't use a regular message queue in your scenario.. dependent on huge amounts of operations in a short time?

  7. thanks.answers below:

    "When you say async..."
    - yes. I was thinking to implement a circular buffer in the share mem. what do you think?

    when you say "regular message queue", do you mean MSMQ?
    When I pass between data my threads I use a ConcurrentQueue.
    The application need to handle a massive amount of data in a very short time.

  8. Mikael,

    Great blog post, thanks! It would be nice, though, if you included a .zip file with the full sources of your test in it. It would save some time to readers who are interested in trying it out on their own.

  9. NoMoDo, I added the code with a link at the bottomm of the post. The code is presented as-is, and you need to change parameters in order to test different pack sizes etc.

  10. Hi Mikael,

    I am not able to download the zip. Could you please send it to me on ?

    Thanks in advance.

  11. You scenario is more of data streaming. Did you tried just stream you messages using binary encoding as chunks of data? Data can be started as size on following chunk, read from stream, start next chunk. The end will be zero size junk.
    See Also you can search about WCF chunking.

    1. Hi,
      Interesting.. what you propose is to open up a stream and then do custom checking to find the start/end of each message, but keep the stream open as long as there are messages? Is that correct?

      Any experience with how long a stream can be kept open, or if you have to handle re-opening of the stream etc.

      My scenario was client/server with very many small messages going, as well as an academic approach as to compare the different methods available if extreme speed is of importance (which it is when working with search engines for example).


    2. Hi,
      WCF is service platform were chatty communication is bad design (will be slow). If you need to send many small request you have 2 solutions. First create batch on client side and send. Second scenario is to use streaming with next layout: size of message 1 - message 1 - size of message 2 - message 2 - 0. Instead of 0 in the end of stream the can be done WCF callback notifying about its end. Streaming should work for size of 4 GB (mentioned in MSDN docs). Some error handling should be done if connection lost.

  12. I am not sure but did you considered how your byte arrays serialized by WCF? These possibly serialized by DataContractSerializer into XML, which can be replaced with other dummy serializer or Protobuf. In local IPC good serialization is detrimental for good performance.

  13. I just concern that people state that WCF is slow without clear understanding what is WCF and what can be done with it. Going only with default out of the box setup for average scenario which is definitely not matches yours.
    I can add to what I mentioned above the possibility to replace MessageEncodingBindingElement with simpler one when no requirements for iteropability exist.

    1. Hi,
      I really appreciate all your good comments and doing WCF streaming is absolutely a possibility. I will see if I can find the time to redo the benchmarks with WCF streaming as well.

      I don't remember the serialization I used with WCF, but using protobuf is certainly an option. My codeplex project at has a pretty good serialization library picking the most optimal one for your structure, where protobuf is one of the serializers.

  14. This comment has been removed by the author.

  15. I do not see sample code attached. Can you send to "no dot decaf dot cappuccino at Gmail dot com"

  16. I am curious what your limiting factors are. In privateeye we has C -> C# code doing about 1m small messages/second over TCP. MemoryMapped files should theoretically be able to beat this. It could perhaps be serialization overhead or the synchronous signalling?

    1. I don't think serialization is an issue as I'm sending byte arrays in this code. I haven't looked at this code in years, but would be interesting to test against your tcp scenario. And tcp should be slower for IPC indeed, as it adds an u needed layer. Do you have details on how you transfer and process messages between the apps? Multi-thread, single, msg size etc.