Tech and me: c#

Showing posts with label c#. Show all posts

Thursday, February 15, 2018

Programmatic nuance in .Net and PowerShell which can wreck havoc

Photo by David Kovalenko at Unsplash

Took me an hour or so to figure out why something I had ported from C# to PowerShell was not working, and the culprit is how casting of decimals to bytes work.

In .Net: (byte)4.6 = 4, meaning it truncates

In PowerShell [byte]4.6 = 5, meaning it rounds up

The solution is to use [Math]::Floor instead in PowerShell, which of course works fine in .Net as well.

Tuesday, August 18, 2015

Changing JSON properties in an XML file–or edit DataProviderJSON in a search web part using C#

I have previously posted how to use PowerShell to change properties for a search result web part (Make sure your people search is fuzzified)

My colleague Tarjei (@tarjeieo) is working on some templating for a provisioning engine using CSOM and he needed to change some values in a web part file before loading it into a page. This involves both XML parsing with namespaces and making sure you don’t cripple the JSON object you are changing.

Here’s the code I ended up with to modify the SourceName property for a search result web part file, ignoring the namespaces in the xml file.

XElement doc = XElement.Load(@"D:\Temp\test.webpart");
var element = doc.XPathSelectElement(".//*[local-name() = 'property' and @name='DataProviderJSON']");
dynamic dp = JObject.Parse(element.Value);
dp.SourceName = "lala";
element.Value = JObject.FromObject(dp).ToString();
doc.Save(@"D:\Temp\test.webpart2");

Monday, February 20, 2012

Finding the week number from a date–ISO 8601

I previously wrote about how to find the first day or date given a week number and how to get that working correctly.

Going the other way is in theory easier as you can use functions from .Net itself.

public static int GetWeekNumber(this DateTime date)
{
    var currentCulture = CultureInfo.CurrentCulture;
    return currentCulture.Calendar.GetWeekOfYear(date, CalendarWeekRule.FirstFourDayWeek, DayOfWeek.Monday);
}

Except there is a bug in .Net which will return the wrong week for some boundary dates around the end of December and the beginning of January.

The fix is equal to that of my previous post, use the Thursday of the week of your date and it will work.

public static int GetWeekNumber(this DateTime date)
{
    int daysToAdd = date.DayOfWeek != DayOfWeek.Sunday ? DayOfWeek.Thursday - date.DayOfWeek : (int)DayOfWeek.Thursday - 7;
    date = date.AddDays(daysToAdd);
    var currentCulture = CultureInfo.CurrentCulture;
    return currentCulture.Calendar.GetWeekOfYear(date, CalendarWeekRule.FirstFourDayWeek, DayOfWeek.Monday);
}

Monday, January 30, 2012

Finding the first day of a week–ISO 8601

[See http://techmikael.blogspot.com/2012/02/finding-week-number-from-dateiso-8601.html on how to get the correct week number from a date]

I’m working on an application where we work with week numbers and of course there was an error in getting everything to work correct with week numbers. For a specific week I want to get the Monday of that week. With the ISO 8601 standard a week starts on a Monday and ends on a Sunday, and I’m dealing with a calendar which states that week 1 = the first week with 4 full days.

Our current code was not working, and checking on StackOverflow I found a couple of samples in order to fix this:
http://stackoverflow.com/questions/3854429/in-net-knowing-the-week-number-how-can-i-get-the-weekdays-date
http://stackoverflow.com/questions/5377851/get-date-range-by-week-number-c-sharp
http://stackoverflow.com/questions/662379/calculate-date-from-week-number

The flaw in these solutions is that they all base the week calculation on the first Monday of the year. Reading up on ISO 8601 at Wikipedia the solution is simple:

The week number can be described by counting the Thursdays: week 12 contains the 12th Thursday of the year.

Instead of basing the code on a Monday, use Thursday and all boundary conditions with week 1, 52 and 53 are resolved. Then subtract 3 days at the end to get the Monday.

The following code works for all boundary conditions which the previous code samples fail at:

public static DateTime FirstDateOfWeek(int year, int weekOfYear)
{
    DateTime jan1 = new DateTime(year, 1, 1);
    int daysOffset = DayOfWeek.Thursday - jan1.DayOfWeek;

    DateTime firstThursday = jan1.AddDays(daysOffset);
    var cal = CultureInfo.CurrentCulture.Calendar;
    int firstWeek = cal.GetWeekOfYear(firstThursday, CalendarWeekRule.FirstFourDayWeek, DayOfWeek.Monday);

    var weekNum = weekOfYear;
    if (firstWeek <= 1)
    {
        weekNum -= 1;
    }
    var result = firstThursday.AddDays(weekNum * 7);
    return result.AddDays(-3);
}

Tuesday, September 6, 2011

How would you optimize this piece of C# code?

I’ve been digging into some code lately when getting into an existing project, and came across these two lines:

if (Math.Max(thisString.Length, otherString.Length) > Math.Pow(2, 31))
throw new ArgumentException("String too long");

Looking at the code I see right away that this is ported from some other language over to C#. It’s quite easy to optimize it…. how would you do it?

Friday, November 12, 2010

Creating Zip files with System.IO.Packaging namespace

Originally created to support working with Open Office Xml documents, it’s possible to use this namespace to create zip files as well.
The only drawbacks I have found is that you end up with an additional xml file at the root of your zip file called [Content_Types].xml which lists the mapping of file extension to mime type, and you cannot have spaces or non-ascii characters in your filenames.
If you can live with this, there is no need to rely on an external library.

To Tuple or Not To Tuple

Yesterday I answered a question on StackOverflow about “What is the most inuitive way to ask for objects in pairs?” The question was really about using KeyValuePair<,>, but a lot of the answers suggested to use Tuple<> instead which is a new construct in .Net 4.

From the MSDN documentation we read:

“A tuple is a data structure that has a specific number and sequence of values. The Tuple<T1, T2> class represents a 2-tuple, or pair, which is a tuple that has two components. A 2-tuple is similar to a KeyValuePair<TKey, TValue> structure.”

As long as you only have two elements, it doesn’t really matter if you use Tuple<,> or KeyValuePair<,>. But keep in mind that a Tuple is a class while KeyValuePair is a struct. So for certain scenarios one would be preferable over the other. ( A Tuple can have 8 direct elements, or it can nest it self to create a n-tuple.)

So over to the real question, should you use it, or when should you use it? In my opinion it boils down to expressing the intent of your code.

Consider the two following lines of code:

List<Tuple<int, int>> list = new List<Tuple<int, int>>();
List<Point> list = new List<Point>();

They can both be used for holding an x/y coordinate, but the Point struct clearly is more expressive, letting the reader know we’re talking about coordinates.

This does not mean we should not use the Tuple class. The following is also expressive and shows intent

Tuple<Man, Woman> couple = new Tuple<Man, Woman>(m, w);

This leads me to the conclusion that it’s ok to use Tuple<> as long as the types in the Tuple are expressive, meaning they are not base types. A base type says nothing about what it holds. An int can hold a coordinate, an age or any other number of things, but if you wrap your coupled data in a pairing class you can express what you are working with.

Would you use

var bmiList = new List<Tuple<double, double>>();
var bmi = new Tuple<double, double>(180,75);
bmiList.Add(bmi);

class BMI
{
public double Height;
public double Weight;
}

var bmiList = new List<BMI>();
BMI bmi = new BMI {Height = 180, Weight = 75};
bmiList.Add(bmi);

A good use for Tuple’s is for methods that need to return multiple values, and the values are only used locally/once in the return. They are not passed around.

Here’s an example where a Tuple could be preferable to multiple out parameters.

public Tuple<bool, Stream, long> GetStreamAndSpaceAvail(string path)
{
if (File.Exists(path))
  return new Tuple<bool, Stream, long>(true, File.OpenRead(path), new DriveInfo("c:").AvailableFreeSpace);
return new Tuple<bool, Stream, long>(false, null, 0);
}

public void usage()
{
Tuple<bool, Stream,long> result = GetStreamAndSpaceAvail("somepath");
if (result.Item1 && result.Item3 > 1000)
{
  result.Item2.Write(...);
}
}

compared to

public bool GetStreamAndSpaceAvail(string path, out Stream stream, out long freeSpace)
{
freeSpace = new DriveInfo("c:").AvailableFreeSpace;
if (File.Exists(path))
{
  stream = File.OpenRead(path);
  return true;
}
stream = null;
return false;
}

public void usage()
{
Stream s;
long freeSpace;
if(GetStreamAndSpaceAvail("somepath", out s, out freeSpace) && freeSpace > 1000)
{
  s.Write(...);
}
}

I’d love to hear others opinions on this as well.

Monday, June 28, 2010

Get along with WCF 4 and jQuery Ajax

Initially I thought this was going to be a breeze, but as I experienced it was closer to rough sea. But as any experienced sea creature knows, rough sea is just like a breeze.

web.config

<system.serviceModel>
  <services>
    <service name="StbSetupGUI.HtmlParser">
      <endpoint address="" behaviorConfiguration="StbSetupGUI.AjaxAspNetAjaxBehavior" binding="webHttpBinding" contract="StbSetupGUI.HtmlParser" />
    </service>
  </services>
  <behaviors>
    <endpointBehaviors>
      <behavior name="StbSetupGUI.AjaxAspNetAjaxBehavior">
        <enableWebScript />
      </behavior>
    </endpointBehaviors>
    <serviceBehaviors>
      <behavior name="">
        <serviceMetadata httpGetEnabled="true" />
        <serviceDebug includeExceptionDetailInFaults="true" />
      </behavior>
    </serviceBehaviors>
  </behaviors>
  <serviceHostingEnvironment aspNetCompatibilityEnabled="true"
    multipleSiteBindingsEnabled="true" />
</system.serviceModel>

The most important part is the <enableWebScript /> attribute. (This config also has exception detailts turn on). This is added for you when you add an AJAX-enabled WCF Service to your project, so no snag there.

Service class

The service class will be automatically decorated with the AspNetCompatibilityRequirements attribute if you chose to add an AJAX-enabled WCF Service. Still crusing ahead.

[ServiceContract]
[AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)]
public class HtmlParser{...}

WCF Methods

I wanted to use POST for my calls and decorated the method with the WebInvoke attribute. I initally used WebGet which worked fine, but WebInvoke got the waves crushing over my head.

[OperationContract]
[WebInvoke(RequestFormat = WebMessageFormat.Json)]
public string GetText(string cssPath, string url){...}

jQuery call

This is where I had most trouble getting it right, due to my stubbornness of using WebInvoke.

First I had issues getting the json format correct, so I decided to use the json2 library to encode my parameters correct with double and single quotes.

Next, the content type made me slightly sea sick. Most examples specify this as “application/json; charset=utf-8”. This just gave me error upon error. In the end I removed the charset part and it all played ball. Who cares about utf-8 anyway, right?

var path = "H1";
var url = "http://something";
$.ajax({
  type: "POST",
  url: "HtmlParser.svc/GetText",
  contentType: "application/json",
  data: JSON.stringify({ cssPath: path, url: url }),
  dataType: "json",
  success: AjaxSuccess,
  error: AjaxFailed
});

And if you want readable exceptions, parse the result.responseText to a json object in order to get at the details of the error message being returned from WCF. The WCF details are residing in a property called ExceptionDetail. So the key properties to remember is responseText and ExceptionDetail.

function AjaxFailed(result) {
  var res = JSON.parse(result.responseText);
  if (res.ExceptionDetail) {
      alert(res.Message);
      return;
  }
};

Fairly easy, but alot of small things can go wrong. Took me a couple of days of trial and error (and a lot of Fiddling) to get it all 100% working. It was a breeze :D

Friday, February 26, 2010

Directory Search with multiple filters in .Net

Directory.GetFiles in .Net 3.5 and Directory.EnumerateFiles in .Net 4.0 neither supports multiple patterns when searching for files. The reason is that it uses FindFirstFile / FindNextFile of kernel32.dll which lacks the support.

Initial thought would be to create an extension method to the Directory class, but since it’s a static class that’s not possible. The second best choice is to create a short helper class instead. What we do is do a wildcard search with “*” and filter the results with a regular expression.

If you return a large result set the new Enumerable version in .Net 4.0 is preferable as it returns values to act on as you go along.

public static class MyDirectory
{
    // Works in .Net 3.5 - you might want to create several overloads
    public static string[] GetFiles(string path, string searchPatternExpression, SearchOption searchOption)
    {
        if (searchPatternExpression == null) searchPatternExpression = string.Empty;
        Regex reSearchPattern = new Regex(searchPatternExpression);
        return Directory.GetFiles(path, "*", searchOption).Where(file => reSearchPattern.IsMatch(Path.GetFileName(file))).ToArray();
    }

    // Works in .Net 4.0 - inferred overloads with default values
    public static IEnumerable<string> GetFiles(string path, string searchPatternExpression = "", SearchOption searchOption = SearchOption.TopDirectoryOnly)
    {
        Regex reSearchPattern = new Regex(searchPatternExpression);
        return Directory.EnumerateFiles(path, "*", searchOption).Where(file => reSearchPattern.IsMatch(Path.GetFileName(file)));
    }

    // Works in .Net 4.0 - takes same patterns as old method, and executes in parallel
    public static IEnumerable<string> GetFiles(string path, string[] searchPatterns, SearchOption searchOption = SearchOption.TopDirectoryOnly)
    {
        return searchPatterns.AsParallel().SelectMany(searchPattern => Directory.EnumerateFiles(path, searchPattern, searchOption));
    }
}

Wednesday, February 17, 2010

Blazing fast IPC in .Net 4: WCF vs. Signaling and Shared Memory

[Update 2011-02-02: Did a test against NamedPipeServerStream and NamedPipeClientStream which i mention in a comment at the end]

An MSDN article from 2007 compares the speed of WCF vs. .Net Remoting, and shows the speed increase WCF gives over remoting in an IPC scenario using named pipes as transport. With the introduction of the System.IO.MemoryMappedFiles namespace in .Net 4 and a blog post by Salva Patuel which outlines that almost all communication inside windows uses memory mapped files at it’s core, I had to try this myself with the new capabilities in the .Net 4 framework.

Code improvement tools - NDepend

After I released the first version of Disk Based Data Structures, a library for persisting collections on disk (Dictionary<>, List<>), I had the opportunity to try NDepend. As I worked towards a bugfixing/relfactoring release, I used NDepend on the code to clue me in to where I should focus my efforts.

If you haven’t used NDepend, it’s basically a tool which gives you a lot of metrics on your code base along with visual representations.

AbstractnessVSInstability Code metrics can be scary at first, but once you understand what they tell you, they really help. For instance in the “Abstractness vs. Instability” chart my assemblies are plotted on the green area down right. This is due to the fact that I don’t expose many interfaces at all. The library is a concrete implementation of already established interfaces. So it’s ok to be down in the right corner.

I’ve also changed the signature from public to internal/private for many classes which were only used internally, in order to provide a cleaner public interface. All provided by a metric:

WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE CouldBePrivate

My Serializer assembly have a very low relational cohesion, which actually went down from 1.28 to 1.25. And it’s expected since I added one more serializer to the project, and none of the serializers have anything to do with each other. But it makes more sense to bundle them up together than to have multiple assemblies.

The most useful metrics for my refactor release was to identify long methods and the ones with circular complexity. This allowed me to break them up into more understandable code pieces. I try to write short code, but sometimes you forget. By using a metric tool it’s easy to find those pains and bring them up in the open, especially on old code. We all know our old code is worse than what we write today ;)

Another interesting fact is that while my code grew 40% in line size, my comments coverage increased with 2%. So during the refactoring process I wrote more comments. Not only did I break up the code, I documented it as well.

NDepend has it’s own query language so you can easily create your own code insights, or you can modify the existing ones.

All in all, I’m glad I stumbled over NDepend, and it’s become as natural an addition as R#, and I will most likely include it in the automatic build process in future projects.

Saturday, January 9, 2010

Disk based data structures – Release 2

The seconds release is now available for download at Codeplex.

Changelog

Dictionary<TKey,TValue> class now persists all data to disk, so you should not run out of memory on a 64bit system. Only available disk space matters.
Strings can now be used for key/values. Strings don’t have a default empty constructor so I’ve added code to make them work.
I’ve included protobuf-net (Google Protocol Buffers) as a serializer. It’s very fast and efficient on size, but requires decorating your classes either with DataContract/DataMember attributes or ProtoContract/ProtoMember attributes. Check out the Getting Started section on protobuf-net.
Improved locking throughout the code.

Wednesday, December 23, 2009

Customizing the disabled look of a RichTextBox in winforms

When setting the .Enabled property of a control to false, it renders it with the disabled color of the operating system. Sometimes you might just want a different look.

In order to accomplish this in a winform you need to perform a small workaround. If not you will end up with a blinking cursor in the control. Here’s an example in the form of an extension method.

public static class MyExtensions
{
    public static void Disable( this Control control, Control focusTarget )
    {
        control.TabStop = false;
        control.BackColor = Color.DimGray;
        control.Cursor = Cursors.Arrow;
        control.Enter += delegate { focusTarget.Focus(); };
    }
}

In order for this to work you need to pass in another control which can receive focus. This could be a label or some other control on the form.

You would also need to create an Enable method and add some logic to remember the previous background color as well as set the correct Cursor state.

Thursday, November 12, 2009

Disk based data structures

Last year I created a project where I used memory mapped files as storage for a large Array. I’ve now polished the project a bit and included generic List and Dictionary implementations as well. The project can be found at Disk Based Data Structures - CodePlex.

I’ve also created a serializer project which benchmarks and picks the fastest serializer method for your type. This serializer is used to persist the data to disk. The classes are also implemented thread safe.

Background for the project

A disk based version of an array would require a lot of caching logic to make it perform fast enough compared to a pure memory implementation and a couple of years ago I stumbled across Memory Mapped Files which has long existed in the operating systems and is typically used in OS’ for the swap space.

The first time I worked with Memory Mapped files I used a library from MetalWrench, but this time around I got hold of Winterdom's much nicer implementation of the Win32 API. I've included the patch from Steve Simpson, but removed the dynamic paging since it slows things down and it's not necessary on 64bit systems. (If you want to use arrays which hold over 2gb of data on 32bit systems I recommend reverting to Steve's original version and set a view size of 200-500mb.) Future releases will use .Net 4.0’s System.IO.MemoryMappedFiles namespace.

The beauty of 64bit is that you have virtually unlimited address space, so each thread can get it's own view of the mapped file without running out of address space. 32bit Windows can only address 4gb.

As for performance my theory is that Microsoft has implemented a fairly good caching algorithm for it's swap file, so it should prove good enough for me. A few tests show a much better disk IO with the Memory Mapped API than using .Net's file IO library. I haven't testet the performance if you add the SEC_LARGE_PAGES flag, but it might help some.

Hope this library is useful for someone out there :)

Monday, September 14, 2009

Early creation of public properties

I’m battling myself a bit about a public API I’m writing. Should I initialize all my public properties in the default constructor, or should I initialize them when accessed?

Having a public property returning null is not an option in my opinion as it leads to extra checking on the consumer of the API. Grunt work which the API should do for you.

Consider the two following classes:

public class MyClass
{
    public List<string> List { get; set; }

    public MyClass()
    {
        List = new List<string>();
    }
}

and

public class MyClass
{
    private List<string> _list;
    public List<string> List
    {
        get
        {
            if( _list == null )
            {
                _list = new List<string>();
            }
            return _list;
        }
        set { _list = value; }
    }

    public MyClass()
    {            
    }
}

The pros for the first class is that it’s short and very easy to read. The pros for the second class is that it is more optimized in terms of memory usage if the property is not always used.

My API has an object structure of about 20 classes, which may or may not be set. Some might be used more frequent and favor the first class, as others are infrequent and would favor the last one.

Having both implementations seems a bit inconsistent, so the big question is; should I favor the easy read, or the optimized? If the object structure is being created often, will creating all these extra objects be bad for the clr or doesn’t it matter?

It might be that benchmarking is the way to go to give the final answer, but any comments on the matter is appreciated.

Thursday, July 23, 2009

Removing Exif data – continued

Seems like the underlying jpeg library is a bit broken on some OS’ when using the JpegEncoder/JpegDecoder.

Since I only wanted to remove the Exif data, and not modify it, I ended up with a byte patcher instead. A matter of taking control :)

using System.IO;

namespace ExifRemover
{
    public class JpegPatcher
    {
        public Stream PatchAwayExif(Stream inStream, Stream outStream)
        {
            byte[] jpegHeader = new byte[2];
            jpegHeader[0] = (byte)inStream.ReadByte();
            jpegHeader[1] = (byte)inStream.ReadByte();
            if (jpegHeader[0] == 0xff && jpegHeader[1] == 0xd8) //check if it's a jpeg file
            {
                SkipAppHeaderSection(inStream);
            }
            outStream.WriteByte(0xff);
            outStream.WriteByte(0xd8);

            int readCount;
            byte[] readBuffer = new byte[4096];
            while ((readCount = inStream.Read(readBuffer, 0, readBuffer.Length)) > 0)
                outStream.Write(readBuffer, 0, readCount);

            return outStream;
        }

        private void SkipAppHeaderSection(Stream inStream)
        {
            byte[] header = new byte[2];
            header[0] = (byte)inStream.ReadByte();
            header[1] = (byte)inStream.ReadByte();

            while (header[0] == 0xff && (header[1] >= 0xe0 && header[1] <= 0xef))
            {
                int exifLength = inStream.ReadByte();
                exifLength = exifLength << 8;
                exifLength |= inStream.ReadByte();

                for (int i = 0; i < exifLength - 2; i++)
                {
                    inStream.ReadByte();
                }
                header[0] = (byte)inStream.ReadByte();
                header[1] = (byte)inStream.ReadByte();
            }
            inStream.Position -= 2; //skip back two bytes
        }
    }
}

Tuesday, July 21, 2009

Remove Exif data from image files with C# and WPF libraries

(For my final solution check out Exif continued..)

A colleague of mine e-mailed me with a problem he had. He was developing a solution where the customer wanted all exif data to be removed from the images they provide on the web. He had tried a bit with no luck.

Since the image libraries in WPF is far superior to the ones in winforms I gave it a shot. I googled around, read the exif spec and came up with the code below. The image is read and then loop over all exif properties, and then blank them out. It might work just as good by removing them, but with blanking the file don’t change header wise. Properties pertaining to the image characteristics such as width and height are skipped. You can check them against the exif spec. I have only tried the code on jpeg images, and I didn’t have one with GPS coordinates in it, but in theory it should remove GPS coordinates as well.

I skipped the metadata.TrySave() all together since it didn’t work when I use the SetQuery method. If I just changed the metadata properties it worked. It’s easy to put this back in and you find a discussion about it in one of the links at the bottom.

using System;
using System.IO;
using System.Windows.Media.Imaging;

namespace ExifRemover
{
    public class ExifReader
    {
        public void SetUpMetadataOnImage(string filename)
        {
            string tempName = Path.Combine(Path.GetDirectoryName(filename), Guid.NewGuid().ToString());
            // open image file to read
            using (Stream file = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            {
                // create the decoder for the original file.  The BitmapCreateOptions and BitmapCacheOption denote
                // a lossless transocde.  We want to preserve the pixels and cache it on load.  Otherwise, we will lose
                // quality or even not have the file ready when we save, resulting in 0b of data written
                BitmapDecoder original = BitmapDecoder.Create(file, BitmapCreateOptions.PreservePixelFormat, BitmapCacheOption.None);
                // create an encoder for the output file
                BitmapEncoder output = null;
                string ext = Path.GetExtension(filename);
                switch (ext)
                {
                    case ".png":
                        output = new PngBitmapEncoder();
                        break;
                    case ".jpg":
                        output = new JpegBitmapEncoder();
                        break;
                    case ".tif":
                        output = new TiffBitmapEncoder();
                        break;
                }

                if (original.Frames[0] != null && original.Frames[0].Metadata != null)
                {
                    // So, we clone the object since it's frozen.
                    BitmapFrame frameCopy = (BitmapFrame)original.Frames[0].Clone();
                    BitmapMetadata metadata = original.Frames[0].Metadata.Clone() as BitmapMetadata;

                    StripMeta(metadata);

                    // finally, we create a new frame that has all of this new metadata, along with the data that was in the original message
                    output.Frames.Add(BitmapFrame.Create(frameCopy, frameCopy.Thumbnail, metadata, frameCopy.ColorContexts));
                }
                // finally, save the new file over the old file                
                using (Stream outputFile = File.Open(tempName, FileMode.Create, FileAccess.Write, FileShare.ReadWrite))
                {
                    output.Save(outputFile);
                }
            }
            File.Delete(filename);
            File.Move(tempName, filename);
        }

        public void StripMeta(BitmapMetadata metaData)
        {
            for (int i = 270; i < 42016; i++)
            {
                if (i == 274 || i == 277 || i == 284 || i == 530 || i == 531 || i == 282 || i == 283 || i == 296) continue;

                string query = "/app1/ifd/exif:{uint=" + i + "}";
                BlankMetaInfo(query, metaData);

                query = "/app1/ifd/exif/subifd:{uint=" + i + "}";
                BlankMetaInfo(query, metaData);

                query = "/ifd/exif:{uint=" + i + "}";
                BlankMetaInfo(query, metaData);

                query = "/ifd/exif/subifd:{uint=" + i + "}";
                BlankMetaInfo(query, metaData);
            }

            for (int i = 0; i < 4; i++)
            {
                string query = "/app1/ifd/gps/{ulong=" + i + "}";
                BlankMetaInfo(query, metaData);
                query = "/ifd/gps/{ulong=" + i + "}";
                BlankMetaInfo(query, metaData);
            }
        }

        private void BlankMetaInfo(string query, BitmapMetadata metaData)
        {
            object obj = metaData.GetQuery(query);
            if (obj != null)
            {
                if (obj is string)
                    metaData.SetQuery(query, string.Empty);
                else
                {
                    ulong dummy;
                    if (ulong.TryParse(obj.ToString(), out dummy))
                    {
                        metaData.SetQuery(query, 0);
                    }

                }
            }
        }
    }
}

References:

http://www.dreamincode.net/code/snippet3144.htm

http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/d410c2ea-78ef-4ce9-958f-1878936635b7

http://www.kodak.com/global/plugins/acrobat/en/service/digCam/exifStandard2.pdf

Sunday, February 1, 2009

Going unsafe in managed code – give me speed!

After doing the array comparison article my mind has been working subconsciously on another matter I’ve thought about for several years.What is the fastest possible way to serialize/deserialize an object in .Net?

One way is using the built-in serialization in .Net with a BinaryFormatter or a SoapFormatter. This is the most general way and works for “all” cases. If you know a bit more about the data you want to serialize you can improve speed quite a lot.

In my article Using memory mapped files to conserve physical memory for large arrays I solve the serialization on structs or value types and use Marshal.StructureToPtr and Marshal.Copy in order to get a byte array I can write to disk afterwards (because I didn’t know better at the time) This will work for any struct with only value types in them. My weekend testing showed that if I use explicit layout on a struct or class we can omit the Marshal.StructureToPtr step and use Marshal.Copy.

Now over to the unsafe bit. By using pointes directly and skipping the Marshalling all together we improve speed even more. This fueled me to continue on my Disk Based Dictionary project which will benefit both from memory mapped files and fast serializing. My approach will be to analyze the type of object being used. If it’s an object with explicit layout or a base value type I will use fast pointer to pointer copying. If it’s an object with only value types, but implicit layout I’ll go with the StructureToPtr. For an object with reference types I will use normal serialization, or check if they implement a BinaryWriter/BinaryReader interface for writing out the values manually.

The library will then work for the lazy coder which don’t need killer performance, but also for the conscious ones bothering about speed.

If I’m lucky with inspiration I’ll have it done this week before I go to Vegas.

If you’re wondering why I bother with these things it’s because I used to work with search engines where speed vs. memory is a big issue. In my current job doing SharePoint consulting it’s all a waste of time since the SQL server will always be the bottleneck :)

Monday, January 26, 2009

C# Javascript library

Came across this piece in the January edition of MSDN Magazine. I talks about creating javascript for AJAX apps using C# and Visual Studio. Sounds too good to be true :) The library can be fetched at projects.nikhilk.net/ScriptSharp

I know I will take a look at it the next time a project requires some nifty scripting.

Wednesday, January 14, 2009

Fast byte array comparison in C#

I got into a discussion with a colleague the other day about string comparison in .Net and whether to use

variable.Equals("mystring")

"string" == "string"

both in terms of speed (though it wouldn’t matter in most cases) and in terms of readability. As for speed .Equals is faster as you save one method call. == is implemented as an operator which again calls Equals. Our good friend Reflector is always there when you need him.

The interesting part came when reflecting this and I stumbled upon EqualsHelper and CompareOrdinalHelper. Here .Net casts the strings to pointer arrays and compares an int at a time. This lead me to creating a byte[] comparison function after the same code .Net used internally and benchmarking it.

For an equal array with 11 elements the unsafe is 3 times as fast. For unequal arrays the managed implementation is quicker if the first or second byte differs. From the third on and out the unsafe gains speed. Below is some sample code you can experiment with yourself. The longer the array, the more you gain on the unsafe version. Be sure to test the code compile in release mode.

Microsoft don’t recommend you using unsafe unless it’s performance critical, but since they use it internally we can as well ;) (But you should have a good reason due to complexity imo) Why they compare 10 bytes at a time is beyond me and I haven’t tested if this is some magic number which yields good results for general cases.

class Program
{
    static void Main(string[] args)
    {
        byte[] a = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
        byte[] b = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };

        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < 30000000; i++)
        {
            SafeEquals(a, b);
        }
        sw.Stop();
        Console.WriteLine(sw.Elapsed);

        sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < 30000000; i++)
        {
            UnSafeEquals(a, b);
        }
        sw.Stop();
        Console.WriteLine(sw.Elapsed);
    }

    private static bool SafeEquals(byte[] strA, byte[] strB)
    {
        int length = strA.Length;
        if (length != strB.Length)
        {
            return false;
        }
        for (int i = 0; i < length; i++)
        {
            if( strA[i] != strB[i] ) return false;
        }
        return true;
    }

    [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
    private static unsafe bool UnSafeEquals(byte[] strA, byte[] strB)
    {
        int length = strA.Length;
        if (length != strB.Length)
        {
            return false;
        }
        fixed (byte* str = strA)
        {
            byte* chPtr = str;
            fixed (byte* str2 = strB)
            {
                byte* chPtr2 = str2;
                byte* chPtr3 = chPtr;
                byte* chPtr4 = chPtr2;
                while (length >= 10)
                {
                    if ((((*(((int*)chPtr3)) != *(((int*)chPtr4))) || (*(((int*)(chPtr3 + 2))) != *(((int*)(chPtr4 + 2))))) || ((*(((int*)(chPtr3 + 4))) != *(((int*)(chPtr4 + 4)))) || (*(((int*)(chPtr3 + 6))) != *(((int*)(chPtr4 + 6)))))) || (*(((int*)(chPtr3 + 8))) != *(((int*)(chPtr4 + 8)))))
                    {
                        break;
                    }
                    chPtr3 += 10;
                    chPtr4 += 10;
                    length -= 10;
                }
                while (length > 0)
                {
                    if (*(((int*)chPtr3)) != *(((int*)chPtr4)))
                    {
                        break;
                    }
                    chPtr3 += 2;
                    chPtr4 += 2;
                    length -= 2;
                }
                return (length <= 0);
            }
        }
    }
}