Thursday, July 23, 2009

Removing Exif data – continued

Seems like the underlying jpeg library is a bit broken on some OS’ when using the JpegEncoder/JpegDecoder.

Since I only wanted to remove the Exif data, and not modify it, I ended up with a byte patcher instead. A matter of taking control :)

using System.IO;

namespace ExifRemover
{
public class JpegPatcher
{
public Stream PatchAwayExif(Stream inStream, Stream outStream)
{
byte[] jpegHeader = new byte[2];
jpegHeader[0] = (byte)inStream.ReadByte();
jpegHeader[1] = (byte)inStream.ReadByte();
if (jpegHeader[0] == 0xff && jpegHeader[1] == 0xd8) //check if it's a jpeg file
{
SkipAppHeaderSection(inStream);
}
outStream.WriteByte(0xff);
outStream.WriteByte(0xd8);

int readCount;
byte[] readBuffer = new byte[4096];
while ((readCount = inStream.Read(readBuffer, 0, readBuffer.Length)) > 0)
outStream.Write(readBuffer, 0, readCount);

return outStream;
}

private void SkipAppHeaderSection(Stream inStream)
{
byte[] header = new byte[2];
header[0] = (byte)inStream.ReadByte();
header[1] = (byte)inStream.ReadByte();

while (header[0] == 0xff && (header[1] >= 0xe0 && header[1] <= 0xef))
{
int exifLength = inStream.ReadByte();
exifLength = exifLength << 8;
exifLength |= inStream.ReadByte();

for (int i = 0; i < exifLength - 2; i++)
{
inStream.ReadByte();
}
header[0] = (byte)inStream.ReadByte();
header[1] = (byte)inStream.ReadByte();
}
inStream.Position -= 2; //skip back two bytes
}
}
}



17 comments:

  1. Hi Mikael, I'm using your remove EXIF data in an application of mine. It has processed 1000s of images flawlessly. But today I've hit a case where it corrupts the image. If you are interested I can email thru the 'problem' files. Shoot me an email on russsell "dot" sayers -at- gmail 'dot' com

    ReplyDelete
  2. I forgot to take into account if the file is missing an exif header. The code is now corrected. If you send in a file without an exif header it will write out a file equal to the input file.

    ReplyDelete
  3. Thanks for your help Mikael! I found this reference very useful when dubugging the JPEG header code: http://www.media.mit.edu/pia/Research/deepview/exif.html

    ReplyDelete
  4. Appreciate the link. I used http://gvsoft.homedns.org/exif/exif-explanation.html#JpegMarker. They seem to explain about the same :)

    ReplyDelete
  5. Great. Came in really handy. Thanks for the post

    ReplyDelete
  6. I adapted this a bit and was able to "skip the exif data" in the stream, allowing me to better find duplicate jpeg images. Thanks for the leg up!

    ReplyDelete
  7. Hi Mikael, i have used http://techmikael.blogspot.in/2009/07/remove-exif-data-from-image-files-with.html it is working fine for jpeg files,
    for png and tif files final output file can not be viewed error "the image cannot be displayed because it contains errors."
    please let us know how to use jpegpatcher in removing exif data from image files.
    please help me.

    thanks
    Ravi Kiran

    ReplyDelete
    Replies
    1. Hi,
      You would need to read the specs of the tiff and png file formats in order to parse them correctly, and then modify the code accordingly.

      Delete
    2. must be

      outStream.WriteByte(jpegHeader[0])
      outStream.WriteByte(jpegHeader[1])

      instead of writing 0xff, 0xd8

      Delete
    3. So the header skip code would work anyways is what you are saying for other file formats as well? So if you remove the byte check basically and write the id's back.

      Delete
  8. Hi Mikael!
    Is there any way to use this code to delete specific EXIF fields?

    Many thanks!
    Josh

    ReplyDelete
    Replies
    1. You would need to extend the code to parse EXIF headers to do this, which is not part of the code.

      Delete
  9. Hi there Mikael, thank you very much for your code.
    I have done a bit modification since the (outputstring is not required as parameter) it's better to get that one inside the body of the function, like this:

    public Stream PatchAwayExif(Stream inStream)
    {
    Stream outStream = new MemoryStream();
    byte[] jpegHeader = new byte[2];
    jpegHeader[0] = (byte)inStream.ReadByte();
    jpegHeader[1] = (byte)inStream.ReadByte();
    if (jpegHeader[0] == 0xff && jpegHeader[1] == 0xd8) //check if it's a jpeg file
    {
    SkipAppHeaderSection(inStream);
    }
    outStream.WriteByte(0xff);
    outStream.WriteByte(0xd8);

    int readCount;
    byte[] readBuffer = new byte[4096];
    while ((readCount = inStream.Read(readBuffer, 0, readBuffer.Length)) > 0)
    outStream.Write(readBuffer, 0, readCount);

    return outStream;
    }

    ReplyDelete
    Replies
    1. Hi,
      Sure that works, but if you already know the output stream, it's unnecessary to assume a MemoryStream is the best intermediate, allocating more memory than needed. If you pass in a FileStream as input and output, then you just read and write directly from those streams - reducing memory the footprint.

      Delete
    2. Now I have a question, how do we now the output stream ?, the function is needing that as parameter but I'm not pretty clear where do I take it.
      So I assume to use a memorystream to store the "tmp" value to assign it to a stream var, either way that memory stream lives the time the function is alive.

      Delete
    3. Hi,
      That's up to the programmer working on the solution to decide. If you want a memory stream, pass that in. If you are writing to an http request, pass that. It's up to all special stream to implement their read/write methods.. not expect it to be a memorystream which is often redundant and you end up with an extra memory copy.

      Delete