Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

ExifLibrary for .NET

0.00/5 (No votes)
19 Nov 2009 2  
An Exif Metadata library for the .NET Framework

Introduction

During my last summer vacation, I used a GPS navigation device while traveling the western coast of Turkey. I took many pictures along the way, and when I got back home, I wanted to write a small utility to tag the pictures with geo location information from my GPS. Although the .NET Framework has some support for Exif metadata, it is not very user-friendly. The framework reads the standard Exif tags, but it returns unprocessed raw bytes. Hence, I decided to write my own Exif metadata utility, and this library was born.

Disclaimer

This library currently does not understand the vendor specific MakerNote tag. If you are looking for a complete Exif metadata library, you may like to take a look at the excellent ExifTool Perl library by Phil Harvey.

Using the Library

To extract Exif metadata from a JPEG/Exif image, create an instance of the ExifData class with the path to the image file. The ExifData class reads the APP1 section, and extracts all Exif tags and also the embedded thumbnail (if any). For ease of use, this Exif library converts Exif tags to either .NET native types or custom classes. Date fields are returned as DateTime structures, GPS coordinates are wrapped with custom GPSLatitudeLongitude classes, etc.

You can save the Exif metadata with the image using the Save method of the class. The writer will replace the APP1 section of the original image with the modified metadata.

// Extract exif metadata
ExifFile file = ExifFile.Read("path_to_my_image");

// Read metadata
foreach (ExifProperty item in file.Properties.Values)
{
    // Do something with meta data
}

// Get the thumbnail image
Image thumb = file.ThumbnailImage;

// Set the date time to now
file.Properties[ExifTag.DateTime].Value = DateTime.Now;
// Modify GPS location
GPSLatitudeLongitude location =
    file.Properties[ExifTag.GPSLatitude]
    as GPSLatitudeLongitude;
location.Degrees.Set(22, 0);

// Save exif data with the image
file.Save("path_to_my_image");

You can also remove all or some of the Exif metadata from the image before saving.

// Extract exif metadata
ExifFile file = ExifFile.Read("path_to_my_image");
// Clear metadata
file.Properties.Clear();
// Save exif data with the image
file.Save("path_to_my_image");

JPEG, JFIF, Exif: What do They Mean?

JPEG ("Joint Photographic Experts Group") is the committee that created the JPEG standard. It is also the name of the compression method (the codec) defined by the JPEG committee. JPEG is not a file format. (Actually, there is a "pure" file format – JPEG Interchange Format, JIF – described in the original JPEG specification. But it is rarely used.) The most widely used file formats containing JPEG compression are JFIF (JPEG File Interchange Format) and Exif (Exchangeable Image File Format). In everyday use, JPEG usually means a JFIF or an Exif image file.

The difference between JFIF and Exif file formats is that JFIF files use Application Marker 0 (APP0) sections to store metadata, whereas Exif files use Application Marker 1 (APP1) sections. The two file formats are incompatible because they both specify that their sections (APP0 and APP1) must be the first in the image file. In practice, however, Exif files usually include an APP0 section at the start of the image file. This does not comply with the Exif standard, but allows old JFIF readers to read the image file.

A modern JFIF or Exif reader must not assume a particular order for APPn sections. It should read the entire file and process APPn sections as it encounters them. Additionally, APPn sections might not be unique. For example, there might be more than one APP1 section in a JPEG/Exif file.

Reading Metadata from a JPEG/Exif File

Here is a graphical view of a JPEG/Exif file. Since Exif metadata is contained in the APP1 section, I have detailed APP1 only.

The Structure of a JPEG/Exif File

A JPEG/Exif file starts with the start of the image marker (SOI). The SOI consists of two magic bytes: (0xFF, 0xD8), identifying the file as a JPEG file. Following the SOI, there are a number of Application Marker (APPn) sections and sections for compressed image data.

Application Marker Sections (APPn)

In order to identify APPn sections, we start from the SOI and read the next few bytes. Although contents of the APPn sections vary, the first two bytes are always the APPn marker. For the APP0 section, the marker is (0xFF, 0xE0), for the APP1 section (0xFF, 0xE1), and so on. Marker bytes are followed by two bytes for the size of the section (excluding the APPn marker, including the size bytes). The length field is followed by variable size application data.

The APP1 Section

We are interested in the APP1 section, since this is where Exif metadata is stored. In the APP1 section of a JPEG/Exif file, following the marker and size information, there is a 6 byte Exif marker (0x45, 0x78, 0x69, 0x66, 0x00, 0x00) ('Exif\0\0') identifying the file as a JPEG/Exif image. After that, there is the TIFF header which contains information about the byte-order (see below) and a pointer to the 0th Image File Directory (IFD). Following the TIFF tag, there are the IFD sections. Here is the pseudo-code to read the APPn sections:

// Read SOI (0xFF, 0xD8)
marker = readBytes(2);
if(marker != [0xFF, 0xD8])
    exit("Not a JPEG image!");

// Read sections until EOI (0xFF, 0xD9)
while(!EOF && (marker = readBytes(2) != [0xFF, 0xD9]))
{
    // Size of APP section including the size field itself.
    // This will be big-endian; convert as required.
    size = readBytes(2) - 2;

    // Absolute location of the next APP section)
    nextapp = getStreamPosition() + size;

    // Is this the APP1? (0xFF, 0xE1)
    if(marker == [0xFF, 0xE1])
    {
        // Do something with APP1 data
        readAPP1();
    }
    // elseif (marker == ...
        // Read other sections as required
        // ...
        // ...
    else
        seekAbsolute(nextapp);
}

TIFF Header

The TIFF header holds two important values. The first two bytes of the TIFF header tells us whether the following IFD sections are in the little-endian or big-endian byte-order. Since image files are typically transferred between devices, and those devices may have different byte-orders, it is crucial to correctly interpret the byte-order given in the TIFF header. The second important value is the location of the 0th IFD. This location is given as an offset from the start of the TIFF header.

We can now add the pseudo-code to read the APP1 section and the TIFF header:

function readAPP1()
{
    // Do we have the Exif marker?
    if(readBytes(6) == [0x45, 0x78, 0x69, 0x66, 0x00, 0x00])
    {
        // We are now at the TIFF header.
        // Save the offset to the start
        // of the TIFF header.
        // We will need this later on.
        baseoffset = getStreamPosition();

        // Read the IFD byte order
        islittleendian = (readBytes(2) == [0x49, 0x49]);

        // TIFF marker
        readBytes(2); // Should always be [0x002A]

        // Offset to the 0th IFD relative to TIFF header
        nextifd = readBytes(4);

        if(nextifd != 0)
        {
            // Read the IFD
            seekAbsolute(baseoffset + nextifd);
            readIFD();
        }
    }
}

Image File Directories (IFD)

The APP1 section consists of a number of Image File Directories. The offset to the 0th IFD is given in the TIFF header as an offset from the start of the header. The remaining IFDs are referenced in different places. The offset to the Exif IFD and the GPS IFD are given in the 0th IFD fields. The offset to the first IFD is given after the 0th IFD fields. The offset to the Interoperability IFD is given in the Exif IFD.

Each IFD contains a number of fields. The field count is given in the first two bytes of the IFD. Following the field count are 12-byte fields. Following the fields, there is a 4 byte offset from the start of the TIFF header to the start of the first IFD. This value is meaningful for only the 0th IFD. Following this, there is the IFD data section. IFD fields and data sections are described in the following section.

The pseudo-code to read an IFD section:

function readIFD()
{
    // From now on all byte conversions must convert
    // between byte-orders as needed.

    // Get the field count
    fieldcount = readBytes(2);
    for(i = 0; i < fieldcount; i++)
    {
        readField();
    }

    // Offset to 1st IFD
    if(offset = readBytes(4) != 0)
        nextifd = offset;
}

IFD Fields

Fields are 12-byte subsections of the IFD sections. The first two-bytes of each field give the tag ID as defined in the Exif standard. There is one caveat here. Tag IDs are not unique across IFDs. For example, both GPSLatitudeRef and InteroperabilityIndex have a tag ID of 1. To prevent collisions, you should always consider tag IDs within IFD boundaries.

The next two bytes give the type of the field data. Most Exif types can be readily converted to .NET integral types: Byte (byte), Short (uint16), Long (uint32), Signed Long (int32), Ascii (byte array), and Undefined (byte array). Two remaining types, Rational and Signed Rational, can be represented with floating numbers with some work.

The following four bytes may be a little confusing. For byte arrays (the Exif Ascii and Undefined types), the byte length of the array is given. For example, for the Ascii string: "Exif", the count will be 5 including the null terminator. This is true for the Undefined data type too (although Undefined fields do not have a null terminator, so the count would be 4). For other types, this is the number of field components. For GPS location fields, for example, three Rational values are given, one for degrees, one for minutes, and one for seconds. In this case, the count would be 3, although the actual byte length would be 24 (3x8).

Following the count, we have the 4-byte field value. However, if the length of the field data exceeds 4 bytes, it will be stored in the IFD Data section instead. In this case, the value will be the offset from the start of the TIFF header to the start of the field data. For example, for a Long (uint32, 4 bytes), this will be the field value. For a Rational (2 x uint32, 8 bytes), this will be an offset to the 8-byte field data.

Here is the pseudo-code to read the fields:

function readField()
{
    // From now on all byte conversions must convert
    // between byte-orders as needed

    // Tag ID
    tagid = readBytes(2);

    // Field type
    type = readBytes(2);

    // Count or components
    count = readBytes(4);

    // Byte length of field data
    if(type == 1)
        n = count; // 1-byte x count
    else if(type == 2 || type == 7)
        n = count; // 1-byte x count
    else if(type == 3)
        n = 2 x count; // 2-bytes x count
    else if(type == 4 || type == 9)
        n = 4 x count; // 4-bytes x count
    else if(type == 5 || type == 10)
        n = 8 x count; // 2 x 4-bytes x count

    // Value or offset
    value = readBytes(4);

    // Treat value as offset if
    // byte count is more than 4.
    if(n > 4)
    {
        // Store our current position
        currentoffset = getStreamPosition();

        // Seek to data section and read actual field data
        seekAbsolute(baseoffset + value);
        value = readBytes(n);

        // Go back to our last position
        seekAbsolute(currentoffset);
    }
}

Points of Interest

To Lilliput and Back

One thing to note while reading the Exif tags is the byte-order. The JPEG file itself will always be in big-endian format. However, the byte order of IFD subsections may be little-endian or big-endian. Luckily, the byte order of IFD subsections is given in the first two bytes of the TIFF header as either (0x49, 0x49 - little-endian) or (0x4D, 0x4D - big-endian). The library converts between the byte orders as needed. When writing the data back, all fields are written in the original byte order.

The .NET framework contains the static BitConverter class which can convert data between arrays of bytes and integral types. However, BitConverter is not endian-aware. I wrote a simple endian-aware class -BitConverterEx- for this. It is used as follows:

uint value = BitConverterEx.ToUInt32(
    bytes,
    0,
    ByteOrder.BigEndian,
    ByteOrder.Sytem);

It may be tedious to list byte-orders at each conversion. In that case, you may create an instance of the BitConverterEx class, passing byte-orders to the constructor.

BitConverterEx conv = new BitConverterEx(
    ByteOrder.BigEndian,
    ByteOrder.Sytem);
uint value = conv.ToUInt32(bytes, 0);

One Tag to Ruin it All

The APP1 section contains a tag called the MakerNote which is used by camera vendors to record custom information. For example, lens type is typically written in the MakerNote since there is no public Exif tag defined for that purpose. This innocent looking tag is described in the Exif specification as follows:

"A tag for manufacturers of Exif writers to record any desired information. The contents are up to the manufacturer, but this tag should not be used for any other than its intended purpose."

Here are some observations about the MakerNote:

  • MakerNotes may contain very interesting data about the camera, lens, and picture taking conditions.
  • Camera vendors have their own proprietary MakerNote formats, and they do not make their formats public. Writing a MakerNote-aware metadata tool typically requires reverse-engineering the different MakerNote formats.
  • MakerNotes may contain absolute addresses. Moving the MakerNote field around will likely corrupt the data.
  • Some vendors write data in the MakerNote only, ignoring public Exif tags.
  • Some vendors write the MakerNote with an arbitrary byte-order, ignoring what the TIFF header says.
  • Some vendors may even deliberately write false values in the public Exif tags to hide the fact that their cameras do not meet the marketed specifications. In such cases, the correct data will be found in the MakerNote.

Reverse-engineering proprietary MakerNotes and dealing with all the inconsistencies requires an enormous amount of work. Due to this, this library does not currently attempt to understand the MakerNote; although, I have plans to include this functionality through a plug-in mechanism in the future.

Finally, if you are looking for a MakerNote-aware Exif library, I once again recommend Phil Harvey's ExifTool.

Trivia

The TIFF header contains the magic byte 0x2A which is 42 in decimal. Revision 5.0 of the TIFF standard says that:

The number 42 was chosen for its deep philosophical significance.

The number 42 is probably a reference to the "Answer to the Ultimate Question of Life, the Universe, and Everything" from Douglas Adams's The Hitchhiker's Guide to the Galaxy.

References

History

  • 10th November, 2009: Initial post
  • 14th November, 2009: Updated article

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here