Click here to Skip to main content
15,880,891 members
Articles / Programming Languages / C#

Reading IPTC APP14 Segment Header Information from JPEG Images

Rate me:
Please Sign up or sign in to vote.
4.60/5 (4 votes)
27 Apr 2011CPOL2 min read 32.4K   680   7   7
A simple class illustrating how to scrub meta data info tags from JPEG files using C# .NET

Introduction

Recently, I was tasked by my boss to come up with an app that can read the info tags buried inside JPEG files… Knowing nothing at the time about meta data standards, I embarked on a bumpy adventure on finding information on the internet on the subject. Unfortunately, at the time, not knowing the acronym for IPTC (International Press Telecommunications Council), I couldn’t locate a beautiful article about it on CodeProject by Christian Tratz, which I just found out about, when I tried to post my work on the subject…

To cut the story short, it took me quite some time, analyzing, reverse engineering cryptic PHP bits and pieces of samples, to come up with this simple C# class that can parse a JPEG file and extract tags from the Photoshop 3.0 section of it, codenamed APP14 section by Adobe standards. I strongly recommend reading the theory behind meta data in JPEG file located here.

The JPEGMetaData class contains a constructor that takes a reference to the location of the JPEG file on its corresponding drive. It encodes the headers in a separate Hash-Table for clarity. The APP14 section is characterized with the opening tag of 0xFF & 0xED. It should contain a Zero terminated string “Photoshop 3.0” in it. Within the section, various tags could exist, depending on whether the author of the image or whoever authored it last in an app like PhotoShop or Photo Mechanic, has populated any of the available meta data fields. If any of the sought field are not found in the meta-data, an appropriate message is returned back to the user.

C#
public JPEGMetaData(string FileName)
{
	PS3Tags.Add("PS3SectionHeader", "\u00FF\u00ED");
	PS3Tags.Add("PS3SectionIDTag", "Photoshop 3.0\u0000");
	PS3Tags.Add("PS3SectionObjNameTag", "\u001C\u0002\u0005");
	PS3Tags.Add("PS3SectionHeadlineTag", "\u001C\u0002\u0069");
	PS3Tags.Add("PS3SectionCaptionTag", "\u001C\u0002\u0078");
	
	JPEGContentBuffer = LoadJPEG(FileName);
	PS3SectionContentBuffer = 
	ExtractPS3ContentSection(PS3Tags["PS3SectionHeader"],
                                          PS3Tags["PS3SectionIDTag"]);

	PS3TagContents.Add("PS3SectionObjNameTag", 
	ExtractTag(PS3Tags["PS3SectionObjNameTag"].ToString()));
	PS3TagContents.Add("PS3SectionHeadlineTag", 
	ExtractTag(PS3Tags["PS3SectionHeadlineTag"].ToString()));
	PS3TagContents.Add("PS3SectionCaptionTag", 
	ExtractTag(PS3Tags["PS3SectionCaptionTag"].ToString()));
}

The actual raw JPEG file is loaded internally and converted to a string in a local buffer private string JPEGContentBuffer for further slicing.

C#
private string LoadJPEG(string FileName)
{
        FileStream fs = new FileStream(FileName, 
                                       FileMode.Open, 
                                       FileAccess.Read);

        byte[] RAWdata = new byte[fs.Length];
        fs.Read(RAWdata, 0, RAWdata.Length);
        fs.Close();

        return Encoding.Default.GetString(RAWdata, 0, RAWdata.Length);
}

The class exposes only a one Hash-table named PS3TagContents, that holds the contents of the following three major IPTC tags, identified by Adobe as:

IPTCApplicationRecord

Tags

5ObjectNamestring[0,64]
105Headlinestring[0,256]
120Caption-Abstractstring[0,2000]

The actual data extraction is performed in the ExtractTag method of the class. It searches for the corresponding tag header, acquires its block length, and then extracts the actual content from that location.

C#
private string ExtractTag(string currTagSought)
{
        int pos = PS3SectionContentBuffer.IndexOf(currTagSought);
            if (pos > 0)
            {
                pos += 3;
                int BlockSize = (int)(PS3SectionContentBuffer[pos] * 256) + 
                (int)(PS3SectionContentBuffer[pos + 1]);
                
                pos += 2;
                byte[] tagHeaderContent = new byte[BlockSize];
                System.Buffer.BlockCopy(Encoding.Default.GetBytes
		(PS3SectionContentBuffer), 
                pos, tagHeaderContent, 0, BlockSize);
                return Encoding.Default.GetString(tagHeaderContent);
            }
            else
                return currTagSought + " is not available!";
}

Finally, the harvested meta data could be rendered to the output console by invoking the DisplayAllTags() method of the class.

Hope this may help someone in their quest to process JPEG meta-data tags, the way I did at the time having to hustle to get this functionality together. I am attaching the full source code with the accompanying sample harness for the class.

History

  • 27th April, 2011: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United States United States
An ex-Sinclair ZX Spectrum developer turned IT Professional...

Worked for Northrop Grumman Space Technology, Unmanned Systems and Corporate Legal in Redondo Beach, El Segundo and Century City.

The peak of my career would be my FOX Channels Project Management position back in 2000.

Comments and Discussions

 
Questionretrieve with the same mode the keywords in my images Pin
Donato Fiorentino5-Nov-15 5:40
Donato Fiorentino5-Nov-15 5:40 
Questionhow to get all the iptc information Pin
vamshi p10-Aug-15 21:33
vamshi p10-Aug-15 21:33 
GeneralHelpful Links on the Subject Pin
ThomasBrownII4-Jun-11 10:13
ThomasBrownII4-Jun-11 10:13 
GeneralAPPD is actually APP13, not APP14 Pin
Drew Noakes27-May-11 6:38
Drew Noakes27-May-11 6:38 
GeneralRe: APPD is actually APP13, not APP14 Pin
ThomasBrownII4-Jun-11 9:22
ThomasBrownII4-Jun-11 9:22 
GeneralAgain. This is APP13, not APP14 Pin
boardhead6220-Feb-12 0:34
boardhead6220-Feb-12 0:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.