Click here to Skip to main content
15,885,141 members
Articles / Programming Languages / C++

TidyXML

Rate me:
Please Sign up or sign in to vote.
5.00/5 (15 votes)
6 Nov 2014CPOL2 min read 22.9K   236   24   7
Designed to take XML and make it easy to read, by adding appropriate line breaks and tab indentation.

Introduction 

This is designed to take XML and make it easy to read, by adding appropriate line breaks and tab indentation.

I wrote this a few nights ago because I got sick of the XML's my work transmits (about 40mb), all on one line (save bandwidth). Notepad++ has a lovely plugin which does this tidy up, but was taking about 45 mins to do a file. This does it in about 5 seconds.

Since I only had the XML's I have to use to test on, there are probably some things I've overlooked. If you find that some XML's are not getting new lined \ indented properly, please send me a sample one.

Using the code 

This little function will take 2 arguments, an input and output stream.  Read through the input stream, format it, and write to the output stream.

I've written a very crude Win32 API input and output selection area, for simple usage. The original version I wrote just involved chucking the EXE in a directory with some XML's, double clicking it, and it would do every XML in that directory.

While the interface is obviously windows only, the actual function itself should be platform independent.

To call this function use

C++
tidyXML(inputFile, outputFile);

With both inputFile and outputFile being references to the ifstream and ofstream.

Here is the function itself. Okay, I'm a bit lazy with comments. If you don't understand why a certain bit is done a certain way, or just want me to comment specific places, let me know in the comments below, and I'll try adding some more. But it should be pretty self explanatory.

C++
void tidyXML(ifstream &input, ofstream &output) {
    char currChar;
    char nextChar;
    int indent = -1;

    string currKeyStore = "";
    string lastKeyStore = "";
    string valueOrJunkStore = "";

    bool inKey = false;
    bool skipNextIndent = false;

    enum keyType {
        unset,
        infoLine,
        entryKey,
        exitKey,
        emptyValue,
    };
    keyType lastKeyType = unset, currKeyType = unset;

    while(true) {
        currChar = input.get();
        if (!input.good()) {
            output << currKeyStore;
            break;
        }

        // have a gander what's next
        nextChar = input.peek();
        if (!input.good())
            nextChar = '\0';

        if (currChar == '<') {
            inKey = true;

            lastKeyType = currKeyType;
            lastKeyStore = currKeyStore;

            currKeyType = unset;
            currKeyStore = "";

            // if cannot work out here, need to wait until nextChar is '>' to decide
            if (nextChar == '/')
                currKeyType = exitKey;
            else if (nextChar == '?' || nextChar == '!')
                currKeyType = infoLine;
        }

        if (currKeyType == unset && nextChar == '>') {
            // can actually work out what this is now :)
            if (currChar == '/')
                currKeyType = emptyValue;
            else
                currKeyType = entryKey;
        }

        if (inKey)
            currKeyStore += currChar;
        else
            valueOrJunkStore += currChar;

        if (currChar == '>') {
            inKey = false;

            if (!skipNextIndent)
                for (int i = 0; i < indent; ++i)
                    output << '\t';
            output << lastKeyStore;
            skipNextIndent = false;

            if (lastKeyType == entryKey && currKeyType == exitKey) { // value line
                skipNextIndent = true;
                output << valueOrJunkStore;
            } else if (lastKeyType != unset) { // so don't add line at start of the file
                output << endl;
            }

            valueOrJunkStore = "";

            if (lastKeyType == exitKey || lastKeyType == emptyValue)
                --indent;

            if (currKeyType == entryKey || currKeyType == emptyValue)
                ++indent;
        }
    }
}

I compiled the sample version (with a crude Win32 interface) using MinGW. With the following command:

mingw32-g++ --std=c++0x -Wall -fno-builtin -O3 -static *.cpp -lcomdlg32 -o tidyxml.exe

I've static linked it so that it should run as-is, without the dependency hell you can get.

Also provided is the double click build.bat file i use. It'll also strip out the debugging symbols.

Points of Interest

The main pain with writing this, is that you have to always read the next tag, before you know what to do with the current one. Or better put, save the current tag, read the next tag, so you can decide what to do with the saved tag. 

The original version was just full of bools to plan it all out and track everything. These were replaced by the enum which makes the code a lot cleaner and easier to follow. Well I think so anyway.

History  

  • v1 - Original release. 
  • Renamed main function of formatXML to tidyXML . Split demo project out of source file which includes this function.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionMy experience on MAC OS with homebrewn Parsers isnt so good Pin
KarstenK4-Feb-15 20:58
mveKarstenK4-Feb-15 20:58 
QuestionNice tool (my 5) Pin
H.Brydon6-Aug-13 12:03
professionalH.Brydon6-Aug-13 12:03 
GeneralMy vote of 5 Pin
kanalbrummer15-Jul-13 18:03
kanalbrummer15-Jul-13 18:03 
Thanks for sharing. Maybe I need this tool the next time. Hope I find it when I need it Smile | :)
GeneralMy vote of 5 Pin
Silvabolt10-Jul-13 4:48
Silvabolt10-Jul-13 4:48 
GeneralMy vote of 1 Pin
skyformat99@gmail.com9-Jul-13 20:38
skyformat99@gmail.com9-Jul-13 20:38 
GeneralRe: My vote of 1 Pin
MrMikeJJ9-Jul-13 20:47
MrMikeJJ9-Jul-13 20:47 
GeneralRe: My vote of 1 Pin
kanalbrummer15-Jul-13 18:01
kanalbrummer15-Jul-13 18:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.