Click here to Skip to main content
15,884,099 members
Articles / Programming Languages / XML

Leveraging LINQ to XML: Querying an obfuscation map

Rate me:
Please Sign up or sign in to vote.
5.00/5 (4 votes)
5 Oct 2009CPOL5 min read 23.7K   266   11  
Practical use of LINQ to XML technology.

Image 1

Introduction

During my working experience, I had to process some user error reports concerning one of our company's products. These reports included call stack information intended to help us with the detection of error causes.

We use an obfuscation tool upon our production code, so the call stack information provided by an error report requires some “hopping around” with an obfuscation map and manual text search. This “hopping” is not always an easy thing to do – the obfuscation map is a huge XML file with a size of more than 25 MB, and most text editors do not appreciate such information volume at all. Such editors' preferences are reasonable, assuming that a usual human-made file rarely runs over the 1 MB boundary.

Things get worse when you need some syntax highlighting, or even more – XML tree parsing/navigation. The other big problem is that there is a huge number of XML elements with the same obfuscated name, and to identify their type, you should manually analyze the parent XML elements.

Task definition

Facing these problems, I decided to help our support team by automating the name resolving process. The automation task requirements were the following:

  • It should be a tool that allows finding original Class or Class member names based on an obfuscated name.
  • The UI should be as simple as possible – I think that simple tasks should not require complex user manipulations.
  • Use LINQ to XML – this is a convenient and easy way to handle XML data. But, maybe, the main factor was that I, at last, had a chance to use this technology in practice.

Let’s move to the concrete steps. First, we define the input data.

Here is the call stack content example:

Type: System.ArgumentException
Stack:
   at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary.Insert(TKey key, TValue value, Boolean add)
   at ne.c(IError A_0)
   at ne.c(ErrorListEventArgs A_0)
   at ne.c.a()
   at ne.c(Object A_0, EventArgs A_1)
   at System.Windows.Forms.Timer.OnTick(EventArgs e)
   at System.Windows.Forms.Timer.TimerNativeWindow.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, 
                                                 IntPtr wparam, IntPtr lparam)

Our company uses the Dotfuscator tool, its Community Edition is shipped with Visual Studio. Obfuscation maps have the same format for all obfuscator editions, so anyone can test this name resolving tool on their own code. The obfuscation map is an XML file whose structure looks like this:

XML
<dotfuscatorMap version="1.1">
  <header />    <!--Provides timestamp and version information.-->
  <mapping>    <!--Mapping information.-->
    <module>    <!--Module mapping include module’s types.-->
      <name>ModuleName.dll</name>    <!--Original module name.-->
      <type />    <!--Type name and members mapping.-->
      ...
      <type/>    <!--Other type name and members mapping.-->
    </module>
  </mapping>
  <statistics />    <!--Some obfuscation statistics.-->
</dotfuscatorMap>

The <type> element structure is the following:

XML
<type>
  <name>type_name</name>            <!--Original type name.-->
  <newname>obfuscated_name</newname>        <!--Obfusctaed type name (Optional).-->
  <methodlist>                    <!--Type’s method list.-->
    <method>
      <signature>void(object, System.EventArgs)</signature>    <!--Method signature.-->
      <name>method_name</name>         <!--Original method name.-->
      <newname>obfuscated_name</newname>    <!--Obfusctaed method name (Optional).-->
    </method>
    ...
  </methodlist>
  <fieldlist>                     <!--Types filed list.-->
    <field>
      <signature>System.Windows.Forms.Button</signature>    <!--Field signature.-->
      <name>field_name</name>            <!--Original field name.-->
      <newname>obfuscated_name</newname>    <!--Obfuscated field name (Optional).-->
    </field>
    ...
  </fieldlist>
</type>

You will notice that the obfuscated name is always placed in an optional <newname> element. If this element is omitted, then the object uses its original name.

Next, we should define the user input. For example, we need to find a type with the obfuscated name “a”. Usually, we search for the “<newname>a</newname>” string – this will find all the types, methods, and fields that have the obfuscated name ‘a’. There are about several thousand results in complex projects. To achieve our search goal, we should analyze a parent element and detect if it is a <type> element.

Thus, a user usually uses two parameters: the first parameter is an obfuscation map file path, and the second parameter is an obfuscated name. There is also one more (implicit) parameter – a search result type (type/method/field), but we will try to infer this parameter from the second. According to the requirement of UI simplicity, I think this is enough.

Type name search

The first task is an original type name search using an obfuscated name. Let’s do it. First, we need to enlist all types from the map file. This is an easy one:

C#
// Construct the XElement to access map file.
XElement map = XElement.Load("sample.xml");
var types = map.Descendants("type");

The main operation here is the map.Descendants("type") call, which returns all the <type> elements from the XElement content.

The Descendants() method returns a plain collection of the descendant XML elements. This collection includes child elements, grandchildren elements, and etc. So, if we write map.Descendants(), we will get all XML elements enumeration from the map document. This method has an overload that allows filtering the output collection by specifying the matching element name filter. I used this overload to filter out all elements except the <type>.

Note: The filter name should be a fully qualified name; it means that if the filtered elements have a namespace, the filter name must have it too.

Note: Keep in mind that Descendants uses deferred execution, meaning that the actual access to the underlying XML will be performed when you first access the Descendants result rather then when you call this function.

map.Descendants("type") will scan the whole XML tree for the specified element type; it is not the most effective solution, but the simplest one. Using direct element navigation that reduces the whole XML scan will be more productive. For example, we can use such an expression:

C#
var types = map.Elements("mapping").Elements("module").Elements("type");

Depending on the XML content, this expression can give us ten times performance boost than the Descendants call. But for this application, I prefer simplicity of the Descendants function.

Now we have all the <type> elements, and need to find matches with the obfuscated name. I implement this using a LINQ query:

C#
string obfuscatedName = "a"; // Define the search criteria.
var found = from type in types
          // Filter type elements by obfuscatedName matching.
          where type.Element("newname").Value == obfuscatedName
          select type;

The types collection is filtered by matching the type’s child element <newname> content with the passed obfuscated name. This can also be done using the Where extension method with the lambda expression:

C#
var found = types.Where(t => t.Element("newname").Value == obfuscatedName);

As stated before, the <newname> element is optional, so Element("newname") returns null when the type is not obfuscated. To avoid possible NRE, I’ve changed LINQ query to the following:

C#
var found = from type in map.Descendants("type")
// Declare name element variable.
let name = type.Element("newname") ?? type.Element("name")
// Filter type elements by obfuscatedName matching.
where name.Value == obfuscatedName
select type;

This code will search types with obfuscated or original name matching obfuscatedName.

The let keyword introduces a new variable name that holds a <newname> element or a <name> element in case no <newname> element is present. This new variable is an anonymous type that consists of a current <type> element and a <name>/<newname> element. Something like that:

C#
new { Type = type, Name = type.Element("newname") ?? type.Element("name") };

The whole query can be represented in C# as:

C#
IEnumerable<xelement> found = map.Descendants("type").
      Select(type => new { Type = type, Name = type.Element("newname") }).
      Where(tn => tn.Name.Value == obfuscatedName).
      Select(tn => tn.Type);

As we can see, there is the second Select function call, which (in conjunction with the anonymous type projection) will give us some performance penalty, so I rewrite the query to the following:

C#
var found = from type in map.Descendants("type")
// Filter type elements by obfuscatedName matching.
where (type.Element("newname") ?? type.Element("name")).Value == 
    obfuscatedName
select type;

The next thing to do is to process complex type names. In XML, these names are separated by ‘/’ instead of ‘.’; e.g., the “MyClass.MyInternalClass” name is presented by a “MyClass/MyInternalClass” string value. We just need to replace “.” on “/” in the obfuscatedName variable to allow a match:

C#
obfuscatedName = obfuscatedName.Replace('.', '/');

At last, we provide anonymous type projection that will help us to process the search results in C#:

C#
var types = from type in found
          select new {
            ModuleName = type.Parent.Element("name").Value,
            TypeName = type.Element("name").Value
          };

After that, you can process the search result as you wish; for example, output it to the console:

C#
foreach (var type in types) {
    Console.WriteLine("Module:      {0}", type.ModuleName);
    Console.WriteLine("Type:        {0}", type.TypeName);
    Console.WriteLine();
}

Summary

That’s it. We have found types providing the obfuscated name. In the next part, I will step deeper into the LINQ queries by providing Fields and Methods name resolving solutions.

Thanks for your time, and you are welcome to post any questions or suggestions.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Lead Devart (www.devart.com)
Ukraine Ukraine
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --