.NET File Format - Signatures under the Hood, Part 2 of 2

Przemyslaw Celej

Rate me:

5.00/5 (34 votes)

28 Sep 2009CPOL34 min read

48.4K

714

Full description of signatures, that are part of .NET file format

In this article, you will see a full description about signatures that are part of the .NET file format.

Signatures (continuation)
Elements
1. CustomMod
2. TypeDefOrRefEncoded
3. Param
4. RetType
5. Type
6. ArrayShape
Conclusion
References
Revision history

1. Signatures (continuation)

Continuation of the first part.

1.1 LocalVarSig

The LocalVarSig signature is also indexed by the StandAloneSig.Signature column, it stores the type of all the local variables allocated during the running of a method. The LOCAL_SIG element is signature's prolog and has constant value 0x07, the Count element is an unsigned integer (of course compressed !) that stores the number of local variable that the associated method has, the BYREF element is an abbreviation of ELEMENT_TYPE_BYREF constant (see constants in the first part) and indicates that Type element points to the actual variable. There is also one more element that is worth mentioning, it is the Constraint element, it indicates that target type will not be moved by the Garbage Collector when performing memory reclaiming, because local variables are located on the stack (where GC does not perform any actions), the Type of the variable shall be either, a reference type (like System.Object - allocated on the heap) or value type (like System.Decimal - allocated on the stack), but when target type (pinned) is value type, its definition should include the BYREF element, in this case reference to variable is hold on the stack, but variable itself is allocated in the heap. You can see more on pinning here. In Picture 1 below, you can see the full syntax diagram for this signature.

I would like to bring your special attention to TYPEDBYREF element on the below diagram, this is the typed reference, it contains not only a managed pointer (like normal reference) to a location but also a runtime representation of data. I quote description of it from the specification:

"The typed reference local variable signature states that the local will contain both a managed pointer to a location and a runtime representation of the type that can be stored at that location. A typed reference signature is similar to a byref constraint, but while the byref specifies the type as part of the byref constraint (and hence statically as part of the type description), a typed reference provides the type information dynamically. A typed reference is a full signature in itself and cannot be combined with other constraints. In particular, it is not possible to specify a byref whose type is typed reference."

The typed reference is also very helpful when byref passing of unboxed data (i.e., data that is stored on the stack, those are always value types) to methods that are not statically restricted to the type they accept and require in addition to passing managed pointer to a location, also static type of a location, the typed reference meets these needs. Notice also that typed reference parameter can refer to a location that is on the stack, and that location will have a lifetime limited by a time of running a method (within the typed reference is allocated), thus the CIL compiler applies appropriate checks on the lifetime of byref and typed reference parameter, see more in §12.4.1.5.2 in ECMA-355 specification. The typed reference is represented in the .NET's BCL (Base Class Library) as TypedReference structure.

Picture 1: The LocalVarSig signature syntax diagram

Example 1

This example represents declaring byref value types on the stack (only), the sample code is written in the CIL language, and looks like below:

MSIL

// Full source: LocalVarSig\1.il
// Binary: LocalVarSig\1.dll
// (...)

.method public static void TestMethod()
{ 
    .locals init(int32 &IntVarByRef)
    ret
}

The LocalVarSig signature for this sample code is explored in the below table:

Offset	Value	Meaning
`0x05`	`0x04`	Signature size
`0x06`	`0x07`	Signature's prolog (`LOCAL_SIG` constant)
`0x07`	`0x01`	The total number of variables declared in this method is one
`0x08`	`0x10`	Because actual variable resides on the runtime heap, the `BYREF` element of value `0x10` is present
`0x09`	`0x08`	The variable's type (`int32`), see constants in the first part

Example 2

The sample below illustrates what happens to the signature if we use typed reference, at the beginning, we declare the IntVar variable, in the next line, we obtain a typed reference using __makeref keyword (is undocumented and not CLS compliant) and save it in the TypedByRefVar variable.

// Full source: LocalVarSig\2.cs
// Binary: LocalVarSig\2.dll
// (...)

[CLSCompliant(false)]
public void TestMethod()
{
    int IntVar = 0;
    TypedReference TypedByRefVar = __makeref(IntVar);
}

The LocalVarSig for this sample looks as below:

Offset	Value	Meaning
`0x1E`	`0x04`	Signature size
`0x1F`	`0x07`	Signature's prolog (`LOCAL_SIG` constant)
`0x20`	`0x02`	The total number of variables declared in this method is two
`0x21`	`0x08`	The first variable's type (`int32`), see constants in the first part
`0x22`	`0x16`	The second variable's type (`TYPEDBYREF`), see constants in the first part

Example 3

Now move on to a little bit more difficult example, in this sample code, we create TestDataClass class which has only one member named StringVarToBePinned of type string. In the TestMethod method (marked as unsafe), we instantiate the TestDataClass class, in the line below, we try to "pin" StringVarToBePinned member and assign reference to them to FixedVar pointer using fixed keyword. This treatment assures that between { and } braces, the dataClass.StringVarToBePinned member will not be moved by the garbage collector actions, thus FixedVar to the member will be always valid inside braces of fixed keyword. Please notice that we cannot declare the variable to be pinned, directly in the method, because such value is already pinned (is placed on the stack), therefore the variable must be wrapped with TestDataClass class (which is placed on the heap).

// Full source: LocalVarSig\3.cs
// Binary: LocalVarSig\3.dll
// compile with "/unsafe" switch
// (...)

public class TestDataClass
{
    public string StringVarToBePinned;
}

public class TestClass
{
    public unsafe void TestMethod()
    {
        TestDataClass dataClass = new TestDataClass();
        fixed (char* FixedVar = dataClass.StringVarToBePinned) { }
    }
}

This sample is a difficult one because of one more reason, at some point, it uses element that is not described yet, namely TypeDefOrRefEncoded, this element defines in which row and in which metadata table (TypeDef, TypeRef or TypeSpec) specified type is described. We will not go into further details of this elements here, if you want, you can jump directly to a description of this element by going to 2.2 TypeDefOrRefEncoded subsection in the next chapter. The LocalVarSig for the above code is explored in the below table:

Offset	Value	Meaning
`0x20`	`0x08`	Signature size
`0x21`	`0x07`	Signature's prolog (`LOCAL_SIG` constant)
`0x22`	`0x03`	The total number of variables declared in this method is three
`0x23`	`0x12`	The first variable's type (`CLASS` - followed by the `TypeDefOrRefEncoded` element), see constants in the first part
`0x24`	`0x08`	The first variable's type is described in the `TypeDef` metadata table at row `2`, which is `TestDataClass` class. This is the `TypeDefOrRefEncoded` element not explained in the current chapter.
`0x25`	`0x0F`	The second variable's type (`PTR` - followed by `Type` element), see constants in the first part
`0x26`	`0x03`	The pointer's type from the previous byte (`char` - finally this is `char*`), see constants in the first part
`0x27`	`0x45`	The third variable is pinned, see constants
`0x28`	`0x0E`	The third, pinned variable's type (`string`), see constants

1.2 CustomAttrib

As you can guess, this signature stores instances of custom attributes, but is a little different from earlier discussed signatures, the key difference is that the CustomAttrib in contrast to, for example, MethodRefSig signature, stores values of parameters supplied to a custom attribute, and does not store types of parameters. In other words, the CustomAttrib signature stores only values of parameters (fixed and named) supplied at instantiation of a custom attribute, the information about their types and number is not repeated in the signature. The signature is indexed by the CustomAttribute.Value column, the Parent column indicates in which table (TypeDef - for a type, MethodDef - for a method, and so on) and at which row, an attributed element (method, type, and so on) is described. There is also a second significant difference compared to other signatures, in the CustomAttrib signature all binary values are stored in uncompressed little-endian byte order, except the PackedLen item (discussed below) and signature size. And I repeat once again, do not confuse custom attribute with custom modifier ! The full syntax diagram consists of four parts, let us look at the first.

Picture 2a: The CustomAttrib signature syntax diagram

So far it is pretty simple, it starts from the Prolog that has constant value 0x0001 and occupies two bytes (unsigned int16 - uncompressed and little-endian). Next comes fixed arguments (FixedArg is illustrated on the Picture 2b), their number and types can be obtained by examining associated constructor's row in the MethodDef or MemberRef (when attribute's class resides in another assembly) metadata table, note that vararg method can not be used as an attribute's constructor. Next, the number of named parameters follows (NumNamed is two byte unsigned int16 - also uncompressed and little-endian), and finally named parameters themselves occur, repeated NumNamed times.

Picture 2b: The CustomAttrib signature syntax diagram

This is a little bit harder part than the previous one, but is also quite simple, the upper path on the diagram, denotes that parameter is not a single-dimensional, zero-based array (SZARRAY, see constants in the first part), the bottom path represents SZARRAY parameter, i.e., parameter is an array, the number of elements in the SZARRAY array is stored in the NumElem element of type int32 (uncompressed and little-endian) which occupies four bytes, if the SZARRAY parameter is null, then the NumNamed is set to 0xFFFFFFFF value. The CLI completely disallows using other than one-dimensional arrays with a lower bound of zero (SZARRAY), single-dimensional zero-based array of type int32, is int32[] but not int32[,,] and also not int32[3...8]. If you want to know more about arrays in .NET, read the Array Types in .NET article from MSDN Magazine.

Picture 2c: The CustomAttrib signature syntax diagram

This part is probably the most weird of all four, the format Elem takes varies depending on the following conditions (quoted from the specification).

If the parameter kind is simple (first line in the above diagram) (bool, char, float32, float64, int8, int16, int32, int64, unsigned int8, unsigned int16, unsigned int32 or unsigned int64) then the 'blob' contains its binary value (Val). (A bool is a single byte with value 0 (false) or 1 (true); char is a two-byte Unicode character; and the others have their obvious meaning.) This pattern is also used if the parameter kind is an enum -- simply store the value of the enum's underlying integer type.

If the parameter kind is string, (middle line in above diagram) then the blob contains a SerString - a PackedLen count of bytes (compressed and big-endian - added by the author), followed by the UTF8 characters. If the string is null, its PackedLen has the value 0xFF (with no following characters). If the string is empty (""), then PackedLen has the value 0x00 (with no following characters).

If the parameter kind is System.Type (see typeof keyword - added by the author of the article), (also, the middle line in above diagram), its value is stored as a SerString (as defined in the previous paragraph), representing its canonical name. The canonical name x by the assembly where it is defined, its version, culture and public-key-token. If the assembly name is omitted, the CLI looks first in the current assembly, and then in the system library (mscorlib); in these two special cases, it is permitted to omit the assembly-name, version, culture and public-key-token.

If the parameter kind is System.Object, (third line in the above diagram) the value stored represents the "boxed" instance of that value-type. In this case, the blob contains the actual type's FieldOrPropType (see below), followed by the argument's unboxed value. [Note: It is not possible to pass a value of null in this case. end note]

Picture 2d: The CustomAttrib signature syntax diagram

The last part illustrates format of the NamedArg element that represents a named argument (either, a field or a property). Because fields and properties can have the same name, the first element is either FIELD of constant one-byte value 0x53 when named parameter refers to a field or PROPERTY of constant one-byte value 0x54 when named parameter refers to a property. Next comes FieldOrPropType element which describes the type of the named property or field in one or two bytes, if the type of the named parameter is an unboxed simple value type (defined above), then the FieldOrPropType shall contain exactly one associated type's constant value (BOOLEAN, CHAR, I1, U1, I2, U2, I4, U4, I8, U8, R4, R8, STRING - see constants table in the first part), but if the type of the named parameter is boxed simple value type, then FieldOrPropType element is preceded by a byte containing value 0x51, in this case the FieldOrPropType is two-byte long. The FieldOrPropName element is SerString (explained above) containing the name of a property or a field. Finally comes a single FixedArg element shown earlier. So, as you can see, the NamedArg element is the normal FixedArg preceded with some additional information, that identify which field or property it represents. I hope that I did not scare you, as you will see soon, the signature is not as complicated as it looks.

Example 1

This example mainly shows the format of the SerString element and how the CustomAttrib distinguishes between fields and properties that act as named parameters. In the example below, we have the TestAttribute attribute that needs supplying one fixed parameter Fixed1 of type int32, additionally, we may (and we do) supply two additional, named parameters of type int16 and string, as shown in the below code listing:

// Full source: CustomAttrib\1.cs
// Binary: CustomAttrib\1.dll
// (...)

[AttributeUsage(AttributeTargets.Class)]
public class TestAttribute : Attribute
{
    public TestAttribute(int Fixed1) { }

    public short Named1 { get; set; }

    public string Named2;
}

[Test(1, Named1 = 1, Named2 = "Abcd")]
public class TestClass { }

The full CustomAttrib signature for this case is 33-bytes long, so at some points, we have merged several bytes into one row, with single description.

Offset	Value	Meaning
`0x3E`	`0x21`	Signature size, stored as a compressed integer, in big-endian byte order
`0x3F` `0x40`	`0x01` `0x00`	Prolog stored as an uncompressed and little-endian `unsigned int16` of value `0x0001`
`0x41` `0x42` `0x43` `0x44`	`0x01` `0x00` `0x00` `0x00`	The value of the first fixed argument of the attribute (`Fixed1`), the value is `0x00000001` and is stored as an uncompressed, little-endian `int32`. This is represented by the upper line in the Picture 2b and the first path in the Picture 2c.
`0x45` `0x46`	`0x02` `0x00`	The number of the named parameters supplied to the attribute, represented by the `NumNamed` element on the Picture 2a and stored as an `unsigned int16`, little-endian. We supplied exactly two optional parameters, and of course value of this two-byte element is `0x0002`.
`0x47`	`0x54`	The value of this byte indicates that target named parameter is represented by a property (see constants in the first part), this is element `PROPOERTY` on the Picture 2d.
`0x48`	`0x06`	The type of the target property (`int16`, see constants in the first part). This byte is represented by the `FieldOrPropType` element on the Picture 2d.
`0x49` `0x4A` `0x4B` `0x4C` `0x4D` `0x4E` `0x4F`	`0x06` `0x4E` `0x61` `0x6D` `0x65` `0x64` `0x31`	This is the `SerString` string which specifies the name of the target property (represented by the `FieldOrPropName` element on the Picture 2d). The `SerString` is a normal unicode string preceded with its size in bytes, the size is stored as a compressed integer, using big-endian byte order. So we have 6-byte long string (offset `0x49`), because string name does not contain any characters beyond ASCII table, each one character occupies exactly one byte, we can easily read string text, it is `Named1`.
`0x50` `0x51`	`0x01` `0x00`	The value of the first named argument of the attribute (`Named1`), the value is `0x00001` and is stored as an uncompressed, little-endian `int16`. This is represented by the upper line in the Picture 2b and the first path in the Picture 2c.
`0x52`	`0x53`	The value of this byte indicates that target named parameter is represented by a field (see constants in the first part), this is element `FIELD` on the Picture 2d.
`0x53`	`0x0E`	The type of the target field (`string`, see constants in the first part). This byte is represented by the `FieldOrPropType` element on the Picture 2d.
`0x54` `0x55` `0x56` `0x57` `0x58` `0x59` `0x5A`	`0x06` `0x4E` `0x61` `0x6D` `0x65` `0x64` `0x32`	This is again the `SerString` string which specifies the name of the target property (represented by the `FieldOrPropName` element on the Picture 2d). The length of this string is 6-byte (look at offset `0x54`), rest of the bytes are very similar to the previous string, it only differs the last byte, the string text is `Named2`, see ASCII table
`0x5B` `0x5C` `0x5D` `0x5E` `0x5F`	`0x04` `0x41` `0x62` `0x63` `0x64`	The value of the second named argument of the attribute (`Named2`), the value is `Abcd` (see ASCII table) and is stored as a `SerString`. This is represented by the upper line in the Picture 2b and the middle path in the Picture 2c. Because `0x5F - 0x3E = 0x21`, i.e. last offset - first offset = signature size, the signature ends here.

Example 2

In this example, we will demonstrate signature format, when using System.Type, SZARRAY, and boxed value types as arguments of the TestAttribute attribute defined below:

// Full source: CustomAttrib\2.cs
// Binary: CustomAttrib\2.dll
// (...)

[AttributeUsage(AttributeTargets.Class)]
public class TestAttribute : Attribute
{
    public TestAttribute(object Param1, int[] Param2, Type Param3) { }
}

[Test(1, new int[] {1, 2, 3}, typeof(string))]
public class TestClass { }

As in the previous sample, signature is very long (it has 116 bytes !), and I split it up into smaller parts.

Offset	Value	Meaning
`0x2B`	`0x74`	Signature size, stored as a compressed integer, in big-endian byte order
`0x2C` `0x2D`	`0x01` `0x00`	Prolog stored as an uncompressed and little-endian `unsigned int16` of value `0x0001`
`0x2E`	`0x08`	The type of the first fixed argument (`int32` - boxed inside `System.Object`), this case is represented by the third path on the Picture 2c, where a value is immediately preceded by the type of a value
`0x2F` `0x30` `0x31` `0x32`	`0x01` `0x00` `0x00` `0x00`	The value which type was specified in the previous byte, because the type of the value is `int32` it occupies exactly 4 bytes. It is stored in little-endian byte order, so the value is `0x00000001`.
`0x33` `0x34` `0x35` `0x36`	`0x03` `0x00` `0x00` `0x00`	Next comes second parameter's definition, because the second argument is single dimensional and zero-based array (`SZARRAY`), this four bytes specifies the number of elements supplied to the array of the second parameter, this value is stored as an `unsigned int32` in little-endian byte order.
`0x37` `0x38` `0x39` `0x3A`	`0x01` `0x00` `0x00` `0x00`	The value of the first element of the array in the second parameter, it is four-byte long because the type of array is `int32`, the value is `0x00000001`.
`0x3B` `0x3C` `0x3D` `0x3E`	`0x02` `0x00` `0x00` `0x00`	The value of the second element of the array in the second parameter, it is four-byte long because the type of array is `int32`, the value is `0x00000002`.
`0x3F` `0x40` `0x41` `0x42`	`0x03` `0x00` `0x00` `0x00`	The value of the third element of the array in the second parameter, it is four-byte long because the type of array is `int32`, the value is `0x00000003`.
`0x43` `0x44` `0x45` `0x46` `0x47` `0x48` `0x49` `0x4A` `0x4B` `0x4C` `0x4D` `0x4E` `0x4F` `0x50` `0x51` `0x52` `0x53` `0x54` `0x55` `0x56` `0x57` `0x58` `0x59` `0x5A` `0x5B` `0x5C` `0x5D` `0x5E` `0x5F` `0x60` `0x61` `0x62` `0x63` `0x64` `0x65` `0x66` `0x67` `0x68` `0x69` `0x6A` `0x6B` `0x6C` `0x6D` `0x6E` `0x6F` `0x70` `0x71` `0x72` `0x73` `0x74` `0x75` `0x76` `0x77` `0x78` `0x79` `0x7A` `0x7B` `0x7C` `0x7D` `0x7E` `0x7F` `0x80` `0x81` `0x82` `0x83` `0x84` `0x85` `0x86` `0x87` `0x88` `0x89` `0x8A` `0x8B` `0x8C` `0x8D` `0x8E` `0x8F` `0x90` `0x91` `0x92` `0x93` `0x94` `0x95` `0x96` `0x97` `0x98` `0x99` `0x9A` `0x9B` `0x9C` `0x9D`	`0x5A` `0x53` `0x79` `0x73` `0x74` `0x65` `0x6D` `0x2E` `0x53` `0x74` `0x72` `0x69` `0x6E` `0x67` `0x2C` `0x20` `0x6D` `0x73` `0x63` `0x6F` `0x72` `0x6C` `0x69` `0x62` `0x2C` `0x20` `0x56` `0x65` `0x72` `0x73` `0x69` `0x6F` `0x6E` `0x3D` `0x32` `0x2E` `0x30` `0x2E` `0x30` `0x2E` `0x30` `0x2C` `0x20` `0x43` `0x75` `0x6C` `0x74` `0x75` `0x72` `0x65` `0x3D` `0x6E` `0x65` `0x75` `0x74` `0x72` `0x61` `0x6C` `0x2C` `0x20` `0x50` `0x75` `0x62` `0x6C` `0x69` `0x63` `0x4B` `0x65` `0x79` `0x54` `0x6F` `0x6B` `0x65` `0x6E` `0x3D` `0x62` `0x37` `0x37` `0x61` `0x35` `0x63` `0x35` `0x36` `0x31` `0x39` `0x33` `0x34` `0x65` `0x30` `0x38` `0x39`	This 90-bytes long `SerString` describes the canonical name of the type that is supplied to the third parameter, it has the following value `System.String, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089`. This is represented by the middle path on the Picture 2c.
`0x9E` `0x9F`	`0x00` `0x00`	Two ending bytes, that are not part of the previous `SerString` (`0x9F - 0x44 != 0x5A`), but they form part of the entire `CustomAttrib` (`0x9F - 0x2C = 0x74`) and does not contain any data, I think that canonical name has some alignment and that is why these zeros are present, unfortunately specification does not say anything about it.

1.3 MethodSpec

The MethodSpec signature is straightforward, it describes each instantiation of a generic method, is indexed by the MethodSpec.Signature column, and its syntax is as follows, it begins with GENRICINST (do you see missing "E" ?) prolog of one-byte value 0x0A (this constant has different value than ELEMENT_TYPE_GENERICINST defined in the constants table in the first part), where Type is repeated GenArgCount.

MSIL

MethodSpecBlob ::=
   GENRICINST GenArgCount Type Type*

Example 1

In the sample below, we instantiate the TestMethod generic method, supplying three generic arguments.

// Full source: MethodSpec\1.cs
// Binary: MethodSpec\1.dll
// (...)

public class TestClass
{
    public void TestMethod<GenArg1, GenArg2, GenArg3>() { }
}

public class TestRunClass
{
    public void TestRunMethod()
    {
        new TestClass().TestMethod<short, int, string>();
    }
}

The MethodSpec for this case looks as follows:

Offset	Value	Meaning
`0x18`	`0x05`	Signature size
`0x19`	`0x0A`	Prolog
`0x1A`	`0x03`	The number of generic arguments supplied to the generic method
`0x1B`	`0x06`	The first parameter's type (`int16`), see constants in the first part
`0x1C`	`0x08`	The second parameter's type (`int32`), see constants in the first part
`0x1D`	`0x0E`	The third parameter's type (`string`), see constants in the first part

1.4 TypeSpec

The TypeSpec signature is indexed by the TypeSpec.Signature column, and is used when: instantiating type as a multi-dimensional array, instantiating type as a single-dimensional array preceded with custom modifier(s), instantiating generic type and other actions, as shown on the below diagram. Because some elements are not explained yet (such as custom modifiers, array shapes), we use only limited functionality of the TypeSpec signature, in the next chapter, we will focus on the CustomMod, ArrayShape, TypeDefOrRefEncoded elements, and we will back to the TypeSpec signature and use rest of the capabilities of the signature. Also notice that in contrast to previous example, where GENRICINST (missing "E") constant/prolog is also used, in the TypeSpec the ELEMENT_TYPE_GENERICINST constant is used, which is defined in the general constants table (in the first part of the article).

TypeSpecBlob ::=
  PTR      CustomMod*  VOID
| PTR      CustomMod*  Type
| FNPTR    MethodDefSig
| FNPTR    MethodRefSig
| ARRAY    Type  ArrayShape
| SZARRAY  CustomMod*  Type
| GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type Type*

Example 1

In this example, we instantiate the TypeSpec generic type, as shown in the below code listing:

// Full source: TypeSpec\1.cs
// Binary: TypeSpec\1.dll
// (...)

public class TestClass<GenArg1, GenArg2> { }

public class TestRunClass
{
    public void TestRunMethod()
    {
        TestClass<int, string> TestVar = new TestClass<int, string>();
    }
}

The TypeSpec for this case looks as follows:

Offset	Value	Meaning
`0x13`	`0x06`	Signature size
`0x14`	`0x15`	The `ELEMENT_TYPE_GENERICINST` constant, see constants table in the first part
`0x15`	`0x12`	The type of the generic type (`CLASS`), see constants table in the first part
`0x16`	`0x08`	The instantiated generic type is described in the `TypeDef` metadata table at row `2`. This is the `TypeDefOrRefEncoded` element not explained in the current chapter.
`0x17`	`0x02`	The number of generic arguments supplied to the type is two.
`0x18`	`0x08`	The first generic parameter's type (`int32`), see constants in the first part
`0x19`	`0x0E`	The second generic parameter's type (`string`), see constants in the first part

1.5 MarshalSpec

The MarshalSpec signature is generated when using MarshalAs attribute on fields, parameters and return parameters. It specifies how data should be marshalled when calling from/to unmanaged code via the Platform Invoke. The signature is indexed in the FieldMarshal.NativeType column, the name of the metadata table is slightly misleading, in fact, it does not matter whether the MarshalSpec describes either field, parameter or return parameter, it is always indexed by the previously mentioned column. The ParamNum and NumElem elements on the below syntax listing describe respectively, the parameter in the method call that provides the number of elements in the array, the number of elements or additional elements, both elements are stored in the signature as compressed integers, their aim is to help compute the total size in bytes that an array occupies in the memory. The Microsoft-specific implementation of the marshalling descriptor is richer than that described here, and make use of additional constants and extended syntax, if you want to know more about Microsoft implementation of the MarshalSpec, go to the Partition II metadata specification - section §23.4.

MarshalSpec ::=
  NativeIntrinsic
| ARRAY ArrayElemType
| ARRAY ArrayElemType ParamNum
| ARRAY ArrayElemType ParamNum NumElem

ArrayElemType ::=
   NativeIntrinsic 

NativeIntrinsic ::=
  BOOLEAN | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8
| LPSTR | LPSTR | INT | UINT | FUNC

To compute the size in bytes of an array, the following pseudo-code is used, where the @ParamNum stands for the value passed in for parameter number ParamNum.

if ParamNum = 0
   SizeInBytes = NumElem * sizeof (elem)
else
   SizeInBytes = ( @ParamNum +  NumElem ) * sizeof (elem)
endif

Constants table for this signature is as on the below table, in the above syntax descriptors and examples in this subsection, instead of full names of constants, abbrevations are used.

Name	Value
`NATIVE_TYPE_BOOLEAN`	`0x02`
`NATIVE_TYPE_I1`	`0x03`
`NATIVE_TYPE_U1`	`0x04`
`NATIVE_TYPE_I2`	`0x05`
`NATIVE_TYPE_U2`	`0x06`
`NATIVE_TYPE_I4`	`0x07`
`NATIVE_TYPE_U4`	`0x08`
`NATIVE_TYPE_I8`	`0x09`
`NATIVE_TYPE_U8`	`0x0A`
`NATIVE_TYPE_R4`	`0x0B`
`NATIVE_TYPE_R8`	`0x0C`
`NATIVE_TYPE_LPSTR`	`0x14`
`NATIVE_TYPE_LPWSTR`	`0x15`
`NATIVE_TYPE_INT`	`0x1F`
`NATIVE_TYPE_UINT`	`0x20`
`NATIVE_TYPE_FUNC`	`0x26`
`NATIVE_TYPE_ARRAY`	`0x2A`
`NATIVE_TYPE_MAX`	`0x50`

Example 1

Let us start with the simplest possible example shown in the below code listing:

// Full source: MarshalSpec\1.cs
// Binary: MarshalSpec\1.dll
// (...)

[MarshalAs(UnmanagedType.LPWStr)]
public string TestField;

This code has generated the following MarshalSpec signature:

Offset	Value	Meaning
`0x1C`	`0x01`	Signature size
`0x1D`	`0x15`	The `TestField` field is marshalled to the `LPWSTR` in the unmanaged code.

Example 2

Now it is time for a more sophisticated example, we will marshal the array of int32 type to LPArray (a pointer to the first element of a C-style array), because such array type does not provide information about rank and bounds of the associated array data, we have to specify which parameter of the method is responsible for providing information about how much elements the array has, this is done by the specifying SizeParamIndex optional parameter, in addition to it, we also set the SizeConst optional parameter, which specifies that Param1 array contains 10 more elements in addition to that specified by the ArraySize argument. Please notice that there is also the SafeArray array type, which is a self-describing array that carries the type, rank, and boundaries of the associated data, and does not require setting any optional parameters in the MarshalAsAttribute, but it is Microsoft-specific, and thus is not described here.

// Full source: MarshalSpec\2.cs
// Binary: MarshalSpec\2.dll
// (...)

 public void TestMethod(
    [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 2, SizeConst = 10)] int[] Param1,
    int ArraySize)
{
    // nop
}

The following MarshalSpec signature should be generated by the above code.

Offset	Value	Meaning
`0x1B`	`0x05`	Signature size
`0x1C`	`0x2A`	Type of marshalling parameter (`ARRAY`), see constants table for marshalling descriptor
`0x1D`	`0x50`	The `MAX` constant (see constants table for marshalling descriptor) indicates that this array does not provide information about element's type of the array.
`0x1E`	`0x02`	The `ParamNum` parameter stored as compressed integer
`0x1F`	`0x0A`	The `NumElem` parameter stored as compressed integer
`0x20`	`0x01`	The `ElemMult` parameter stored as compressed integer, this is strange parameter, the whole specification mentions about it only two times saying that, if marshalled type is `ARRAY` the `ElemMult` must be set to `0x01` but does not specify its meaning and its location in the `MarshalSpec` signature (see section §22.17 in the Partition II metadata specification).

2. Elements

We have discussed all signatures, but it is not the end, signatures consist of smaller parts named "elements" (I call it this way), they were separated because, they form a part of more than one signature and thus, there is no need to repeat explanation for particular element(s) in each signature. In this chapter, we will take a closer look at them.

2.1 CustomMod

This element has frequently repeated in the discussed signatures, and that is why we are starting from it. The custom modifiers are similar to the custom attributes, but in contrast to them, the custom modifiers are part of a signature. Custom modifiers are defined in the CIL using modreq (required modifier) and modopt (optional modifier) keywords in a method declaration, both need supplying a type (class or structure) as their "argument". Two signatures that differ only by the addition of a custom modifier (required or optional) shall not be considered to match, and, as the specification says:

The distinction between required and optional modifiers is important to tools other than the CLI that deal with the metadata, typically compilers and program analysers. A required modifier indicates that there is a special semantics to the modified item that should not be ignored, while an optional modifier can simply be ignored. For example, the const qualifier in the C programming language can be modelled with an optional modifier since the caller of a method that has a const-qualified parameter need not treat it in any special way. On the other hand, a parameter that shall be copy-constructed in C++ shall be marked with a required custom attribute since it is the caller who makes the copy.

Unfortunately, C# has some problems with handling parameters that have custom modifiers attached, you can read about it in the Modopt, method signatures, and incomplete specs oh my! and More on modopt articles on CodeBetter.com.

The CMOD_OPT and CMOD_REQD are just constants defined in the constants table in the first part, the TypeDefEncoded and TypeRefEncoded elements are in fact single TypeDefOrRefEncoded element, thoroughly discussed in the next subsection. Note that there can be zero, one or more the CustomMods attached to a field, property, parameter or return parameter. As far as I know, there is no way to define custom modifier using C#, of course excluding System.Reflection.Emit. In the System.Runtime.CompilerServices namespace, you can find several indicators (I call it this way) that can be applied to a custom modifier, for instance CallConvCdecl, IsConst, IsLong.

Picture 3: The CustomMod element syntax diagram

Example 1

In the example below, we have annotated the TestField field with the modreq modifier, hence the CustomMod lies within the FieldSig signature, depicted at the very beginning of the article, at the Picture 2. The IsLong indicator, distinguishes a long from an integer in C++, but, in fact, in our case, there is no special semantics behind this custom modifier, we want just demonstrate CustomMod element's format in the signature. The value of the TypeDefOrRefEncoded element is shown twice, in two numeral systems - hexadecimal (<sub>16</sub> subscript) and binary (<sub>2</sub> subscript), in the next subsection, you will see why.

MSIL

// Full source: CustomMod\1.il
// Binary: CustomMod\1.dll
// (...)

.field public int64 modreq([mscorlib]System.CompilerServices.IsLong) TestField

The table below presents whole FieldSig signature indexed by the Field.Signature column, along with embedded custom modifier generated by the modreq keyword.

Offset	Value	Meaning
`0x01`	`0x04`	Signature size
`0x02`	`0x06`	`FieldSig`'s prolog
`0x03`	`0x1F`	Encountered custom, required modifier (`modreq`), see constants in the first part
`0x04`	`0x05`₁₆ `00000101`₂	The `TypeDefOrRefEncoded` element, in this case it points to first row of the `TypeRef` table, that is `IsLong` class. This element is described in the next subsection.
`0x05`	`0x0A`	The type of the field (`int64`), see constants in the first part

2.2 TypeDefOrRefEncoded

Now we will try to demystify the most mysterious elements at this moment, fortunately, namely the TypeDefOrRefEncoded, it is not so complicated as it may seem. This element determines in which metadata table and at which table's row referenced type's information resides. The first two, least significant bits encode metadata table, 0 for TypeDef (referenced type resides in the current assembly), 1 for TypeRef (referenced type resides in a separate assembly) and 2 for TypeSpec (referenced type is generic type, array, etc. see chapter 4.9 TypeSpec), the rest bits encode the row's index, note that indexes are one-based, in other words, first row in every metadata table is always 1, not 0.

Example 1

In this example, we have declared the single field with the custom, required modifier attached to it, the modreq accepts as the argument the TestClass type declared in the same assembly, as shown below:

MSIL

// Full source: TypeDefOrRefEncoded\1.il
// Binary: TypeDefOrRefEncoded\1.dll
// (...)

.class public TestClass extends [mscorlib]System.Object { }

.field public int64 modreq(TestClass) TestField

The FieldSig for the above sample code is as follows:

Offset	Value	Meaning
`0x01`	`0x04`	Signature size
`0x02`	`0x06`	`FieldSig`'s prolog
`0x03`	`0x1F`	Encountered custom, required modifier (`modreq`), see constants in the first part
`0x04`	`0x08`₁₆ `00001000`₂	The `TypeDefOrRefEncoded` element, this time, it points to the second row of the `TypeDef` table, the first two, least significant bits stand for type of table (`00`₂ - `TypeDef`), bits from 3 to 8 denotes number of row in the table (`000010`₂ - `2`), that is `TestClass`. Now, compare this, with the `TypeDefOrRefEncoded` element from previous subsection.
`0x05`	`0x0A`	The type of the field (`int64`), see constants in the first part

2.3 Param

This element describes a single parameter supplied to a method or a property, and therefore is part of PropertySig, MethodDefSig, MethodRefSig, etc. This is the syntax diagram for the Param element:

Picture 4: The Param element syntax diagram

Example 1

In the TestMethod method illustrated below, there are two custom modifiers attached to the single parameter, the aim of this example is to demonstrate the Param element's format, and once again show how the TypeDefOrRefEncoded element works.

MSIL

// Full source: Param\1.il
// Binary: Param\1.dll
// (...)

.class public TestClass extends [mscorlib]System.Object { }

.method public static void TestMethod(int32 modopt(TestClass) 
        modreq([mscorlib]System.Runtime.CompilerServices.IsLong) Param1) 
{
    ret
}

The associated MethodDefSig signature for this method is:

Offset	Value	Meaning
`0x01`	`0x08`	Signature size
`0x02`	`0x00`	Method is `static`
`0x03`	`0x01`	The number of parameters
`0x04`	`0x01`	The type of the returned value (`void`), see constants in the first part
`0x05`	`0x1F`	Encountered custom, required modifier (`modreq`), see constants in the first part
`0x06`	`0x09`₁₆ `00001001`₂	Referenced row is `2` in `TypeRef` metadata table, that is `IsLong` type
`0x07`	`0x20`	Encountered custom, optional modifier (`modopt`), see constants in the first part
`0x08`	`0x08`₁₆ `00001000`₂	Referenced row is `2` in `TypeDef` metadata table, that is `TestClass` type
`0x09`	`0x08`	First parameter's type (`int32`), see constants in the first part

2.4 RetType

This element is almost identical to the Param element, it has one more extra path that can include VOID type. Because the below syntax diagram for this element is self-explanatory, there are no examples provided for this subsection.

Picture 5: The RetType element syntax diagram

2.5 Type

Is it not surprising that the Type element describes... a type, and not only primitive type (such as int32, bool, string, etc.) but also arrays, generic instance types and complex types (classes and structures). The below listing presents syntax diagram for this element, of course, words written using upper case are constants whose values can be found in the constants table in the first part. You may wonder that the constant GENERICINST is part of this element, but remember that the TypeSpec, MethodSpec and MethodDefSig signatures have different aims !

Type ::=	  
BOOLEAN | CHAR | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8 | I | U |
| ARRAY Type ArrayShape
| CLASS TypeDefOrRefEncoded
| FNPTR MethodDefSig
| FNPTR MethodRefSig
| GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type *
| MVAR number
| OBJECT
| PTR CustomMod* Type
| PTR CustomMod* VOID
| STRING
| SZARRAY CustomMod* Type
| VALUETYPE TypeDefOrRefEncoded
| VAR number

Example 1

Let us see what happens to the MethodDefSig signature when method accepts generic types as normal parameters.

// Full source: Type\1.cs
// Binary: Type\1.dll
// (...)

public class TestClass<GenArg1, GenArg2> { }

public class TestRunClass
{
    public void TestRunMethod()
    {
        TestMethod(new TestClass<int, string>());
    }

    public void TestMethod(TestClass<int, string> Param1) { }
}

Dissecting the MethodDefSig signature for the TestMethod method.

Offset	Value	Meaning
`0x0E`	`0x09`	Signature size
`0x0F`	`0x20`	The method is instance method
`0x10`	`0x01`	The number of normal parameters
`0x11`	`0x01`	The type of the returned value (`void`), see constants in the first part
`0x12`	`0x15`	The first parameter's type is generic type (`GENERICINST`), see constants in the first part
`0x13`	`0x12`	The first parameter's type is generic class (`CLASS`), see constants in the first part

2.6 ArrayShape

I think that a lot people who use .NET platform know that array can have more than one dimension but do not know that each dimension in an array can have lower bound, that is probably because most of developers use C# language which does not allow using lower bounds, except using Array.CreateInstance method to create such array type. The ArrayShape element holds full definition of a multi-dimensional array, it stores number of dimensions, size and lower boundary of each dimension that array has. The syntax diagram along with brief description copied from the specification for this element is depicted below:

Picture 6: The ArrayShape element syntax diagram

Rank is an integer (stored in compressed form, see §23.2) that specifies the number of dimensions in the array (shall be 1 or more). NumSizes is a compressed integer that says how many dimensions have specified sizes (it shall be 0 or more). Size is a compressed integer specifying the size of that dimension - the sequence starts at the first dimension, and goes on for a total of NumSizes items. Similarly, NumLoBounds is a compressed integer that says how many dimensions have specified lower bounds (it shall be 0 or more). And LoBound is a compressed integer specifying the lower bound of that dimension - the sequence starts at the first dimension, and goes on for a total of NumLoBounds items. None of the dimensions in these two sequences can be skipped, but the number of specified dimensions can be less than Rank.

NOTE: Please do not confuse multi-dimensional arrays with jagged arrays, multi-dimensional array in CIL can be for example: int32[,] and jagged array is int32[][]. Also note that ArrayShape stores information only about multi-dimensional arrays ! Single dimensional array is denoted as SZARRAY constant - nothing more ( see Type element). To learn more about arrays in .NET see Array Types in .NET article in the MSDN Magazine.

IMPORTANT: Unfortunately, as we will see in second example, the ILASM compiler has some problems with handling lower boundaries of arrays (the LoBound field on the Picture 6), the lower boundary is multiplied by two! Surely, this is not correct, since the specification says that lower boundaries shall be stored in signatures without making any change. Below, you can see a table copied from the specification that shows sample arrays declarations and its correct parameters in the ArrayShape element. Moreover, the specification does not specify in which case(s) the NumSizes and the NumLoBounds fields may be less than the Rank field, from my observation the NumSizes and the NumLoBounds fields are less than Rank only in one case - when lower boundary is not specified for all dimensions (this is represented in the second row in the below table), otherwise the NumSizes and the NumLoBounds are always equal to the Rank, this is in contradiction with the third and fifth case in the below table:

Declaration	Type	Rank	NumSizes	Size	NumLoBounds	LoBound
`[0...2]`	`I4`	`1`	`1`	`3`	`0`	`-`
`[,,,,,,]`	`I4`	`7`	`0`	`-`	`0`	`-`
`[0...3, 0...2,,,,]`	`I4`	`6`	`2`	`4 3`	`2`	`0 0`
`[1...2, 6...8]`	`I4`	`2`	`2`	`2 3`	`2`	`1 6`
`[5, 3...5, , ]`	`I4`	`4`	`2`	`5 3`	`2`	`0 3`

Example 1

Let us see how the ArrayShape works in action.

MSIL

// Full source: ArrayShape\1.il
// Binary: ArrayShape\1.dll
// (...)

.field public int32[,,] TestField

The following FieldSig signature should be generated by the above multi-dimensional array.

Offset	Value	Meaning
`0x01`	`0x06`	Signature size
`0x02`	`0x06`	`FieldSig`'s prolog
`0x03`	`0x14`	Field's type value is `ARRAY`, see constants in the first part
`0x04`	`0x08`	Array's type is `int32`, see constants in the first part
`0x05`	`0x03`	The number of the array's dimensions (`Rank` field on the Picture 6)
`0x06`	`0x00`	Size of array's dimensions not specified (`NumSizes` field on the Picture 6)
`0x07`	`0x00`	Lower bounds of array's dimensions not specified (`NumLoBounds` field on the Picture 6)

Example 2

This example is aimed to show you how the ArrayShape element behaves when declaring multi-dimensional arrays with lower boundaries specified.

MSIL

// Full source: ArrayShape\2.il
// Binary: ArrayShape\2.dll
// (...)

.field public int32[0...5,,4...6] TestField

The whole FieldSig signature looks like:

Offset	Value	Meaning
`0x01`	`0x0C`	Signature size
`0x02`	`0x06`	`FieldSig`'s prolog
`0x03`	`0x14`	Field's type value is `ARRAY`, see constants in the first part
`0x04`	`0x08`	Array's type is `int32`, see constants in the first part
`0x05`	`0x03`	The number of the array's dimensions (`Rank` field on the Picture 6)
`0x06`	`0x03`	The number of sizes for this array (`NumSizes` field on the Picture 6)
`0x07`	`0x06`	The size of the first dimension of the array (`Size` field on the Picture 6)
`0x08`	`0x00`	The size of the second dimension of the array, zero means - not specified (`Size` field on the Picture 6)
`0x09`	`0x03`	The size of the third dimension of the array (`Size` field on the Picture 6)
`0x0A`	`0x03`	The number of the lower bounds for this array (`NumLoBounds` field on the Picture 6)
`0x0B`	`0x00`	The lower boundary of the first dimension of the array (`LoBound` field on the Picture 6)
`0x0C`	`0x00`	The lower boundary of the second dimension of the array (`LoBound` field on the Picture 6)
`0x0D`	`0x08`	The lower boundary of the third dimension of the array (`LoBound` field on the Picture 6). The boundary is multiplied by two, see important note at the beginning of current subsection

Example 3

Now let us look how the ArrayShape element looks in reality and compare results to the specification.

MSIL

// Full source: ArrayShape\3.il
// Binary: ArrayShape\3.dll
// (...)

.field public int32[0...2] TestField

Yes, the NumLoBounds is equal to the Rank, despite that specification says that NumLoBounds shall be equal to zero.

Offset	Value	Meaning
`0x01`	`0x08`	Signature size
`0x02`	`0x06`	`FieldSig`'s prolog
`0x03`	`0x14`	Field's type value is `ARRAY`, see constants in the first part
`0x04`	`0x08`	Array's type is `int32`, see constants in the first part
`0x05`	`0x01`	The number of the array's dimensions (`Rank` field on the Picture 6)
`0x06`	`0x01`	The number of sizes for this array (`NumSizes` field on the Picture 6)
`0x07`	`0x03`	The size of the first dimension of the array (`Size` field on the Picture 6)
`0x08`	`0x01`	The number of the lower bounds for this array (`NumLoBounds` field on the Picture 6)
`0x09`	`0x00`	The lower boundary of the first dimension of the array (`LoBound` field on the Picture 6)

3. Conclusion

As you see, signatures are complicated monstrosity, but makes .NET executable small, compact and consistent. If you have any questions, hints or requests, do not hesitate, just add comment below, constructive comments are always welcome.

4. References

5. Revision History

1.0: 26^th September 2009: Initial release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Przemyslaw Celej

Software Developer

Poland

Przemek was born in 1988, he lives in small town near Warsaw in Poland, Europe. Currently he codes some C# stuff and J2EE as well, ocasionally he uses C++ for fun. Przemek is cycling fun, if weather permits he rides a bike.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

.NET File Format - Signatures under the Hood, Part 2 of 2

Contents

1. Signatures (continuation)

1.1 LocalVarSig

Example 1

Example 2

Example 3

1.2 CustomAttrib

Example 1

Example 2

1.3 MethodSpec

Example 1

1.4 TypeSpec

Example 1

1.5 MarshalSpec

Example 1

Example 2

2. Elements

2.1 CustomMod

Example 1

2.2 TypeDefOrRefEncoded

Example 1

2.3 Param

Example 1

2.4 RetType

2.5 Type

Example 1

2.6 ArrayShape

Example 1

Example 3

3. Conclusion

4. References

5. Revision History

License

Comments and Discussions