Click here to Skip to main content
15,879,184 members
Articles / Programming Languages / C#

.NET File Format - Signatures under the Hood, Part 2 of 2

Rate me:
Please Sign up or sign in to vote.
5.00/5 (34 votes)
28 Sep 2009CPOL34 min read 48.4K   714   68   5
Full description of signatures, that are part of .NET file format
In this article, you will see a full description about signatures that are part of the .NET file format.

Contents

  1. Signatures (continuation)
    1. LocalVarSig
    2. CustomAttrib
    3. MethodSpec
    4. TypeSpec
    5. MarshalSpec
  2. Elements
    1. CustomMod
    2. TypeDefOrRefEncoded
    3. Param
    4. RetType
    5. Type
    6. ArrayShape
  3. Conclusion
  4. References
  5. Revision history

1. Signatures (continuation)

Continuation of the first part.

1.1 LocalVarSig

The LocalVarSig signature is also indexed by the StandAloneSig.Signature column, it stores the type of all the local variables allocated during the running of a method. The LOCAL_SIG element is signature's prolog and has constant value 0x07, the Count element is an unsigned integer (of course compressed !) that stores the number of local variable that the associated method has, the BYREF element is an abbreviation of ELEMENT_TYPE_BYREF constant (see constants in the first part) and indicates that Type element points to the actual variable. There is also one more element that is worth mentioning, it is the Constraint element, it indicates that target type will not be moved by the Garbage Collector when performing memory reclaiming, because local variables are located on the stack (where GC does not perform any actions), the Type of the variable shall be either, a reference type (like System.Object - allocated on the heap) or value type (like System.Decimal - allocated on the stack), but when target type (pinned) is value type, its definition should include the BYREF element, in this case reference to variable is hold on the stack, but variable itself is allocated in the heap. You can see more on pinning here. In Picture 1 below, you can see the full syntax diagram for this signature.

I would like to bring your special attention to TYPEDBYREF element on the below diagram, this is the typed reference, it contains not only a managed pointer (like normal reference) to a location but also a runtime representation of data. I quote description of it from the specification:

"The typed reference local variable signature states that the local will contain both a managed pointer to a location and a runtime representation of the type that can be stored at that location. A typed reference signature is similar to a byref constraint, but while the byref specifies the type as part of the byref constraint (and hence statically as part of the type description), a typed reference provides the type information dynamically. A typed reference is a full signature in itself and cannot be combined with other constraints. In particular, it is not possible to specify a byref whose type is typed reference."

The typed reference is also very helpful when byref passing of unboxed data (i.e., data that is stored on the stack, those are always value types) to methods that are not statically restricted to the type they accept and require in addition to passing managed pointer to a location, also static type of a location, the typed reference meets these needs. Notice also that typed reference parameter can refer to a location that is on the stack, and that location will have a lifetime limited by a time of running a method (within the typed reference is allocated), thus the CIL compiler applies appropriate checks on the lifetime of byref and typed reference parameter, see more in §12.4.1.5.2 in ECMA-355 specification. The typed reference is represented in the .NET's BCL (Base Class Library) as TypedReference structure.

The LocalVarSig signature syntax diagram

Picture 1: The LocalVarSig signature syntax diagram

Example 1

This example represents declaring byref value types on the stack (only), the sample code is written in the CIL language, and looks like below:

MSIL
// Full source: LocalVarSig\1.il
// Binary: LocalVarSig\1.dll
// (...)

.method public static void TestMethod()
{ 
    .locals init(int32 &IntVarByRef)
    ret
}

The LocalVarSig signature for this sample code is explored in the below table:

Offset Value Meaning
0x05 0x04 Signature size
0x06 0x07 Signature's prolog (LOCAL_SIG constant)
0x07 0x01 The total number of variables declared in this method is one
0x08 0x10 Because actual variable resides on the runtime heap, the BYREF element of value 0x10 is present
0x09 0x08 The variable's type (int32), see constants in the first part

Example 2

The sample below illustrates what happens to the signature if we use typed reference, at the beginning, we declare the IntVar variable, in the next line, we obtain a typed reference using __makeref keyword (is undocumented and not CLS compliant) and save it in the TypedByRefVar variable.

C#
// Full source: LocalVarSig\2.cs
// Binary: LocalVarSig\2.dll
// (...)

[CLSCompliant(false)]
public void TestMethod()
{
    int IntVar = 0;
    TypedReference TypedByRefVar = __makeref(IntVar);
}

The LocalVarSig for this sample looks as below:

Offset Value Meaning
0x1E 0x04 Signature size
0x1F 0x07 Signature's prolog (LOCAL_SIG constant)
0x20 0x02 The total number of variables declared in this method is two
0x21 0x08 The first variable's type (int32), see constants in the first part
0x22 0x16 The second variable's type (TYPEDBYREF), see constants in the first part

Example 3

Now move on to a little bit more difficult example, in this sample code, we create TestDataClass class which has only one member named StringVarToBePinned of type string. In the TestMethod method (marked as unsafe), we instantiate the TestDataClass class, in the line below, we try to "pin" StringVarToBePinned member and assign reference to them to FixedVar pointer using fixed keyword. This treatment assures that between { and } braces, the dataClass.StringVarToBePinned member will not be moved by the garbage collector actions, thus FixedVar to the member will be always valid inside braces of fixed keyword. Please notice that we cannot declare the variable to be pinned, directly in the method, because such value is already pinned (is placed on the stack), therefore the variable must be wrapped with TestDataClass class (which is placed on the heap).

C#
// Full source: LocalVarSig\3.cs
// Binary: LocalVarSig\3.dll
// compile with "/unsafe" switch
// (...)

public class TestDataClass
{
    public string StringVarToBePinned;
}

public class TestClass
{
    public unsafe void TestMethod()
    {
        TestDataClass dataClass = new TestDataClass();
        fixed (char* FixedVar = dataClass.StringVarToBePinned) { }
    }
}

This sample is a difficult one because of one more reason, at some point, it uses element that is not described yet, namely TypeDefOrRefEncoded, this element defines in which row and in which metadata table (TypeDef, TypeRef or TypeSpec) specified type is described. We will not go into further details of this elements here, if you want, you can jump directly to a description of this element by going to 2.2 TypeDefOrRefEncoded subsection in the next chapter. The LocalVarSig for the above code is explored in the below table:

Offset Value Meaning
0x20 0x08 Signature size
0x21 0x07 Signature's prolog (LOCAL_SIG constant)
0x22 0x03 The total number of variables declared in this method is three
0x23 0x12 The first variable's type (CLASS - followed by the TypeDefOrRefEncoded element), see constants in the first part
0x24 0x08 The first variable's type is described in the TypeDef metadata table at row 2, which is TestDataClass class. This is the TypeDefOrRefEncoded element not explained in the current chapter.
0x25 0x0F The second variable's type (PTR - followed by Type element), see constants in the first part
0x26 0x03 The pointer's type from the previous byte (char - finally this is char*), see constants in the first part
0x27 0x45 The third variable is pinned, see constants
0x28 0x0E The third, pinned variable's type (string), see constants

1.2 CustomAttrib

As you can guess, this signature stores instances of custom attributes, but is a little different from earlier discussed signatures, the key difference is that the CustomAttrib in contrast to, for example, MethodRefSig signature, stores values of parameters supplied to a custom attribute, and does not store types of parameters. In other words, the CustomAttrib signature stores only values of parameters (fixed and named) supplied at instantiation of a custom attribute, the information about their types and number is not repeated in the signature. The signature is indexed by the CustomAttribute.Value column, the Parent column indicates in which table (TypeDef - for a type, MethodDef - for a method, and so on) and at which row, an attributed element (method, type, and so on) is described. There is also a second significant difference compared to other signatures, in the CustomAttrib signature all binary values are stored in uncompressed little-endian byte order, except the PackedLen item (discussed below) and signature size. And I repeat once again, do not confuse custom attribute with custom modifier ! The full syntax diagram consists of four parts, let us look at the first.

The CustomAttrib signature syntax diagram

Picture 2a: The CustomAttrib signature syntax diagram

So far it is pretty simple, it starts from the Prolog that has constant value 0x0001 and occupies two bytes (unsigned int16 - uncompressed and little-endian). Next comes fixed arguments (FixedArg is illustrated on the Picture 2b), their number and types can be obtained by examining associated constructor's row in the MethodDef or MemberRef (when attribute's class resides in another assembly) metadata table, note that vararg method can not be used as an attribute's constructor. Next, the number of named parameters follows (NumNamed is two byte unsigned int16 - also uncompressed and little-endian), and finally named parameters themselves occur, repeated NumNamed times.

The CustomAttrib signature syntax diagram

Picture 2b: The CustomAttrib signature syntax diagram

This is a little bit harder part than the previous one, but is also quite simple, the upper path on the diagram, denotes that parameter is not a single-dimensional, zero-based array (SZARRAY, see constants in the first part), the bottom path represents SZARRAY parameter, i.e., parameter is an array, the number of elements in the SZARRAY array is stored in the NumElem element of type int32 (uncompressed and little-endian) which occupies four bytes, if the SZARRAY parameter is null, then the NumNamed is set to 0xFFFFFFFF value. The CLI completely disallows using other than one-dimensional arrays with a lower bound of zero (SZARRAY), single-dimensional zero-based array of type int32, is int32[] but not int32[,,] and also not int32[3...8]. If you want to know more about arrays in .NET, read the Array Types in .NET article from MSDN Magazine.

The CustomAttrib signature syntax diagram

Picture 2c: The CustomAttrib signature syntax diagram

This part is probably the most weird of all four, the format Elem takes varies depending on the following conditions (quoted from the specification).

If the parameter kind is simple (first line in the above diagram) (bool, char, float32, float64, int8, int16, int32, int64, unsigned int8, unsigned int16, unsigned int32 or unsigned int64) then the 'blob' contains its binary value (Val). (A bool is a single byte with value 0 (false) or 1 (true); char is a two-byte Unicode character; and the others have their obvious meaning.) This pattern is also used if the parameter kind is an enum -- simply store the value of the enum's underlying integer type.

If the parameter kind is string, (middle line in above diagram) then the blob contains a SerString - a PackedLen count of bytes (compressed and big-endian - added by the author), followed by the UTF8 characters. If the string is null, its PackedLen has the value 0xFF (with no following characters). If the string is empty (""), then PackedLen has the value 0x00 (with no following characters).

If the parameter kind is System.Type (see typeof keyword - added by the author of the article), (also, the middle line in above diagram), its value is stored as a SerString (as defined in the previous paragraph), representing its canonical name. The canonical name x by the assembly where it is defined, its version, culture and public-key-token. If the assembly name is omitted, the CLI looks first in the current assembly, and then in the system library (mscorlib); in these two special cases, it is permitted to omit the assembly-name, version, culture and public-key-token.

If the parameter kind is System.Object, (third line in the above diagram) the value stored represents the "boxed" instance of that value-type. In this case, the blob contains the actual type's FieldOrPropType (see below), followed by the argument's unboxed value. [Note: It is not possible to pass a value of null in this case. end note]

The CustomAttrib signature syntax diagram

Picture 2d: The CustomAttrib signature syntax diagram

The last part illustrates format of the NamedArg element that represents a named argument (either, a field or a property). Because fields and properties can have the same name, the first element is either FIELD of constant one-byte value 0x53 when named parameter refers to a field or PROPERTY of constant one-byte value 0x54 when named parameter refers to a property. Next comes FieldOrPropType element which describes the type of the named property or field in one or two bytes, if the type of the named parameter is an unboxed simple value type (defined above), then the FieldOrPropType shall contain exactly one associated type's constant value (BOOLEAN, CHAR, I1, U1, I2, U2, I4, U4, I8, U8, R4, R8, STRING - see constants table in the first part), but if the type of the named parameter is boxed simple value type, then FieldOrPropType element is preceded by a byte containing value 0x51, in this case the FieldOrPropType is two-byte long. The FieldOrPropName element is SerString (explained above) containing the name of a property or a field. Finally comes a single FixedArg element shown earlier. So, as you can see, the NamedArg element is the normal FixedArg preceded with some additional information, that identify which field or property it represents. I hope that I did not scare you, as you will see soon, the signature is not as complicated as it looks.

Example 1

This example mainly shows the format of the SerString element and how the CustomAttrib distinguishes between fields and properties that act as named parameters. In the example below, we have the TestAttribute attribute that needs supplying one fixed parameter Fixed1 of type int32, additionally, we may (and we do) supply two additional, named parameters of type int16 and string, as shown in the below code listing:

C#
// Full source: CustomAttrib\1.cs
// Binary: CustomAttrib\1.dll
// (...)

[AttributeUsage(AttributeTargets.Class)]
public class TestAttribute : Attribute
{
    public TestAttribute(int Fixed1) { }

    public short Named1 { get; set; }

    public string Named2;
}

[Test(1, Named1 = 1, Named2 = "Abcd")]
public class TestClass { }

The full CustomAttrib signature for this case is 33-bytes long, so at some points, we have merged several bytes into one row, with single description.

Offset Value Meaning
0x3E 0x21 Signature size, stored as a compressed integer, in big-endian byte order
0x3F

0x40

0x01

0x00

Prolog stored as an uncompressed and little-endian unsigned int16 of value 0x0001
0x41

0x42

0x43

0x44

0x01

0x00

0x00

0x00

The value of the first fixed argument of the attribute (Fixed1), the value is 0x00000001 and is stored as an uncompressed, little-endian int32. This is represented by the upper line in the Picture 2b and the first path in the Picture 2c.
0x45

0x46

0x02

0x00

The number of the named parameters supplied to the attribute, represented by the NumNamed element on the Picture 2a and stored as an unsigned int16, little-endian. We supplied exactly two optional parameters, and of course value of this two-byte element is 0x0002.
0x47 0x54 The value of this byte indicates that target named parameter is represented by a property (see constants in the first part), this is element PROPOERTY on the Picture 2d.
0x48 0x06 The type of the target property (int16, see constants in the first part). This byte is represented by the FieldOrPropType element on the Picture 2d.
0x49

0x4A

0x4B

0x4C

0x4D

0x4E

0x4F

0x06

0x4E

0x61

0x6D

0x65

0x64

0x31

This is the SerString string which specifies the name of the target property (represented by the FieldOrPropName element on the Picture 2d). The SerString is a normal unicode string preceded with its size in bytes, the size is stored as a compressed integer, using big-endian byte order. So we have 6-byte long string (offset 0x49), because string name does not contain any characters beyond ASCII table, each one character occupies exactly one byte, we can easily read string text, it is Named1.
0x50

0x51

0x01

0x00

The value of the first named argument of the attribute (Named1), the value is 0x00001 and is stored as an uncompressed, little-endian int16. This is represented by the upper line in the Picture 2b and the first path in the Picture 2c.
0x52 0x53 The value of this byte indicates that target named parameter is represented by a field (see constants in the first part), this is element FIELD on the Picture 2d.
0x53 0x0E The type of the target field (string, see constants in the first part). This byte is represented by the FieldOrPropType element on the Picture 2d.
0x54

0x55

0x56

0x57

0x58

0x59

0x5A

0x06

0x4E

0x61

0x6D

0x65

0x64

0x32

This is again the SerString string which specifies the name of the target property (represented by the FieldOrPropName element on the Picture 2d). The length of this string is 6-byte (look at offset 0x54), rest of the bytes are very similar to the previous string, it only differs the last byte, the string text is Named2, see ASCII table
0x5B

0x5C

0x5D

0x5E

0x5F

0x04

0x41

0x62

0x63

0x64

The value of the second named argument of the attribute (Named2), the value is Abcd (see ASCII table) and is stored as a SerString. This is represented by the upper line in the Picture 2b and the middle path in the Picture 2c. Because 0x5F - 0x3E = 0x21, i.e. last offset - first offset = signature size, the signature ends here.

Example 2

In this example, we will demonstrate signature format, when using System.Type, SZARRAY, and boxed value types as arguments of the TestAttribute attribute defined below:

C#
// Full source: CustomAttrib\2.cs
// Binary: CustomAttrib\2.dll
// (...)

[AttributeUsage(AttributeTargets.Class)]
public class TestAttribute : Attribute
{
    public TestAttribute(object Param1, int[] Param2, Type Param3) { }
}

[Test(1, new int[] {1, 2, 3}, typeof(string))]
public class TestClass { }

As in the previous sample, signature is very long (it has 116 bytes !), and I split it up into smaller parts.

Offset Value Meaning
0x2B 0x74 Signature size, stored as a compressed integer, in big-endian byte order
0x2C

0x2D

0x01

0x00

Prolog stored as an uncompressed and little-endian unsigned int16 of value 0x0001
0x2E 0x08 The type of the first fixed argument (int32 - boxed inside System.Object), this case is represented by the third path on the Picture 2c, where a value is immediately preceded by the type of a value
0x2F

0x30

0x31

0x32

0x01

0x00

0x00

0x00

The value which type was specified in the previous byte, because the type of the value is int32 it occupies exactly 4 bytes. It is stored in little-endian byte order, so the value is 0x00000001.
0x33

0x34

0x35

0x36

0x03

0x00

0x00

0x00

Next comes second parameter's definition, because the second argument is single dimensional and zero-based array (SZARRAY), this four bytes specifies the number of elements supplied to the array of the second parameter, this value is stored as an unsigned int32 in little-endian byte order.
0x37

0x38

0x39

0x3A

0x01

0x00

0x00

0x00

The value of the first element of the array in the second parameter, it is four-byte long because the type of array is int32, the value is 0x00000001.
0x3B

0x3C

0x3D

0x3E

0x02

0x00

0x00

0x00

The value of the second element of the array in the second parameter, it is four-byte long because the type of array is int32, the value is 0x00000002.
0x3F

0x40

0x41

0x42

0x03

0x00

0x00

0x00

The value of the third element of the array in the second parameter, it is four-byte long because the type of array is int32, the value is 0x00000003.
0x43

0x44

0x45

0x46

0x47

0x48

0x49

0x4A

0x4B

0x4C

0x4D

0x4E

0x4F

0x50

0x51

0x52

0x53

0x54

0x55

0x56

0x57

0x58

0x59

0x5A

0x5B

0x5C

0x5D

0x5E

0x5F

0x60

0x61

0x62

0x63

0x64

0x65

0x66

0x67

0x68

0x69

0x6A

0x6B

0x6C

0x6D

0x6E

0x6F

0x70

0x71

0x72

0x73

0x74

0x75

0x76

0x77

0x78

0x79

0x7A

0x7B

0x7C

0x7D

0x7E

0x7F

0x80

0x81

0x82

0x83

0x84

0x85

0x86

0x87

0x88

0x89

0x8A

0x8B

0x8C

0x8D

0x8E

0x8F

0x90

0x91

0x92

0x93

0x94

0x95

0x96

0x97

0x98

0x99

0x9A

0x9B

0x9C

0x9D

0x5A

0x53

0x79

0x73

0x74

0x65

0x6D

0x2E

0x53

0x74

0x72

0x69

0x6E

0x67

0x2C

0x20

0x6D

0x73

0x63

0x6F

0x72

0x6C

0x69

0x62

0x2C

0x20

0x56

0x65

0x72

0x73

0x69

0x6F

0x6E

0x3D

0x32

0x2E

0x30

0x2E

0x30

0x2E

0x30

0x2C

0x20

0x43

0x75

0x6C

0x74

0x75

0x72

0x65

0x3D

0x6E

0x65

0x75

0x74

0x72

0x61

0x6C

0x2C

0x20

0x50

0x75

0x62

0x6C

0x69

0x63

0x4B

0x65

0x79

0x54

0x6F

0x6B

0x65

0x6E

0x3D

0x62

0x37

0x37

0x61

0x35

0x63

0x35

0x36

0x31

0x39

0x33

0x34

0x65

0x30

0x38

0x39

This 90-bytes long SerString describes the canonical name of the type that is supplied to the third parameter, it has the following value System.String, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089. This is represented by the middle path on the Picture 2c.
0x9E

0x9F

0x00

0x00

Two ending bytes, that are not part of the previous SerString (0x9F - 0x44 != 0x5A), but they form part of the entire CustomAttrib (0x9F - 0x2C = 0x74) and does not contain any data, I think that canonical name has some alignment and that is why these zeros are present, unfortunately specification does not say anything about it.

1.3 MethodSpec

The MethodSpec signature is straightforward, it describes each instantiation of a generic method, is indexed by the MethodSpec.Signature column, and its syntax is as follows, it begins with GENRICINST (do you see missing "E" ?) prolog of one-byte value 0x0A (this constant has different value than ELEMENT_TYPE_GENERICINST defined in the constants table in the first part), where Type is repeated GenArgCount.

MSIL
MethodSpecBlob ::=
   GENRICINST GenArgCount Type Type*

Example 1

In the sample below, we instantiate the TestMethod generic method, supplying three generic arguments.

C#
// Full source: MethodSpec\1.cs
// Binary: MethodSpec\1.dll
// (...)

public class TestClass
{
    public void TestMethod<GenArg1, GenArg2, GenArg3>() { }
}

public class TestRunClass
{
    public void TestRunMethod()
    {
        new TestClass().TestMethod<short, int, string>();
    }
}

The MethodSpec for this case looks as follows:

Offset Value Meaning
0x18 0x05 Signature size
0x19 0x0A Prolog
0x1A 0x03 The number of generic arguments supplied to the generic method
0x1B 0x06 The first parameter's type (int16), see constants in the first part
0x1C 0x08 The second parameter's type (int32), see constants in the first part
0x1D 0x0E The third parameter's type (string), see constants in the first part

1.4 TypeSpec

The TypeSpec signature is indexed by the TypeSpec.Signature column, and is used when: instantiating type as a multi-dimensional array, instantiating type as a single-dimensional array preceded with custom modifier(s), instantiating generic type and other actions, as shown on the below diagram. Because some elements are not explained yet (such as custom modifiers, array shapes), we use only limited functionality of the TypeSpec signature, in the next chapter, we will focus on the CustomMod, ArrayShape, TypeDefOrRefEncoded elements, and we will back to the TypeSpec signature and use rest of the capabilities of the signature. Also notice that in contrast to previous example, where GENRICINST (missing "E") constant/prolog is also used, in the TypeSpec the ELEMENT_TYPE_GENERICINST constant is used, which is defined in the general constants table (in the first part of the article).

TypeSpecBlob ::=
  PTR      CustomMod*  VOID
| PTR      CustomMod*  Type
| FNPTR    MethodDefSig
| FNPTR    MethodRefSig
| ARRAY    Type  ArrayShape
| SZARRAY  CustomMod*  Type
| GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type Type*

Example 1

In this example, we instantiate the TypeSpec generic type, as shown in the below code listing:

C#
// Full source: TypeSpec\1.cs
// Binary: TypeSpec\1.dll
// (...)

public class TestClass<GenArg1, GenArg2> { }

public class TestRunClass
{
    public void TestRunMethod()
    {
        TestClass<int, string> TestVar = new TestClass<int, string>();
    }
}

The TypeSpec for this case looks as follows:

Offset Value Meaning
0x13 0x06 Signature size
0x14 0x15 The ELEMENT_TYPE_GENERICINST constant, see constants table in the first part
0x15 0x12 The type of the generic type (CLASS), see constants table in the first part
0x16 0x08 The instantiated generic type is described in the TypeDef metadata table at row 2. This is the TypeDefOrRefEncoded element not explained in the current chapter.
0x17 0x02 The number of generic arguments supplied to the type is two.
0x18 0x08 The first generic parameter's type (int32), see constants in the first part
0x19 0x0E The second generic parameter's type (string), see constants in the first part

1.5 MarshalSpec

The MarshalSpec signature is generated when using MarshalAs attribute on fields, parameters and return parameters. It specifies how data should be marshalled when calling from/to unmanaged code via the Platform Invoke. The signature is indexed in the FieldMarshal.NativeType column, the name of the metadata table is slightly misleading, in fact, it does not matter whether the MarshalSpec describes either field, parameter or return parameter, it is always indexed by the previously mentioned column. The ParamNum and NumElem elements on the below syntax listing describe respectively, the parameter in the method call that provides the number of elements in the array, the number of elements or additional elements, both elements are stored in the signature as compressed integers, their aim is to help compute the total size in bytes that an array occupies in the memory. The Microsoft-specific implementation of the marshalling descriptor is richer than that described here, and make use of additional constants and extended syntax, if you want to know more about Microsoft implementation of the MarshalSpec, go to the Partition II metadata specification - section §23.4.

MarshalSpec ::=
  NativeIntrinsic
| ARRAY ArrayElemType
| ARRAY ArrayElemType ParamNum
| ARRAY ArrayElemType ParamNum NumElem

ArrayElemType ::=
   NativeIntrinsic 

NativeIntrinsic ::=
  BOOLEAN | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8
| LPSTR | LPSTR | INT | UINT | FUNC 

To compute the size in bytes of an array, the following pseudo-code is used, where the @ParamNum stands for the value passed in for parameter number ParamNum.

if ParamNum = 0
   SizeInBytes = NumElem * sizeof (elem)
else
   SizeInBytes = ( @ParamNum +  NumElem ) * sizeof (elem)
endif

Constants table for this signature is as on the below table, in the above syntax descriptors and examples in this subsection, instead of full names of constants, abbrevations are used.

Name Value
NATIVE_TYPE_BOOLEAN 0x02
NATIVE_TYPE_I1 0x03
NATIVE_TYPE_U1 0x04
NATIVE_TYPE_I2 0x05
NATIVE_TYPE_U2 0x06
NATIVE_TYPE_I4 0x07
NATIVE_TYPE_U4 0x08
NATIVE_TYPE_I8 0x09
NATIVE_TYPE_U8 0x0A
NATIVE_TYPE_R4 0x0B
NATIVE_TYPE_R8 0x0C
NATIVE_TYPE_LPSTR 0x14
NATIVE_TYPE_LPWSTR 0x15
NATIVE_TYPE_INT 0x1F
NATIVE_TYPE_UINT 0x20
NATIVE_TYPE_FUNC 0x26
NATIVE_TYPE_ARRAY 0x2A
NATIVE_TYPE_MAX 0x50

Example 1

Let us start with the simplest possible example shown in the below code listing:

C#
// Full source: MarshalSpec\1.cs
// Binary: MarshalSpec\1.dll
// (...)

[MarshalAs(UnmanagedType.LPWStr)]
public string TestField;

This code has generated the following MarshalSpec signature:

Offset Value Meaning
0x1C 0x01 Signature size
0x1D 0x15 The TestField field is marshalled to the LPWSTR in the unmanaged code.

Example 2

Now it is time for a more sophisticated example, we will marshal the array of int32 type to LPArray (a pointer to the first element of a C-style array), because such array type does not provide information about rank and bounds of the associated array data, we have to specify which parameter of the method is responsible for providing information about how much elements the array has, this is done by the specifying SizeParamIndex optional parameter, in addition to it, we also set the SizeConst optional parameter, which specifies that Param1 array contains 10 more elements in addition to that specified by the ArraySize argument. Please notice that there is also the SafeArray array type, which is a self-describing array that carries the type, rank, and boundaries of the associated data, and does not require setting any optional parameters in the MarshalAsAttribute, but it is Microsoft-specific, and thus is not described here.

C#
// Full source: MarshalSpec\2.cs
// Binary: MarshalSpec\2.dll
// (...)

 public void TestMethod(
    [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 2, SizeConst = 10)] int[] Param1,
    int ArraySize)
{
    // nop
}

The following MarshalSpec signature should be generated by the above code.

Offset Value Meaning
0x1B 0x05 Signature size
0x1C 0x2A Type of marshalling parameter (ARRAY), see constants table for marshalling descriptor
0x1D 0x50 The MAX constant (see constants table for marshalling descriptor) indicates that this array does not provide information about element's type of the array.
0x1E 0x02 The ParamNum parameter stored as compressed integer
0x1F 0x0A The NumElem parameter stored as compressed integer
0x20 0x01 The ElemMult parameter stored as compressed integer, this is strange parameter, the whole specification mentions about it only two times saying that, if marshalled type is ARRAY the ElemMult must be set to 0x01 but does not specify its meaning and its location in the MarshalSpec signature (see section §22.17 in the Partition II metadata specification).

2. Elements

We have discussed all signatures, but it is not the end, signatures consist of smaller parts named "elements" (I call it this way), they were separated because, they form a part of more than one signature and thus, there is no need to repeat explanation for particular element(s) in each signature. In this chapter, we will take a closer look at them.

2.1 CustomMod

This element has frequently repeated in the discussed signatures, and that is why we are starting from it. The custom modifiers are similar to the custom attributes, but in contrast to them, the custom modifiers are part of a signature. Custom modifiers are defined in the CIL using modreq (required modifier) and modopt (optional modifier) keywords in a method declaration, both need supplying a type (class or structure) as their "argument". Two signatures that differ only by the addition of a custom modifier (required or optional) shall not be considered to match, and, as the specification says:

The distinction between required and optional modifiers is important to tools other than the CLI that deal with the metadata, typically compilers and program analysers. A required modifier indicates that there is a special semantics to the modified item that should not be ignored, while an optional modifier can simply be ignored. For example, the const qualifier in the C programming language can be modelled with an optional modifier since the caller of a method that has a const-qualified parameter need not treat it in any special way. On the other hand, a parameter that shall be copy-constructed in C++ shall be marked with a required custom attribute since it is the caller who makes the copy.

Unfortunately, C# has some problems with handling parameters that have custom modifiers attached, you can read about it in the Modopt, method signatures, and incomplete specs oh my! and More on modopt articles on CodeBetter.com.

The CMOD_OPT and CMOD_REQD are just constants defined in the constants table in the first part, the TypeDefEncoded and TypeRefEncoded elements are in fact single TypeDefOrRefEncoded element, thoroughly discussed in the next subsection. Note that there can be zero, one or more the CustomMods attached to a field, property, parameter or return parameter. As far as I know, there is no way to define custom modifier using C#, of course excluding System.Reflection.Emit. In the System.Runtime.CompilerServices namespace, you can find several indicators (I call it this way) that can be applied to a custom modifier, for instance CallConvCdecl, IsConst, IsLong.

The CustomMod element syntax diagram

Picture 3: The CustomMod element syntax diagram

Example 1

In the example below, we have annotated the TestField field with the modreq modifier, hence the CustomMod lies within the FieldSig signature, depicted at the very beginning of the article, at the Picture 2. The IsLong indicator, distinguishes a long from an integer in C++, but, in fact, in our case, there is no special semantics behind this custom modifier, we want just demonstrate CustomMod element's format in the signature. The value of the TypeDefOrRefEncoded element is shown twice, in two numeral systems - hexadecimal (<sub>16</sub> subscript) and binary (<sub>2</sub> subscript), in the next subsection, you will see why.

MSIL
// Full source: CustomMod\1.il
// Binary: CustomMod\1.dll
// (...)

.field public int64 modreq([mscorlib]System.CompilerServices.IsLong) TestField

The table below presents whole FieldSig signature indexed by the Field.Signature column, along with embedded custom modifier generated by the modreq keyword.

Offset Value Meaning
0x01 0x04 Signature size
0x02 0x06 FieldSig's prolog
0x03 0x1F Encountered custom, required modifier (modreq), see constants in the first part
0x04 0x0516

000001012

The TypeDefOrRefEncoded element, in this case it points to first row of the TypeRef table, that is IsLong class. This element is described in the next subsection.
0x05 0x0A The type of the field (int64), see constants in the first part

2.2 TypeDefOrRefEncoded

Now we will try to demystify the most mysterious elements at this moment, fortunately, namely the TypeDefOrRefEncoded, it is not so complicated as it may seem. This element determines in which metadata table and at which table's row referenced type's information resides. The first two, least significant bits encode metadata table, 0 for TypeDef (referenced type resides in the current assembly), 1 for TypeRef (referenced type resides in a separate assembly) and 2 for TypeSpec (referenced type is generic type, array, etc. see chapter 4.9 TypeSpec), the rest bits encode the row's index, note that indexes are one-based, in other words, first row in every metadata table is always 1, not 0.

Example 1

In this example, we have declared the single field with the custom, required modifier attached to it, the modreq accepts as the argument the TestClass type declared in the same assembly, as shown below:

MSIL
// Full source: TypeDefOrRefEncoded\1.il
// Binary: TypeDefOrRefEncoded\1.dll
// (...)

.class public TestClass extends [mscorlib]System.Object { }

.field public int64 modreq(TestClass) TestField

The FieldSig for the above sample code is as follows:

Offset Value Meaning
0x01 0x04 Signature size
0x02 0x06 FieldSig's prolog
0x03 0x1F Encountered custom, required modifier (modreq), see constants in the first part
0x04 0x0816

000010002

The TypeDefOrRefEncoded element, this time, it points to the second row of the TypeDef table, the first two, least significant bits stand for type of table (002 - TypeDef), bits from 3 to 8 denotes number of row in the table (0000102 - 2), that is TestClass. Now, compare this, with the TypeDefOrRefEncoded element from previous subsection.
0x05 0x0A The type of the field (int64), see constants in the first part

2.3 Param

This element describes a single parameter supplied to a method or a property, and therefore is part of PropertySig, MethodDefSig, MethodRefSig, etc. This is the syntax diagram for the Param element:

The Param element syntax diagram

Picture 4: The Param element syntax diagram

Example 1

In the TestMethod method illustrated below, there are two custom modifiers attached to the single parameter, the aim of this example is to demonstrate the Param element's format, and once again show how the TypeDefOrRefEncoded element works.

MSIL
// Full source: Param\1.il
// Binary: Param\1.dll
// (...)

.class public TestClass extends [mscorlib]System.Object { }

.method public static void TestMethod(int32 modopt(TestClass) 
        modreq([mscorlib]System.Runtime.CompilerServices.IsLong) Param1) 
{
    ret
}

The associated MethodDefSig signature for this method is:

Offset Value Meaning
0x01 0x08 Signature size
0x02 0x00 Method is static
0x03 0x01 The number of parameters
0x04 0x01 The type of the returned value (void), see constants in the first part
0x05 0x1F Encountered custom, required modifier (modreq), see constants in the first part
0x06 0x0916

000010012

Referenced row is 2 in TypeRef metadata table, that is IsLong type
0x07 0x20 Encountered custom, optional modifier (modopt), see constants in the first part
0x08 0x0816

000010002

Referenced row is 2 in TypeDef metadata table, that is TestClass type
0x09 0x08 First parameter's type (int32), see constants in the first part

2.4 RetType

This element is almost identical to the Param element, it has one more extra path that can include VOID type. Because the below syntax diagram for this element is self-explanatory, there are no examples provided for this subsection.

The RetType element syntax diagram

Picture 5: The RetType element syntax diagram

2.5 Type

Is it not surprising that the Type element describes... a type, and not only primitive type (such as int32, bool, string, etc.) but also arrays, generic instance types and complex types (classes and structures). The below listing presents syntax diagram for this element, of course, words written using upper case are constants whose values can be found in the constants table in the first part. You may wonder that the constant GENERICINST is part of this element, but remember that the TypeSpec, MethodSpec and MethodDefSig signatures have different aims !

Type ::=	  
BOOLEAN | CHAR | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8 | I | U |
| ARRAY Type ArrayShape
| CLASS TypeDefOrRefEncoded
| FNPTR MethodDefSig
| FNPTR MethodRefSig
| GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type *
| MVAR number
| OBJECT
| PTR CustomMod* Type
| PTR CustomMod* VOID
| STRING
| SZARRAY CustomMod* Type
| VALUETYPE TypeDefOrRefEncoded
| VAR number

Example 1

Let us see what happens to the MethodDefSig signature when method accepts generic types as normal parameters.

C#
// Full source: Type\1.cs
// Binary: Type\1.dll
// (...)

public class TestClass<GenArg1, GenArg2> { }

public class TestRunClass
{
    public void TestRunMethod()
    {
        TestMethod(new TestClass<int, string>());
    }

    public void TestMethod(TestClass<int, string> Param1) { }
}

Dissecting the MethodDefSig signature for the TestMethod method.

Offset Value Meaning
0x0E 0x09 Signature size
0x0F 0x20 The method is instance method
0x10 0x01 The number of normal parameters
0x11 0x01 The type of the returned value (void), see constants in the first part
0x12 0x15 The first parameter's type is generic type (GENERICINST), see constants in the first part
0x13 0x12 The first parameter's type is generic class (CLASS), see constants in the first part

2.6 ArrayShape

I think that a lot people who use .NET platform know that array can have more than one dimension but do not know that each dimension in an array can have lower bound, that is probably because most of developers use C# language which does not allow using lower bounds, except using Array.CreateInstance method to create such array type. The ArrayShape element holds full definition of a multi-dimensional array, it stores number of dimensions, size and lower boundary of each dimension that array has. The syntax diagram along with brief description copied from the specification for this element is depicted below:

The ArrayShape element syntax diagram

Picture 6: The ArrayShape element syntax diagram

Rank is an integer (stored in compressed form, see §23.2) that specifies the number of dimensions in the array (shall be 1 or more). NumSizes is a compressed integer that says how many dimensions have specified sizes (it shall be 0 or more). Size is a compressed integer specifying the size of that dimension - the sequence starts at the first dimension, and goes on for a total of NumSizes items. Similarly, NumLoBounds is a compressed integer that says how many dimensions have specified lower bounds (it shall be 0 or more). And LoBound is a compressed integer specifying the lower bound of that dimension - the sequence starts at the first dimension, and goes on for a total of NumLoBounds items. None of the dimensions in these two sequences can be skipped, but the number of specified dimensions can be less than Rank.

NOTE: Please do not confuse multi-dimensional arrays with jagged arrays, multi-dimensional array in CIL can be for example: int32[,] and jagged array is int32[][]. Also note that ArrayShape stores information only about multi-dimensional arrays ! Single dimensional array is denoted as SZARRAY constant - nothing more ( see Type element). To learn more about arrays in .NET see Array Types in .NET article in the MSDN Magazine.

IMPORTANT: Unfortunately, as we will see in second example, the ILASM compiler has some problems with handling lower boundaries of arrays (the LoBound field on the Picture 6), the lower boundary is multiplied by two! Surely, this is not correct, since the specification says that lower boundaries shall be stored in signatures without making any change. Below, you can see a table copied from the specification that shows sample arrays declarations and its correct parameters in the ArrayShape element. Moreover, the specification does not specify in which case(s) the NumSizes and the NumLoBounds fields may be less than the Rank field, from my observation the NumSizes and the NumLoBounds fields are less than Rank only in one case - when lower boundary is not specified for all dimensions (this is represented in the second row in the below table), otherwise the NumSizes and the NumLoBounds are always equal to the Rank, this is in contradiction with the third and fifth case in the below table:

Declaration Type Rank NumSizes Size NumLoBounds LoBound
[0...2] I4 1 1 3 0 -
[,,,,,,] I4 7 0 - 0 -
[0...3, 0...2,,,,] I4 6 2 4 3 2 0 0
[1...2, 6...8] I4 2 2 2 3 2 1 6
[5, 3...5, , ] I4 4 2 5 3 2 0 3

Example 1

Let us see how the ArrayShape works in action.

MSIL
// Full source: ArrayShape\1.il
// Binary: ArrayShape\1.dll
// (...)

.field public int32[,,] TestField

The following FieldSig signature should be generated by the above multi-dimensional array.

Offset Value Meaning
0x01 0x06 Signature size
0x02 0x06 FieldSig's prolog
0x03 0x14 Field's type value is ARRAY, see constants in the first part
0x04 0x08 Array's type is int32, see constants in the first part
0x05 0x03 The number of the array's dimensions (Rank field on the Picture 6)
0x06 0x00 Size of array's dimensions not specified (NumSizes field on the Picture 6)
0x07 0x00 Lower bounds of array's dimensions not specified (NumLoBounds field on the Picture 6)

Example 2

This example is aimed to show you how the ArrayShape element behaves when declaring multi-dimensional arrays with lower boundaries specified.

MSIL
// Full source: ArrayShape\2.il
// Binary: ArrayShape\2.dll
// (...)

.field public int32[0...5,,4...6] TestField

The whole FieldSig signature looks like:

Offset Value Meaning
0x01 0x0C Signature size
0x02 0x06 FieldSig's prolog
0x03 0x14 Field's type value is ARRAY, see constants in the first part
0x04 0x08 Array's type is int32, see constants in the first part
0x05 0x03 The number of the array's dimensions (Rank field on the Picture 6)
0x06 0x03 The number of sizes for this array (NumSizes field on the Picture 6)
0x07 0x06 The size of the first dimension of the array (Size field on the Picture 6)
0x08 0x00 The size of the second dimension of the array, zero means - not specified (Size field on the Picture 6)
0x09 0x03 The size of the third dimension of the array (Size field on the Picture 6)
0x0A 0x03 The number of the lower bounds for this array (NumLoBounds field on the Picture 6)
0x0B 0x00 The lower boundary of the first dimension of the array (LoBound field on the Picture 6)
0x0C 0x00 The lower boundary of the second dimension of the array (LoBound field on the Picture 6)
0x0D 0x08 The lower boundary of the third dimension of the array (LoBound field on the Picture 6). The boundary is multiplied by two, see important note at the beginning of current subsection

Example 3

Now let us look how the ArrayShape element looks in reality and compare results to the specification.

MSIL
// Full source: ArrayShape\3.il
// Binary: ArrayShape\3.dll
// (...)

.field public int32[0...2] TestField

Yes, the NumLoBounds is equal to the Rank, despite that specification says that NumLoBounds shall be equal to zero.

Offset Value Meaning
0x01 0x08 Signature size
0x02 0x06 FieldSig's prolog
0x03 0x14 Field's type value is ARRAY, see constants in the first part
0x04 0x08 Array's type is int32, see constants in the first part
0x05 0x01 The number of the array's dimensions (Rank field on the Picture 6)
0x06 0x01 The number of sizes for this array (NumSizes field on the Picture 6)
0x07 0x03 The size of the first dimension of the array (Size field on the Picture 6)
0x08 0x01 The number of the lower bounds for this array (NumLoBounds field on the Picture 6)
0x09 0x00 The lower boundary of the first dimension of the array (LoBound field on the Picture 6)

3. Conclusion

As you see, signatures are complicated monstrosity, but makes .NET executable small, compact and consistent. If you have any questions, hints or requests, do not hesitate, just add comment below, constructive comments are always welcome.

4. References

5. Revision History

  • 1.0: 26th September 2009: Initial release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Poland Poland
Przemek was born in 1988, he lives in small town near Warsaw in Poland, Europe. Currently he codes some C# stuff and J2EE as well, ocasionally he uses C++ for fun. Przemek is cycling fun, if weather permits he rides a bike.

Comments and Discussions

 
Questionquestion about the correct type for transfer to com on this code? Pin
jeffery c12-Jul-13 16:13
jeffery c12-Jul-13 16:13 
GeneralMy vote of 5 Pin
Brian Pendleton16-Mar-12 10:32
Brian Pendleton16-Mar-12 10:32 
GeneralMy vote of 5 Pin
Mohammad A Rahman11-Feb-12 22:21
Mohammad A Rahman11-Feb-12 22:21 
GeneralA 5 from me... Pin
Rozis29-Sep-09 4:15
Rozis29-Sep-09 4:15 
GeneralRe: A 5 from me... Pin
Przemyslaw Celej29-Sep-09 4:32
Przemyslaw Celej29-Sep-09 4:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.