Creating a Custom De/Serializer

<b>Alaa Ben Fatma</b>

4.64/5 (8 votes)

Apr 13, 2018

MIT

13 min read

41048

506

In this article, we will walk through the basic concepts of de/serialization and how to create a very basic de/serializer.

Introduction

Serialization, in short, is a way to translate objects or data structures into a format that can be easily used later on to retrieve some data (in another computer environment, in most cases).

Introduction
Understanding Serialization and Deserialization
- Serialization
- Deserialization
Making a De/Serializer
- Making a Serializer
- Making a Deserializer
Implementation of the Serializer
- Serialize
- Deserialize
Brief Summary
Important

This article is not intended to compete with other de/serializers or re-invent the wheel. Its goal is purely educational. Nevertheless, you still can use this library in your application freely if it suits your needs.

This article will assume that the reader's understanding of the concept of serialization is not high.

Understanding Serialization and Deserialization

Back in the days, someone asked me if I can use a real-life situation to explain what, in reality, is serialization. The challenge was truly intriguing; however, I had to come up with a scenario that could explain the general idea about de/serialization.

*Note: Each statement that is in parentheses refers to the equivalent of a specific matter in the context of computer science: the context of data storage in our case.

Now for the story, Imagine if you pass by a computer that is for sale, and it attracts your attention; however, you decide to ask your friend who is fond of computers whether buying that computer is a clever idea or not. Naturally, taking the computer home is not possible, therefore, you will have to note down the specs of the computer so you can show them to your friend later (This is serialization). Once you show the noted specs to your friend, he will read the specs that are given to him (This is deserialization). Eventually, he will decide whether buying that computer is a relevant idea or not (This is a feat that has been performed based on the deserialized data).

Now you can safely compare this diagram below to the upper diagram - you'll notice a similarity when it comes to the abstract idea of serialization.

*Note: Note that if your friend only knows one language, and you write in a different one. He surely will not be able to comprehend the information you noted down.

Example:

He only knows English, and you write the description of the computer in French.

Summary

Serialization is a way to describe objects in a way that other parties can understand.
Deserialization is a way to read and comprehend the description of a specific object.
You can perform operations on the given description of the object if you manage to comprehend it.

Now that you have got the general idea behind serialization, we can swiftly move on and discuss serialization, deserialization and how to create a serializer.

Serialization

Based on our previous understanding of how serialization works, we can easily conclude that serialization is a way of describing objects by the values of their properties. That's the simplest way to imagine serialization. Dig deeper if you are a beginner and you want to make life harder for yourself.

First things first, in order to understand serialization correctly, we will try to deal with a real-life example.

Say we have a structure/class that describes a CodeProject user by his name, points, number of articles and rank.

class Member
{
    string Name{get;set;}
    double Points{get;set;}
    int Articles{get;set;}
    string Rank{get;set;}
}

And a member:

 Member member = new Member()
            {
                Name = "Alaa Ben Fatma",
                Points = 16.200,
                Articles = 8,
                Rank = "Legend"
            };

That said and done, imagine if we want to use the information of this member again in one of these different scenarios:

When you restart the application
When you are running two instances of the application and you want that custom set of information to be passed and used by both of the instances.
When you are running the application on another machine

Sounds fancy, doesn't it?

Sadly, we can't handle any of the mentioned scenarios without having a set of information to play with; however, we cannot pass the object (member) AS-IT-IS to another application - we should pass some crucial details that can be eventually read by the other instances of the application (This is the deserialization part of the article, we will get into it later).

With all of that being said, the idea behind all of this it so to save the values of the properties of the object in a file or in the memory buffer. To perform such a feat, we need to retrieve all properties of an object and extract their values. As a result, we will end up with a set of information that reflects the object itself.

Here is an example of how the extracted data may, possibly, look like if you serialize the object (member) we declared above:

<Member>
    <Name>Alaa Ben Fatma</Name>
    <Points>16.200</Points>
    <Articles>8</Articles>
    <Rank>Legend</Rank>
</Member>

This code is an XML code, please check this for more details about it.

Deserialization

In the last chapter, we have discussed how the serialization process abstractly works. As a consequence, now we have conserved our data ready to be used.

First things first, I have already mentioned in my real-life example, if the second party does not understand a specific language, then it will have no chance to comprehend anything written in it. In my previous example, we ended up with a code sample written in XML; however, if the application is not able to deserialize a set of information that is written in XML, then it will be impossible for us to have any good use of the conserved data. Hence, I won't dig into the technical aspects for the time being.

In order to make good use of the conserved data, our application must be able to comprehend that information that is being passed to it.

Examples:

In this case, when we will deserialize the data, we will be in a good position for the properties of the object matching the names of the elements of the serialized information of the object; however, in some cases, this may not be enough to assure a good deserialization. For example, we may be having a numeric value that is presented as 999.36.257.3, and the host property cannot accept such a format, then the operation will meet an obscure end and eventually fail.

Based on what I have mentioned above, we don't see that the names of the elements and properties are matching each other, nor the name of the class matching the name of the serialized object. In this case, we are left only with one answer: Deserialization cannot be done.

Deserializing a dataset means that we need to extract the values that are contained within each element in that set, and then assigning the extracted value to the property that has the same name as the element that we extracted the data from. Not to mention that we must respect the type of the property; and in order to do that, we need to cast the value whenever a conversion is needed. Too long to read? Here is a graph that may help you see the situation in a better way:

Now, let us take a look back at the real-life scenario I mentioned above, at his level (the level of deserialization) your friend has already received the written notes and tried to comprehend them. Same goes here, there will be always a deserializer that will read the conserved information and try to comprehend it, and send its values to their fitting position.

Making a Custom De/Serializer

The class that will be used for all the parts of this section:

        private class Pet
        {
            public string Type { get; set; }
            public string Name { get; set; }
            public string Age { get; set; }
            public double Weight { get; set; }
        }

Making an instance of it:

var pet = new Pet();

Admittedly, before jumping directly into the hard work, there are some points to consider:

The de/serializer does not know the name of the object.
- Fatal consequence: how will the serializer choose a proper name for the serialized set of information.
The de/serializer does not know the types of the properties that the object has.
- The de/serializer will then face problems when it comes to casting within the operation of deserializing the dataset.
The de/serializer does not know the name of the properties that the object has.
- As a result, the de/serializer won't be able to access the value of an unknown property.

In order to overcome such difficulties, we need to make use of Reflection which will enable us to retrieve metadata on types at runtime.

Making a Serializer

To make a serializer, we need to make sure that we have retrieved the name of the object and all the properties of an object along with their values.

The first thing to do is to get the type of the object we are planning to serialize:

var myType = obj.GetType();

Running this code will return to us a variable myType that reflects the type of our object pet we are planning to serialize.

After finding the name and the type of our object (the name can be extracted out from the type itself), we shall then iterate through all its properties and add them to a list.

IList<PropertyInfo> props = new List<PropertyInfo>(myType.GetProperties());

Now that we have finally managed to extract ever crucial information related to the object and its properties, it is time to write our full-basic serializer that will, in the end, generate a code that can be shared everywhere and used by any application that has the ability to deserialize it.

       public static string Serialize(object obj)
        {
            var sb = new StringBuilder();
            sb.AppendLine();
            sb.Append("<?");
            var myType = obj.GetType();
            IList<PropertyInfo> props = new List<PropertyInfo>(myType.GetProperties());
            foreach (var prop in props)
            {
                var propValue = prop.GetValue(obj, null);
                sb.AppendLine();
                sb.Append(@"    [" + prop.Name + "=" + propValue + "]");
            }
            sb.AppendLine();
            sb.Append("?>");
            return sb.ToString();
        }

Serialization Test

Let's assign some information to our pet:

pet.Type = "Dragon";
pet.Name = "SkyCloud";
pet.Age = "9200 years";
pet.Weight = 9652.6500;

By implementing the serialization algorithm that is mentioned above, we will get this final result:

<?
[Type=Dragon]
[Name=SkyCloud]
[Age=9200 years]
[Weight=9562.6500]
?>

Making a Deserializer

As we have seen before, deserialization is all about reading a dataset, comprehending it, extracting the values contained within it and then assigning them to their matching properties.

Each deserializer out there has the ability to deserialize a set of conserved information about an object if and only if the serialized format is comprehensible for it. With that being said, our deserializer won't be able to deserialize an object that has been serialized into an XML format- it will only perform the deserialization operation if the syntax can be manipulated by it. Remember the example above, where I mentioned that if you write your computer's specs in French, your friend will not be able to comprehend it? Here is what I meant.

With that being said, we will consider the serialized code we have obtained when we serialized our pet object before. And in order to extract the data contained within it and assign it to its fitting property, we need to break the code into parts, which are, with respect to the order, presented in here.

Extract the block of text that is between the two symbols <? and ?>
Extract every block of text that is between two brackets [ and ]
Split each record into two parts- the name of the property, and its value

To have a better insight of the situation, take a look at these steps being represented visually as animation:

I am sorry for the typos, it is "years" not "year". I thought that I shall mention it in here instead of remaking the animation from scratch.

Implementing the behaviour that is presented in that gif may prove a little bit tedious; however, to keep the article clean and avoid throwing blocks of codes on it, here is the Github repository of the project for more information if you are willing to know how I implemented it.

Now that we have our set of data extracted and in a friendly format (all the records are contained within a list), we can manipulate it easily- in order to do that properly, we will make a struct that contains the name of the property and its value (based on the given serialized dataset of course).
The name of the property, true to its nature, is a string - however, not every value we see in our dataset is a string in its primitive form.

Even though the fact that not every property type is a string in its primitive form is true - every object in .NET has its own textual presentation, and in order to generate that textual presentation, you can use <object>.ToString();

With that being said, our struct shall look like this:

        public struct Data
        {
            public string PropertyName;
            public string Value;
        }

For every record within our list of serialized properties that we managed to extract previously, we will create an instance of the Data struct to conserve the information related to that record.

Now that we have our list of structures (struct is of type Data that we declared previously), we shall accurately iterate through its elements one by one and see if the object that we planning to use as a host for our deserialized dataset has a spot where we can put our data.

(I am sorry for the misbehaviour of this gif, things didn't go really well when I tried to record it :) )

Each time we find a matching property, we need to pass the deserialized value to that property; however, there are two obstacles we need to surpass in order to perform such a feat:

Set a value of a property by using its name in the runtime
Cast the deserialized value to the primitive type of the property

Thus, the use of reflection becomes handy. The first thing to do is to iterate through all the properties of the object that we managed to retrieve previously (Check the Making a Serializer part), and for each property, we will extract its set of information; and to do that, we are in need for a PropertyInfo class. This class will help us get the type of the property, using the extracted type we will be able to perform a valid casting operation using the Convert.ChangeType method.

foreach (var property in properties)
                {
                    var propInfo = target.GetType().GetProperty(property.PropertyName);
                    propInfo?.SetValue(target,
                        Convert.ChangeType(property.Value, propInfo.PropertyType), null);
                }

With having all of that said and done, now we do have the complete architecture of the deserializer ready to be implemented and - the full code of both the serializer and the deserializer can be found on Github.

Implementing the Custom Serializer

To make a good use of this custom de/serializer that we have built together, we will try to perform two operations. The first will be serializing a basic class, the second one will focus on deserializing a serialized class.

Important: Add the de/serializer to your references first.

Serialize an Object

In this example, we will work with the Pet class that we talked about previously. We will make an instance of it and assign some values to it - and serialize it afterwards.

/*
        Input :
        Type = Dragon
        Name = SkyCloud
        Age = 9200 years
        Weight = 9562.6500
*/       
string code = TinySerializer.Serialize(pet);

If we display the code, this will be the expected result:

<?
[Type=Dragon]
[Name=SkyCloud]
[Age=9200 years]
[Weight=9562.6500]
?>

Deserialize an Object

Let's say that we have a text file that contains a serialized data about a pet, that file is named pet.txt, and its contents are:

<?
[Type=Cat]
[Name=Abby]
[Age=2 Months]
[Weight=3.50]
?>

To deserialize that code that is contained within the file, we have to read all its text, and then make some operations on that code.

var petData = File.ReadAllText("pet.txt");
var pet = TinySerializer.TinySerializer.DeSerialize(petData, new Pet());
Console.WriteLine(
    $"This pet is a {pet.Type} and it is called 
    {pet.Name}.\nAge:{pet.Age}\nWeight:{pet.Weight}KG");

Expected output:

This is a Cat and it is called Abby.
Age: 2 Months
Weight:3.5KG

Brief Summary

Serialization is a way to translate an object into another format that can be used later on to retrieve some aspects that are related to that object.
Deserialization is the act of reading and comprehending that serialized data.
The custom de/serializer in here will generate a unique syntax, no other deserializer supports it.
The custom de/serializer can only deserialize the format that it supports, which can only be generated using this custom de/serializer. In other words, it cannot deserialize a set of data written in XML or JSON for example.

Important

As I have mentioned above, this de/serializer was not shared with the intention to compete with other existing products, it's only for educational purposes. However, you can still use it if it fits some of your needs.

Notes

This serializer may possibly be the tiniest serializer around.
It is basic.
It does not support arrays or collections of objects. However, workarounds can be implemented to perform such a feat manually.

History

13^th April, 2018: Initial version