Click here to Skip to main content
15,881,882 members
Articles / Programming Languages / C# 4.0

Reduce the Size of MongoDB Documents Generated from .NET/C#

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
18 Mar 2014CPOL3 min read 15.4K   9  
How to reduce the size of MongoDB documents generated from .NET/C#

Introduction

This is a small article about an issue I recently had while trying to save some big documents represented as .NET objects in MongoDB using the MongoDB .Net driver.

While saving a “relatively” big document, I received the following exception:

System.IO.FileFormatException: Size 32325140 is larger than MaxDocumentSize 16777216.
   at MongoDB.Bson.IO.BsonBinaryWriter.BackpatchSize() in 
   c:\projects\mongo-csharp-driver\MongoDB.Bson\IO\BsonBinaryWriter.cs:line 697
   at MongoDB.Bson.IO.BsonBinaryWriter.WriteEndArray() in 
   c:\projects\mongo-csharp-driver\MongoDB.Bson\IO\BsonBinaryWriter.cs:line 294
   at MongoDB.Bson.Serialization.Serializers.EnumerableSerializerBase`1.Serialize
   (BsonWriter bsonWriter, Type nominalType, Object value, IBsonSerializationOptions options) 
   in c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\EnumerableSerializerBase.cs:line 408
   at MongoDB.Bson.Serialization.BsonClassMapSerializer.SerializeMember
   (BsonWriter bsonWriter, Object obj, BsonMemberMap memberMap) in 
   c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\BsonClassMapSerializer.cs:line 684
   at MongoDB.Bson.Serialization.BsonClassMapSerializer.Serialize(BsonWriter bsonWriter, 
   Type nominalType, Object value, IBsonSerializationOptions options) in 
   c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\BsonClassMapSerializer.cs:line 432
   at MongoDB.Driver.Internal.MongoInsertMessage.AddDocument(BsonBuffer buffer, 
   Type nominalType, Object document) in 
   c:\projects\mongo-csharp-driver\MongoDB.Driver\Communication\Messages\MongoInsertMessage.cs:line 53
   at MongoDB.Driver.Operations.InsertOperation.Execute(MongoConnection connection) 
   in c:\projects\mongo-csharp-driver\MongoDB.Driver\Operations\InsertOperation.cs:line 97
   at MongoDB.Driver.MongoCollection.InsertBatch(Type nominalType, IEnumerable documents, 
   MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1149
   at MongoDB.Driver.MongoCollection.Insert(Type nominalType, Object document, 
   MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1004
   at MongoDB.Driver.MongoCollection.Save(Type nominalType, Object document, 
   MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1426

Well the message is clear: seems like I’ve exceeded the MongoDB max document size threshold which is 16MB, fair enough this is quite a sane design decision.

First, I’ll explain why I had this issue, then how I solved it.

Causes and Consequences

At first, I was quite surprised because the same set of objects represented as a CSV document was only a 6MB file.
But rethinking about the data, I remembered that this data-set is mostly a sparse matrix because a lot of properties are null.

With the CSV format for each null property, you only pay for a semi-colon, quite cheap even if you have hundreds of thousands of them.

But with an object-oriented representation like .NET objects or BSON documents, this is another story: for each null property, the cost is far higher because you still store the name of the property and the “null” symbol!
And when you have dozens of properties (and yes, I have good reasons to have that many properties in a single object :) ), the overhead can be huge and represent most of the total size.

So you end up with documents that look something like:

{
    a: "Some data",
    b: null,
    c: null,
    d: "Some other data",
    e: null,
    f: null,
    g: null,
    ...
    z: "Last data"
}

Much of the document is filled with useless markers increasing its size for no additional information.
And this is not really flattering for BSON: my BSON document was 6 times bigger than the CSV document!

The Solution

Fortunately, the guys behind the MongoDB .NET driver are aware of this kind of issue, and they have taken it into account when designing the driver, allowing you to customize the way the BSON documents are generated.

You have at least 2 solutions:

  • mark properties that should be ignored if null
  • register a global policy for the whole app-domain

If you want to mark properties individually, you can use the BsonIgnoreIfNull attribute:

class Data
{
    [BsonIgnoreIfNull]
    public string A { get; set; }
    [BsonIgnoreIfNull]
    public string B { get; set; }
    [BsonIgnoreIfNull]
    public string C { get; set; }
}

The good thing is that this is quite explicit.
But it can add a lot of code if like me, you have dozens of properties to mark.
Moreover, it is quite obtrusive and I don’t like to pollute my business entities with technical attributes, though I do it if there is no simpler solution: again pragmatism should always prevail over dogmatism, though some dogmatic geeks prefer duplicating code and add mappings to clearly isolate business entities. (I’m a recovering dogmatic ;).)

For my current issue, I’ve chosen the other way by registering a global policy:

ConventionPack pack = new ConventionPack();
pack.Add(new IgnoreIfNullConvention(true));

ConventionRegistry.Register("Ignore null properties of data", pack, type => type == typeof(Data));

The last predicate ensures the policy only applies to my “Data” class.

I’ve put this code in the static constructor of the type that is the entry point to the MongoDB database.
So if I have no need for MongoDB, the type won’t be loaded by the CLR and this code won’t be executed.
You could also put this code in the Main of your application, but if you have more than one application that uses your MongoDB layer, you might need to duplicate code, so prefer a static constructor or any other “Init” method.

Conclusion

After applying this patch, I was able to save my documents, and to have an idea of how much space was saved, I’ve checked the size of the newly saved document in the Mongo Shell using the Object.bsonsize() method:

> Object.bsonsize(db.data.find()[0])
7161729

Compared to the original BSON document that included all the properties, this is far better, 7MB instead of 32MB, more than 4 times smaller.

Of course, there is still an overhead compared to CSV because you need to store the fields names when the values are not null, but it’s limited to “only” 15%.
It’s still a big document, but one that fits into the MongoDB database, and this is all that matters.

Hopefully, this article will help somebody with the same issue.
If you catch any typo or mistake or have additional questions, feel free to leave a comment.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Instructor / Trainer Pragmateek
France (Metropolitan) France (Metropolitan)
To make it short I'm an IT trainer specialized in the .Net ecosystem (framework, C#, WPF, Excel addins...).
(I'm available in France and bordering countries, and I only teach in french.)

I like to learn new things, particularly to understand what happens under the hood, and I do my best to share my humble knowledge with others by direct teaching, by posting articles on my blog (pragmateek.com), or by answering questions on forums.

Comments and Discussions

 
-- There are no messages in this forum --