Click here to Skip to main content
15,867,870 members
Articles / Database Development / MongoDB

Paging in MongoDB – How to Actually Avoid Poor Performance ?

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
2 Jan 2020CPOL5 min read 4.2K   1  
Best way, performance wise, to paginate results in MongoDB
The first article in this series was a quick introduction on how to load big chunks of data and then retrieve values using WebApi and LINQ. In this article, I will start from that project, extending it with more details related to paging the query results.

What is the best way (performance wise) to paginate results in MongoDB? Especially when you also want to get the total number of results?
Project running with .NET Core 2.0.

Where to Start?

For answering these questions, let’s start from the datasets defined in my earlier article, Part 1: How to search good places to travel (MongoDb LINQ & .NET Core). That article was a quick introduction on how to load big chunks of data and then retrieve values using WebApi and LINQ. Here, I will start from that project, extending it with more details related to paging the query results. You could also check Part 3 – MongoDb and LINQ: How to aggregate and join collections.

You can find the full solution, together with the data at https://github.com/fpetru/WebApiQueryMongoDb.

Topics Covered

  • Paging query results with skip and limit
  • Paging query results using last position
  • MongoDb BSonId
  • Paging using MongoDb .NET Driver

To Install

Here are all the things needed to be installed:

See the Results

Here are few steps to have the solution ready, and see the results immediately:

  1. Clone or download the project.
  2. Run import.bat file from Data folder – this will create the database (TravelDb), and fill in two datasets.
  3. Open solution with Visual Studio 2017 and check the connection settings appsettings.json.
  4. Run the solution.

If you have any issues on installing MongoDb, setting up the databases, or project structure, please review my earlier article.

Paging Results using cursor.skip() and cursor.limit()

If you do a Google search, this is usually the first presented method to make pagination of the query results in MongoDB. It is a straightforward method, but also expensive in terms of performance. It requires the server to walk from the beginning of the collection or index each time, to get the offset or skip position, before actually beginning to return the result you need.

For example:

JavaScript
db.Cities.find().skip(5200).limit(10);

The server will need to parse the first 5200 items in WikiVoyage collection, and then return the next 10. This doesn’t scale well due to skip() command.

Paging Using the Last Position

To be faster, we should search and retrieve the details starting from the last retrieved item. As an example, let’s assume we need to find all the cities in France, with a population greater than 15.000 inhabitants.

Following this method, the initial request to retrieve first 200 records would be:

LINQ Format

We first retrieve AsQueryable interface:

JavaScript
var _client = new MongoClient(settings.Value.ConnectionString);
var _database = _client.GetDatabase(settings.Value.Database);
var _context = _database.GetCollection<City>("Cities").AsQueryable<City>();	

and then we run the actual query:

JavaScript
query = _context.CitiesLinq
                .Where(x => x.CountryCode == "FR"
                            && x.Population >= 15000)
                .OrderByDescending(x => x.Id)
                .Take(200);
				
List<City> cityList = await query.ToListAsync();

The subsequent queries would start from the last retrieved Id. Ordering by BSonId, we retrieve the most recent records created on the server before the last Id.

JavaScript
query = _context.CitiesLinq
                .Where(x => x.CountryCode == "FR"
                         && x.Population >= 15000
                         && x.Id < ObjectId.Parse("58fc8ae631a8a6f8d000f9c3"))
                .OrderByDescending(x => x.Id)
                .Take(200);
List<City> cityList = await query.ToListAsync();

Mongo’s ID

In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. It is immutable, and may be of any type other than an array (by default a MongoDb ObjectId, a natural unique identifier, if available; or just an auto-incrementing number).

Using default ObjectId type,

JavaScript
[BsonId]
public ObjectId Id { get; set; }

it brings more advantages, such as having available the date and timestamp when the record has been added to the database. Furthermore, sorting by ObjectId will return last added entities to the MongoDb collection.

JavaScript
cityList.Select(x => new
					{
						BSonId = x.Id.ToString(), // unique hexadecimal number
						Timestamp = x.Id.Timestamp,
						ServerUpdatedOn = x.Id.CreationTime
						/* include other members */
					});

Returning Fewer Elements

While the class City has 20 members, it would be relevant to return just the properties we actually need. This would reduce the amount of data transferred from the server.

JavaScript
cityList.Select(x => new
					{
						BSonId = x.Id.ToString(), // unique hexadecimal number
						Name,
						AlternateNames,
						Latitude,
						Longitude,
						Timezone,
						ServerUpdatedOn = x.Id.CreationTime
					});

Indexes in MongoDB – Few Details

We would rarely need to get data, in exact order of the MongoDB internal ids (_id)I, without any filters (just using find()). In most of the cases, we would retrieve data using filters, and then sorting the results. For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.

How Do We Add an Index?

Using RoboMongo, we create the index directly on the server:

JavaScript
db.Cities.createIndex( { CountryCode: 1, Population: 1 } );

How Do We Check Our Query Is Actually Using the Index?

Running a query using explain command would return details on index usage:

JavaScript
db.Cities.find({ CountryCode: "FR", Population : { $gt: 15000 }}).explain();

Image 1

Is There a Way to See the Actual Query Behind the MongoDB Linq Statement?

The only way I could find this, it was via GetExecutionModel() method. This provides detailed information, but inside elements are not easily accessible.

JavaScript
query.GetExecutionModel();

Using the debugger, we could see the elements as well as the full actual query sent to MongoDb.

Image 2
Then, we could get the query and execute it against MongoDb using RoboMongo tool, and see the details of the execution plan.

Non LINQ Way – Using MongoDb .NET Driver

LINQ is slightly slower than using the direct API, as it adds abstraction to the query. This abstraction would allow you to easily change MongoDB for another data source (MS SQL Server / Oracle / MySQL, etc.) without many code changes, and this abstraction brings a slight performance hit.

Even so, newer version of the MongoDB .NET Driver has simplified a lot the way we filter and run queries. The fluent interface (IFindFluent) brings very much with LINQ way of writing code.

JavaScript
var filterBuilder = Builders<City>.Filter;
var filter = filterBuilder.Eq(x => x.CountryCode, "FR")
				& filterBuilder.Gte(x => x.Population, 10000)
				& filterBuilder.Lte(x => x.Id, ObjectId.Parse("58fc8ae631a8a6f8d000f9c3"));

return await _context.Cities.Find(filter)
							.SortByDescending(p => p.Id)
							.Limit(200)
							.ToListAsync();

where _context is defined as:

JavaScript
var _context = _database.GetCollection<City>("Cities");	

Implementation

Wrapping up, here is my proposal for the paginate function. OR predicates are supported by MongoDb, but it is usually hard for the query optimizer to predict the disjoint sets from the two sides of the OR. Trying to avoid them whenever possible is a known trick for query optimization.

JavaScript
// building where clause
//
private Expression<Func<City, bool>> GetConditions(string countryCode, 
												   string lastBsonId, 
												   int minPopulation = 0)
{
    Expression<Func<City, bool>> conditions 
						= (x => x.CountryCode == countryCode
                               && x.Population >= minPopulation);

    ObjectId id;
    if (string.IsNullOrEmpty(lastBsonId) && ObjectId.TryParse(lastBsonId, out id))
    {
        conditions = (x => x.CountryCode == countryCode
                        && x.Population >= minPopulation
                        && x.Id < id);
    }

    return conditions;

}

public async Task<object> GetCitiesLinq(string countryCode, 
										string lastBsonId, 
										int minPopulation = 0)
{
    try
    {
        var items = await _context.CitiesLinq
                            .Where(GetConditions(countryCode, lastBsonId, minPopulation))
                            .OrderByDescending(x => x.Id)
                            .Take(200)
                            .ToListAsync();

        // select just few elements
        var returnItems = items.Select(x => new
                            {
                                BsonId = x.Id.ToString(),
                                Timestamp = x.Id.Timestamp,
                                ServerUpdatedOn = x.Id.CreationTime,
                                x.Name,
                                x.CountryCode,
                                x.Population
                            });

        int countItems = await _context.CitiesLinq
                            .Where(GetConditions(countryCode, "", minPopulation))
                            .CountAsync();

        return new
            {
                count = countItems,
                items = returnItems
            };
    }
    catch (Exception ex)
    {
        // log or manage the exception
        throw ex;
    }
}

and in the controller:

JavaScript
[NoCache]
[HttpGet]
public async Task<object> Get(string countryCode, int? population, string lastId)
{
	return await _travelItemRepository
					.GetCitiesLinq(countryCode, lastId, population ?? 0);
}

The initial request (sample):

http://localhost:61612/api/city?countryCode=FR&population=10000

followed by other requests where we specify the last retrieved Id:

http://localhost:61612/api/city?countryCode=FR&population=10000&lastId=58fc8ae631a8a6f8d00101f9

Here is just a sample:
Image 3

At the End

I hope this helps, and please let me know if you need this to be extended or have questions.

History

  • 2nd January, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect
Denmark Denmark
My name is Petru Faurescu and I am a solution architect and technical leader. Technical blog: QAppDesign.com

Comments and Discussions

 
-- There are no messages in this forum --