Click here to Skip to main content
15,908,581 members
Articles / Web Development / ASP.NET
Article

Paging of Large Resultsets in ASP.NET

Rate me:
Please Sign up or sign in to vote.
4.92/5 (174 votes)
5 Aug 200411 min read 1.1M   26.6K   593   138
An article about optimization and performance testing of MS SQL Server 2000 stored procedures used for paging of large resultsets in ASP.NET

Introduction

The paging of a large database resultset in Web applications is a well known problem. In short, you don't want all the results from your query to be displayed on a single Web page, so some sort of paged display is more appropriate. While it was not an easy task in the old ASP, the DataGrid control in the ASP.NET simplifies this to a few lines of code. So, the paging is easy in ASP.NET, but the default behavior of the DataGrid is that all resulting records from your query will be fetched from SQL server to the ASP.NET application. If your query returns a million records this will cause some serious performance issues (if you need convincing, try executing such a query in your web application and see the memory consumption of the aspnet_wp.exe in the task manager). That's why a custom paging solution is required where desired behavior is to fetch only the rows from the current page.

There are numerous articles and posts concerning this problem and several proposed solutions. My goal here is not to present you with an amazing solves-it-all procedure, but to optimize all the existing methods and provide you with a testing application so you can do evaluation on your own. Here is a good starting point article which describes many different approaches and provides some performance test results:

How do I page through a recordset?

I was not satisfied with the most of them. First, half of the methods use old ADO and are clearly written for the "old" ASP. The rest of the methods are SQL server stored procedures. Some of them yield poor response times as you can see from the author’s performance results at the bottom of the page, but several have caught my attention.

Generalization

The three methods I decided to closely look into are the ones the author calls TempTable, DynamicSQL and Rowcount. I'll refer to the second method as the Asc-Desc method in the rest of this text. I don't think DynamicSQL was a good name, because you can apply dynamic SQL logic to the other methods too. The general problem with all these stored procedures is that you have to assess which columns you'll allow sorting for and that won't probably be just the PK column(s). This leads to a new set of problems – for each query you want to display via paging you must have as many different paging queries as you have different sorting columns. This means that you will either have a different stored procedure (regardless of the paging method applied) for each sorting column or you'll try to generalize this to only one stored procedure with the help of dynamic SQL. This has a slight performance impact, but increases maintainability if you need to display many different queries using this approach. Thus, I’ll try to generalize all of the stored procedures in this text with dynamic SQL, but in some cases it will be possible to achieve only a certain level of generalization, so you’ll still have to write separate stored procedures for some complex queries.

The second problem with allowing other sorting columns beside the PK column(s) is that if those columns are not indexed in some way, none of these methods will help. In all of them a paged source must be sorted first and the cost of using ordering by non-indexed column is immense for large tables. The response times are so high that all the procedures are practically unusable in this case (the response varies from couple of seconds to couple of minutes depending on the size of the tables and the starting record being fetched). The indexing of other columns brings more performance issues and may be undesirable, for example it might significantly slow you down in a situation where you have a lot of daily imports.

TempTable

The first one I would comment on is the TempTable method. This is actually a widely proposed solution and I encountered it several times. Here is another article that describes it along with the explanation and a sample how to use custom paging with the DataGrid:

ASP.NET DataGrid Paging Part 2 - Custom Paging

The methods in both articles could be optimized with just the Primary Key data copied to the temp table and then doing the join with the main query. Therefore, the essence of this method would be the following

SQL
CREATE TABLE #Temp (
    ID int IDENTITY PRIMARY KEY,
    PK  /* here goes PK type */
)

INSERT INTO #Temp SELECT PK FROM Table ORDER BY SortColumn

SELECT ... FROM Table JOIN #Temp temp ON Table.PK = temp.PK ORDER BY temp.ID <BR>WHERE ID > @StartRow AND ID < @EndRow
The method can be optimized further by copying the rows to the temp table until the end paging row is reached (SELECT TOP EndRow...), but the point is that in the worst case – for a table with 1 million records you end up with 1 million records in a temp table as well.  Considering all this and having looked upon the results in the article above, I decided to discard this method from my tests.

Asc-Desc

This method uses default ordering in a subquery and then applies the reverse ordering. The principle goes like this

SQL
DECLARE @temp TABLE (
    PK  /* PK Type */ NOT NULL PRIMARY 
)

INSERT INTO @temp 
SELECT TOP @PageSize PK FROM (
    SELECT TOP (@StartRow + @PageSize) 
    PK, 
    SortColumn /*If sorting column is defferent from the PK, SortColumn must <BR>                 be fetched as well, otherwise just the PK is necessary */ 
    ORDER BY SortColumn /* default order – typically ASC */) 
ORDER BY SortColumn /* reversed default order – typically DESC */

SELECT ... FROM Table JOIN @Temp temp ON Table.PK = temp.PK <BR>ORDER BY SortColumn /* default order */
Full Code – Paging_Asc_Desc

RowCount

The base logic of this method relies on the SQL SET ROWCOUNT expression to both skip the unwanted rows and fetch the desired ones:

SQL
DECLARE @Sort /* the type of the sorting column */
SET ROWCOUNT @StartRow
SELECT @Sort = SortColumn FROM Table ORDER BY SortColumn
SET ROWCOUNT @PageSize
SELECT ... FROM Table WHERE SortColumn >= @Sort ORDER BY SortColumn
Full Code – Paging_RowCount

SubQuery

There are 2 more methods I’ve taken into consideration, and they come from different resources. The first one is well known triple query or the SubQuery method. The most thorough approach is the one I’ve found in the following article

Server-Side Paging with SQL Server

Although you'll need to be subscribed, a .zip file with the SubQuery stored procedure variations is available. The Listing_04.SELECT_WITH_PAGINGStoredProcedure.txt file contains the complete generalized dynamic SQL. I used a similar generalization logic with all other stored procedures in this text. Here is the principle followed by the link to the whole procedure (I shortened the original code a bit, because a recordcount portion was unnecessary for my testing purposes).

SQL
SELECT ... FROM Table WHERE PK IN 
    (SELECT TOP @PageSize PK FROM Table WHERE PK NOT IN
        (SELECT TOP @StartRow PK FROM Table ORDER BY SortColumn)
    ORDER BY SortColumn)
ORDER BY SortColumn
Full Code – Paging_SubQuery

Cursor

I’ve found the last method while browsing through the Google groups, you can find the original thread here. This method uses a server-side dynamic cursor. A lot of people tend to avoid cursors, they usually have poor performance because of their non-relational, sequential nature. The thing is that paging IS a sequential task and whatever method you use you have to somehow reach the starting row. In all the previous methods this is done by selecting all rows preceding the starting row plus the desired rows and then discarding all the preceding rows. Dynamic cursor has the FETCH RELATIVE option which does the “magic” jump. The base logic goes like this

SQL
DECLARE @PK /* PK Type */
DECLARE @tblPK TABLE (
    PK /* PK Type */ NOT NULL PRIMARY KEY
)

DECLARE PagingCursor CURSOR DYNAMIC READ_ONLY FOR
SELECT @PK FROM Table ORDER BY SortColumn

OPEN PagingCursor
FETCH RELATIVE @StartRow FROM PagingCursor INTO @PK

WHILE @PageSize > 0 AND @@FETCH_STATUS = 0
BEGIN
    INSERT @tblPK(PK) VALUES(@PK)
    FETCH NEXT FROM PagingCursor INTO @PK
    SET @PageSize = @PageSize - 1
END

CLOSE PagingCursor
DEALLOCATE PagingCursor

SELECT ... FROM Table JOIN @tblPK temp ON Table.PK = temp.PK <BR>ORDER BY SortColumn
Full Code – Paging_Cursor

Generalization of Complex Queries

As pointed out before, all the procedures are generalized with dynamic SQL, thus, in theory, they can work with any kind of complex query. Here is a complex query sample that works with Northwind database.

SQL
SELECT Customers.ContactName AS Customer, 
       Customers.Address + ', ' + Customers.City + ', ' + <BR>                                                Customers.Country AS Address, 
       SUM([Order Details].UnitPrice*[Order Details].Quantity) AS <BR>                                                          [Total money spent] 
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID
WHERE Customers.Country <> 'USA' AND Customers.Country <> 'Mexico'
GROUP BY Customers.ContactName, Customers.Address, Customers.City, <BR>         Customers.Country 
HAVING (SUM([Order Details].UnitPrice*[Order Details].Quantity))>1000
ORDER BY Customer DESC, Address DESC
The paging stored procedure call that returns the second page looks like this
SQL
EXEC ProcedureName
/* Tables */
'Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID',
/* PK */
'Customers.CustomerID',
/* ORDER BY */
'Customers.ContactName DESC, Customers.Address DESC',
/* PageNumber */
2,
/* Page Size */
10,
/* Fields */
'Customers.ContactName AS Customer,
Customers.Address + '', '' + Customers.City + '', '' + Customers.Country <BR>                                                                  AS Address, 
SUM([Order Details].UnitPrice*[Order Details].Quantity) AS [Total money spent]',
/* Filter */
'Customers.Country <> ''USA'' AND Customers.Country <> ''Mexico''',
/*Group By*/
'Customers.CustomerID, Customers.ContactName, Customers.Address, <BR> Customers.City, Customers.Country 
HAVING (SUM([Order Details].UnitPrice*[Order Details].Quantity))>1000'

Note that in the original query, aliases are used in the ORDER BY clause. You can't do that in paging procedures, because the most time-consuming task in all of them is skipping rows preceding the starting row. This is done in various ways, but the principle is not to fetch all the required fields at first, but only the PK column(s) (in case of RowCount method the sorting column), which speeds up this task. All required fields are fetched only for the rows that belong to the requested page. Therefore, field aliases don't exist until the final query, and sorting columns have to be used earlier (in row skipping queries).

The RowCount procedure has another problem, it is generalized to work with only one column in the ORDER BY clause. The same goes for Asc-Desc and Cursor methods, though they can work with several ordering columns, but require that only one column is included in the PK. I guess this could be solved with more dynamic SQL, but in my opinion it is not worth the fuss. Although these situations are highly possible, they are not that frequent. Even if they are, you can always write a separate paging procedure following the principles above.

Performance Testing

I used these 4 methods in my tests, if you have a better one, I’d be glad to know about it. Nevertheless, I wanted to compare these methods and measure their performance. The first thought was to write an ASP.NET test application with paged DataGrid and then measure page response. Still, this wouldn’t reflect the true response time of the stored procedures, so the console application seemed more appropriate. I also included a web application, not for performance testing, but rather as an example of how DataGrid custom paging works with these stored procedures. They are both incorporated in the PagingTest Solution.

I used the auto generated large table for my tests and inserted around 500 000 records in it. If you don’t have a large table to experiment on, you can download the script for a table design and stored procedure for data generation here. I didn't want an identity column for my PK, I used the uniqueidentifier instead. If you'll use this script, you may consider to add an identity after you generate the table. It will add numbers sorted by PK and you'll have an indication that correct page is fetched when you call a paging procedure with PK sorting.

The idea behind performance testing was to call a specific stored procedure many times through a loop and then measure the average response time. Also, in order to remove caching deviations and to model the real situation more accurately – multiple calls to a stored proc with the same page fetched each time seemed inappropriate. Thus, a random sequence of the same stored procedure with a set of different page numbers was required. Of course, a set of different page numbers assumes fixed number of pages (10 – 20) where each page would be fetched many times, but in a random sequence.

It’s not hard to notice that response times depend on the distance of the fetched page from the beginning of the resultset. The further the starting record is, more records need to be skipped. This is the reason I didn’t include first 20 pages in my random sequence. Instead I used the set of 2N pages. A loop was set to a (number of different pages)*1000. So, every page was fetched around 1000 times (more or less because of a random distribution).

Results

Here are the results I've got - Paging_Results (MS Excell file)
Image 1
Image 2
Image 3
Image 4
Image 5

Conclusion

The methods performed in the following order, starting from the best one - RowCount, Cursor, Asc-Desc and Subquery. The behavior in the lower portion was especially interesting, because in many real situations you'll browse beyond the first five pages rarely, so the Subquery method might satisfy your needs in those cases. It all depends on the size of your resultset and the prediction how frequently will the distant pages be fetched. You might use the combination of methods as well. As for myself, I decided to use the RowCount method wherever possible. It beaves quite nice, even for the first page. The "wherever possible" part stands for some cases where it's hard to generalize this method, then I would use the Cursor (possibly combined with the SubQuery for the first couple of pages).

Update 2004-05-05

The main reason I wrote this article was the feedback from the vast programming community. In a couple of weeks I'll be starting work on a new project. The preliminary analysis showed that there's going to be a couple of very large tables involved. These tables will be used in many complex joined queries and their results will be displayed in the ASP.NET application (with sorting and paging enabled). That's why I invested some time in research and pursue for the best paging method. It wasn't just the performance that interested me, but also the usability and maintainability.

Now the invested time has started to pay off already. You can find a post by C. v. Berkel below (many thanks) in which he found a flaw in the RowCount method. It won't work correctly if the sorting column is not unique. The RowCount method performed the best in my tests, but now I am seriously considering not using it at all. In most cases sorting columns (besides the PK) won't be unique. This leaves me with the Cursor method as the fastest and applicable to most situations. It can be combined with the SubQuery method for the first couple of pages and possibly with the RowCount method for unique sorting columns.

Another thing which may be worth mentioning is that there's a tiny flaw in the Asc-Desc method as well. It always returns the PageSize number of records for the last page and not the actual number (which may be lower than the PageSize). The correct number can be calculated but since I don't intend to use this procedure (because of how it performed), I didn't want to improve it any further.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Software Developer Mono Ltd
Croatia Croatia
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionWhat about... Pin
Rocky Moore12-May-04 6:23
Rocky Moore12-May-04 6:23 
GeneralIt was a lot of work ....and now it is time this features to become a standard operator and stop people from wondering Pin
idishkov11-May-04 0:47
idishkov11-May-04 0:47 
GeneralRe: It was a lot of work ....and now it is time this features to become a standard operator and stop people from wondering Pin
Tom Pester7-Jun-04 14:31
Tom Pester7-Jun-04 14:31 
GeneralRe: It was a lot of work ....and now it is time this features to become a standard operator and stop people from wondering Pin
mgorham12-Jul-04 9:17
sussmgorham12-Jul-04 9:17 
GeneralRe: It was a lot of work ....and now it is time this features to become a standard operator and stop people from wondering Pin
Anonymous20-May-05 14:05
Anonymous20-May-05 14:05 
GeneralRe: It was a lot of work ....and now it is time this features to become a standard operator and stop people from wondering Pin
Tom Pester21-May-05 7:51
Tom Pester21-May-05 7:51 
GeneralFalse assertions Pin
pmoldovan10-May-04 23:15
pmoldovan10-May-04 23:15 
GeneralRe: False assertions Pin
Jasmin Muharemovic18-May-04 23:15
Jasmin Muharemovic18-May-04 23:15 
Sorry for not responding earlier, this is exactly the kind of feedback I was hoping for.

1. It seems that you are correct on this one, I was misguided, but I do find it strange that a lot of people claim this to be true. I've found this info in many places, both google groups threads and various web sites. Here's a really good article on this issue

Should I use a #temp table or a @table variable?

A quote from there:
"A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this."

I missed the bolded part when I first read this, but after a bit more effort I managed to find the Microsoft official response at

MS KB 305977

Quote:
"Q4: Are table variables memory-only structures that are assured better performance as compared to temporary or permanent tables, because they are maintained in a database that resides on the physical disk?

A4: A table variable is not a memory-only structure. Because a table variable might hold more data than can fit in memory, it has to have a place on disk to store data. Table variables are created in the tempdb database similar to temporary tables. If memory is available, both table variables and temporary tables are created and processed while in memory (data cache)."

So, it's clear now and I thank you for this correction. I'll update the article as soon as possible and remove this part as I don't want to misguide more people myself.

2. I know that dynamic SQL isn't optimal at all and I do tend to avoid it whenever I can. The problem is that in my project I'm faced with dozens of queries which return large resultsets and need to be displayed in an ASP.NET application. If I won't use dynamic SQL I'll have to write (number of queries)*(number of sorting columns in each query) = number of different paging stored procedures. This could easily mean up to a hundred of them. I don't wish to manually write all of them, even if it is a lot of copy/paste operations and then changing a few parameters, eventually making type mistakes, etc. Performance in this case decreases maintanability and viceversa, though I know that disecting queries into pieces with dynamic SQL isn't also that maintainable and it is rather annoying. I did consider writing a template in CodeSmith code generation tool. This way I would have a hundred paging procedures in the database, but at least I wouldn't be writing them myself.

Still, I wouldn't discard Dynamic SQL that easy, the part you said that compiling and building execution plan takes 3-10 times more than executing a well designed stored procedure may be true, but SQL server caches execution plans and repeated calls to dynamic SQL proc with the same parameters will be much faster. Ok, cached plan won't last forever because of aging (Books Online: Execution Plan Caching and Reuse), but it won't be compiled every time. I guess I should have tried to use sp_executesql instead of EXECUTE, which would significantly decrease the number of cached paging procedures and the probability for their execution plans to be removed from cache because of aging.

3. You have an interesting point here. I know about VirtualItemCount property, but I thought that calling SELECT COUNT ... (inside or outside the paging proc) was inevitable in this case. Another thing, when I executed SELECT COUNT(ID) on my LargeTable (500 000 rows) it lasted 9 seconds the first time and a glimpse of a second on each following repetition. That's because SQL server caches data as well as execution plans. The sqlservr.exe process in the task manager showed mem usage increase of around 90.000K. I hoped I could rely on sql caching, but on second thought I will have to deal with couple of large tables and the cache size is limited. Maybe I discarded temp table method too easily, but I don't think the insertion of 500 000 IDs into temp table would be that fast especially because this operation needs to be repeated since the temp table (i.e. table variable) would be created and destroyed on each procedure call. It may seem to be fast if caching is involved again, but then I can use other methods as well. The use of global temp table is out of the question because it wouldn't reflect true situation if data is changed often (and in my case it will be).

This search for perfect paging solution seems more and more like the quest for holy grail to me. I am very interested on your comments about all this.

Kind Regards,

.....................................................
Jasmin Muharemovic
Mono Ltd
http://www.mono-software.com
.....................................................
GeneralRe: False assertions Pin
pmoldovan19-May-04 1:12
pmoldovan19-May-04 1:12 
GeneralRe: False assertions Pin
bteskera17-Jun-04 1:23
bteskera17-Jun-04 1:23 
Generalmodified version of PagingCursor Pin
drittich7-May-04 6:10
drittich7-May-04 6:10 
GeneralRe: modified version of PagingCursor Pin
drittich7-May-04 6:16
drittich7-May-04 6:16 
GeneralRe: modified version of PagingCursor Pin
drittich7-May-04 6:31
drittich7-May-04 6:31 
GeneralRe: modified version of PagingCursor Pin
Jasmin Muharemovic10-May-04 1:09
Jasmin Muharemovic10-May-04 1:09 
GeneralRe: modified version of PagingCursor Pin
drittich10-May-04 3:32
drittich10-May-04 3:32 
Generalerror in : modified version of PagingCursor Pin
gsharma201114-Dec-04 22:03
gsharma201114-Dec-04 22:03 
GeneralPossible flaw in your &#8220;RowCount&#8221; procedure Pin
KcN33z4-May-04 6:59
KcN33z4-May-04 6:59 
GeneralRe: Possible flaw in your ?RowCount? procedure Pin
Anonymous5-May-04 0:55
Anonymous5-May-04 0:55 
GeneralRe: Possible flaw in your ?RowCount? procedure Pin
Tom Pester5-May-04 0:55
Tom Pester5-May-04 0:55 
GeneralRe: Possible flaw in your ?RowCount? procedure Pin
Jasmin Muharemovic5-May-04 1:23
Jasmin Muharemovic5-May-04 1:23 
GeneralRe: Possible flaw in your ?RowCount? procedure Pin
Anonymous5-May-04 2:47
Anonymous5-May-04 2:47 
GeneralRe: Possible flaw in your ?RowCount? procedure Pin
Tom Pester5-May-04 2:48
Tom Pester5-May-04 2:48 
GeneralRe: Possible flaw in your ?RowCount? procedure Pin
Jasmin Muharemovic6-May-04 20:40
Jasmin Muharemovic6-May-04 20:40 
GeneralRe: Possible flaw in your ?RowCount? procedure Pin
Tom Pester7-May-04 0:13
Tom Pester7-May-04 0:13 
GeneralRe: Possible flaw in your ?RowCount? procedure Pin
Ryan Healey22-Sep-04 11:57
Ryan Healey22-Sep-04 11:57 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.