Click here to Skip to main content
15,899,124 members
Home / Discussions / Algorithms
   

Algorithms

 
GeneralRe: Threading problem Pin
Anthony Mushrow15-Feb-11 7:49
professionalAnthony Mushrow15-Feb-11 7:49 
GeneralRe: Threading problem Pin
Alan Balkany15-Feb-11 8:07
Alan Balkany15-Feb-11 8:07 
GeneralRe: Threading problem Pin
Anthony Mushrow15-Feb-11 8:21
professionalAnthony Mushrow15-Feb-11 8:21 
GeneralRe: Threading problem Pin
Luc Pattyn15-Feb-11 8:57
sitebuilderLuc Pattyn15-Feb-11 8:57 
GeneralRe: Threading problem Pin
JesperMadsen12321-Feb-11 10:09
JesperMadsen12321-Feb-11 10:09 
AnswerRe: Threading problem Pin
Luc Pattyn15-Feb-11 7:57
sitebuilderLuc Pattyn15-Feb-11 7:57 
GeneralRe: Threading problem Pin
Anthony Mushrow15-Feb-11 8:10
professionalAnthony Mushrow15-Feb-11 8:10 
QuestionMerging sorted data blocks. Pin
Member 419459313-Feb-11 16:59
Member 419459313-Feb-11 16:59 
What is the best (by that I mean the fastest) way to merge some smaller files of sorted data into a larger file, and then later merge the larger files into a single sorted file.

Some particulars:

Input: text files each 6 GB, short records of 500,000,000 10 digit numbers in three formats, sequential ascending, sequential descending, and random selection order, long records with null strings to 65533 byte strings in random order, with many duplicate strings, and then as unique strings where the last 10 digits of each line are replaced with one of the random short strings. A total of 5 different files for 5 different tests.

Memory is allocated with VirtualAlloc (3 GB enabled in boot.ini and in the Link line and 4 GB if memory in the system), memory blocks of 2 GB, 1 GB, and 47 MB, plus 2 more small blocks are allocated.

The first step does the initial sort. The 2GB buffer is used as an input buffer, the 1 GB buffer is used for building mod 16 aligned and sized strings and for the nodes for a RedBlackTree, the 47 MB buffer is used to hold pointers to the strings for the sort tree and for the records moved after the sort. For the short records, this results in about 211 sorted blocks written to a file. This step is complete for now.

The next step combines the 47 MB sorted blocks into a big sort block (2 GB) using the 1 GB buffer for tree nodes and 28 sort blocks are combined into each big block (limited by the node space available - 16 BYTES for each record in the tree), and there are 8 big sort blocks written to a file. This step is complete for now, but could be changed if there is a better merge solution.

The final step needs to merge these big blocks into a single output file, but has to be done in a fragmented mode because there is not enough memory to hold the files. The 2 GB buffer is split into pieces (8 in this case for the 8 big sort blocks). The header from each sort block and the initial pointers from each sort block and the initial associated records from each sort block are read into the 8 pieces of the 2 GB buffer (with an initial header pointing to each big sort block).

Essentially you need to sort the big blocks in ascending sequence by the first record in each big block (how to do this efficiently?), then select and output all the records in the first big block that are lower than the first record in the second big block (how to do this efficiently?), then unlink the first buffer and re-link it with the other set of buffers in ascending sequence (how to do this efficiently?), refilling each block buffer as it exhausts, deleting the block as all of its records are selected and written.

Unfortunately, 6 GB is not necessarily the maximum size file so the 8 big blocks could become thousands so linked lists are not efficient. Arrays of thousands of pointers have insertion timing property problems. Trees are better at insertions than arrays, but have timing problems as well (RedBlackTree recursive rotations, especially with deletions).

Which way is best (overall) for merging, and why?

Dave.
AnswerRe: Merging sorted data blocks. Pin
Luc Pattyn14-Feb-11 1:15
sitebuilderLuc Pattyn14-Feb-11 1:15 
AnswerRe: Merging sorted data blocks. Pin
Member 419459314-Feb-11 4:00
Member 419459314-Feb-11 4:00 
GeneralRe: Merging sorted data blocks. Pin
Stefan_Lang14-Feb-11 23:25
Stefan_Lang14-Feb-11 23:25 
GeneralRe: Merging sorted data blocks. Pin
Member 419459315-Feb-11 4:41
Member 419459315-Feb-11 4:41 
GeneralRe: Merging sorted data blocks. Pin
Stefan_Lang15-Feb-11 5:31
Stefan_Lang15-Feb-11 5:31 
GeneralRe: Merging sorted data blocks. Pin
Member 419459315-Feb-11 7:30
Member 419459315-Feb-11 7:30 
AnswerRe: Merging sorted data blocks. Pin
dasblinkenlight14-Feb-11 22:29
dasblinkenlight14-Feb-11 22:29 
GeneralRe: Merging sorted data blocks. Pin
Member 419459315-Feb-11 4:47
Member 419459315-Feb-11 4:47 
QuestionRe: Merging sorted data blocks. Pin
musefan15-Feb-11 6:22
musefan15-Feb-11 6:22 
AnswerRe: Merging sorted data blocks. Pin
Member 419459315-Feb-11 7:02
Member 419459315-Feb-11 7:02 
AnswerRe: Merging sorted data blocks. Pin
Eddy Vluggen17-Feb-11 7:47
professionalEddy Vluggen17-Feb-11 7:47 
GeneralRe: Merging sorted data blocks. Pin
Member 419459317-Feb-11 15:16
Member 419459317-Feb-11 15:16 
GeneralRe: Merging sorted data blocks. Pin
Eddy Vluggen18-Feb-11 11:29
professionalEddy Vluggen18-Feb-11 11:29 
GeneralRe: Merging sorted data blocks. Pin
Member 419459319-Feb-11 16:15
Member 419459319-Feb-11 16:15 
GeneralRe: Merging sorted data blocks. Pin
Eddy Vluggen20-Feb-11 1:29
professionalEddy Vluggen20-Feb-11 1:29 
GeneralRe: Merging sorted data blocks. Pin
Member 419459322-Feb-11 12:11
Member 419459322-Feb-11 12:11 
QuestionIn what mathematical/programming tasks are used sparse graphs? Pin
Marsellos5-Feb-11 3:06
Marsellos5-Feb-11 3:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.