Click here to Skip to main content
15,895,011 members
Home / Discussions / Algorithms
   

Algorithms

 
GeneralRe: External sorting: Which algorithm to select Pin
lizardking3d5-Oct-08 20:48
lizardking3d5-Oct-08 20:48 
GeneralRe: External sorting: Which algorithm to select Pin
Alan Balkany6-Oct-08 3:29
Alan Balkany6-Oct-08 3:29 
GeneralRe: External sorting: Which algorithm to select Pin
lizardking3d6-Oct-08 4:08
lizardking3d6-Oct-08 4:08 
GeneralRe: External sorting: Which algorithm to select Pin
Alan Balkany6-Oct-08 4:22
Alan Balkany6-Oct-08 4:22 
GeneralRe: External sorting: Which algorithm to select Pin
Mark Churchill6-Oct-08 5:23
Mark Churchill6-Oct-08 5:23 
GeneralRe: External sorting: Which algorithm to select Pin
supercat921-Oct-08 12:49
supercat921-Oct-08 12:49 
GeneralRe: External sorting: Which algorithm to select Pin
lizardking3d22-Oct-08 1:57
lizardking3d22-Oct-08 1:57 
GeneralRe: External sorting: Which algorithm to select Pin
supercat922-Oct-08 7:38
supercat922-Oct-08 7:38 
How many records are there, how big are the keys, and how big are the records? Do you have one, two, or more disk drives available for processing?

If the keys are small enough (and there are few enough of them) that they can all fit into memory, you should start by sorting the keys (each one accompanied by an integer giving the location in the original file). Then proceed as I described.

If the number of records so large that e.g. only 10% of the keys will fit into memory, then I would suggest that you come up with some means of partitioning the keys into, say, 65,536 buckets. It doesn't matter whether the distribution is particularly even, provided that no single bucket holds more than 10%, and preferably no more than 2% or so. Make a pass through the file and count how many keys fit into each bucket.

Once that is done, count how many buckets one could add, starting at the bottom, before they totaled 10% of the records. Make a pass through the original file reading into RAM all the records that fit into those buckets. Then sort them in RAM and write them out. Then repeat the procedure, starting the the bucket after the last one that was used in the first pass. Then sort those and write those out.

The exact procedure you should use will vary depending upon what your data looks like and the number of separate disks available. Nonetheless, the key observations are (1) it's often good to sort with records containing just key and a reference to the original record, since the number of such records that can fit in RAM is larger than the number of full records that would fit; (2) it's better to think in terms of reading through a whole file, fetching some data into RAM and ignoring other data, than to think in terms of grabbing lots of little pieces of data scattered through a file; (3) though I haven't touched on this much, for really big jobs, having two or three hard drives will help things a lot.
NewsHuge new prime number discovered Pin
73Zeppelin27-Sep-08 22:41
73Zeppelin27-Sep-08 22:41 
GeneralRe: Huge new prime number discovered Pin
Bassam Abdul-Baki28-Sep-08 15:16
professionalBassam Abdul-Baki28-Sep-08 15:16 
JokeRe: Huge new prime number discovered Pin
Nelek14-Oct-08 0:52
protectorNelek14-Oct-08 0:52 
QuestionScapegoat Tree Issue Pin
loctarar25-Sep-08 2:47
loctarar25-Sep-08 2:47 
Generalsuggest me an algorithm Pin
niconicx22-Sep-08 23:08
niconicx22-Sep-08 23:08 
GeneralRe: suggest me an algorithm Pin
73Zeppelin22-Sep-08 23:14
73Zeppelin22-Sep-08 23:14 
GeneralRe: suggest me an algorithm Pin
Paul Conrad25-Sep-08 7:13
professionalPaul Conrad25-Sep-08 7:13 
GeneralRe: suggest me an algorithm Pin
CPallini27-Sep-08 22:07
mveCPallini27-Sep-08 22:07 
GeneralRe: suggest me an algorithm Pin
PIEBALDconsult21-Nov-08 5:29
mvePIEBALDconsult21-Nov-08 5:29 
GeneralRe: suggest me an algorithm Pin
CPallini22-Nov-08 21:00
mveCPallini22-Nov-08 21:00 
QuestionI'm sorry if this sounds like a weird question... Pin
mhtirogla18-Sep-08 20:53
mhtirogla18-Sep-08 20:53 
AnswerRe: I'm sorry if this sounds like a weird question... Pin
73Zeppelin18-Sep-08 21:02
73Zeppelin18-Sep-08 21:02 
QuestionInterpX function limitation (within Interp32.xll Excel addin) [modified] Pin
M.Slipper18-Sep-08 7:04
M.Slipper18-Sep-08 7:04 
AnswerRe: InterpX function limitation (within Interp32.xll Excel addin) Pin
73Zeppelin18-Sep-08 12:10
73Zeppelin18-Sep-08 12:10 
GeneralRe: InterpX function limitation (within Interp32.xll Excel addin) Pin
M.Slipper18-Sep-08 14:49
M.Slipper18-Sep-08 14:49 
GeneralRe: InterpX function limitation (within Interp32.xll Excel addin) Pin
73Zeppelin18-Sep-08 20:51
73Zeppelin18-Sep-08 20:51 
GeneralRe: InterpX function limitation (within Interp32.xll Excel addin) Pin
M.Slipper19-Sep-08 7:58
M.Slipper19-Sep-08 7:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.