Algorithms

Monte carlo Rabin_Karp Search

17-Oct-08 3:54

The problem is starting to get interesting. For some participant sets there may be no perfect solution.

A characteristic of this problem is that good solutions to the whole problem will tend to be composed of good solutions to subproblems (e.g. with same-color participants matched during certain rounds). This characteristic suggests two promising approaches: 1. Dynamic Programming and 2. Genetic Algorithms.

Dynamic Programming builds up optimal solutions for small numbers of participants, combining them to construct optimal solutions for greater numbers of participants. Genetic Algorithms take a set of complete solutions, rank them, and combine the best ones to (hopefully) make better ones.

A third approach (which may be best if you can figure out how to implement it) is to take a decent solution, then transform it one step at a time to progressively better solutions. For example, order the participants so that matching colors mostly meet during the appropriate rounds. Then for the particpants that DON'T match during these rounds, swap partners so that they DO match. The challenge here is to make other corrections to compensate for this disruption to the paring system.

Angelinna2-Oct-08 14:17

Angelinna

2-Oct-08 14:17

Where can I find sample codes showing implementation of Monte Carlo rabin_karp search.
Thanks

Robert.C.Cartaino2-Oct-08 16:23

Robert.C.Cartaino

2-Oct-08 16:23

From this website[^].

Algorithm 9.2.8 Monte Carlo Rabin-Karp Search
This algorithm searches for occurrences of a pattern p in a text t. It prints out a list of indexes such that with high probability t[i..i +m− 1] = p for every index i on the list.

Input Parameters: p, t
Output Parameters: None

mc_rabin_karp_search(p, t) 
{
      m = p.length
      n = t.length
      q = randomly chosen prime number less than mn2
      r = 2m−1 mod q

      // computation of initial remainders
      f[0] = 0
      pfinger = 0
      for j = 0 to m-1 
      {
            f[0] = 2 * f[0] + t[j] mod q
            pfinger = 2 * pfinger + p[j] mod q
      }

      i = 0
      while (i + m ≤ n) 
      {
            if (f[i] == pfinger)
                  prinln("Match at position" + i)

            f[i + 1] = 2 * (f[i]- r * t[i]) + t[i + m] mod q
            i = i + 1
      }
}

Angelinna2-Oct-08 17:07

Angelinna

2-Oct-08 17:07

Thanks, but am after a working example not just a pseudocode.

Sign of the times...

CPallini2-Oct-08 21:52

CPallini

2-Oct-08 21:52

Angelinna wrote:
but am after a working example not just a pseudocode

plz gimme codez (urgent?) Sigh | :sigh:

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.

This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
-- Iain Clarke

[My articles]

Paul Conrad3-Oct-08 6:28

Paul Conrad

3-Oct-08 6:28

Angelinna wrote:
am after a working example not just a pseudocode

Why can't you take the pseudo code and implement it in what ever language you are programming in?

"The clue train passed his station without stopping." - John Simmons / outlaw programmer

"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

"Not only do you continue to babble nonsense, you can't even correctly remember the nonsense you babbled just minutes ago." - Rob Graham

Array Rearrangement trick [modified]

Tim Craig3-Oct-08 18:34

Tim Craig

3-Oct-08 18:34

Because "she" comes in here and expects guys to fall all over her doing her homework for her. D'Oh! | :doh:

If you don't have the data, you're just another a**hole with an opinion.

abhigad1-Oct-08 9:53

abhigad

1-Oct-08 9:53

Let’s say we have an array of integers

int[] myArray = new int[] {1,2,3,4,5};

~~So the length of this array is 4 [i.e. n=4] since C# array index starts at 0~~

Yes the length will be 5 and not 4 as pointed out in the next post. Its my bad - Sorry!

Define integer k such that 0<= k < n [n = length of an array]

For example, If k = 2 then the output should be
{3,4,5,1,2} i.e starting from kth position move all the array elements to the top of an array.

If k = 3, output would be
{4,5,1,2,3}

Here is the challenge.
Yes this is trivial if we write a loop that starts at 0 and goes up to n like

for(int i =0;i<n;i++){}

We want to optimize this loop so that it would not loop till n-1. anything less than n-1 is a good solution.

[Tip: if you want to reverse this array like 5,4,3,2,1 – you can use the loop like
for(int i=0;i<n/2;i++)

modified on Wednesday, October 1, 2008 5:23 PM

Re: Array Rearrangement trick

CPallini1-Oct-08 10:16

CPallini

1-Oct-08 10:16

abhigad wrote:
So the length of this array is 4 [i.e. n=4] since C# array index starts at 0

The length of the array is 5, independently if it is 0-based or 1-based.
Smile | :)

Re: Array Rearrangement trick

Alan Balkany2-Oct-08 3:42

Re: Array Rearrangement trick

2-Oct-08 3:42

Your question is unclear.

Mark Churchill2-Oct-08 5:16

Mark Churchill

2-Oct-08 5:16

So you mean:

int[] in = new int[] {...}
int[] out = new int[in.Length];
Array.Copy(in (k -> length) => out (0 ...));
Array.Copy(in (0 -> k) => out (length - k));

Or is this some school assignment where you have to shuffle in-place?

Mark Churchill
Director, Dunn & Churchill Pty Ltd
Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.

Alpha release: Entanglar: Transparant multiplayer framework for .Net games.

Digit combination string [modified]

z33z30-Sep-08 21:25

z33z

30-Sep-08 21:25

Hi!
I wan't to construct a list of numbers which covers all combinations of those numbers.

Say for example that I want all combinations of the numbers 1 2 3 4, with the lenght of four (or say numbers 1 to 9 but still with the lenght of four, or if I know that '2' must be in the combination i.e. 2xxx, x2xx, xx2x or xxx2), like 1234, 1324 etc, but as a sequential string, e.g. 1234232 etc, where every new number becomes a new combination (in this case that string tests 1234, 2342, 3423, 4232).

How can I construct such an algorithm to finde the shortest possible string covering all combinations? I think it's called Euler path, but not sure. Did some googling.

Anyone who can push me in the right direction? Maybe an implementation as well?
Thanks in advance!

modified on Wednesday, October 1, 2008 4:41 AM

External sorting: Which algorithm to select

lizardking3d29-Sep-08 1:45

29-Sep-08 1:45

I have a simple question: What is the best algorithm for external sorting.
The external merge sort or the external distribution sort, or would you recommend a completly another sort algorithm?

Background:
I have to program an external sort program in C# but before I have to decide which approach to choose.
Files larger than 2 GB have to be sorted as fast as possible and therefore no internal algorithm is capable of handle such large files.

Alan Balkany1-Oct-08 3:36

1-Oct-08 3:36

Merge sort is the classic way of sorting data that's too large to fit into memory, but shortcuts are possible.

One suggestion: Just read the keys of your records into memory, each paired with the record number it occurs in, e.g.:

(1, key1), (2, key2), ..., (n, keyn).

Then sort these pairs on the keys. When these pairs are sorted, the record numbers will have the sorted order, e.g. the first record number in the sorted pairs will be the record with the lowest key, etc...

Then assuming you can fit k records into memory, read in the k lowest records, write them to your output file, read the next k lowest records, append them to your output file and so on.

When done, your output file will have the sorted data.

lizardking3d5-Oct-08 20:48

5-Oct-08 20:48

I tried this approach, but it seems to be very slow because of the many hd seek commands.

The sort of the (Key,Line Number)-pairs is rather fast, but the adjacent building of the output file takes quite a while. To get one specific line I compute the absolute address of the line and deduct a seek command.

I also tried the direct approach, holding (Key,Complete line)-pairs is about two times faster, although fewer pairs fit into main memory, but no successional seeking is necessary

The results also surprised me. Further ideas are welcome Smile | :)

Alan Balkany6-Oct-08 3:29

6-Oct-08 3:29

A further idea: Get a 64-bit machine, which will allow more than 4 gigs of memory, and do a normal sort.

If this must run on a 32-bit machine, I could provide a program that could read the whole file into memory compressed, sort it in its compressed form, and write out the sorted file, uncompressed. It would take some work so I'd have to charge for it however. Also the amount of compression would depend on characteristics of your data; random numbers wouldn't compress, but real-world data would probably compress to about 10% of the original size.

lizardking3d6-Oct-08 4:08

6-Oct-08 4:08

64-bit: Would be nice, but my clients won't change their complete IT infrastructure.

What type of compression do you recommend?
I tried zip and gzip with lowest compression rate, but they slowed down the overall process too much.
Could a simple Run-Length-Compression be satisfying?

Alan Balkany6-Oct-08 4:22

6-Oct-08 4:22

Run-length compression would help if you have long sequences of the same symbol in your data.

I have a compression technique that gives more compression than in currently-available commercial products. If you give me an email address I'll send you a link that describes it.

Mark Churchill6-Oct-08 5:23

Mark Churchill

6-Oct-08 5:23

If your keys fit in memory fine, and sort fine, then that is as fast as its going to get. If the seek to pull data off the hard disk by key is an issue, then you need more RAM.

Think of it this way. You have the sorted keys, so your only challenge is to pull records off disk. Thats going to be slow. I'd start to look at why you have a 2 gig flatfile in the first place, and why that needs to be sorted so quickly.

supercat921-Oct-08 12:49

supercat9

21-Oct-08 12:49

Did you understand Mr. Balkany's suggestion? If you can fit 1/16 of the data in RAM along with all the indices, then you should be able to process everything (after the sort) in 16 passes through the data file, with no random seeks.

Suppose, for example, that there are 16,000,000 records and you can hold an array recBuff of 1,000,000 records in RAM along with an array finalPos of 16,000,000 integers. First, fill in finalPos such that finalPos(0) says where record #0 in the original file should go; finalPos(1) says where record #1 should go, etc. This can be done in linear time.

Next, read through the entire source file; after reading record #n from the file, look at finalPos(n). If it's less than 1,000,000 then store the record in recBuff(finalPos(n)). Otherwise discard it. Once this is done, recBuff(0..999999) will hold the first million records. Write them to disk.

Now read through the source file again. This time, look for records where finalPos(n) is in the range 1,000,000 to 1,999,999 and store those records in recBuff(finalPos(n)-1000000). Once all records have been read, recBuff will hold the next million records. Write those to disk.

If recBuff and finalPos fit in RAM without swapping, the program should run very fast. Doubling the number of items in recBuff will double the speed, if it does not cause swapping. If it does cause swapping, it will dog the performance.

If there are so many records that the finalPos array itself takes an excessive amount of space, a temp file could be created which interleaves the source data with the finalPos items (since finalPos is always read in order). That would free up more space for recBuff.

lizardking3d22-Oct-08 1:57

22-Oct-08 1:57

First of all I have to ask, if this is the algorithm called "bucket sort" / "radix sort", because you didn't mentioned any comparison operations.

The point I don't get is how one record is classifed to the current recBuf()-borders.

I will demonstrate my lack of understanding with an example.

Given an unsorted array: 5,3,2,9,1,8,2,4,7,2,6. Let's assume that my internal memory can only hold 3 values Wink | ;)

First, I read the entire array and my goal is to classify 1,2,2 into the first recBuf(0..2). And that's my problem of understanding, how can I know that "3" belongs to the second recBuf(3..5).
There are three instances of "2" and so the array is not uniformly distributed.