Algorithms

Re: Algorithm to find a given number is prime or not.

6-Sep-14 1:21

Hi friends, need help!! i am absloute beginner and need advise. below algorithm is an extrat from a text book but when i try to apply and solve the problem on paper i see that this algorithm will fail. as i get a remainder 0 every digit i key in till 8 (i took number 8 as an examole and applied below)...please advise....also i found that applying this algorithm on number 2 would result in 0 as well and if it is 0 then not prime, then how is this algorithm correct!

1 start
2 read the number num
3 [initialize]
i <--2, flag <--1
4 repeatnsteps 4 through 6 unitl i

modified 6-Sep-14 8:03am.

harold aptroot6-Sep-14 2:32

Re: Algorithm to find a given number is prime or not.

6-Sep-14 2:32

Did you type it over correctly? Steps 5 and 6 are obviously missing, and "until i" is not a condition.

Member 110631236-Sep-14 4:01

Re: Algorithm to find a given number is prime or not.

6-Sep-14 4:01

4th step - repeat steps 4 through 6 unitl i <num or flag =0 5) rem <--num mod i 6) if rem=0 then
flag<-- 0 else i<--i+1 7) if flag =0 then print number is not prime else prit number is prime 8) stop
in this step if i use number 2 as an example it would result in 0 which will result it number being non-prime.

-- modified 6-Sep-14 10:12am.

harold aptroot6-Sep-14 4:10

Re: Algorithm to find a given number is prime or not.

6-Sep-14 4:10

Ok, now it makes more sense. The problem here is that 2 is a special case, and they did not handle it. Simply add a step: if the number is two, it is prime.
So in short, yes, the book is wrong and you are right.

edit: ok now it makes less sense, why are most of the steps gone again? What I wrote above should still apply though.

Member 110631236-Sep-14 4:14

Re: Algorithm to find a given number is prime or not.

6-Sep-14 4:14

Thank you

, i was editing so the steps can be read properly....atleast i know that i was not wrong. i appricate you help and gives me confidence that peoople online can help me with my problems while i am learning.

Member 110631236-Sep-14 4:18

Re: Algorithm to find a given number is prime or not.

6-Sep-14 4:18

also, i found that with this approach if i have number 99 the remainder will show as 1 and which would say it is a prime number, but again 99 is not a prime number.

harold aptroot6-Sep-14 4:49

Re: Algorithm to find a given number is prime or not.

6-Sep-14 4:49

What step is that at? Obviously you'd get a remainder of 1 for 2, but then in the next step you'd find that the remainder with 3 is 0, and therefore it's not a prime.

Member 110631236-Sep-14 6:33

Re: Algorithm to find a given number is prime or not.

6-Sep-14 6:33

won't the remainder be 0 for 2, when you divided 2 by 2?

harold aptroot6-Sep-14 7:22

Re: Algorithm to find a given number is prime or not.

6-Sep-14 7:22

Sure, but you're not testing 2, you're testing 99, right? The remainder of 99 / 2 is 1, the remainder of 99 / 3 is 0

Member 110631236-Sep-14 7:53

Re: Algorithm to find a given number is prime or not.

6-Sep-14 7:53

so assuming the number 2 in these steps in not properly defined and so if we move in the sequence and found another number to be divisible and remainder to be 0 (we should then in this case consider it to be correct) and therefore 99 would not show up as prime in when the algorithm runs, is that correct?

Member 110631237-Sep-14 6:23

Re: Algorithm to find a given number is prime or not.

7-Sep-14 6:23

i found the soltion: The flag value can either be 1 or 0 in the course of this program. Flag is just a variable. This is tested in step 7 to determine if the number is prime or not prime. flag value is initially set to 1(during initialization). It may or may not change at step 6 depending on whether rem=0. The value of rem is 0 when the result of the calculation at step 5 is 0 (happens when the division produces no reminder). as you know by doing num mod i we divide num by the current value of i and check the reminder. If the reminder is 0 that means num is divisible by i. So num cannot be a prime. Whenever we find that reminder is 0 we set value of flag=0 which means the number is not prime.

Member 110631237-Sep-14 6:23

Re: Algorithm to find a given number is prime or not.

7-Sep-14 6:23

the algorithm in plain english is
(0) initially set i=2 and flag=1
(1) take a number (num) which we want to test for being prime.
(2) Then we start dividing the number num successively by i= 2, 3, 4.....upto (num-1) and check for the reminder. At each iteration we increase i by 1.
(3) At any stage, if we find that num is divisible by i then (num is not prime) set flag=0
(4) If we reach i= (num-1) and still reminder is never 0 at any stage, then flag value remains unchanged at 1 and the number is prime

We test the flag value at the end of the program to decide whether num is prime or not.

Richard MacCutchan6-Sep-14 5:06

Richard MacCutchan

6-Sep-14 5:06

See http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes[^].

Re: Algorithm to find a given number is prime or not.

PIEBALDconsult14-Nov-14 3:37

PIEBALDconsult

14-Nov-14 3:37

As to the missing code, you need to use the Encode button so your angle brackets don't get interpreted as XML/HTML.

1 start
2 read the number num
3 [initialize]
i <--2, flag <--1
4 repeat n steps 4 through 6 unitl i <num or flag =0
5 rem <--num mod i
6 if rem=0 then
flag<-- 0
else
i<--i+1
7 if flag =0 then
print number is not prime
else
print number is prime
8 stop

How to describe method: SQL for frequent pattern discovery

pseudogrammaton4-Sep-14 18:01

pseudogrammaton

4-Sep-14 18:01

I've prototyped a way to do pattern discovery using SQL, but I still have a poor understanding of where this method fits in the data mining vernacular.

Being set-based, I'm not building a tree, although a functional tree does arise in the result set.

The steps:

1) Build a "look up" table, by doing a cross join, yielding a combinatorial "dictionary" (a rainbow table) of n-gram "words."

2) Get COUNT(*) > n , using SQL GROUP BY, matched against a large table of items

3) Further look for equivalent longer self-matches within the result set.

My seed table is 177 items, allowed to cross-join itself 3x, for a final table 2.8 million 3-gram words (takes about 35 seconds to build this table in Postgres).

The actual itemset table is 10 million rows in series (serially numbered), although the actual number of itemsets might be considered smaller.

I've recorded 35** seconds on the join between the two original tables, yielding all the simple repeating 3-grams meeting group by's count(*) > x (that's the dictionary joined to the itemsets.

That's the Q&D discovery step, and then subsequent steps simply apply a self-join for longer-chained repeating series. These have been pretty quick, in the 50 millisecond range.

My questions are:

1) What's the best way to describe this algorithm? Frequent pattern? Motif?

2) It's a simple enough method, but is it fast enough for general use? I.e. other data mining apps where performance requirements are different from my own?

3) I've wondered if SQL could be convinced to pattern-match like an LCS dynamic programming algorithm, by matching across gaps in the sequence, with maybe a lookup table of allowable variances & distance between matching values?

**Right now I'm seeing 50 seconds after the buffers load, but I reinstalled Postgres & my postgres.conf file apparently is all defaults now (the postgres process back to only 16 mb getting buffered, so it's suffering more I/O to the SSD drive).

Thanks in advance,

-- Lee

-- modified 5-Sep-14 0:17am.

Re: How to describe method: SQL for frequent pattern discovery

Bernhard Hiller4-Sep-14 20:52

Bernhard Hiller

4-Sep-14 20:52

Member 11060173 wrote:
seed table is 177 items, allowed to cross-join itself 3x, for a final table 2.8 million 3-gram words
...
actual itemset table is 10 million rows

Ehm, 2.8 million is about half of 177*177*177 - that's quite a lot, but does still fit into memory (RAM). With 10 million rows, your trigram table will have some 10^20 rows, and that's beyond the memory of any machine nowadays, even beyond the capacity of any hard disk.
It won't work (already that factor of 10^14 applied to the present 35 seconds should tell you that).

Re: How to describe method: SQL for frequent pattern discovery

pseudogrammaton5-Sep-14 2:55

pseudogrammaton

5-Sep-14 2:55

Oh, I forgot to mention that the trigram dictionary is trimmed by abs(val1+val2+val3) <= 88 (it's a vector dataset of small int). But the 5.5m trigram dictionary might not slow things much given the use of covering indices (the access is all via b-tree indices, obviating the need for as much memory).

I looked into using a 4-gram dictionary but it presented a very large table, much larger than the 3-gram dictionary, and worse it made for more overlapping duplicates in building the equivalent of an FP-tree (at least 1 extra overlap, whereas w/ the 3-gram matches I'm always overlapping by n+2). Also I sense a trigram-based tree innately reflects the smallest useful vector of from-and-to applicable.

One problem might be that in high-frequency datasets I could see an explosion of 3-gram noise that doesn't always support better (longer) matches, bloating the output. I understand that in FP-Tree algo's there's a minimum support criterion that perhaps works around this. There may be a way to ameliorate this in SQL, such as checking for matching adjacencies in a manner that'll optimize via a correlated subquery, using ANY or [NOT] IN.

I haven't had enough time to fully experiment with various datasets, I've been going through a application language selection process** & am contemplating looking into PostGIS' geometric data & index features (R-Tree indices) as a way to get better, longer string matches, even perhaps supporting approximate or intermittent matches akin to the ability of LCS/cosine match algorithms (but on larger sets with more expressive syntax).

I'm prepared to start coding in C or Julia, but I'll avoid it if Postgres proves "fast enough." That's b/c as new data are imported to the DBMS I'll want to rerun pattern discoveries in the background against the main data store. My current 10 million rows are an exorbitant sample, out-scaling anything I expect to encounter in the actual data (MIDI note vectors).

[Edit:]
Just found this:
https://www.academia.edu/5184801/SQL_Based_Frequent_Pattern_Mining_with_FP-growth[^]

It's a very old paper (circa 2001?), but I'll probably be following their methodology. Maybe they gave up b/c DB/2 was too slow vs. algorithmic FP-Growth in C++.

Also, from a 2006 paper: http://webdocs.cs.ualberta.ca/~zaiane/postscript/adma05.pdf

"...In this work we presented COFI-Closed, an algorithm for mining frequent
closed patterns. This novel algorithm is based on existing data structures FP-tree
and COFI-tree. Our contribution is a new way to mine those existing structures
using a novel traversal approach. Using this algorithm, we mine extremely large
datasets, our performance studies showed that the COFI-Closed was able to
mine efficiently 100 million transactions in less than 3500 seconds on a small
desktop while other known approaches failed..."

I know that's old hat by now (my 2008-era Thinkpad T400 Celeron laptop w/ its 4GB RAM & SSD drive vs. his "small desktop"), but I'm in the ballpark.

**( As for fast application langs, esp. for other algorithmic jobs not best modeled in a SQL DBMS, I think I just found a winner: http://www.julialang.org[^] & http://juliawebstack.org/[^] )

modified 5-Sep-14 10:57am.

Boundless Binary Search

Igor van den Hoven27-Aug-14 10:24

Igor van den Hoven

27-Aug-14 10:24

I've been playing around with binary search algorithms and have created a novel new variant that appears to be significantly faster than any traditional implementations I've seen.

C++

int boundless_binary_search(int *array, int array_size, int key)
{
        register int mid, i;

        mid = i = array_size - 1;

        while (mid > 7)
        {
                mid = mid / 2;

                if (key < array[i - mid]) i -= mid;
        }
        while (i && key < array[i]) --i;

        if (key == array[i])
        {
                return i;
        }
        return -1;
}

I'm wondering if this is a novel binary search implementation, or has this approach been used by someone before?

Some variants and benchmark graphs:

https://sites.google.com/site/binarysearchcube/binary-search[^]

harold aptroot4-Sep-14 21:28

4-Sep-14 21:28

Perhaps I'm not thinking about it right (I have a cold so I'm not particularly sharp right now), but it seems to me that if the item is in the second half of the array, it will essentially devolve into a linear search.

Igor van den Hoven6-Sep-14 10:04

Igor van den Hoven

6-Sep-14 10:04

It switches to a linear search when roughly 8 elements are left. The i != 0 check is there in case the key is smaller than the value at index 0.

harold aptroot6-Sep-14 11:10

6-Sep-14 11:10

Yes ok, I guess the cold got to me. I was thinking something weird based around the assumption that mid started in the middle, which it obviously doesn't.
So it works, that's good. It seems closely related to the variant which keeps a "midpoint" and a "span" (here it's the next span (mid), and the midpoint plus that next span (i)). Same pros and cons too (needs fixup at the end, but inner loop is simple), the "midpoint/span" variant is usually seen (when seen at all) in its "more complicated math in the inner loop"-form which doesn't need fixup, but then what's the point.

Igor van den Hoven6-Sep-14 12:32

Igor van den Hoven

6-Sep-14 12:32

Using a midpoint and a span is slower because it requires 2 assignments per loop opposed to 1.5 (on average) in my implementation.

I assume it's the fixup and assignment issue that left academics stumped for the past 60 years. There are also caching issues for larger arrays, for which I've created a variant that is mindful in that regard.

harold aptroot6-Sep-14 12:52