15,177,588 members
Articles / General Programming / Algorithms
Article
Posted 5 Jun 2017

21.8K views
276 downloads
10 bookmarked

Sort by a 2D curve

Rate me:
3.56/5 (15 votes)
22 Jul 2017CPOL15 min read
An algorithm for sorting integers with a complexity less than O(n log (n))

The state of the art

If you search the wikipedia site, you will find these similar algorithms:

• Computational geometry
• Online algorithm
• Graham scan
• Bucket sort
• Odd-even sort

Although the complexity of these algorithms is generally O(n2) or O(n log (n)), they usually do not use memory, while my own sorting algorithm uses it.

Introduction

In this previous article, I wrote that a sorting algorithm must take n lap to sort two sublists whose size is n / 2.

The set of numbers in a list is ideal when the complexity is p * n, if the number n is exactly equal to 2p.

This second article is a study on a method of sorting numbers that takes at most 3 * n times where n is the exact size of an unordered list.

On the whole, this study is far from being mathematical and not exclusively on the fundamental theorem of arithmetic[3].

Background

There are several algorithms for sorting numbers. Generally, we have numbers to sort; Nevertheless, it may be other formats: string, floating point, objects or any other data that can be compared.

There are two possible requirements depending on the approach of the sorting algorithm.
First, a comparator must be provided to compare two elements. Therefore, integers satisfy this condition.

Second, to sort, you must provide a function to calculate a distance between two items taken 2 to 2 in the list.

Comparing two elements is sometimes more difficult than calculating a function as a distance between two elements[1]

Graphical interpretation

A sorted list is a succession of numbers, so for every k > n, the n-th is less or equal than that of the k-th.

When the elements of a list are not sorted, the sorting algorithm automatically transforms this list into a succession of numbers ranging from the smallest to the largest without any human intervention.

Computers were very slow two decades ago. That's why it was important to have the fastest sorting algorithm possible.

If n = 1000, the simplest sorting algorithm will handle the job, at worst, after 999000 turns.

This simple algorithm is intuitive. Other algorithms are not. And if I assume that there is a faster sorting algorithm, this algorithm can result in an incomplete ordered number list.

A graphical function with a two-dimensional array

It is true that I can interpret a list of numbers as a graphic function with a two-dimensional array exactly.

For example, suppose this list of numbers shown below at this order :

45    33   5    21   98    7    4    5    77    34

It shows this graphical representation :

An ordered list of numbers has a graphical representation and that its representation is a strictly increasing line.

For example, the list of successive even numbers is exactly equal to the graphic representation below. It shows that a variation of 1 on the x-axis gives a variation of 2 on the y-axis.

Moreover, the algebraic equation of this line is an application that takes each value x and returns the value y.

y = 2 * x

Now, in a series of ideal numbers, the line is linear, as proposed in linear algebra, I mean that it satisfies these two principal constraints: an operation + and an operation *.

There is a direct relationship between a line and a series of numbers; While the numerical intervals between each number are always equal, the graphical representation is exactly one line, and if the intervals are not equal, the line contains discontinuities.

For example, the list above has several discontinuities on its curve.

Development of an algorithm

This mathematical theory shows a development path for an algorithm.

The first pass of this algorithm is to read all the numbers of the input list (or a subset of this list). It's relatively simpler to create each group in a separate thread.

An arithmetic progression is a series of numbers with a common difference. And, the linear difference equation encapsulates each of the progressions.

The final passage consists of reading all the groups of discontinuities; Then, the result is the sorted list.

Determine the common difference of an arithmetic progression

The common difference d is obtained by dividing the difference with the two ends by the number of elements.

Therefore, the calculation of d is possible during the iteration of the list. For each item, I update the number of items read, the current minimum number and the maximum number.

The data retained are as follows

- an arithmetic progression: the numbers xmin, xmax, n. The term d is computed during execution.
- for each interval, I keep another arithmetic progression

Thus, after some elements read, the data are a tree of several arithmetical progressions. If the memory size increases, this size is a multiple of 3.

Calculate the x-th index on the curve

Suppose that after some elements, I obtained an arithmetic progression with n = 5 numbers and d = 3.
Suppose the xmin is equal to 2 then the list of numbers is

2    5    8    11   14

The value of the following element can be:

• before the first element
• after the last element
• the same item found in the list
• in an interval

For each given value, the arithmetic progression is modified. And the change has no impact on the current data tree and that's why it's great !

Euclidean division

The x-th is given by this formula

When the value x is an integer, I add 1 to the value of n.

When the value x is a rational number and if it already has an arithmetic progression in the hashtable to the integer part of the x value, I have to pass the calculation to this arithmetic progression.

When the value x is a rational number and has no arithmetic progression, I create it and add it to the hashtable with the integer part of the x value.

To decide when x is an integer or a rational number, I just have to use the div () function. This function returns the quotient part and the remaining part. This is clearly the Euclidean division.

Using an affine equation

The main task consists in calculating the coefficients a and b of an affine equation such that:

y = a * x + b

With respect to this function, two pairs of coordinates are required to identify exactly the coefficients.
If it is a more complicated equation, for example, a polynomial, we need a larger number of coordinate pairs.

This is an online algorithm. This means several things :

1. You can start the sorting calculation at any point in the list,
2. At each step, all the given elements are sorted,
3. At each step, you ask the algorithm to process a new number,
4. If you exchange items in the list, the work order is exchanged,
5. The algorithm has a memory in time; If you delete the memory, the algorithm works as if it were starting.

First number

I define the first element with the coordinates (k1, p1).

Second number

I define the second element with the coordinates (k2, p2). Make sure p2 is greater than p1. If not, exchange them.

Since I have two coordinates, I can calculate the coefficients a and b. They are given by two formulas. These formulas are the equations of the solution of a system of two equations with two unknowns. A substitution or a more complicated method (Gauss) can obtain them.

Third number

From this chronology of the algorithm, I define each element with the coordinates (ki, pi).
I must determine the value of ki by isolating this term in the affine equation.

There are two options left.

Option 1

When ki gives an integer z, it is the index number where the pi will certainly be placed if there is already a total number of elements z-i.

This result can be expressed with the sum of each element, the sum of all the i-th indexes and the common difference.

Option 2

When ki gives a rational number, I have to cut the line into two parts.
These two parts are obtained by the framing of ki by default and by excess.

The first line starts at k1 and ends with E[ki] (E is the integer part function) and the second line starts at E[ki]+1 and ends with k2. The first line is the one you keep at the top and the second line is placed to be inside a hash table.

Improvement of the algorithm

What is the real problem with a computer ?
In other words, what is the difference between a human and a computer to sort integers ?

I answer: a human has an eye. So to allow a computer to see, what is better than a graphical representation ?

The algorithm aligns all points along the same vertical line.

To sort the items, first select each pointer in the row from bottom to top.

Take the minimum amount of time and minimum memory size

Since time takes a constant proportion of n, the size of the memory must also be the smallest. To get this minimum, I have to understand something about the actual difficulties in sorting the list.

A sorted list can take three generating forms :

• A continuous line which means a single common difference from the lowest number to the highest number seen in the list,
• Several discontinuous lines, this means that there are several slopes as a sequence of smaller sorted lines which may also have discontinuities,
• There are duplicates in the list.

So the important thing is to count discontinuities and duplicates first. Once completed, an iterative counter will automatically organize all discontinuities as multiple sequences without internal recurrence by adding each account to the counter and indicating the exact locations of each number.

When this second step is completed, I only need permutations to sort the entire list.

Thus, the maximum time is always 3 * n, where the value of n is the number of elements in the input list.

Except that there is still a problem with the evolution of the line.

Calculations during the progressive iteration of the list

For any list size, but relatively large, a ten-digit base (as a vector-space has is 10 for its kernel value) helps to quickly preview the nearest size.
With a size less than or equal to ten, the rank is ten. Sorting takes 2 turns of list. This is already less than
10 * log2 (10) ≈ 3 * 10. But this is equivalent to 10 * log10 (10).

But we must draw attention to the fact that a logarithm of 1 is equal to 0. This means that the complexity of the algorithm is not an exact value when n is a small number because of the complexity of zero (Then infinite?).

With a size less than or equal to one hundred, the rank could be 10 or 100. This could reach 100 if the numbers in the list are high. But, if the numbers in the list are less than 100, the ranking will not progress.

With a size greater than or equal to a thousand, the ranking could be from 10 to 100 because we will always divide each big growth by 10.

The method works well as described by these small formulas

0 < i < 10, m=0, y= ki * m + ri

y > 10, 0 < i < 100, m=10, yi = ki * m + ri

y > 100, i >= 100, m'=10 * m,  yi/10 = ki * m/10 + ri / 10

yi' = ki' * m' + ri',  yi' = yi / 10, ki' = ki / 10,  ri' = ri / 10,  m' = 10 * m

Each ten elements of ki are added to form the number of elements at n ° 0. Remember that the size chart to store the counters is double; One when the remainder of the Euclidean division is equal to 0 and the other when the latter is not zero.

You will find the source code associated with this sort method in this article.

POC (Proof Of Concept) : grid 2D

In FIG. 1, you can see successive points increased by 1 on each cell from the beginning of the left corner.

FIG. 1

This point configuration means that the successive elements are sorted when each is increased by 1. If it is not the exact configuration selected, It could be a number less than or greater than i

These two changes are presented below in Fig. 2 and FIG. 3.

FIG. 2                                                                           FIG. 3

We can conclude that after the first pass with the maximum and minimum number, a list of the values ki annotated with their count in the list will reasonably give a sequence of several arithmetic progressions with its minimum, number and common difference.

Therefore, the complexity of this algorithm is 2 * n for any mixed number list. This is called O (n).

But...

Since I can use a grid to show that an ordered list is just a line with an increasing slope, or successive lines and a specific slope.
Does this make one think of the Bresenham algorithm[2]?

Suppose I can model in terms of probabilities either to go left, or to move left and up with a movement of 1.
Moreover, I give a name for one, A and B for the second.

For example, this sequence makes it possible to move along a line :

BABBAAABABBAAAA

I interpret this sequence as a sorted list. Is it possible to find a different sequence where the list is not sorted and to transform this sequence into its corresponding sequence when this list is sorted?

An example

Suppose I have this list to sort:

3  9  4  7  8

By each sequential difference, if a number is less than its successor, I say A+ and if a number is greater than its successor, I say A-. And if they are equal, I say B.

A+    A-    A+    A+

With this example, I obtain

3         9          4         7          8

A+      A+        A-       A+        A+

An another example

As I said above, the risks for unsorted lists are insertions and discrepancies.
So I can try to put such a risk with this list

4        42        48        55       18       37        27       71        80

A+       A+       A+        A-        A+       A-       A+        A+

Conclusion

The Brezenham algorithm traces a line in a grid with sorted numbers. Since the sorted numbers have different slopes, these slopes coincide with a Euclidean number. In such a drawn line, the size and value of the kernel indicate that I can define a vector space, so that the number of elements in the list must be equal to the grid of an equivalent size.

Thus, after this paragraph, the letters give way to the slopes.

If I find a size large enough to receive the items, is it necessary to make insertions or resize?
And if the size is too large, can I just calculate the positions?
And if I can just calculate the positions, can I simply iterate the slopes?

I answer yes; But, the problem are duplicates. So I do not classify any value, no slope, I do not create a grid and I only count the number of duplicates.

Using the code

The IntegerList class handles the list of numbers.
The numbers in the list can be read from a file or generated by chance or iterated with a slope.

Then, the Indexer class helps iterate the list. Thus, an additional class must be created to compute each step of the numbers by the maximum value of the integer (if, the data type of each item in the list is an integer).

Then, the algorithm requires a class to calculate the couple (row, column), the main calculation.

The end of the algorithm consists of iterating each row by row and column by column on a dictionary of duplicate numbers. To avoid a long iteration of the grid size, translate the rows and columns into a grid with a slope (1,s) and a size equal to the number of elements in the list.

It lacks a dictionary class for duplicates.

Examples

Example of an algorithm that mimics the sorting of integers

Here is an example that uses slopes to sort the given input list.

Why do I think of a slope to sort whole numbers?

I think of a slope for two reasons:

1) the sum of the integers may overflow when the integers are high.
In fact, if a subset of an ordered list has a single slope, the sum of elements is proportional to the sum of an arithmetic progression of 1.

2) a single slope is not so close to the list of integers. The different slopes mean that the list has a different progression.

Notes

[1] https://en.wikipedia.org/wiki/Distance (Go To Link)

[2] https://en.wikipedia.org/wiki/Bresenham (Go To Link)

[3] https://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic (Go To Link)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

 Software Developer (Senior) NoComment France
No Biography provided

Comments and Discussions

 First Prev Next
 Benchmarks? D. Infuehr25-Jul-17 1:34 D. Infuehr 25-Jul-17 1:34
 Re: Benchmarks? InvisibleMedia25-Jul-17 5:53 InvisibleMedia 25-Jul-17 5:53
 Re: Benchmarks? D. Infuehr25-Jul-17 8:56 D. Infuehr 25-Jul-17 8:56
 Re: Benchmarks? InvisibleMedia25-Jul-17 9:50 InvisibleMedia 25-Jul-17 9:50
 My vote of 2 Stefan_Lang13-Jul-17 22:48 Stefan_Lang 13-Jul-17 22:48
 Re: My vote of 2 InvisibleMedia17-Jul-17 12:44 InvisibleMedia 17-Jul-17 12:44
 Re: My vote of 2 Stefan_Lang18-Jul-17 1:09 Stefan_Lang 18-Jul-17 1:09
 Thank you for your reply. I will try to answer your questions. InvisibleMedia wrote:What did I not understand? In the introduction, you talked about sorting algorithms. Then you offered a list that, presumably, should specify various kinds of sorting algorithms. Here you could have listed things like bubble sort, Heap sort, quicksort, insertion sort, and the like. Instead you start your list with "Computational geometry", which is not even related to sorting, and with "online algorithm", which , at best, may include algorithms for sorting, but is about as good as listing "food" as an example for different kinds of apples. I'm not at all sure what you were thinking when including these terms. Also, "graham scan" is an algorithm used to find the convex hull of a set of points, not a sorting algorithm. I can see why you added it to your list. given the nature of the algorithm you presented, but, still, this is not a sorting algorithm at all! Ok, you did not say these items are sorting algorithms, but the little you said before presenting the list really implies you meant exactly that. Thinking about it, I believe what you meant to present is various topics that are related to the algorithm that you were about to present later in the paper. If that was your intent, please clearly say so. It is rather confusing the way it is presented now. Later, you're repeatedly using the term continuity, when you were actually talking about the slope. That's a minor issue, really, as your meaning was clear from the context. As a mathematician though, I found it irritating. Then you're talking about complexity, and, rather than simply use the big-O notation, you claim your algorithm just needs 3*n 'steps'. In complexity theory, when you need to perform 3*n steps, you just list it as O(n). If you want to express something more detailled, you should clarify what you consider a 'step'. Personally, when I look at the trivial problem of just finding the min and max of an unordered set, that requires 3*n operations (actually I'd expect 4*n, with 3*n being the best case, depending on the set) - so I'm having trouble to believe your claim that your entire algorithm only needs 3*n 'steps'. Of course, that depends on what you consider a 'step' - therefore you should really be more clear about that. InvisibleMedia wrote:What does not make sense? It starts with interpreting an unordered set of numbers as a function. To say that this is an unusual approach is an understatement. Mathematically, it's fine - a function is simply a mapping that projects input values to output values. the only condition you have, mathematically, is that for each input value, the output must be definit. You have that. To be clear, however, the input value are simply the positions of each number in the unordered set. These values cannot be meaningfully measured any more than water can be counted. Then you start looking at the graph of the function, and looking at the points as a sequence. This implies, that your set of numbers is a sequence. You're free to define a sequence that way, but I disagree that you can derive any meaningful information from the resulting analysis. The only statement you can (and did) make, is that, once the set is ordered, the graph of the resulting function must be monotonous. Consequently, the only meaningful statement you can make about the unordered set, is that if it isn't monotonouos, then it isn't ordered. Furthermore, what you pictured is not the graph of such a function! The real graph only consists of distinct points. In your images however, you connected the points with a polyline that has no mathematical meaning, as the input set is not continuous (and I do mean continuous (that is, for every two values in the input set, the intermediate value halfway between those points is also included), not in the way you used it). Then you make the rather huge leap (albeit forseeable, after the previous step) towards continuous functions, and claim that in a "series of ideal numbers", you'd get a linear function. This statement is problematic for several reasons. First, you cannot make any assumption about the unordered set of numbers. You have to assume that they're anything but 'ideal'! Second, you interpret these numbers as a series, implying that there are relations between consecutive numbers in the (ordered? unordered?) set, or an underlying relation defining a series. No, while you can certainly interpret the set any way you want, you can not derive any properties or relations from that. Third, You can still only define a discrete function on your set, not a continuous function. But I admit that is just a technicality. Next, you introduce arithmetic progression. You can certainly use this as a way to describe a series of numbers that is defined as a linear function. Or you can even use repeated arithmetic progression for a series that is defined as a polynomial function. (when you introduced arithmetic progression, I originally assumed that you would use this, but apparently I was wrong). However, you cannot assume that the unordered set of numbers is (after sorting) even close to a linear function! I did give an example in my first comment above. The difference and slope calculations you did are therefore entirely meaningless with respect to finding the final sorting order. While in general, you can assume that for many elements of the set, the formulas can be used to approximate the correct position within the sorted list, you have to take into consideration, that it's only an approximation, not exact, and therefore may be incorrect for some, or many of the numbers. I suppose you could work with that in a multi-step algorithm to get a good average sorting time - but the worst case would still be O(nlog(n)) or even worse. Then you use the approcimate calculations to put your numbers into a hash table. As others pointed out, creating and maintaining a hashtable may be constant time per item on average, but it's O(n) worst case, per item, making your algorithm O(n^2) You've made an effort to point at the actual, average number of steps, but these calculations are strewn over the article, and very hard to follow. Given that some of the calculations (e.g. for the assumed position based on arithmetic progression) already take up to 9 operations per item, I don't understand what your claim of 3*n 'steps' is supposed to mean. Section 'Improvement of the algorithm': I was rather confused by this part of your article. It didn't seem to contribute to it in any way at all. Except that you did include some performance calculations. Unfortunately I did not understand what these calculations actually refer to. In part this may be due to an imperfect language. E. g. take this sentence: Quote:For any list size, but relatively large, a ten-digit database helps to quickly preview the nearest size.What database are you referring to? What do you even mean by ten-digit-DB? What do you mean by 'preview'? What do you mean by 'nearest size'? And the rest of the text, specifically in this section of the article, is similarly hard to follow. As for the formulas, for the most part they're ok, but I still had some trouble regarding the terms you start with or offer as an example, like 'This is already less than 10 * log2 (10) ≈ 3 * 10. But this is equivalent to 10 * log10 (10)' - what are you comparing here? What do you mean by 'equivalent' (that second term is exactly 10, which is not the same as 30)? Also, you seem content that, at least according to your calculations, your algorithm performs well for 100 items. But that is true for ever sorting algorithm in existence. If you want to make a point, examine large lists of, say, one million items, not 100. Thats's what complexity calculations are for. Summary: (yes, I haven't reached the end of the article or the end of issues, but it's already too long): My understanding of your algorithm is that 1. You determine the average slope by finding the min and max 2. You shove all items into a hash table by estimating their position under the assumption that the numbers are evenly apread over the entire value range, and thus the sorted list is near-linear in appearance. 3. ? (not sure what you are doing next, I lost track there) My claim is that step 2. makes an inappropriate assumption, and that it will not help you get O(n) complexity for the worst cases. I offered a counter example in my first comment. Now, for average time complexity, the question is how reasonable the assumption is, that the numbers are evenly spread across the range. Personally I don't think this is often the case. Depending on the source of the set, you will often have a significant amount of the set clustered around some value, or several values. And that value may not be in the center As for step 3, I've tried finding information on how you proceed once you have the hash table, but didn't find any. GOTOs are a bit like wire coat hangers: they tend to breed in the darkness, such that where there once were few, eventually there are many, and the program's architecture collapses beneath them. (Fran Poretto)
 Re: My vote of 2 InvisibleMedia18-Jul-17 4:08 InvisibleMedia 18-Jul-17 4:08
 Re: My vote of 2 Stefan_Lang18-Jul-17 4:44 Stefan_Lang 18-Jul-17 4:44
 Re: My vote of 2 InvisibleMedia19-Jul-17 13:57 InvisibleMedia 19-Jul-17 13:57
 Re: My vote of 2 Stefan_Lang24-Jul-17 3:59 Stefan_Lang 24-Jul-17 3:59
 Hashtable Avitevet12-Jun-17 7:06 Avitevet 12-Jun-17 7:06
 Re: Hashtable InvisibleMedia14-Jun-17 3:59 InvisibleMedia 14-Jun-17 3:59
 Re: Hashtable Avitevet14-Jun-17 7:00 Avitevet 14-Jun-17 7:00
 Re: Hashtable InvisibleMedia14-Jun-17 11:57 InvisibleMedia 14-Jun-17 11:57
 Re: Hashtable Avitevet14-Jun-17 16:00 Avitevet 14-Jun-17 16:00
 Re: Hashtable InvisibleMedia14-Jun-17 22:31 InvisibleMedia 14-Jun-17 22:31
 Re: Hashtable Avitevet16-Jun-17 11:50 Avitevet 16-Jun-17 11:50
 Re: Hashtable InvisibleMedia17-Jun-17 0:58 InvisibleMedia 17-Jun-17 0:58
 Et puis??? Menuki12-Jun-17 1:09 Menuki 12-Jun-17 1:09
 Re: Et puis??? InvisibleMedia12-Jun-17 2:24 InvisibleMedia 12-Jun-17 2:24
 ??? David O'Neil6-Jun-17 19:43 David O'Neil 6-Jun-17 19:43
 Re: ??? David O'Neil7-Jun-17 10:29 David O'Neil 7-Jun-17 10:29
 Re: ??? InvisibleMedia7-Jun-17 11:41 InvisibleMedia 7-Jun-17 11:41
 Re: ??? David O'Neil7-Jun-17 13:38 David O'Neil 7-Jun-17 13:38
 Last Visit: 31-Dec-99 19:00     Last Update: 26-Jan-22 3:37 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.