15,509,322 members
See more:
We have a problem, we have a series of data, please check the graph below. Basically it's a 24 hours data, we want to automatically split the 24 hours to a few sections according to the similar values.

for example, we can see that from 0am to 7am the values are more or less the same, we mark that period as group1, and between 7am and 17 we have similar (roughly) then we make it group 2, and 17 to 24 as group 3.

if we input a series of data (x,y), and define there should be a numer of partitions(for example, 3), then the result should come like this:

group1: 0am - 7am
group2: 7am - 17
group3: 16 - 24

graph is here:

http://pot9tw.sn2.livefilestore.com/y1pvO0HSvBYGu22JKakg4XJSys_zM1H9qohg7XqDwxxSOgz1orlRNjJhLKCOYyvZ9RZqHyyUHErZFQbgaNvUHuZ4FRIIOTlFuHV/graph.PNG?psid=1[^]
Posted
Updated 26-Jan-11 17:21pm
v3

## Solution 1

This is called "Cluster analysis", your problem is probably the most elementary one.

http://en.wikipedia.org/wiki/Cluster_analysis[^].

The article is very clear and provides very good references you may want to follow. It least you'll learn proper terminology which will help you in your search of relevant approaches and algorithms. I actually just took a close look myself: the algorithms are easy enough to code them from scratch; what you need is to compare them and decide which one is the best for your purpose; I think this is more difficult, will require some research.

Good luck,

—SA

v3
Sandeep Mewara 27-Jan-11 0:16am
Deserve a 5!
JF2015 27-Jan-11 0:46am
Sergey Alexandrovich Kryukov 27-Jan-11 0:50am
Thanks a lot, Sandeep, JF2015.
--SA
MCY 27-Jan-11 4:01am
good one
Huisheng Chen 27-Jan-11 4:58am
let me simplify my question, imagine a series of values:

1,2,1,3,2,4,5, 11,10,12,10,12,13, 25,21,24,20, 4,5,2,3

we can see clearly see that according to the difference, we could split them into 4 groups bassed on my requirement above.

I searched a lot, finally came to k-means algo, but it seems that it only deals with 2-d data, not for 1-d values.

## Solution 2

Hi, I implemented the cluster by my own method. the logic is that it will first try to find out top n delta (differences), then split the data according to the delta.

C#
using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication612
{
class Program
{
private class ValueSort : IComparer<int[]>
{
public int Compare(int[] x, int[] y)
{
if (x[1] > y[1])
return -1;
else if (x[1] < y[1])
return 1;
else
return 0;
}
}
private class IndexSort : IComparer<int[]>
{
public int Compare(int[] x, int[] y)
{
if (x[0] > y[0])
return 1;
else if (x[0] < y[0])
return -1;
else
return 0;
}
}
static void Main2(string[] args)
{
int[] data = new int[] { 1, 2, 1, 3, 2, 4, 5, 11, 10, 12, 10, 12, 13, 25, 21, 24, 20, 4, 5, 2, 3 };
int[][] delta = new int[data.Length][];
delta[0] = new int[2];
delta[0][0] = 0;
delta[0][1] = 0;
for (int i = 0; i < data.Length - 1; i++)
{
delta[i + 1] = new int[2];
delta[i + 1][0] = i + 1;
delta[i + 1][1] = Math.Abs(data[i + 1] - data[i]);
}
Array.Sort(delta, new ValueSort());
int groups = 4;
int[][] topDelta = new int[groups - 1][];
for (int i = 0; i < topDelta.Length; i++)
{
topDelta[i] = delta[i];
}
Array.Sort(topDelta, new IndexSort());
List<List<int>> output = new List<List<int>>();
int index = 0;
for (int i = 0; i < topDelta.Length; i++)
{
List<int> group = new List<int>();
for (int j = index; j < topDelta[i][0]; j++)
{
}
index = topDelta[i][0] + 1;
}
List<int> group2 = new List<int>();
for (int j = topDelta[topDelta.Length - 1][0]; j < data.Length; j++)
{
}
for (int i = 0; i < output.Count; i++)
{
Console.Write("Group" + i + ":");
foreach (int value in output[i])
{
Console.Write(value + ",");
}
Console.WriteLine();
}
}
}
}

v6
Sergey Alexandrovich Kryukov 27-Jan-11 10:15am
You need to reformat this.
You can try to paste from Visual Studio.

To do this, take your original C#, in some text editor (not HTML page) replace all angular brackets with HTML entities < and > in "Improve Answer" check "Allow HTML" radio button and paste the code, manually add <pre lang="cs">...</pre>.

Check up the result on the page after you post.

--SA
Huisheng Chen 27-Jan-11 16:39pm
thank you very much, finally I made the correct format

## Solution 3

Here is a thought:
Use Bollinger Bands[^]

Calibrate to fit your needs and monitor the rate of change for the moving average. This should enable you to detect some of the interesting changes to your data. Comparing the input values against +/- the standard deviation multiplied by a calibrated value, indicates other points of interest. While bollinger bands primarily are associated with visual interpretation of financial data - there is no reason not to take advantage of the method computationally :)

Regards
Espen Harlinn

Sergey Alexandrovich Kryukov 27-Jan-11 16:59pm
Interesting method, I voted 5. The problem of those cluster analysis methods I tried to overview very quickly is that they are oriented to unordered data while OP has fully ordered data -- as a time sequence, and the clustered data should preserve order. Bands should be closed to the purpose.
--SA
Espen Harlinn 27-Jan-11 17:21pm
Thanks SAKryukov, thought you might like it - Fire up Delphi and play a bit with TeeChart, it has this functionality as a "function" :) - at least both the Pro and .Net version has it