|
I knew it. I did not, but I knew someone would !
|
|
|
|
|
You forgot to chill it using Liquid Nitrogen.
Panic, Chaos, Destruction. My work here is done.
Drink. Get drunk. Fall over - P O'H
OK, I will win to day or my name isn't Ethel Crudacre! - DD Ethel Crudacre
I cannot live by bread alone. Bacon and ketchup are needed as well. - Trollslayer
Have a bit more patience with newbies. Of course some of them act dumb - they're often *students*, for heaven's sake - Terry Pratchett
|
|
|
|
|
Seconded. Motion is passed.
|
|
|
|
|
What an interesting question, well I think that a Self Organizing Map should ............ oh shiny! or in my case .... hmm chocolate!
Ali
|
|
|
|
|
Welcome to the algorithms forum.
I'm not sure what triggered such negative responses to your question (originally posted to another forum which it wasn't appropriate for ? then moved here with all its baggage of snipes ?), but it seems to me your question is related to algorithms.
I'm not familiar with SOM, or Kohonen networks, but I come here, to this forum, to share my own algorithmic interests, and to learn, and be stimulated by, others interests, so I'm happy when someone introduces me to a concept I haven't encountered before.
One way I think you can make your question here more enjoyable for other participants is to share the context of your question: why you are interested in this type of algorithm, what this problem or question is in the context of your work, or your study.
If you share what you have done so far, in code, or in theory, or where you find your progress in implementing this type of algorithm/solution is blocked, as specifically as possible, then, I think you'll get responses that will be most helpful.
Your description of Kohonen networks, and the example given of organizing a color-space, reminds me of the wonderful program 'Visual Thesaurus'[^] which, when it came out several years ago, was a real 'revelation' ... to me.
VT 'maps' semantic affinities between words in a thesaurus using font-size and font-attributes to represent a virtual third dimension, and its 'dynamic:' the 'cloud' of associated words is being continually updated.
And, yes, the 'word-tag-clouds' seen on many website these days are a form of 'static' VT, imho.
I wonder if VT type algorithms might be considered an algorithmic problem of the type you are describing ?
Where I get interested in the issue I think is raised by your question is: the how of ... once you establish the evaluative criteria by which a multi-dimensional data set is 'distilled' into some useful summary representation ... you then express that representation in a form which you can use a compiler (or lexer-parser) to create : as visual interface, as meta-language (DSL), etc.
While some data-sets, I think, have an 'easier,' perhaps statistics-based, answer to how you represent summary information in a way that facilitates communicating essential 'meaning,' and enables sophisticated exploration through 'drill-down' in hierarchical, or 'flat,' visual interfaces ...
Other 'problem spaces,' ones I think are much more interesting, are going to have internal 'semantic' networks based on ? Bayesian probabilities ?
An example: a personal interest of mine is the evolution of Buddhist sacred iconography in S.E. Asia, where I live. There is a kind of an algorithmic component in the sense of a 'guiding meme' of the 32 lakshanas originating in Sri Lanka which express a sense of the normative visual attributes of representation of the teacher now known as 'the Buddha.'
And then, there's the whole complex chaotic history of how styles of representation of 'the Buddha' evolved through conquest and assimilation (capture and relocation of artisans, and or images, and supplies of raw materials, being a key goal in S.E. Asian geo-political warfare).
And then, there unique events, as when King Tilokkaraja of Chiang Mai, in the 15th. century, as part of establishing more centralized control of the nascent proto-polity of a more cohesive northern Thailand meta-state (Lanna), promoted the image of 'the Buddha' in 'royal raiment' as part of a larger campaign to have himself recognized as a divinely sanctioned ruler (dhammaraja, chakravartin).
So you have a leit-motiv (the lakshanas), you have historical mutation and change, you have top-down visual innovation and 'orthodoxy' imposed at times, and you have serendipitous changes, sometimes enabled by a changed supply of raw materials, or contact with other cultures, or innovation by particular craftspeople, or sudden availability of new technology
And then, it gets even more complex: at times 'retrograde' change occurs where the visual style of an older, or imagined archaic, style becomes fashionable, or required, or is re-vivified by some individual innovator.
And at the maximum 'edge' of change ... fractal ? ... you have meta-cultural differences which can be quite profound. To the 'western mind,' shaped so profoundly by ideas of absolute historical origins, a non-recurrent sequence of time, cumulative progress, Aristotelian syllogism, the division of art and science (trivium, quadrivium), mind and body, Psyche and Techne: a culture where time is experienced as an 'eternal return' of a boundless cycle of dynamic manifestations of opposites (what Jung would have called 'enantidromia') is 'alien' ground.
That's the kind of 'data-space' that I am interested in. How to make an 'interface' to that kind of data ... mmm ... there's the rub.
best, Bill
"In the River of Delights, Panic has not failed me." Jorge Luis Borges
modified on Thursday, August 18, 2011 9:04 PM
|
|
|
|
|
BillWoodruff wrote: originally posted to another forum which it wasn't appropriate for ? then moved here with all its baggage of snipes
Yup. Started life in the lounge. Didn't actually get stomped as hard as I expected.
Cheers,
Peter
Software rusts. Simon Stephenson, ca 1994.
|
|
|
|
|
It's rated low because it was posted in the Lounge and then moved here with the low votes intact. It seems like an interesting enough algorithmic post.
|
|
|
|
|
Thank you Bill
BillWoodruff wrote: If you share what you have done so far, in code, or in theory, or where you find your progress in implementing this type of algorithm/solution is blocked, as specifically as possible, then, I think you'll get responses that will be most helpful.
Well, the color sorting problem is one of the examples often used to describe the Self Organizing Map (SOM) algorithm.
There is several implementations of the SOM algorithm. There exist task specific SOM algorithms and general purpose SOM algorithms. Here is three general purpose implementations:
SOM Toolbox 2.0 (Matlab 5) By Tevo Kohonen and his research team , sort of open source
http://www.cis.hut.fi/somtoolbox/[^]
SOM_PAK (ansi c) also By Tevo Kohonen and his research team, sort of open source
http://www.cis.hut.fi/research/som_lvq_pak.shtml[^]
There is also an Octave/Matlab compatibel developed based on the above SOM Toolbox 2.0 called
Melikerion , open sourcehttp://www.finndiane.fi/software/melikerion/[^]
The Melikerion research is done by for the Finnish diabetic nephropathy study. But the software is general use.
Another good SOM/Kohonen software package is Ron Wehrens "kohonen: Supervised and unsupervised self-organising maps". It is written in R http://cran.r-project.org/[^]
Kohonen package for R
http://cran.r-project.org/web/packages/kohonen/index.html[^]
There exist also several SOM/Kohonen projects here on codeproject.com
I am studying individual persons career paths wich include many person level variables, such as, age, sex, education, graduation, family related variables, income, neighborhood, house, etc. I know how to classify my data on to a 2-d map with SOM_PAK and Melikerion. As the first step the data is scaled and the nodes of the SOM is set to initial values usually based on a random sample of the input data. Then one member of the input dataset is compared to all of the nodes by calculating the multidimensional euclidian distance. The node wich is closest to the input data "winns" and are updated with the values from the one input data member. Also the surrounding nodes are adjusted as a gradient where far nodes are adjusted less than the closest. This is repeated for the whole dataset several times until the map converges. The convergens is accomplished by decreasing the adjustment of the nodes. When the input data is clustered the SOM algorithm can reveal the clustering and that can be observed by visual inspection of the map. My basic problem is how two find the clusters made of several of the map nodes. The SOM creates a map where the nodes that are close to each other is more alike than nodes far apart.
I would like to find clusters of similar nodes and use those as the classification of my data. That informatin could be used in the analysis of the differences and similarities between different individuals careers and income levels related to their educational history, work career and other family characteristics.
|
|
|
|
|
It seems to me that this problem boils down to selecting the best projection of your problem space onto two axes. Do you already know how to create such a map? Are you just looking for cluster analysis algorithms? A search should turn up some 2D cluster analysis algorithms, it must be a solved problem.
|
|
|
|
|
You sent a shiver down my spine when I read your post. I've just being doing this as part of my Uni AI course, although we were doing it with analysis of birds contaminated by an oil spill, and had 17 inputs going to a 8x8 SOM grid.
We have been using JavaNNS for all the course work.
What you basically need to do is set up your network with each of the 3 input nodes (RGB) connected to every single node in your SOM.
You haven't said what language you are trying to create this with, but searching google there are heaps of results with libraries to use.
Here is one example article;
Implementing the Kohonen Neural Network[^]
|
|
|
|
|
I will try to formulate my question about Self Organizing Map (SOM) that I have. My research problem is that I want to classify the entries of a large dataset so that similar entries end up in clusters where the the variation within clusters are less than between the clusters. I know this can be accomplished with the SOM algorithm. Now to the point:
If the SOM is started with a large map lets say 10 by 10 nodes (100 nodes in total). So I map my data onto the 100 node map. Then I visually inspect the result and find that there is lets say 7 distinct clusters. Is it then approperiate to reduce the number of nodes to a map 3 by 3 nodes (9 nodes in total) and again run the SOM onto this much smaller map. And then use the nodes of this smaller map as my clusters?
|
|
|
|
|
Hi,
Interesting the way that one intellectual quest can lead to another: my connecting-nodes-in-hierachies quest led to thinking about property change notifications in WinForms, which took a detour into WPF where so much work has been done with INotifyPropertyChanged automation, which led to experimenting with auto-code generation with T4 and PostSharp.
Which led to: I wrote a (non-recursive) code-generator ... which I will write-up for CP as a Tip or something if there's any interest ... that will generate a complete C# class for evaluating all possible logical comparisons of #n boolean variables. Here's the 'guts' of what it produces if I call it with an integer argument of #3:
if (Is_A && Is_B && Is_C)
{
Console.WriteLine("Test0 passed");
return;
}
else if (!Is_A && Is_B && Is_C)
{
Console.WriteLine("Test1 passed");
return;
}
else if (Is_A && !Is_B && Is_C)
{
Console.WriteLine("Test2 passed");
return;
}
else if (!Is_A && !Is_B && Is_C)
{
Console.WriteLine("Test3 passed");
return;
}
else if (Is_A && Is_B && !Is_C)
{
Console.WriteLine("Test4 passed");
return;
}
else if (!Is_A && Is_B && !Is_C)
{
Console.WriteLine("Test5 passed");
return;
}
else if (Is_A && !Is_B && !Is_C)
{
Console.WriteLine("Test6 passed");
return;
}
else if (!Is_A && !Is_B && !Is_C)
{
Console.WriteLine("Test7 passed");
return;
} It has been many, many years since I once studied 'truth-tables' and 'decision-tables, etc. And, given a language ... C# ... that optimizes boolean expressions so that in "A && B && C" if 'A is not true, there is no evaluation of 'B and 'C ...
Assuming that any variable in the set of variables is as likely to be true or false in any given use case, it, of course, would not make any difference in what order you wrote out the tests. But, what interests me is how, when you have a sense of the probability distribution of evaluations, you might optimize.
So if you think that a test of 'A' is modal in your use-case, you move the evaluation of 'A to the front of the test; if you think evaluation of 'A is an edge-case infrequently evaluated, you write it last.
Where I am 'blocked' is in coming up with an algorithm to make a pass over the complete matrix of possibilities and optimize the tests based on some user supplied 'weighting.'
Perhaps the user of the code-generator should supply a string like "C,B,A" as its input parameter, and then interpret that as the order to write out the terms ?
Appreciate your opinions, and advice.
best, Bill
p.s. While the current code-generator is not Linquified, I'll probably end up going there. And I am also considering it might be better to generate a switch statement of the form:
switch (theSelector)
{
case "TTT:
break;
case "TFT:
break;
... more cases ...
case default;
break;
}
"In the River of Delights, Panic has not failed me." Jorge Luis Borges
modified on Wednesday, August 17, 2011 1:49 AM
|
|
|
|
|
Hey Bill, why not generate a pattern like:
if (Is_A)
{
if (Is_B)
{
if (Is_C)
{
Console.WriteLine("Test 0 passed");
}
else
{
Console.WriteLine("Test 1 passed");
}
}
else
{
if (Is_C)
{
Console.WriteLine("Test 2 passed");
}
else
{
Console.WriteLine("Test 3 passed");
}
}
}
else
{
if (Is_B)
{
if (Is_C)
{
Console.WriteLine("Test 4 passed");
}
else
{
Console.WriteLine("Test 5 passed");
}
}
else
{
if (Is_C)
{
Console.WriteLine("Test 6 passed");
}
else
{
Console.WriteLine("Test 7 passed");
}
}
}
I didn't check whether they have the same test numbers, but the idea should be clear - instead of evaluating O(2^k) things, evaluate only O(k)
The switch should be even faster, provided you use integers.
|
|
|
|
|
Hi David, excellent feedback, thanks !
I did consider writing out the nested form as you present here, but, given I was just getting into the arcane world of *.tt file editing in VS Studio, and T4 syntax (coming to it without a background in ASP.NET content/code escape syntax conventions), decided it was easier to write the more 'linear' output first. Nested 'style' is coming.
Yes, I like your idea of using integers for the conditions in a switch statement: I'm already using those values in the bit-shifting I'm doing in the code-generator, anyway. See any downside in nested switch statements ?
If the code-generator writes out a comment in each clause documenting implicit meaning of the integer in terms of the boolean variables, that's enough for code maintenance ?
I kind of like the idea of dual approach: perhaps during development and testing using the 'linear form' expression might be productive, while using the 'compiled-to-switch denser' form for possible greater execution speed ?
best, Bill
"In the River of Delights, Panic has not failed me." Jorge Luis Borges
|
|
|
|
|
 Hi David,
Nothing like writing out by hand an example of what you want, and testing it against all possible inputs, to make you aware of issues you'll face in generating it auto-magically ...
Wish I had the IL skills to look at how the compiler renders this and evaluate for efficiency. Is falling through to the 'default case' of a switch statement exactly the same as skipping over an 'if clause that has a 'return inside it ? I'd guess so.
Some obvious refactoring 'cries out' to be done in the 'brute force' example included here: mmm ... dare I utter the forbidden name: 'G*T*' ?
Would you really want to have beasts like this at large in your applications:
public static class NestedSwitch
{
public static string Test4(bool A, bool B, bool C, bool D)
{
int intSelectorA = A ? 8 : 0;
int intSelectorB = B ? 4 : 0;
int intSelectorC = C ? 2 : 0;
int intSelectorD = D ? 1 : 0;
switch (intSelectorA)
{
case 8:
switch (intSelectorB)
{
case 4:
switch (intSelectorC)
{
case 2:
switch (intSelectorD)
{
case 1:
return "ABCD";
default:
return "ABC!D";
}
default:
switch (intSelectorD)
{
case 1:
return "AB!CD";
default:
return "AB!C!D";
}
}
default:
switch (intSelectorC)
{
case 2:
switch (intSelectorD)
{
case 1:
return "A!BCD";
default:
return "A!BC!D";
}
default:
switch (intSelectorD)
{
case 1:
return "A!B!CD";
default:
return "A!B!C!D";
}
}
}
default:
switch (intSelectorB)
{
case 4:
switch (intSelectorC)
{
case 2:
switch (intSelectorD)
{
case 1:
return "!ABCD";
default:
return "!ABC!D";
}
default:
switch (intSelectorD)
{
case 1:
return "!AB!CD";
default:
return "!AB!C!D";
}
}
default:
switch (intSelectorC)
{
case 2:
switch (intSelectorD)
{
case 1:
return "!A!BCD";
default:
return "!A!BC!D";
}
default:
switch (intSelectorD)
{
case 1:
return "!A!B!CD";
default:
return "!A!B!C!D";
}
}
}
}
}
}
"In the River of Delights, Panic has not failed me." Jorge Luis Borges
|
|
|
|
|
I'm pretty sure the compiler is going to be silly about it and not optimize it (edit: that is to say, the nested switches) at all, but I would suggest this:
int all = intSelectorA | intSelectorB | intSelectorC | intSelectorD;
switch (all)
{
case 0:
return "!A!B!C!D";
....
case 15:
return "ABCD";
}
|
|
|
|
|
Thanks, David, I'm going to take a break, and enjoy what's left of the day here at GMT + 7, and then think about your comments, and Luc's comments, tonight, or early am tomorrow.
I feel pretty clear that I want to implement something with both a 'verbose' form for development and testing, and a 'mean-lean' form for final use. So any further thoughts about what would be the form(s) you'd like to see would be welcome.
I am a devotee of the cult of making generic dictionaries with keyvalue pairs whose keys are whatever and whose values are executable code: I'll have to see if I can bootleg that in here
best, Bill
"In the River of Delights, Panic has not failed me." Jorge Luis Borges
|
|
|
|
|
For mean and lean I'd definitely go for the Single Big Switch, it will be compiled to a jump table so it's constant time and without the often-mispredicted branches that the Tree Of If's would have.
As for a dictionary with executable values, yes that'll work.. you could just make it an array though, because the key would just be an integer in the range 0 - 2^k without holes.
I don't expect that to be any faster than the Single Big Switch though because it's essentially doing the same thing, only the switch has less overhead because the jump table would contain the address directly rather than references to objects that contain the address (as in an array of delegates)
|
|
|
|
|
Hi David,
... edit #1 ...
An hour of reading on the web about what happens when switch statements get turned into IL turned out to be fascinating: in some cases if-then-else statements are generated ... in other cases jump-tables of the form you might expect when a continuous series of values are used as selector ... in other cases generic dictionaries may be generated and used as a look-up based solution, and even hashtables !
... end edit #1 ...
I took a look at the IL code generated by a switch statement with an integer selector, and I see code like this:
IL_003e: switch (IL_00c4, IL_00d8, IL_00ec, IL_0100, IL_0114, IL_0128, IL_0309, IL_013c, IL_0150, IL_0165, IL_017a, IL_018f, IL_01a4, IL_01b9, IL_01ce, IL_01e3, IL_01f8, IL_020d, IL_0222, IL_0237, IL_024c, IL_0261, IL_0276, IL_0309, IL_028b, IL_029d, IL_02af, IL_02c1, IL_02d3, IL_02e5, IL_02f7)
IL_00bf: br IL_0309
IL_00c4: ldloc.1
IL_00c5: ldloc.1
IL_00c6: ldc.i4.1
IL_00c7: ldc.i4.1
IL_00c8: ldstr "!DCBAE"
IL_00cd: callvirt instance class Test5Parameters/BigSwitchBoolIntString Test5Parameters/BigSwitchBoolIntString::SetValue(class Test5Parameters/BigSwitchBoolIntString, bool, int32, string)
IL_00d2: stloc.2
IL_00d3: br IL_0314
...
all the other cases
...
Seems reasonable to assume the IL instruction 'switch' is turned into a jump-table by the JIT-compiler, but are you certain of this ? It's beyond my abilities to view/grok what's produced by JIT.
thanks, Bill
"In the River of Delights, Panic has not failed me." Jorge Luis Borges
modified on Thursday, August 25, 2011 9:27 AM
|
|
|
|
|
assuming your number of bools is limited to 32 (or 64) I would pack them in an uint (or ulong), then work with that. So it could look like:
uint v=0;
if (Is_A) v|=0x00000001;
if (Is_B) v|=0x00000002;
...
followed by your decision tree, possibly a sequence of ifs:
if (v==someValue1) {
} else if (v==someValue2) {
}...
BTW1: the first code block could possibly be avoided altogether, either by using v everywhere, or by constructing an union/overlap structure (which is slightly harder in C# than it is in C/C++).
BTW2: the second code block could possibly be avoided also: when all your cases are so similar, you could simply write something like Console.WriteLine("Test"+v+" passed");
|
|
|
|
|
Thanks, Luc, good to see your 'light' shining in/on CP again ! Will look forward to thinking about your ideas.
I like the idea of a dual 'logic compiler' which can create either verbose or succinct output as desired.
best, Bill
"In the River of Delights, Panic has not failed me." Jorge Luis Borges
|
|
|
|
|
Turns out creating an auto-generator for the 'big-switch' form of the logic-test class, as suggested by David1987, was really pretty easy.
Here's a link to a usable sample of its output[^], which I'd appreciate suggestions on in terms of code, format, style, etc.
In comments in the file is code which will test the complete matrix of logical possibilities for a three condition case.
thanks, Bill
"In the River of Delights, Panic has not failed me." Jorge Luis Borges
|
|
|
|
|
Hi Bill,
I had a play with this tonight and thought I'd post it for what it's worth. If the code generator needs to support a various number of boolean values, it may be of some use.
void Start()
{
bool[] bools = new bool[] { true, false, true};
string testCase = "Test case " + FindTestCase(bools);
}
int FindTestCase(bool[] bools)
{
string arrayCode = BoolArrayToBinaryString(bools);
foreach (string code in BoolArrayToBinaryStringPermuations(bools.Length))
if (code == arrayCode)
return Convert.ToInt32(code, 2);
return -1;
}
string BoolArrayToBinaryString(bool[] bools)
{
StringBuilder sb = new StringBuilder();
foreach (bool b in bools)
sb.Append(Convert.ToInt32(b));
return sb.ToString();
}
IEnumerable BoolArrayToBinaryStringPermuations(int count)
{
double max = Math.Pow(2, count);
for (int i = 0; i < max; i++)
yield return Convert.ToString(i, 2).PadLeft(count, '0');
}
[modified] took into account David's comments but decided against << due to readablity. Changed code and assigned a max variable so I was only killing the kitten once instead of at every iteration
"You get that on the big jobs."
modified on Friday, August 19, 2011 1:37 PM
|
|
|
|
|
RobCroll wrote: Math.Pow(2, count)
Please no. Every time you do that, a kitten dies.
Use this instead: 1 << count
|
|
|
|
|
I've never liked cat's anyway but wow I never knew about <<, I thought it was c syntax. Can you provide a link, google isn't play ball with "<<".
"You get that on the big jobs."
|
|
|
|
|