Algorithms Discussion Boards

Eduard Keilholz27-Oct-13 21:50

27-Oct-13 21:50

Hey guys!
Recently I posted this question "Picking entities from a group[^]" on the codeproject. I was advised to move it to this forum because some of you clever ones may have a way to go.

My problem is that I have a group of n people (in my example I said 8 but let's say a minimum of four). I want each person in the group to point to a different person in the group. This means no person can point to himself. Also if a person points to some person in the group, no other person in the group can point to the same person. So everyone in the group must point to a person, and everyone in the group is pointed at. I also want the process to be random.

For now I've already done this by creating a generic list of 'allowed' persons, pick a random person from the list, remove that person from the list and so on.

Now here comes the difficult part..
I also want to be able to configure impossible combinations. Like Person A cannot point to Person B and Person B cannot point to Person A.

This makes the process way more difficult. I'd like to have the ability to test whether a certain configuration is possible and to only pick between these possible 'combination sets'.

My solution now, is a trial on error method. I start a large loop and start to random picking just like above. I generate a list of possibilities and pick one of the possibilities. If the list of possibilities happens to be empty, that means I ran into an impossible 'combination set' and start the entire process again. If the picking fails a certain number of times, I assume the configuration is impossible.

As you understand my solution for now isn't ideal, 'cause there's no way to check whether a configuration is possible and if my picking routine tells me the configuration is not possible you're not actually sure the config is not possible.

I think I got my problem explained well as my English is fairly good but way from perfect. Hope you guys can point me in a good direction.

Thanks! Eduard

.: I love it when a plan comes together :.

Re: Picking persons in a group algorithm

C^hris L^osinger28-Oct-13 6:01

C^hris L^osinger

28-Oct-13 6:01

this sounds like something Dynamic Programming[^] was designed to help with.

image processing toolkits | batch image processing

Re: Picking persons in a group algorithm

Richard Deeming28-Oct-13 12:57

Richard Deeming

28-Oct-13 12:57

This sounds like a type of directed[^] graph[^] with indegree and outdegree 1. It has similarities to the Hamiltonian path problem[^], except that you could have multiple disconnected paths within the same graph.

Not sure if that helps at all, or just confuses the problem! Hopefully, it might give you somewhere to start looking. Smile | :)

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Re: Picking persons in a group algorithm

Richard Deeming29-Oct-13 2:45

Richard Deeming

29-Oct-13 2:45

Here's another way of thinking about this:

You have an array of values. The index into the array is the zero-based number of the person doing the pointing, and the value is the zero-based index of the person being pointed at. This automatically satisfies the rule that each person can only point to one other person. You then only need to validate that nobody is pointing to themselves, nobody is pointing to a forbidden person, and there are no duplicates.

public struct ForbiddenConnection
{
    private readonly int _source;
    private readonly int[] _destination;
    
    private ForbiddenConnection(int source, int[] destination)
    {
        _source = source;
        _destination = destination;
    }
    
    public static ForbiddenConnection Create(int source, IEnumerable<int> destination)
    {
        return new ForbiddenConnection(source, SortedDistinct(destination).ToArray());
    }
    
    public static ForbiddenConnection Create(int source, params int[] destination)
    {
        return Create(source, destination.AsEnumerable());
    }
    
    private static IEnumerable<int> SortedDistinct(IEnumerable<int> source)
    {
        if (source == null) yield break;
        
        bool started = false;
        int lastItem = 0;
        foreach (int item in source.OrderBy(x => x))
        {
            if (started && item == lastItem) continue;
            
            yield return item;
            lastItem = item;
            started = true;
        }
    }
    
    public int Source
    {
        get { return _source; }
    }
    
    public bool Contains(int destination)
    {
        return _destination != null && Array.BinarySearch(_destination, destination) >= 0;
    }
}

public sealed class GraphGenerator : IEnumerable<IReadOnlyList<int>>
{
    private readonly int _length;
    private readonly IDictionary<int, ForbiddenConnection> _forbidden;
    
    private GraphGenerator(int length, IEnumerable<ForbiddenConnection> forbidden)
    {
        _length = length;
        
        if (forbidden != null)
        {
            _forbidden = forbidden.ToDictionary(fc => fc.Source);
        }
        else
        {
            _forbidden = new Dictionary<int, ForbiddenConnection>(0);
        }
    }
    
    public static GraphGenerator Create(int length, IEnumerable<ForbiddenConnection> forbidden)
    {
        if (length < 2) throw new ArgumentOutOfRangeException("length");
        return new GraphGenerator(length, forbidden);
    }
    
    public static GraphGenerator Create(int length, params ForbiddenConnection[] forbidden)
    {
        return Create(length, forbidden.AsEnumerable());
    }
    
    private bool Validate(IReadOnlyList<int> graph)
    {
        Debug.Assert(graph != null);
        Debug.Assert(graph.Count == _length);
        
        for (int i = 0; i < graph.Count; i++)
        {
            int index = graph[i];
            Debug.Assert(0 <= index && index < _length);
            
            if (index == i)
            {
                Trace.TraceInformation("Node {0} points to itself.", i);
                return false;
            }
            
            ForbiddenConnection forbidden;
            if (_forbidden.TryGetValue(i, out forbidden) && forbidden.Contains(index))
            {
                Trace.TraceInformation("Node {0} cannot point to node {1}.", i, index);
                return false;
            }
            
            for (int j = i + 1; j < graph.Count; j++)
            {
                if (graph[j] == index)
                {
                    Trace.TraceInformation("Nodes {0} and {1} both point to node {2}.", i, j, index);
                    return false;
                }
            }
        }
        
        return true;
    }
    
    public IEnumerator<IReadOnlyList<int>> GetEnumerator()
    {
        // TODO: Return every valid permutation.
    }
    
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Now you just need to generate every permutation of indices for your graph, and return those which are valid. Eric Lippert has an excellent seven-part series on generating permutations[^], so we can borrow some of his code:

public struct Permutation : IReadOnlyList<int>
{
    public static readonly Permutation Empty = default(Permutation);
    
    private readonly int[] _value;
    
    private Permutation(Permutation source, int index, int value)
    {
        if (source.Count == 0)
        {
            _value = new[] { value };
        }
        else
        {
            _value = new int[source.Count + 1];
            _value[index] = value;
            
            if (index != 0) 
            {
                Array.Copy(source._value, 0, _value, 0, index);
            }
            if (index != source.Count)
            {
                Array.Copy(source._value, index, _value, index + 1, source.Count - index);
            }
        }
    }
    
    public int Count
    {
        get { return (_value == null) ? 0 : _value.Length; }
    }
    
    public int this[int index]
    {
        get
        {
            if (0 > index || index >= Count) throw new ArgumentOutOfRangeException("index");
            return _value[index];
        }
    }
    
    public IEnumerator<int> GetEnumerator()
    {
        if (_value == null) return Enumerable.Empty<int>().GetEnumerator();
        return _value.AsEnumerable().GetEnumerator();
    }
    
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
    
    public static IEnumerable<Permutation> HamiltonianPermutations(int ofSize)
    {
        if (ofSize < 0) throw new ArgumentOutOfRangeException("ofSize");
        return HamiltonianPermutationsIterator(ofSize);
    }
    
    private static IEnumerable<Permutation> HamiltonianPermutationsIterator(int ofSize)
    {
        if (ofSize == 0)
        {
            yield return Empty;
            yield break;
        }
        
        bool forwards = false;
        foreach (var permutation in HamiltonianPermutationsIterator(ofSize - 1))
        {
            for (int index = 0; index < ofSize; index++)
            {
                int insertAt = forwards ? index : ofSize - index - 1;
                yield return new Permutation(permutation, insertAt, ofSize - 1);
            }
            
            forwards = !forwards;
        }
    }
}

This leads to a very simple implementation of the GraphGenerator.GetEnumerator method:

public IEnumerator<IReadOnlyList<int>> GetEnumerator()
{
    return Permutation.HamiltonianPermutations(_length)
        .Cast<IReadOnlyList<int>>()
        .Where(Validate)
        .GetEnumerator();
}

Using it would look something like this:

var generator = GraphGenerator.Create(8,
    ForbiddenConnection.Create(0, 1), // Person A cannot point to person B
    ForbiddenConnection.Create(1, 0)  // Person B cannot point to person A
);

if (!generator.Any())
{
    // Configuration is not possible
}

For eight people with no forbidden connections, there are 14833 possible graphs. On my computer, enumerating every possible graph takes approximately 0.11 seconds, which should be fast enough for most purposes.

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Re: Picking persons in a group algorithm

BillWoodruff29-Oct-13 5:22

BillWoodruff

29-Oct-13 5:22

Upvoted.

What a fascinating answer ! I look forward to studying this code.

thanks, Bill

Google CEO, Erich Schmidt: "I keep asking for a product called Serendipity. This product would have access to everything ever written or recorded, know everything the user ever worked on and saved to his or her personal hard drive, and know a whole lot about the user's tastes, friends and predilections." 2004, USA Today interview

using genetic algorithms to improve voice commands recognition

Member 997361615-Oct-13 0:24

Member 9973616

15-Oct-13 0:24

I'm using "system.speech" library to recognize voice commands, but I want to use genetic algorithms to improve recognition, and I'm new to GAs and I've some questions:

How to implement this (what is the population in this situation? the genomes..)

Can this be done using the c# speech recognition library or should I use FFT....

Any other advice ll be apprecited. Thank you

modified 15-Oct-13 10:21am.

How to Remove duplicate files rapidly?

happy liu7-Oct-13 22:35

happy liu

7-Oct-13 22:35

The current thinking is to calculate hash for every file ,and then delete them.
Now we can use MD5 or SHA1 to calculate hash.From the speed point of view, MD4 is a better choice when MD4 is faster but more unsafer.But after experiment, i found even md4 is so slow that it would cost a lot of time when i dealt with large files.
So any algorithm to work faster?

Re: How to Remove duplicate files rapidly?

jschell8-Oct-13 10:28

jschell

8-Oct-13 10:28

If the name and size are the same, then delete them.

Re: How to Remove duplicate files rapidly?

happy liu8-Oct-13 14:51

happy liu

8-Oct-13 14:51

yeah ,it is a good idea.
Another situation is that -The contents of file is the same but names are different.
And i want to know which kind of hash algorithm has the best speed?
or any other way to speed up?

Re: How to Remove duplicate files rapidly?

Manfred Rudolf Bihy9-Oct-13 0:14

Manfred Rudolf Bihy

9-Oct-13 0:14

You can considerably speed up the process: If you have mostly big files you can use a two step hash comparison. In the first step you take only say 1 - 5 KB from the beginning of the file to build your hash (I'd recommend SHA2[^]). You only perform a hash calculation for the complete file if the hashes for two files match.

Best,

Manfred

"I had the right to remain silent, but I didn't have the ability!"

Ron White, Comedian

Re: How to Remove duplicate files rapidly?

happy liu9-Oct-13 16:44

happy liu

9-Oct-13 16:44

Thanks for your advice.
However i wonder why SHA2 should be a better choice which is slower than MD5 or MD4?

Re: How to Remove duplicate files rapidly?

Manfred Rudolf Bihy9-Oct-13 16:51

Manfred Rudolf Bihy

9-Oct-13 16:51

It's been developed to be more robust, i.e. the probability to have collisions is reduced.
Just look it up, there are plenty of good explanations on the internet on what the actual improvements over its predecessors are.

Cheers!

"I had the right to remain silent, but I didn't have the ability!"

Ron White, Comedian

Re: How to Remove duplicate files rapidly?

happy liu9-Oct-13 16:52

happy liu

9-Oct-13 16:52

Thanks! ^_^

Re: How to Remove duplicate files rapidly?

saephoed8-Jan-14 12:50

saephoed

8-Jan-14 12:50

i found a similar solution when i had to recover 4.5TB of files mostly between .5 - 15GB where around 1-2% of them were early terminated during a copy process, but left with correct filesize filled up with zeroes.

it adds to your suggestion to scan all big files with big fixed and some random jumps up to their ends.

Re: How to Remove duplicate files rapidly?

Kornfeld Eliyahu Peter9-Oct-13 0:50

Kornfeld Eliyahu Peter

9-Oct-13 0:50

It is almost sure that the problem not your hash algorithm but I/O. All (most) hash algorithms are short and fast enough to not blame them...

If you have large number of files you have to rethink your approach.

1. use file system's FileInfo - name, size, creation, last modified and so
2. if you can't you may consider to hash only the first block (4K) of every file and go forward only for those found the same.

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is (V).

Re: How to Remove duplicate files rapidly?

happy liu9-Oct-13 16:42

happy liu

9-Oct-13 16:42

Thank you V. ^_^
Your advice is good. In the beginning i had not a clear understanding of this question.
Now i agree I/O should be the key point, and i wish i can find a way to resolve my problem.
Thanks!

Re: How to Remove duplicate files rapidly?

Kornfeld Eliyahu Peter10-Oct-13 1:10

Kornfeld Eliyahu Peter

10-Oct-13 1:10

Yes I/O is a key issue...
If it is in your power you may move the storage to a more capable one, like SCSI or flash, that should speed up I/O...

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is (V).

Re: How to Remove duplicate files rapidly?

C^hris L^osinger10-Oct-13 5:05

C^hris L^osinger

10-Oct-13 5:05

i wrote a little app to do this for my own use, and i used multiple tests:

for every file to compare, find:
1. size - if they aren't the same size, they aren't the same file.
2. a copy of the first 100 bytes - if the first 100 bytes don't match, they aren't the same file. this is easy and fast to compare, requires no computation, and doesn't take much to store.

that stage is very fast. it generally takes far longer to just get the list of files than it does to do all of those tests.

after that,
calculate the hash (i used SHA1) of the first few KB for each of the remaining files. compare the hashes.

after those two stages, you will have eliminated most of the non-duplicates. then, you can do a full hash of the remaining files to find any definite duplicates.

image processing toolkits | batch image processing

Re: How to Remove duplicate files rapidly?

Kornfeld Eliyahu Peter10-Oct-13 5:42

Kornfeld Eliyahu Peter

10-Oct-13 5:42

You may give a polish to that code and publish it here as a tip or article...

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is (V).

Re: How to Remove duplicate files rapidly?

Manoj Kumar Rai13-Oct-13 9:26

Manoj Kumar Rai

13-Oct-13 9:26

I agree one can have mupltiple type of algorithms based on suitability:

1. check for size if does not matches then those files are not same.

2. Compare 2% character from begining and then from end if those does not matches then files are not same.

3. Finally it can be once option to simply compare the remaining part of both files or use CRC, checksum, MD SHA. If MD and SHA or not already calculated and stored then those will be a bit costly.

Manoj
Never Gives up

Re: How to Remove duplicate files rapidly?

theafien3-Dec-13 3:58

theafien

3-Dec-13 3:58

store information as bytes size and simple crc code.
performe loop and initially check bytes size if equal performe crc or hash check.
Or too do hash checking only when bytes then equal storing hash for future use.

Estimating RGB function blackbox

codetowns27-Sep-13 23:04

codetowns

27-Sep-13 23:04

Hi,

I have a 2D color map (without a color bar depth key) that depicts depth in various colors. Ideally I would like to get an estimate of all relative depths from a function of the RGB values of a color.

What I do know is the colors of the highest, middle and lowest points. So I have something like:

Fn(R,G,B) = Depth
Fn(255,255,0) = 1000
Fn(125,125,0) = 500
Fn(22, 125, 20) = 0

I realise that there won't be an exact solution and some sort of linear assumption would have to be made, but can anyone suggest a good way to approach this? (assuming it is do-able!)

I guess the images I'm talking about would look similar to this one but have various non-rainbow color changes and also have no depth key:

Similar map link

Many thanks for any assistance.

Re: Estimating RGB function blackbox

Alan Balkany30-Sep-13 4:16

Alan Balkany

30-Sep-13 4:16

A simple, naive approach is to collect as many (R, G, B, Depth) tuples as you can get, then do multiple linear regression with Depth as the dependent variable.

If the linear assumption doesn't give satisfactory results, you can try multiple polynomial regression, or splitting the data set into two partitions, and using regression separately on each.

Re: Estimating RGB function blackbox

codetowns1-Oct-13 5:26

codetowns

1-Oct-13 5:26

Many thanks for the help.

I'm pretty sure that linear regression won't work because I have an image that has large Green values at both high and low depth.

If I understand it right, Polynomial regression for R, G or B may sometimes work but I think that often the depth can be an unknown function of all three colours.

I wonder if it makes sense to combine the best Polynomial regression fit for each one somehow?

Re: Estimating RGB function blackbox

Alan Balkany1-Oct-13 5:29

Alan Balkany

1-Oct-13 5:29

"...I have an image that has large Green values at both high and low depth."

I suspected this, which is why I suggested dividing the space into separate partitions and solving them independently. Statisticians call this "elaboration".

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.