|
Apparently this is a popular shuffling algorithm. Unfortunately it has a couple of problems with it.
First, OrderBy is not a linear time operation, while shuffling should be. That is already sufficient reason to stop using it to shuffle, especially on big lists, but there is more.
Less obviously, it does not choose a permutation with uniform probability across all possible permutations. This will be clear if we intentionally make it worse first and work from there. Consider
list.OrderBy(x => random.Next() & 1)
Now every element only gets a random bit. Assume we shuffle a list with 2 elements A and B, there are only 2 outcomes, AB and BA. BA happens when the first element gets a 1 and the second element gets a 0. AB happens in all other cases, because in the case of a draw they are left in their original order.
This is obviously bad. One outcome is chosen with probability 0.25, and the other with probability 0.75. With more random bits both probabilities get closer and closer to 0.5, but they never quite make it all the way there.
Larger lists are worse. The probability of two elements being assigned the same random number gets large enough to worry about really quickly, which you have heard about as the birthday paradox. Any time a group of elements is assigned the same random number, those elements will not be reordered. Of course it is OK to not reorder elements, but the probability of that happening is larger than it should be. This is especially noticeable for lists larger than the space from which random numbers are drawn.
Consider what would have happened if the snippet above was used to sort a list of 3 elements, A B and C. At least 2 of them must get the same random value, because there are only two different values to randomly draw. That means that at least 2 elements remain in their original order, so it is impossible to generate CBA. For the full number of random bits this problem only happens for lists with over 2 billion elements, but for shorter lists it is still the case that permutations with many inversions are generated with a probability that is too low.
Here's an other way to look at it. Clearly it would work if all elements were assigned a unique random number. That is equivalent to applying a permutation with a key-value sort, which is a common (though inefficient) technique. It doesn't even need to be a permutation on the numbers 0 to N-1, it's OK if there are "gaps", that's just a relabeling. So as long as only unique random numbers are generated, it's fine. What is the probability that that does not happen? If I got the formula right then that means that for a list of length 10000 there is already a probability of 2% that you're in a "bad case" (with some duplicate), quickly growing to 90% for size 100000. In practice that would still be really hard to notice since those "bad cases" don't create an immediate problem, they're just subtly skewing the probability distribution, but it is still wrong.
|
|
|
|
|
Oh! A lecture!
You forgot only two very important elements of a lecture:
1) Somewhere in the middle you must say this, best after you have confused most of the audience:
"Obviously (something obscure) is (some even more obscure property).
2) At the end you must mention that these only were the essentials and say something like this:
"I leave (some terribly complicated case) as an exercise for you."
I have lived with several Zen masters - all of them were cats.
|
|
|
|
|
Well then how about this: I'll leave the calculation of the actual probability distribution as an exercise.
|
|
|
|
|
I have not done that in a long time.
I have lived with several Zen masters - all of them were cats.
|
|
|
|
|
I was waiting for the explaination how I can use this code I can cut and paste to break Las Vegas.
Installing Signature...
Do not switch off your computer.
|
|
|
|
|
Here you go:
public void BreakLasVegas(MissileLauncher Launcher)
{
if(MissileAcquireAndCommandCheck == Enums.MissileStatus.Go)
{
if(SetTarget("LasVegas") == Enums.MissileStatus.Go)
{
if(CheckSecurityLocksAndAuthorization() == Enums.MissileStatus.Go)
{
Launcher.Launch();
}
}
}
}
I have lived with several Zen masters - all of them were cats.
|
|
|
|
|
CodeWraith wrote: I have lived with several Zen masters - all of them were cats.
The difficult we do right away...
...the impossible takes slightly longer.
|
|
|
|
|
harold aptroot wrote: it does not choose a permutation with uniform probability across all possible permutations
You're assuming that you actually want a uniform probability. If you're displaying the results to a human, that's going to end up "feeling" less random:
Why 'random' shuffle feels far from random | The Independent[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
It fails that even worse, since it's biased towards keeping elements in order, while humans are biased towards expecting "mixing".
|
|
|
|
|
Nice! I was quite young when given this problem. (It is true: I was quite young in 1980). The language was Fortran and the container was an array of integers. So I solved by looping over all indexes and unconditionally swapping current index with random index. As far as I can tell this (accidentally) does not have the same flaw as your example. (AB happens after zero and two swaps, BA happens after either one of the possible swaps).
But what I am really after is: for this operation would I be wrong if I said that an indexable array, with my teenage algo, would give better performance than a linked list with the above code?
... such stuff as dreams are made on
|
|
|
|
|
"Lecture," or not, I really appreciate being able to read this ! In your view is Fisher-Yates enough for us mere mortals ?
cheers, Bill
«While I complain of being able to see only a shadow of the past, I may be insensitive to reality as it is now, since I'm not at a stage of development where I'm capable of seeing it. A few hundred years later another traveler despairing as myself, may mourn the disappearance of what I may have seen, but failed to see.» Claude Levi-Strauss (Tristes Tropiques, 1955)
|
|
|
|
|
Fisher-Yates is fine, you can even write a lazy version that yield return s the elements as soon as they have been chosen (it's really a sequence of random choices out of "the set of item you haven't had yet", every choice is final). Of course one should then be careful about their source of random numbers, for example a freshly-seeded instance of System.Random only has at most 31 bits of entropy (fewer if seeded with the current time) which on large shuffles limits the set of possible outcomes.
|
|
|
|
|
Hi, Harold,
I am curious if you consider this an implementation of what you describe:
public static class EnumerableExtensions
{
private static Random rnd;
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
{
rnd = new Random(DateTime.Now.Millisecond);
List<T> slist = source.ToList();
int scount = slist.Count();
int ndx;
while (--scount >= 0)
{
ndx = rnd.Next(0, scount);
yield return slist[ndx];
slist.RemoveAt(ndx);
}
}
} Other than using a different random generator, do you see any way to improve the "randomness" of this ?
«While I complain of being able to see only a shadow of the past, I may be insensitive to reality as it is now, since I'm not at a stage of development where I'm capable of seeing it. A few hundred years later another traveler despairing as myself, may mourn the disappearance of what I may have seen, but failed to see.» Claude Levi-Strauss (Tristes Tropiques, 1955)
|
|
|
|
|
It's probably fine WRT randomness, but it's accidentally a quadratic time algorithm now thanks to the RemoveAt in the middle. Instead of doing that, we could move slist[scount - 1] into slist[ndx] (optionally set slist[scount - 1] to default(T) ).
|
|
|
|
|
I think you mean:
while (--scount >= 0)
{
ndx = rnd.Next(0, scount);
yield return slist[ndx];
slist[ndx] = slist[scount];
} Using 'scount - 1 would throw an error at the last iteration. I can't make sense of the suggestion re using 'default T. I don't go deep enough down the stack to evaluate the expense of a call to'RemoveAt vs. a shift of a value to another slot.
By the way, this discussion is a perfect example of content on the Lounge that, imho, should be happening on (or preserved on) another Forum. But, I have beaten the drum over this issue so often that I'm not going there
thanks. Bill
«While I complain of being able to see only a shadow of the past, I may be insensitive to reality as it is now, since I'm not at a stage of development where I'm capable of seeing it. A few hundred years later another traveler despairing as myself, may mourn the disappearance of what I may have seen, but failed to see.» Claude Levi-Strauss (Tristes Tropiques, 1955)
|
|
|
|
|
I don't think it would be a problem because scount should never be 0 anyway (so that condition is slightly off). If scount is 0 that means there are 0 things to choose from.. Anyway here's an other possible implementation, from Jon Skeet:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
for (int i = elements.Length - 1; i >= 0; i--)
{
int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
}
}
Essentially he's working with an i that is always 1 lower than in your example, so there is no -1 but there is a +1.
|
|
|
|
|
Well, anytime I see code I've written without googling even slightly resembles Jon Skeet's code ... I feel a transient sense of not being totally stupid :
This will definitely throw an index error:
private static Random rnd;
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
{
rnd = new Random((Int32)DateTime.Now.Ticks);
List<T> slist = source.ToList();
int scount = slist.Count();
int ndx;
while (--scount >= 0)
{
ndx = rnd.Next(0, scount);
yield return slist[ndx];
slist[ndx] = slist[scount - 1];
}
}
«While I complain of being able to see only a shadow of the past, I may be insensitive to reality as it is now, since I'm not at a stage of development where I'm capable of seeing it. A few hundred years later another traveler despairing as myself, may mourn the disappearance of what I may have seen, but failed to see.» Claude Levi-Strauss (Tristes Tropiques, 1955)
|
|
|
|
|
Ok I get it, the non-standard loop threw me off. Your index is also "one lower" so it's also slist[scount] and rnd.Next(0, scount + 1) . It just depends on whether scount starts at the length of the list or at the highest index. The correct thing to combine with `rnd.Next(0, scount) is slist[scount - 1] but then of course the index needs to be one higher.
|
|
|
|
|
If you're seeding off of a time, you've got a much more serious problem than just a low number of bits of entropy. You're using a predictable seed which means an attacker can guess it relatively easily and pwn you much faster than brute force.
At an absolute minimum, use a Cryptographic RNG to seed Random. You could use RandomNumberGenerator directly everywhere; and for crypto need to if you want to be secure instead of a subject of lulz on hackernews a few years from now when you're epicly pwnd. But since it only outputs byte streams using it to create arbitrary integer ranges or floats is a PITA.
MSDN is lacking in details (shocking), but presumably to be cryptographically secure it seeds itself using the hardware RNG build into CPUs for the last 10 or 15 years (driven by thermal noise but too slow for use in anything that needs a lot of values).
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
Dan Neely wrote: If you're seeding off of a time, you've got a much more serious problem than just a low number of bits of entropy. You're using a predictable seed which means an attacker can guess it relatively easily and pwn you much faster than brute force. I'd be very interested to hear more on this.
thanks, Bill
«While I complain of being able to see only a shadow of the past, I may be insensitive to reality as it is now, since I'm not at a stage of development where I'm capable of seeing it. A few hundred years later another traveler despairing as myself, may mourn the disappearance of what I may have seen, but failed to see.» Claude Levi-Strauss (Tristes Tropiques, 1955)
|
|
|
|
|
If your seed is time based, an attacker only has to test the seeds from potential periods when it was restarted instead of all possible values. If you use a single Random app wise and initialized on startup or first use then by watching for an indication of a restart they can often narrow it down to a few minutes or an hour that was used for the seed. And even if they can'd do that if it's a Windows server it's probably been rebooted in the last month for patch Tuesday. Even if you're not rebooting the OS version can tell a lot. eg Windows Server 2016 only came out in September, so you've got a year of timestamps to work with. Finding a bug that lets you crash the server lets you narrow the start time, and thus the seeds you need to test to a few minutes window.
For the specific example in the thread new Random(DateTime.Now.Millisecond); there are only a 1000 possible seeds 0-999; testing all of them until you find the one that generates a random sequence which matches your output is trivial since you've only got 10 bits of entropy.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
Thanks, Dan, for the fascinating reply. Usually I use 'Ticks:
Random rnd = new Random((Int32)DateTime.Now.Ticks);
not 'Milliseconds, for a seed ... an old habit: I can't even remember why I started doing ... I believe that's probably a waste of time since, I read, the .NET implementation (now ?) uses 'Ticks by default.
cheers, Bill
«While I complain of being able to see only a shadow of the past, I may be insensitive to reality as it is now, since I'm not at a stage of development where I'm capable of seeing it. A few hundred years later another traveler despairing as myself, may mourn the disappearance of what I may have seen, but failed to see.» Claude Levi-Strauss (Tristes Tropiques, 1955)
modified 25-Oct-17 12:49pm.
|
|
|
|
|
That'd be marginally less bad in that Ticks is higher in resolution, but is still approximately predictable based on knowing when the app started.
Ultimately it's MS's fault though for:
0) Not having a crypto secure source for a seed in .net 1.0
1) Not having the courage to break the legacy default behavior in 2.0 by having their shiny new crypto RNG at least seed Random()
2) Not making the Crypto RNG classes feature rich enough to be dropin replacements for the legacy class.
There are cases were all you need is good enough randomness and an RNG that runs an order of magnitude faster than a cryptographically secure one is preferable; but outside of a hotloop that should never be the case. Because developers are lazy and stupid and lazy and copypasta addicts and lazy the default Random() should be a cryptographically secure RNG seeded with a hardware RNG. For cases where the exception is needed both Random and FasterInsecureRandom can both implement IRandom for ease of swapping the latter in.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
If the shuffle has to be cryptographically strong then System.Random is not an option at all anyway..
|
|
|
|
|
I have to admit that I'm guilty of this... I need to solve a problem, so I Google, find a working solution and paste it into my code. It works, but you don't necessarily understand the solution. On the one hand having answers so easily available speeds up development, but on the other it makes for developers that don't really understand what they are doing?
|
|
|
|
|