|
That sounds horribly familiar.
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
Alan Kay.
|
|
|
|
|
[sarcasm on]
Everyone knows foreign keys have been imposed by storage-devices-manufacturers lobbys.
[sarcasm off]
"I'm neither for nor against, on the contrary." John Middle
|
|
|
|
|
Haha.. this is funny - not! I've come across this all the time. The problem "back then" were application designers that had no idea, zilch, nada; about database structures. Yet, here we are in 2018 and I still see the same mistakes. Primary key every table with an auto-fill ID column. Yes, you have a primary key, congratulations! But what's the point if you don't have a reference to the ID from another table? Did someone forget that the ID Primary Key column is a machine necessity, if at all? Like who care my next record starts with ID 2001!? A true Primary Key is one created based on data "snippets" off other columns in the table. Yeah baby... bring back Excel databases!
Just do what I do when databases have no relations... DROP DATABASE.. oh but don't forget to save your data first.
|
|
|
|
|
Fandango90 wrote: A true Primary Key is one created based on data "snippets" off other columns in the table.
The "natural vs surrogate primary key" debate is like tabs vs spaces - some people insist that there's only one "correct" way to do, whilst others make a decision on a case-by-case basis.
SQL Server: Natural Key Verses Surrogate Key — DatabaseJournal.com[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Fandango90 wrote: the same mistakes. Primary key every table with an auto-fill ID column
Sorry, I don't see how this is a mistake.
Fandango90 wrote: Yes, you have a primary key, congratulations! But what's the point if you don't have a reference to the ID from another table? I think you may have misunderstood the OP's issue. They didn't de-normalize, only removed the FK constraints.
Fandango90 wrote: Did someone forget that the ID Primary Key column is a machine necessity, if at all? Like who care my next record starts with ID 2001!?
I'm not even sure what you mean by 'machine necessity'. Perhaps it's sarcasm and I'm not getting it.
Fandango90 wrote: A true Primary Key is one created based on data "snippets" off other columns in the table
Again, I can't tell if this is sarcasm or a real opinion. So an identity or guid is not a true PK, but a couple of columns where one or more values can change is??? (or, just throw in a timestamp to be sure!) It sure makes future record maintenance a lot easier when I can tag a record with a single condition. Anyway, it would probably make an interesting poll.
"Go forth into the source" - Neal Morse
|
|
|
|
|
A relation may have multiple candidate keys, any one of which may be selected as primary key.
Any attribute that is subject to mutation is not suitable - because they simply cannot be used to identify a row.
The "A true PK is one created on data "snippets" off other columns in the table" is explicitly the wrong way round in terms of normalisation. Every other column should be dependent on the whole of every candidate key, and have no dependencies on anything else. Here, Fandango seems to be proposing exactly the opposite - having the PK dependent on every other column, which is pure insanity, and worthy of an entry in this forum all by itself
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
Alan Kay.
|
|
|
|
|
Maybe you did it one night after taking some Adderol...
#SupportHeForShe
Government can give you nothing but what it takes from somebody else. A government big enough to give you everything you want is big enough to take everything you've got, including your freedom.-Ezra Taft Benson
You must accept 1 of 2 basic premises: Either we are alone in the universe or we are not alone. Either way, the implications are staggering!-Wernher von Braun
|
|
|
|
|
was probably having issues removing a row ! HAAAAAA !
|
|
|
|
|
You probably all knew this, but I needed a definitive answer.
I'm using a ConcurrentQueue which is a FIFO collection.
I needed to know that when I went to determine if the queue already had the item, if it would return faster when the item was found at the top (next item out) versus found at the bottom (last out position).
I assumed, but wasn't sure.
This code gives a good definitive answer.
ConcurrentQueue<int> mainList = new ConcurrentQueue<int>();
Console.WriteLine("Adding items...");
for (int x = 1; x < 10000000;x++){
mainList.Enqueue(x);
}
Console.WriteLine("DONE adding items...");
Stopwatch sw = new Stopwatch();
Console.WriteLine("Searching for FIRST item...");
sw.Start();
mainList.FirstOrDefault(item => item == 1);
sw.Stop();
Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);
sw = new StopWatch();
Console.WriteLine("Searching for LAST item...");
sw.Start();
mainList.FirstOrDefault(item => item == 9999999);
sw.Stop();
Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);
Output
Adding items...
DONE adding items...
Searching for FIRST item...
0ms
Searching for LAST item...
99ms
The consistent result is around 100ms on my machine.
EDIT
This addition seems to bear expected results too:
for (int z = 9999999; z >= 1;z--){
if (z % 1000000 == 0) {
sw = new Stopwatch();
sw.Start();
mainList.FirstOrDefault(item => item == z);
sw.Stop();
Console.WriteLine ("{0} : {1}ms",z, sw.ElapsedMilliseconds);
}
}
Output for this section is:
9000000 : 99ms
8000000 : 89ms
7000000 : 78ms
6000000 : 71ms
5000000 : 55ms
4000000 : 44ms
3000000 : 33ms
2000000 : 21ms
1000000 : 10ms
Interesting that on my machine it is about 10ms per 1,000,000 items.
EDIT 2
But, wait...there's more!
What about for objects?
I'm creating a queue of anonymous objects via dynamic -- basically cuz I'm lazy.
Then I test to see how long it takes to find the matching value. Answer: Longer
Right, we all assumed longer. The amount of time could just be unboxing of object to get value or whatever.
This answers the question for me though, because I'm going to be putting a bunch of items in a ConcurrentQueue and I want to add more items but don't want to add an item if the collection already contains the item.
void Main()
{
ConcurrentQueue<dynamic> mainList = new ConcurrentQueue<dynamic>();
Console.WriteLine("Adding items...");
for (int x = 1; x < 10000000;x++){
mainList.Enqueue(new {x=x,name="x_" + x});
}
Console.WriteLine("DONE adding items...");
Stopwatch sw = new Stopwatch();
Console.WriteLine("Searching for FIRST item...");
sw.Start();
mainList.FirstOrDefault(item => item.x == 1);
sw.Stop();
Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);
sw = new Stopwatch();
Console.WriteLine("Searching for LAST item...");
sw.Start();
mainList.FirstOrDefault(item => item.x == 9999999);
sw.Stop();
Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);
for (int z = 9999999; z >= 1;z--){
if (z % 1000000 == 0) {
sw = new Stopwatch();
sw.Start();
mainList.FirstOrDefault(item => item.x == z);
sw.Stop();
Console.WriteLine ("{0} : {1}ms",z, sw.ElapsedMilliseconds);
}
}
}
Output Looks Like
Adding items...
DONE adding items...
Searching for FIRST item...
4ms
Searching for LAST item...
753ms
9000000 : 643ms
8000000 : 575ms
7000000 : 496ms
6000000 : 430ms
5000000 : 362ms
4000000 : 285ms
3000000 : 217ms
2000000 : 141ms
1000000 : 72ms
modified 5-Feb-18 17:25pm.
|
|
|
|
|
raddevus wrote: I assumed, but wasn't sure.
This code gives a good definitive answer. Hence, the upvote.
Proof is always better than an assumption
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
Not particularly surprising, given the amount of work it's doing.
Reference Source: ConcurrentQueue.cs[^]
It looks like it's implemented as a singly linked list of small arrays (32 items each). Searching a linked list for the last item, or for an item that doesn't exist, is always going to take longer than finding an item close to the start.
It also seems that the order of searches has some effect. Try searching for 1st followed by 31st; then try 31st followed by 1st.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Very good confirming information on that. Thanks a lot.
Richard Deeming wrote: It also seems that the order of searches has some effect. Try searching for 1st followed by 31st; then try 31st followed by 1s
I actually noticed that too as I tested, but my original post became so long I decided not to mention more.
|
|
|
|
|
It is seems to me a very bad practice to use a FIFO/LIFO oriented collection to anything else than push/pop...
Skipper: We'll fix it.
Alex: Fix it? How you gonna fix this?
Skipper: Grit, spit and a whole lotta duct tape.
|
|
|
|
|
Kornfeld Eliyahu Peter wrote: seems to me a very bad practice to use a FIFO/LIFO oriented collection to anything else than push/pop
Yes, a good observation.
I had initially solved this with a ConcurrentDictionary.
Someone else suggested the Queue to me.
1. I have a memory collection which continues to get re-loaded from the database on an interval.
2. I only want to load new items into the collection
3. However, when I get an item, I just want to pop off the oldest item from the collection.
This lead me to two different ways to do this.
1. Queue - load up the queue on interval - insure an item is never added more than once. -- later, simply pop off the oldest item (First-In). The getter code is a bit more simple (obvious / self-documenting) with the Queue.
2. Dictionary - load up Dictionary on interval - TryAdd() handles this for me. As long as I load the dictionary from the database in reverse order (Stored Proc returns oldest first) all I have to do is call .First() on Dictionary when I want the oldest item. The Queue obviously gives you the First added item automatically.
I've now done detailed testing and using the ConcurrentDictionary is much less CPU intensive, but might be slightly more RAM hungry.
|
|
|
|
|
raddevus wrote: all I have to do is call .First() on Dictionary when I want the oldest item.
That's an assumption that's going to bite you in the future. The dictionary classes don't guarantee that the enumerator will return items in any specific order. The order could change with a framework update; or it could change when your dictionary exceeds a certain size.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Ah, interesting. Very good info. So back to the Queue, it is.
There are always trade-offs.
I was sorting the Dictionary first via
waitingFiles.OrderByDescending(of => of.Value.FileCreated).First();
But that is really intensive because from what I can tell those LINQ objects are instantiated to search for the one that matches.
So, I was hoping (and it is really fast and nice) that since I load the data into the Dictionary in ascending order (oldest files first) that I could count on just calling First().
I mean it does work now. I was actually testing the order that the files are then processed.
However, I understand that it is not guaranteed.
Interesting and why I'm trying not to make assumptions about which is better ( Dictionary or Queue ) in this instance.
|
|
|
|
|
Question, do you need to check all or any of the elements in the list, or is it enough to check the last one?
If so you would do better off with a linked list.
There are no concurrent linked lists though. Apparently it was around in the beta versions of dotNet 4.0 but it was left out from the release version as there were no advantages over using a normal linked list with locking.
But a simple lock would still be faster since you don't need to enumerate the collection.
|
|
|
|
|
Thanks for chiming in on the conversation. The challenge I have here is that I need to insure the item is not currently anywhere in the collection -- an item may have been loaded at any location.
There are probably other ways around this but just attempting to do it the easiest, yet most correct way which uses the least resources.
The collection is used in two ways :
1. caller will request the oldest unique item
2. collection will periodically load newer items into itself
The interesting thing about this is that the Queue makes on of those requirements easy and the Dictionary makes the other requirement easy.
The first one is easy with the Queue since you just get the first in item.
The second one is difficult with a Queue (there is no key) so you have to determine that the object is not already contained in the Queue for every item, each time the items are retrieved from db.
The second one is easy with a Dictionary since you can do TryAdd() with your unique key and it fails fast if the item has already been added).
However, the first one is more difficult with a Dictionary because the items are not (necessarily) ordered in a significant way and we need to get the oldest item.
It's an interesting problem.
|
|
|
|
|
If you can afford the extra memory usage, the simplest option would probably be a custom class to store the data twice - once in a HashSet<T> , and once in a Queue<T> .
You'd obviously need to add your own locking code to ensure it was thread-safe.
Off the top of my head, something like this might work:
public sealed class UniqueQueue<T> : IEnumerable<T>, IDisposable
{
private readonly Queue<T> _queue = new Queue<T>();
private readonly HashSet<T> _set = new HashSet<T>();
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
public int Count
{
get
{
bool lockTaken = false;
try
{
lockTaken = _lock.TryEnterReadLock(Timeout.Infinite);
if (!lockTaken) throw new InvalidOperationException();
return _queue.Count;
}
finally
{
if (lockTaken)
{
_lock.ExitReadLock();
}
}
}
}
public void Dispose()
{
_lock.Dispose();
}
public T Dequeue()
{
bool lockTaken = false;
try
{
lockTaken = _lock.TryEnterWriteLock(Timeout.Infinite);
if (!lockTaken) throw new InvalidOperationException();
T oldestItem = _queue.Dequeue();
_set.Remove(oldestItem);
return oldestItem;
}
finally
{
if (lockTaken)
{
_lock.ExitWriteLock();
}
}
}
public void AddRange(IEnumerable<T> itemsToAdd)
{
if (itemsToAdd == null) throw new ArgumentNullException(nameof(itemsToAdd));
bool lockTaken = false;
try
{
lockTaken = _lock.TryEnterWriteLock(Timeout.Infinite);
if (!lockTaken) throw new InvalidOperationException();
foreach (T item in itemsToAdd)
{
if (_set.Add(item))
{
_queue.Enqueue(item);
}
}
}
finally
{
if (lockTaken)
{
_lock.ExitWriteLock();
}
}
}
public UniqueQueueEnumerator GetEnumerator()
{
T[] copyOfItems = Array.Empty<T>();
bool lockTaken = false;
try
{
lockTaken = _lock.TryEnterReadLock(Timeout.Infinite);
if (!lockTaken) throw new InvalidOperationException();
copyOfItems = new T[_queue.Count];
_queue.CopyTo(copyOfItems, 0);
}
finally
{
if (lockTaken)
{
_lock.ExitReadLock();
}
}
return new UniqueQueueEnumerator(copyOfItems);
}
IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public struct UniqueQueueEnumerator : IEnumerator<T>
{
private readonly T[] _items;
private int _position;
public UniqueQueueEnumerator(T[] items)
{
_items = items;
_position = -1;
}
void IDisposable.Dispose()
{
}
public void Reset()
{
if (_items == null) throw new InvalidOperationException();
_position = -1;
}
public bool MoveNext()
{
if (_items == null) return false;
return ++_position < _items.Length;
}
public T Current
{
get
{
if (_items == null) throw new InvalidOperationException();
if (0 > _position || _position >= _items.Length) throw new InvalidOperationException();
return _items[_position];
}
}
object IEnumerator.Current => Current;
}
}
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
This is a very good suggestion and I saw something very similar at StackOverflow.
But, as you mentioned I was attempting to keep the memory size low too and it is interesting that this does store the objects in two collections.
Also, I am actually using the ConcurrentQueue and ConcurrentDictionary which are thread-protected automatically so that would solve that part of the problem.
Thanks for the input and discussion.
|
|
|
|
|
raddevus wrote: I am actually using the ConcurrentQueue and ConcurrentDictionary which are thread-protected automatically
Ish.
Modifications to one of those collections will automatically be thread-safe. But if you're trying to read or update the two in a single atomic operation, you'll still need your own locking mechanism.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
I had a feeling it wouldn't be that simple.
Richards suggestion would've been my second one, minus the code sample though. (impressive that!)
Second question though, you wrote in another post that you load the objects from a database.
Are the objects in the database not unique? How do you ensure the uniqueness if you have popped an object from the list?
Or is the uniqueness coming from the identity column in the database? If so you could still use the linked list. Using a paged and sorted query.
modified 9-Feb-18 13:14pm.
|
|
|
|
|
raddevus wrote: I needed to know that when I went to determine if the queue already had the item, if it would return faster when the item was found at the top (next item out) versus found at the bottom (last out position).
I assumed, but wasn't sure.
This code gives a good definitive answer
You forgot the important qualifier through -- this code gives a good definitive answer for build XXX of the library. In the next build, the implementer may find a bug with the way they're doing things and have to change it, altering the performance. If you're writing code that depends that much on the performance of some class, you need to write the class yourself.
|
|
|
|
|
|
Without showing code there's little interesting here (especially since "nothing" was fixed ) ...
There's 'yer chuckle though.
kmoorevs wrote: fixed That's like some sort of like quantum foam, right?
if i'm not mistaken
|
|
|
|