Design and Architecture

25-Feb-15 7:21

For the generation of primary keys in n-Tier applications I only ever see two strategies: Either client-side generation of a GUID or a temporary Integer that gets replaced by the DAL and the DAL reports back to the client the value that was actually assigned.

My alternative idea is to let the client request 1..n new primary key value(s) from the DAL whenever it needs one/some. This way I wouldn't have to cope with ugly GUID's (I don't plan for DB-merge-ability) and I avoid awkward client-logic for replacing temporary primary keys.

Have I simply not yet found some projects that do it this way or is there some flaw in that strategy that I'm not aware of?

- Sebastian

Wendelius25-Feb-15 7:57

Wendelius

25-Feb-15 7:57

I'd say this would highly depend on the philosophy of the DAL. For example why would you need to generate temporary primary keys. Let it be empty until a value is returned from the database.

If the referential information isn't seen from the object hierarchy and you need a value for foreign keys until they are actually created, why not use HashCode in such situations. Even better, the object could return the hashcode as a primary key until the actual primary key is generated in the databsae.

In all cases I'd always let the database generate the surrogate key value, never the DAL layer.

manchanx25-Feb-15 21:24

25-Feb-15 21:24

Thank you for your response Mika!

Mika Wendelius wrote:
In all cases I'd always let the database generate the surrogate key value, never the DAL layer.

What I meant is that the client sends a request for key-provision to the DAL which then lets the database generate it through sequences.

<small>Mycroft Holmes</small> wrote:
Listen to Mika, let the database generate the primary keys, you may need to submit in sequence so you get the PK to be used as a foreign key on the children.

The application for which I will use the framework I'm designing here first has a use case where usually about 1000 entities plus sub-entities are created. I don't want to make that many remote calls for performance reasons.

Mika Wendelius wrote:
If the referential information isn't seen from the object hierarchy and you need a value for foreign keys until they are actually created

Yes, I need some value for foreign keys - object references are "virtual" through keys, to facilitate lazy instantiation and cache expiration/GC. Not sure if I'm missing something but I see some problems with the HashCode approach: In contrast to Guids, HashCode-collisions are much more likely. On top of that they could collide with pre-existing keys. A deterministic generation of temporary keys would appear safer to me - but I would like to avoid that altogether.

- Sebastian

edit: typo

modified 26-Feb-15 4:03am.

jschell26-Feb-15 9:45

26-Feb-15 9:45

manchanx wrote:
I will use the framework I'm designing here first has a use case where usually about 1000 entities plus sub-entities are created.

You appear to be suggesting that you are going to construct several thousand valid entities in a client first before sending to the back end.

And the question would be...why?

I certainly wouldn't want to create a DAL and just assume that the client is going to send thousands of valid entries to me. That violates the primary purpose of constraints on a database in that it protects from programmer errors not user errors.

So you don't save anything in terms of validity.

You will still need to transport those thousands of entities to the back end. If you do it one at a time AND that is problem then your solution doesn't address that at all. If you are going to send them as a block then there would in fact be FEWER calls if you let the DAL handle the ids, since your DAL should be capable of recognizing dependencies (if nothing else pseudo ids in the block accomplish that.)

However thousands of calls, unless you intend to that every second, isn't a problem on any effective modern server as long as it is infrequent (of course modern servers can handle that many calls per second but doing that just to avoid batch handling would be silly.)

manchanx26-Feb-15 20:20

26-Feb-15 20:20

Hi jschell, thank you for your response!

jschell wrote:
I certainly wouldn't want to create a DAL and just assume that the client is going to send thousands of valid entries to me. That violates the primary purpose of constraints on a database in that it protects from programmer errors not user errors.

The entities are fully validated before they're sent to the DAL.

jschell wrote:
You will still need to transport those thousands of entities to the back end. If you do it one at a time AND that is problem then your solution doesn't address that at all.

No, they're sent in a batch/block.

jschell wrote:
If you are going to send them as a block then there would in fact be FEWER calls if you let the DAL handle the ids, since your DAL should be capable of recognizing dependencies (if nothing else pseudo ids in the block accomplish that.)

You mean there would be fewer calls because the client wouldn't have to request new keys from the DAL/DB before creating new entities? But the client can request more than one new key at once - if it's clear how many new entities are to be created beforehand, it's just one request, if it's not clear beforehand, it would still be considerably less than one request per key because it can just request increasingly more if it runs out of new keys.

jschell wrote:
However thousands of calls, unless you intend to that every second, isn't a problem on any effective modern server as long as it is infrequent (of course modern servers can handle that many calls per second but doing that just to avoid batch handling would be silly.)

The application will run in a variety of environments, many of which won't have a very performant server or network. So I want to design it in a way that it puts the least stress on either.

jschell27-Feb-15 10:19

27-Feb-15 10:19

manchanx wrote:
The entities are fully validated before they're sent to the DAL.

Again I would not write a DAL nor a database (relational) that relied solely on a client for validity.

manchanx wrote:
But the client can request more than one new key at once

One is more than none.

manchanx wrote:
many of which won't have a very performant server or network.

Batching it solves that problem but it doesn't explain why the client needs to provide the ids.

manchanx27-Feb-15 20:00

27-Feb-15 20:00

jschell wrote:
One is more than none.
Batching it solves that problem but it doesn't explain why the client needs to provide the ids.

I don't think I deserve your impatience with me because you could have found the explanation in another post of me in this thread:

[..] the main point why I need at least some kind of key is that I implement object-references "virtually" (don't know if there's a better term for it): Entities don't hold a direct reference to other entities but a key and on property access the key gets resolved into an object reference - that way I can easily implement lazy/implicit loading and cache expiration.

To put it into picture: I'm developing a custom ORM, mainly because of one requirement that disqualifies existing ORMs: My users need to be able to extend the model with custom tables and fields (which of course need to be "non-intrusive" on the business-logic by being nullable/optional). The first version of the application will only include desktop clients and those will be rich clients where the part of the ORM that does the record-entity mapping resides in the client. So probably you could say that I split up the DAL into tiers. This will probably clear up the following:

jschell wrote:
Again I would not write a DAL nor a database (relational) that relied solely on a client for validity.

The DAL I've been talking about is essentially that part of a DAL which you're thinking about that does the final step of saving the raw records. Any validation you would do in a DAL happens here in the first layer of the client.

So I need Id's/keys in the client because they're required to resolve references.

jschell4-Mar-15 9:00

4-Mar-15 9:00

manchanx wrote:
I don't think I deserve your impatience with me because you could have found the explanation in another post of me in this thread:..the main point why I need at least some kind of key is that I implement object-references

You do not need ids from the DAL to solve that however. All you need is a consistent implementation within each block. You can implement it with nothing more than by incrementing an integer.

Once the DAL receives it then it replaces the references with consistent database ones.

manchanx wrote:
Any validation you would do in a DAL happens here in the first layer of the client.

Sounds like you are pushing work that should be in the DAL to the client.

manchanx5-Mar-15 3:13

5-Mar-15 3:13

jschell wrote:
You do not need ids from the DAL to solve that however. All you need is a consistent implementation within each block. You can implement it with nothing more than by incrementing an integer.

Once the DAL receives it then it replaces the references with consistent database ones.

I can't see the net benefit of doing that:

1) You said in a previous reply sending ~1000 requests to the DAL isn't an issue as long as it happens rarely (which I agree with) - but why would you then worry about that single request for new id's?
2) The coding effort I'm doing for id-provision to the client I'm saving in the DAL by not having to replace temporary id's.
3) Let's take a concrete example: The application will be for library management. On the form intended to loan media to a customer there might be (for convenience) a button to invoke the use-case for adding a new customer. After adding the new customer he should automatically be selected for loaning media in the first mentioned form. When working with temporary id's the client would have to requery the customer entity by his external customer-number (or whatever). When using DAL-provided new id's, that's not neccessary.

I'm not saying my planned solution is way better than the more convential solutions but you haven't yet shown why it would be worse.

jschell6-Mar-15 10:52

6-Mar-15 10:52

manchanx wrote:
I can't see the net benefit of doing that:

As I said, when I create DALs I expect the DAL not the client to enforce restrictions. Otherwise there is little point in having an actual DAL.

manchanx wrote:
but why would you then worry about that single request for new id's?

Limiting transactions was your requirement not mine.

manchanx wrote:
2) The coding effort I'm doing for id-provision to the client I'm saving in the DAL by not having to replace temporary id's.

However, that solution requires that you now solve a different problem - how to get the ids from the database.

manchanx wrote:
3) Let's take a concrete example:

I have created many DALs in my lifetime. I was creating them before there was a term for them.

I have created several serialization protocols that required resolving references.
I have used several frameworks that used protocols that required resolving references.

So I believe I understand the problem and its ramifications.

manchanx wrote:
but you haven't yet shown why it would be worse.

As I pointed out - the DAL is then going to be relying on the client for valid data. As I said before consistency verification is something that belongs in the DAL and/or the database. Letting it out of there increases the risk that the verification will, especially over time, be wrong.

Fixing stored data that is inconsistent can be difficult to solve requiring complex programming solutions and at times is programmatically impossible to fix and requires manual intervention. (I have had to do all of that at one time or another.)

Mycroft Holmes25-Feb-15 13:21

Mycroft Holmes

25-Feb-15 13:21

Listen to Mika, let the database generate the primary keys, you may need to submit in sequence so you get the PK to be used as a foreign key on the children.

Never underestimate the power of human stupidity
RAH

manchanx25-Feb-15 21:25

25-Feb-15 21:25

Thank you for your reply Mycroft! I "embedded" my answer to you in the response to Mika.

- Sebastian

Mycroft Holmes25-Feb-15 21:49

Mycroft Holmes

25-Feb-15 21:49

You still need to write each record to the database individually, I would construct my object as parent object containing a List<> of children. At this point there is no requirement for a FK value as the parent has the children in it's List<>.

When you decide to write the data into the database you have a nested loop where the inserts the record and gets the PK value back from the database and uses that when inserting the children.

Or I am missing something Frown | :(

Never underestimate the power of human stupidity
RAH

manchanx25-Feb-15 22:18

25-Feb-15 22:18

It's not one simple parent-child-relationship - there can be several inter-relationships. But the main point why I need at least some kind of key is that I implement object-references "virtually" (don't know if there's a better term for it): Entities don't hold a direct reference to other entities but a key and on property access the key gets resolved into an object reference - that way I can easily implement lazy/implicit loading and cache expiration.

Pete O'Hanlon25-Feb-15 23:07

25-Feb-15 23:07

So why are you against using GUIDs? You can allocate them up front and save yourself all sorts of grief afterwards.

manchanx26-Feb-15 0:06

26-Feb-15 0:06

Mostly because of join performance. Querying is the main job of the application and there are a lot of related tables, which, even if they're not part of the query predicate, have to appear in the resultset.

Pete O'Hanlon26-Feb-15 0:17

26-Feb-15 0:17

Have you actually done any profiling to see if there is a hit that will be, in any way, significant using GUIDs? As these are presumably going to be indexed fields, you shouldn't be seeing much impact.

manchanx26-Feb-15 1:04

26-Feb-15 1:04

Yes I did - GUIDs (sequential) were considerably slower in many of the required queries. Through to be frank, I should repeat the profiling since the model changed quite a bit since then.

Pete O'Hanlon26-Feb-15 1:10

26-Feb-15 1:10

I'm confused. What's a sequential GUID?

What database are you using and what datatype were you storing this value in?

manchanx26-Feb-15 1:23

26-Feb-15 1:23

I'm using SQL Server 2008. Sequential GUIDs in general are GUIDs that are not completely random but created in way that their value is ascending (and thus remedying many of the drawbacks of conventional fully-random GUIDs when being used as a key in a DB) while still providing the advantage that collisions are virtually non-existent. SQL Server provides a function to generate these since version ~~2008~~ 2005: NEWSEQUENTIALID[^]

modified 26-Feb-15 7:32am.

Pete O'Hanlon26-Feb-15 2:06

26-Feb-15 2:06

I'm pretty sure that's been available since 2005 but you're using the database to generate the GUID, rather than in code (which was why I was querying what the sequential guid was as this isn't native behaviour). If you don't want to preallocate the key then you have no choice but to have your code react and post-allocate the IDs.

manchanx26-Feb-15 2:32

26-Feb-15 2:32

Pete O'Hanlon wrote:
I'm pretty sure that's been available since 2005

Correct, was a typo.

Pete O'Hanlon wrote:
but you're using the database to generate the GUID, rather than in code (which was why I was querying what the sequential guid was as this isn't native behaviour)

I actually created the sequential GUID with a custom "algorithm" in C#. I just linked to the T-SQL-Doc because I understood your comment as if you hadn't heard of sequential GUIDs at all yet Wink | ;)

Pete O'Hanlon wrote:
If you don't want to preallocate the key then you have no choice but to have your code react and post-allocate the IDs.

Well, that's the original idea of this thread: Why not provide sequence-values (Int's) from the DB/DAL to the client? To me this seems to be an easy way to avoid GUIDs and still not having to deal with temporary keys in the client-code. But I wasn't able to find any evidence that somebody has already done this so I wanted to ask here if you can spot some flaw in that idea.

Pete O'Hanlon26-Feb-15 4:25