|
Here is interesting question:
I have ~1000 objects each of which is described by ~60mln records.
I need the best read performance in the following condition:
- select ~5000 records (by the key) for each object.
Which will perform faster: if I have all of these objects inside one table (~60bln records)
and run one SELECT from that table, or if I have each object inside it's own table and run ~1000 selects (for each table).
Database is either MS SQL, PostgreSQL or MySQL.
I don't care about write performance.
Database choice as well as layout will depend on the answer to this question. It'll take quite some time to model the situation, so I wanted to see if anyone has reasonable input.
Thank you!
|
|
|
|
|
There is insufficient information to give any reasonable answer. Anything would be pure guess work. You are best prototyping your situation and seeing which comes out top.
Kosta Cherry wrote: select ~5000 records (by the key) for each object.
What do you mean by that? Do you mean that you want to pass the individual key for 5000 rows to the database, or you have a range (min/max) of key values that results in 5000 rows being returned?
Kosta Cherry wrote: I have each object inside it's own table
What do you mean by "object"?
Kosta Cherry wrote: I don't care about write performance.
Have you considered using indexes rather than splitting up tables? (indexes slow down writes but can improve read performance if used correctly)
Kosta Cherry wrote: It'll take quite some time to model the situation
From what you've indicated so far there is only really one logical table, but you perhaps want to split it up into 1000 physical tables. If the situation can be modeled by only one logical table, what takes up the time in modelling?
|
|
|
|
|
Colin Angus Mackay wrote: here is insufficient information to give any reasonable answer. Anything would be pure guess work. You are best prototyping your situation and seeing which comes out top.
Kosta Cherry wrote:
select ~5000 records (by the key) for each object.
I mean SQL like this:
select * from mytable where ((timekey > X and timekey < Y) AND/OR some other conditions, but they all involve indexed columns only)
The result brings up ~5000 (or so) rows.
If all records are within several tables, I will run that query ~1000 times.
If all records are within one table, I will add into WHERE additional condition like "objectID in (select objectID from other table where ....)", or may be just a join, or whatever other condition.
The problem with modeling will be that selection from many tables will be done from multithreaded application, and I'm not sure how servers will react compared with single-threaded approach when all sits in one table and retrieved with one query; plus, those DB servers should be properly tuned for one approach or another. All of this involves a lot of modeling.
I'm kind of leaning towards the "many tables" approach from the point of maintainability (where I can have each table on different table space or even different server).
Colin Angus Mackay wrote: Kosta Cherry wrote:
I have each object inside it's own table
What do you mean by "object"?
"Object" here is just named collection of millions of similar records - like, for example, file is collection of bytes, or picture is collection of pixels.
Colin Angus Mackay wrote: Kosta Cherry wrote:
I don't care about write performance.
Have you considered using indexes rather than splitting up tables? (indexes slow down writes but can improve read performance if used correctly)
Of course I did. I just don't know which way it'll work faster. On one hand, having all records in one table increases Btree+ index depth by 1-2 levels (depending on number of keys in the leaf), which increases seek time; on another hand, having records in different tables requires additional time for parsing (1000 queries instead of 1); so it's a hard to say what will be faster in the end.
|
|
|
|
|
Kosta Cherry wrote: The problem with modeling will be that selection from many tables will be done from multithreaded application, and I'm not sure how servers will react compared with single-threaded approach when all sits in one table and retrieved with one query
A good database server will be expecting to get many queries at the same time. The only restriction is, as I recall, that you can't put down more than one query simultaneously on a single connection. So you have to open up a new connection on each thread. When you complete a query you will obviously close the connection which returns it to the pool and another thread can potentially use it.
Kosta Cherry wrote: those DB servers should be properly tuned for one approach or another.
Well, to an extent. I would guess that database servers are already tuned to having multiple queries run against it. It is the tuning of the queries to avoid things like deadlocks that you need to be concerned about. If all the queries are effectively read only then I don't see any problems in that area. The problems come when you have queries writing to the database. Since you already said write performance isn't an issue, I'm guessing that insertions and updates don't happen frequently, or happen in batches outside of normal hours.
Kosta Cherry wrote: I'm kind of leaning towards the "many tables" approach from the point of maintainability (where I can have each table on different table space or even different server).
Personally, I would have thought the many-tables (especially if they are all the same structure) approach would be a maintenance nightmare.
Also, I think you are possibly over-optimising at this stage.
Kosta Cherry wrote: "Object" here is just named collection of millions of similar records - like, for example, file is collection of bytes, or picture is collection of pixels.
Well, if all the rows are using the same structure, are they not ALL similar? Differences by a column value (or small number of column values) isn't introducing dissimilartity as far as I can see.
Kosta Cherry wrote: Of course I did. I just don't know which way it'll work faster.
In my experience you never fully know until it is running. I've implemented a system one way and optimised it. I thought I'd learned the lessons of that optimisation and attemted to apply it somewhere else, but it didn't work and some other optimisation worked better.
If you have enough data to build a prototype I would recommend doing that and experimenting. For reference, the largest database I worked on was growing at 1Gb per week. I don't recall how many rows that was, but the largest table was itself growing in the region of a million rows per day.
From that experience I found that the main bottleneck was pulling all those rows off the disk. Queries would frequently be pulling 10s, if not 100s, of millions of rows at once off the disk. In which case what you should be looking at is ensuring the disks are fast enough. A RAID set or SAN would improve performance in that area more than anything you can do.
I know this doesn't really answer your questions. It really just throws up more things to think about. But like I said earlier any answer you are going to get here will be pure guess work.
One additional thing to think about. Are any of these rows inactive? By that I mean do you have an active set of data that is used daily and another set that is only used occasionally (e.g. by reporting or archive systems)? If you do, you could optimise it along those lines and reduce the number of rows that you have to deal with on a daily basis.
|
|
|
|
|
What is the OP even asking? I got lost part way through it
"I guess it's what separates the professionals from the drag and drop, girly wirly, namby pamby, wishy washy, can't code for crap types." - Pete O'Hanlon
|
|
|
|
|
I have searched hi and low for my answer with no success. The dataadapter design gives you a way to create multiple queries. (Fill, Fillby, Update, etc.) But, I don't wan to use the designer. I want to code it all. I have the fill. That was easy. How do I create a query method using code.
Reason:
In one part on my program I am getting data from a database and binding it to a datagrid. At one point I want only certain records to show. At other times I want all records to show.
I want to be able to code like this:
if certainFlag = "true" then<br />
adapter.fill(datatable)<br />
else<br />
adapter.allfill(datatable)<br />
endif
How would I code the allfill query?
|
|
|
|
|
there is no such option in "adapter.allfill", check once.
|
|
|
|
|
Are you saying that I can't create a query using code? I can do it using the designer. I created one called insertRecord. I want to create one NOT using the designer but using code.
|
|
|
|
|
Is there anyway to Pad Left a string in SQL for MS Access? Or to retrieve a numeric field with zeros in any empty place markers? For example, I have a numeric field of size (3, 0) and if it contains the number 1 I want it to return 001. I've searched but not found anything for MS Access that seems to work.
Any help/advice would be appreciated.
|
|
|
|
|
Try:
Right('000' & MyNumberField, 3)
Regards
Andy
|
|
|
|
|
Thank you thank you thank you!
I've been circling around and around searching for the past few hours.
|
|
|
|
|
That's a very neat way of doing it.
You always pass failure on the way to success.
|
|
|
|
|
Hi
Within an update trigger I am trying to get the value of the primary key of the row that fired the trigger. I can't select the primary key using any of the other columns either since the other columns may be duplicated in rows other than the one that fired the trigger.
Thanks
There are 10 types of people in the world, those who understand binary and those who dont.
|
|
|
|
|
If you are using Sql Server, you can select the value from the deleted table in your trigger. Trust me - it sounds strange, but it does work.
|
|
|
|
|
Thanks,
That worked
Cheers
There are 10 types of people in the world, those who understand binary and those who dont.
|
|
|
|
|
Does anybody know if it is possible to change the name of the column in the subscribers table when using Snapshot Replication?
|
|
|
|
|
Don't know. Have you tried it?
"I guess it's what separates the professionals from the drag and drop, girly wirly, namby pamby, wishy washy, can't code for crap types." - Pete O'Hanlon
|
|
|
|
|
I'm trying to replicate a database from SQL Server 2005 to SQL Server 2000, while creating a new Subscription when I try to add a subscriber server the following error message is displayed.
TITLE: Connect to Server
ADDITIONAL INFORMATION:
Failed to connect to server BERN. (Microsoft.SqlServer.ConnectionInfo)
An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) (Microsoft SQL Server, Error: 53)
I did Googleing but I could not find a suitable solution, please help me out. I will be grateful to you.
Sami.
|
|
|
|
|
Are you able to connect to the remote server using management studio? (by openning a new database query window)
|
|
|
|
|
Hi,
My backend is MSAccess. I have a table which contains a field 'return date' which is DateTime. I want to subtract the return date from the current date in the query.This should be done only between dates and not time.
|
|
|
|
|
Hi,
You can do this in the following manner.
---------------------------------------------------------------------------
BEGIN CODE
Dim d1, d2 As Date
Dim diff As Long
d1 = Date.Parse(CurrentDate)
d2 = Date.Parse(ReturnDate.Text)
diff = DateDiff(DateInterval.Day, d1, d2)
END CODE
---------------------------------------------------------------------------
This code will return the days only.
I hope this helps .
Regards,
John Adams
ComponentOne LLC
|
|
|
|
|
i google the net but i did't fined any simple project about how to save an image in sql 2000 using C# language.
Plz just how can i comver the image file to byte string.
Help me out just put a link or small chunk of code.
|
|
|
|
|
Naveed khan nido wrote: i google the net but i did't fined any simple project about how to save an image in sql 2000 using C# language.
I also did a search and found the following results[^] in about 5 seconds.
What were you searching for?
Paul Marfleet
"No, his mind is not for rent
To any God or government"
Tom Sawyer - Rush
|
|
|
|
|
Actually i got the results but that are difficult in sql, i just want to save only pic not much information about it.
I just need simple code.
|
|
|
|
|
Naveed khan nido wrote: I just need simple code.
One of the skills a developer needs to possess is the ability to undertake research to find the solution to a problem. It is unlikely that you will succeed in your chosen career if you ask people to do your work for you, despite having been given advice and pointed towards sample code.
Read the articles I suggested again, attempt to solve the problem yourself using the samples provided and post again if you have any further questions. If you want someone to do your work for you, I'm afraid you've come to the wrong place.
Paul Marfleet
"No, his mind is not for rent
To any God or government"
Tom Sawyer - Rush
|
|
|
|