|
Got the union working, and sum, just have trouble with the date range.
pValue = _
((From oh In context.Order_History
Where oh.OrderStatus = "COMPLETED" _
And oh.OrderDate = DbFunctions.AddDays(DateTime.Now(), 0)
Select oh.GrandTotal).Union _
(From oc In context.Order_Completed
Where oc.OrderStatus = "QUEUED_FOR_FULFILLMENT" _
Or oc.OrderStatus = "OUT_FOR_FULFILLMENT" _
Or oc.OrderStatus = "QUEUED_FOR_SHIPPING" _
Or oc.OrderStatus = "OUT_FOR_PACKAGING" _
Or oc.OrderStatus = "READY_TO_SHIP" _
And oc.OrderDate = DbFunctions.AddDays(DateTime.Now(), 0)
Select oc.GrandTotal)).Sum()
|
|
|
|
|
Well I'm closer now. I suspect that I have to redesign this Linq in order for it to work.
More of a From Sum Select then where
The cast to value type 'System.Decimal' failed because the materialized value is null. Either the result type's generic parameter or the query must use a nullable type.
Dim pValue As Nullable(Of Decimal) = 0
Using context As New OrdersContext
Dim DateStart As DateTime = DateTime.Today.AddDays(-1)
Dim DateStop As DateTime = DateTime.Today.AddDays(+1)
pValue = _
((From oh In context.Order_History
Where oh.OrderDate >= DateStart _
And oh.OrderDate <= DateStop _
And oh.OrderStatus = "COMPLETED" _
Select oh.GrandTotal).Union _
(From oc In context.Order_Completed
Where oc.OrderDate >= DateStart _
And oc.OrderDate <= DateStop _
And oc.OrderStatus = "QUEUED_FOR_FULFILLMENT" _
Or oc.OrderStatus = "OUT_FOR_FULFILLMENT" _
Or oc.OrderStatus = "QUEUED_FOR_SHIPPING" _
Or oc.OrderStatus = "OUT_FOR_PACKAGING" _
Or oc.OrderStatus = "READY_TO_SHIP"
Select oc.GrandTotal)).Sum()
End Using
Return pValue
|
|
|
|
|
Finally
Using context As New OrdersContext
pValue = _
(
From oh In context.Order_History
Where oh.OrderDate >= DateStart _
And oh.OrderDate <= DateStop _
And oh.OrderStatus = "COMPLETED" _
Select oh.GrandTotal
).Union _
(
From oc In context.Order_Completed
Where oc.OrderDate >= DateStart _
And oc.OrderDate <= DateStop _
And oc.OrderStatus = "QUEUED_FOR_FULFILLMENT" _
Or oc.OrderStatus = "OUT_FOR_FULFILLMENT" _
Or oc.OrderStatus = "QUEUED_FOR_SHIPPING" _
Or oc.OrderStatus = "OUT_FOR_PACKAGING" _
Or oc.OrderStatus = "READY_TO_SHIP"
Select oc.GrandTotal
).DefaultIfEmpty().Sum()
End Using
|
|
|
|
|
After testing with a real dataset, I was getting the wrong values.
For doing this in VB, you have to wrap the date range in (), in order to properly evaluate it.
So this produces the correct result, even if the value is 0.
I was banging my head against the wall trying to figure this one out.
Thanks to Richard Deeming for setting me straight on the DateTime.
Dim pValue As Decimal = 0
Dim baseDate As DateTime = DateTime.Today()
Dim dateStart As New DateTime(baseDate.Year, baseDate.Month, baseDate.Day, 0, 0, 0, 0)
Dim dateStop As New DateTime(baseDate.Year, baseDate.Month, baseDate.Day, 23, 59, 59, 999)
Using context As New hx5Context
pValue = _
(
From oh In context.Order_History
Where (oh.OrderDate = dateStart And oh.OrderDate <= dateStop) _
And Not (oh.OrderStatus = "ORDER_CANCELED")
Select oh.GrandTotal
).Union _
(
From oc In context.Order_Completed
Where (oc.OrderDate >= dateStart And oc.OrderDate <= dateStop) _
And
(
oc.OrderStatus = "QUEUED_FOR_FULFILLMENT" _
Or oc.OrderStatus = "OUT_FOR_FULFILLMENT" _
Or oc.OrderStatus = "QUEUED_FOR_SHIPPING" _
Or oc.OrderStatus = "OUT_FOR_PACKAGING" _
Or oc.OrderStatus = "READY_TO_SHIP" _
)
Select oc.GrandTotal
).DefaultIfEmpty().Sum()
End Using
Return pValue
|
|
|
|
|
jkirkerx wrote: Thanks to Richard Deeming for setting me straight on the DateTime. You can up vote his responses to show appreciation (up/down arrows appear on the left of the response).
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
I haven't been able to do that in years, its like they took that away from me. Seriously, I don't get the vote interface when I log in.
|
|
|
|
|
Hover your mouse pointer over a message and two arrows should appear on the left side. The green upward pointing one is an upvote. The old voting system was removed due to abuse, but now a spam/abuse vote is used as a one vote by wuck fits and trolls. No real change there.
What do you get when you cross a joke with a rhetorical question?
The metaphorical solid rear-end expulsions have impacted the metaphorical motorized bladed rotating air movement mechanism.
Do questions with multiple question marks annoy you???
|
|
|
|
|
I've seen those, never bothered to click on them.
Thanks
Guess you need to delete the message to keep it secret secret!
|
|
|
|
|
Hello friends! How is the day going? I need opinions in this case. I'm working on a social network site and as you all know there are many modules involved e.g friends, chat, messages table etc. My question is should I create database for each e.g chat db, messages db, friends db or I should create one database then create table for each of them e.g tblchat, tblmessages etc. Which of the option above is ok when managing social network?
|
|
|
|
|
While possible, I expect that any of those options will prove equally challenging to you.
|
|
|
|
|
You need to decide where one set of records needs a connection or relationship to another. If they have a relationship then it makes sense for them to be in separate tables of a single database.
|
|
|
|
|
If you are expecting extreme volumes you should investigate the storage methods specifically for that environment. Otherwise have them all reside in the same database.
If you need separation then use schemas, this will make it easier to split the database in the future if your client base grows significantly.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
As already answered the man criteria would be; Is there a relation between the data? If there is, a single database is the most logical choice.
However, there is also other things to consider such as maintenance, backups and so on. So even if the data isn't related, from the maintenance point of view it could simplify the situation if all the data is in one place. Of course if the amount of data grows huge over time you may have to separate it but in the beginning it would make sense to start small.
|
|
|
|
|
Hello friends! How is the day going? I need opinions in this case. I'm working on a social network site and as you all know there are many modules involved e.g friends, chat, messages table etc. My question is, shoulf
|
|
|
|
|
Where did your question go?
There are only 10 types of people in the world, those who understand binary and those who don't.
|
|
|
|
|
I'm using a postgresql database and managing it becomes harder and harder due to the sheer size of it.
the largest table contains about 2.5-3 billion records and there are some tables reaching about 10 million records. Database size is about 1.2 TB on disk. I believe the large table (and the tables around it) is the most problematic since there are (very roughly) 10 million write statements a day.
I have had issues with the transaction id's overflowing. Luckily by now I know the cure, but I would like to avoid this issue (and any other issue!) if possible. Especially since the database is expected to grow even further. VACUUM takes like forever on that large table.
I did a search on managing large databases, but most results are for other databases. Only found a not so helpful presentation on slideshare.
If anyone has good (practical) tips about keeping such a large database into good shape, please let me know.
(I hope I explained things well enough)
|
|
|
|
|
With that amount of data, I take it you already know enough about indexing etc so let's not go to those.
Depending what problem you're solving the technique might be different. Based on your description I'd concentrate on partitioning the large table (or tables) first. Partitioning would help you to isolate the hot spot of the table to a significantly smaller amount of rows so that changes in indexes and on disk would affect a considerable smaller amount of data.
For more information have a look at http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html[^]
As a side note, I hate to say this but I would also consider changing the database product
|
|
|
|
|
Oracle? I know there is a free Oracle version (and a free SQL-Server version), but I'm not sure they'll do the job. Paying for it is a no go here (has to be as cheap as possible), so I'm down with or MySQL or PostgreSQL and I'll take the latter any day .
Indexing, I'm not really an expert, but the indexes are limited to 2 (excluding the implicit index on the primary key). So if you have tips there, please shoot.
For this application I'm the designer, architect, tester, developer and, you guessed it, DBA despite my limited knowledge of being a 'real' dba. (there are little or no people doing better at our company so no choice there)
I'll have a look at partitioning.
[EDIT]Ooh, and thanks ! [/EDIT]
|
|
|
|
|
V. wrote: I know there is a free Oracle version (and a free SQL-Server version), but I'm not sure they'll do the job
The free one, no. It wouldn't be sufficient because of the limitations for the database size etc. So you'd have to go for the paid one...
One interesting option could be https://mariadb.org/[^]. If you have time at some point, have a look.
|
|
|
|
|
I think your strategy on indexing is probably a good one, you can kill a database with too many indexes...
As Mika suggested partitioning is going to be your best bet, but you should also look at archiving (a more drastic form of partitioning).
You should look at just what you are getting out of all those records and what use they are being put to! Is it possible to create a highly summarised reporting database.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
unfortunately all 3.5 billion records should be readily available. The reading part is not really the issue since I limit the recordset to maximum 30 000 records through the client applications and there is a downsample algorithm in place as well (it´s timeline data - (timestamp,value) pairs). It's the writing part (transactions id's, vacuum, ...) that's problematic.
Yes, my boss had some high demanding requirements
|
|
|
|
|
V. wrote: all 3.5 billion records should be readily available I think your boss has not thought through the requirements. Readily may not mean instantly.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
instantly, immediately, blazing fast, ... whatever .
But reading is pretty OK. In fact, the product for the users works like a charm. I´m just having trouble with importing data every day for this specific records set. (because it is so huge).
thanks for the advice, still simmering on it a little.
|
|
|
|
|
3.5 billion records aren't anything strange. It's the amount that changes that's the problem.
And since you also don't have any problems selecting the data, we have to look at the other side of the problem.
Firstly we have to see the fact that Insert time is close to linear[^] with the number of indexes. So keeping that number low is important.
But it's also very unlinearly depending on some other factors.
So I have some questions:
Do you have any unused surrogate keys?
Are you inserting data in parallell or from more than one process at a time?
Are you inserting data at one end of the index only? (Using ID, sequence or date for example)
Are you just inserting, or are you updating and deleting at the same time aswell?
|
|
|
|
|
I have worked with 300 million row medical datasets where analysis jobs took two days to run.
A lot can depend on how the data is modelled - sometimes a very flat table structure can help where in effect you denormalise the data - this is a very useful way of modelling the data if you are just trawling through one table. At other times more EAV style tables can speed things up.
A lot of it depends on what you are doing with the data - there is no simple answer as to how to structure/re-structure data as much of it comes down to some experience and trial and error.
One thing you may want to consider is when running queries on the data - have an initial set of queries that will extract the general population of your data that you are working on. Then build tables with indexes on this 'candidate' data and work from there. This way you may be slimming down your data from 3 billion rows to a few million rows.
You probably do not want to be running queries on a 3billion row recordset other than for the purposes of extracting your initial data which you then work on in separate tables.
This is how I did things with 300 million rows, although I realise that this is smaller by a factor of 10 compared to your dataset.
[edit] I noticed you also mention in a later post that writing is an issue.
Your limitation is going to be down to hardware. Depending on your budget splashing out an a decent SAN solution may help - although SAN technology is beyond my current level of experience, I am just someone who uses it while other people configure it.
Be sure that the disks where the database lives are not shared with the OS or anything else that may cause contention when reading/writing data.
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
modified 18-Jul-15 7:31am.
|
|
|
|