|
V. wrote: all 3.5 billion records should be readily available I think your boss has not thought through the requirements. Readily may not mean instantly.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
instantly, immediately, blazing fast, ... whatever .
But reading is pretty OK. In fact, the product for the users works like a charm. I´m just having trouble with importing data every day for this specific records set. (because it is so huge).
thanks for the advice, still simmering on it a little.
|
|
|
|
|
3.5 billion records aren't anything strange. It's the amount that changes that's the problem.
And since you also don't have any problems selecting the data, we have to look at the other side of the problem.
Firstly we have to see the fact that Insert time is close to linear[^] with the number of indexes. So keeping that number low is important.
But it's also very unlinearly depending on some other factors.
So I have some questions:
Do you have any unused surrogate keys?
Are you inserting data in parallell or from more than one process at a time?
Are you inserting data at one end of the index only? (Using ID, sequence or date for example)
Are you just inserting, or are you updating and deleting at the same time aswell?
|
|
|
|
|
I have worked with 300 million row medical datasets where analysis jobs took two days to run.
A lot can depend on how the data is modelled - sometimes a very flat table structure can help where in effect you denormalise the data - this is a very useful way of modelling the data if you are just trawling through one table. At other times more EAV style tables can speed things up.
A lot of it depends on what you are doing with the data - there is no simple answer as to how to structure/re-structure data as much of it comes down to some experience and trial and error.
One thing you may want to consider is when running queries on the data - have an initial set of queries that will extract the general population of your data that you are working on. Then build tables with indexes on this 'candidate' data and work from there. This way you may be slimming down your data from 3 billion rows to a few million rows.
You probably do not want to be running queries on a 3billion row recordset other than for the purposes of extracting your initial data which you then work on in separate tables.
This is how I did things with 300 million rows, although I realise that this is smaller by a factor of 10 compared to your dataset.
[edit] I noticed you also mention in a later post that writing is an issue.
Your limitation is going to be down to hardware. Depending on your budget splashing out an a decent SAN solution may help - although SAN technology is beyond my current level of experience, I am just someone who uses it while other people configure it.
Be sure that the disks where the database lives are not shared with the OS or anything else that may cause contention when reading/writing data.
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
modified 18-Jul-15 7:31am.
|
|
|
|
|
GuyThiebaut wrote: Be sure that the disks where the database lives are not shared with the OS or anything else that may cause contention when reading/writing data.
Cool, currently still shared, but I´m getting bigger disks this summer on another machine, so that should fix that.
I'm actually happy that it looks 'normal' to suffer with this, we have some developers here (well, doctors in mathematics and such) who react like "how could that be a problem?", which annoys the hell out of me .
Still taking in advice and letting it simmer a bit, to see if I can get a solution out of it .
thx for the advice.
|
|
|
|
|
V. wrote: how could that be a problem?", which annoys the hell out of me . Unfortunately, as you are discovering, when it comes to these sorts of issues experience wins over technical proficiency.
V. wrote: (well, doctors in mathematics and such) sometimes one way to handle people's expectations when they have a science background is to explain the science of disk access and data storage, the chances are that being scientists they will understand and value a scientific explanation, that way you can win them over to your side. I say this as someone who works in a research company where many of those in the company have PhDs, when the 'database is slow' I always hear from them.
I think many people still forget that computers have moving parts, they are machines - they are not some sort of quantum flux probability engine that returns results at the speed of light.
3 billion records is a huge dataset - even Cern has intermediary machines that filter out data on the go as experiments are run(worth checking this out on youtube), as so much data is generated that they could not possible store or analyse it all.
Good luck and be sure to run tests before and after so that you can prove that there has been a speed up.
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
|
|
|
|
|
I know how to update a single record, but updating multiple records is different.
Single
Dim uTemplate As CRM_JOBS = context.crm_Jobs.Single(Function(m) m.templateName = templateName)
uTemplate.templateXHTML = sEMT.templateXHTML
uTemplate.templateID = pValue
context.SaveChanges()
I thought of this for multiple records, but items doesn't carry over to the loop.
So would I have to project it into something and then save?
Dim jobs =
From items In context.crm_Jobs
Where items.templateName = templateName
For Each items In jobs
jobs.templateXHTML = sEMT.templateXHTML
jobs.templateID = pValue
Next
context.SaveChanges()
[edit]
I have this now
Dim jobs =
From j In context.crm_Jobs
Where j.templateName = templateName
Select j
For Each j As CRM_JOBS In jobs
j.templateXHTML = sEMT.templateXHTML
j.templateID = sEMT.templateID
Next
context.SaveChanges()
Oh that works!
modified 15-Jul-15 17:32pm.
|
|
|
|
|
Hi,
I have below code:
WHILE i < total_rows DO
SET @param_employee_number = (SELECT employee_number FROM earned_leaves LIMIT i,1);
PREPARE query_statement FROM 'CALL sp_populate_leave_summary(?)';
EXECUTE query_statement USING @param_employee_number;
SET i = i + 1;
END WHILE;
I want to save the value returned by the EXECUTE into a variable in order to use it in the next UPDATE statement.
How can I do this please?
Thanks,
Jassim[^]
Technology News @ www.JassimRahma.com
|
|
|
|
|
Taken this is MySql, you could modify the stored procedure to have an OUT parameter. In such case you could then use two variables in your execute. So something like:
PREPARE query_statement FROM 'CALL sp_populate_leave_summary(?,?)';
EXECUTE query_statement USING @param_employee_number, @another_parameter;
|
|
|
|
|
I am getting:
Procedure execution failed
1048 - Column 'earned_leave' cannot be null
although the sp_populate_leave_summary returns the correct value as I can see in the result window.
this is the code after adding an OUT:
WHILE i < total_rows DO
SET @param_employee_number = (SELECT employee_number FROM earned_leaves LIMIT i,1);
PREPARE query_statement FROM 'CALL sp_populate_leave_summary(?,?)';
EXECUTE query_statement USING @param_employee_number, @param_eligible_days;
UPDATE earned_leaves SET earned_leave = @param_eligible_days WHERE employee_number = @param_employee_number;
SET i = i + 1;
END WHILE;
my OUT is:
OUT param_eligible_days int
Technology News @ www.JassimRahma.com
|
|
|
|
|
As the error says, the return value from your procedure to variable @param_eligible_days is null. Because of this null is being updated to the table and this is prohibited because the column is not null.
So check why sp_populate_leave_summary returns NULL for the second parameter.
|
|
|
|
|
Just noting that the code posted suggests that a loop should not be used, thus the question that derives from it is moot.
SQL (as most dbs implement it) will do queries and updates on sets. The loop is implicit in it. It is just a matter of structuring it.
|
|
|
|
|
I'm trying to convert this to Linq. I didn't see a OR in Linq, so I'm figuring you can do multiple Where stateements. And on the date, I'm not sure if I should calculate it in code and jsut submit the date, or if there is a way to use Linq to do Date Calcs.
" SELECT COUNT(*) " & _
" FROM CRM_MESSAGES " & _
" WHERE " & _
" MessageStatus='MESSAGE_COMPLETE' " & _
" OR MessageStatus='MESSAGE_PENDING' " & _
" OR MessageStatus='MESSAGE_FORWARD' " & _
" AND ContactDate >=DATEADD(day, -30, getdate()) "
So Far I've got this
Dim messages As IQueryable(Of CRM_MESSAGES)
messages = messages.Where(Function(m) m.MessageStatus = "MESSAGE_COMPLETE")
messages = messages.Where(Function(m) m.MessageStatus = "MESSAGE_PENDING")
messages = messages.Where(Function(m) m.MessageStatus = "MESSAGE_FORWARD")
messages = messages.Where(Function(m) m.ContactDate >= m.ContactDate.Date.AddDays(-30))
|
|
|
|
|
Multiple Where statements is equivalent to an AND operator. For an Or operator, use VB's OrElse operator:
messages = messages.Where(Function(m) m.MessageStatus = "MESSAGE_COMPLETE" OrElse m.MessageStatus = "MESSAGE_PENDING" OrElse m.MessageStatus = "MESSAGE_FORWARD")
Your date comparison won't do anything - you're asking for all messages where the contact date is later than a date 30 days before the contact date, which is all of them. Use Date.Today.AddDays(-30) instead:
messages = messages.Where(Function(m) m.ContactDate >= Date.Today.AddDays(-30))
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
I would of never figured that out on the multiple where statements.
And the Date bombed, so I tried DateTime.Now.AddDays(-30), but changed to your example before testing it.
I wonder if this will now work without using IQueryable? I'll give it a spin. It didn't work
LINQ to Entities does not recognize the method 'System.DateTime AddDays(Double)' method, and this method cannot be translated into a store expression.
I'll go back to IQueryable and give it a try.
Dim pValue As Integer = 0
Using context As New CRMContext()
pValue = _
(From item In context.CRM_MESSAGES
Where item.MessageStatus = "MESSAGE_COMPLETE" _
OrElse item.MessageStatus = "MESSAGE_PENDING" _
OrElse item.MessageStatus = "MESSAGE_FORWARD" _
And item.ContactDate >= Date.Today.AddDays(-30)).Count()
End Using
Return pValue
|
|
|
|
|
DBFunctions for EF 6.0, Thanks!
Imports System.Data.Entity
Dim pValue As Integer = 0
Using context As New CRMContext()
pValue = _
(From item In context.CRM_MESSAGES
Where item.MessageStatus = "MESSAGE_COMPLETE" _
OrElse item.MessageStatus = "MESSAGE_PENDING" _
OrElse item.MessageStatus = "MESSAGE_FORWARD" _
And item.ContactDate >= DbFunctions.AddDays(DateTime.Now(), -30)).Count()
End Using
Return pValue
|
|
|
|
|
jkirkerx wrote: LINQ to Entities does not recognize the method 'System.DateTime AddDays(Double)' method, and this method cannot be translated into a store expression.
You might need to move the date outside of the filter:
Dim minDate As Date = Date.Today.AddDays(-30)
messages = messages.Where(Function(m) m.ContactDate >= minDate)
Alternatively, use the EntityFramework.SqlServer assembly, and the System.Data.Entity.SqlServer.SqlFunctions class[^]:
messages = messages.Where(Function(m) m.ContactDate >= SqlFunctions.DateAdd("day", -30, SqlFunctions.GetDate()))
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
So I thought I was getting this down now.
In my original code, I used 1 function to get the name, and another function to get the record results.
So I rewrote it Linq, But I'm not quite getting how it works.
So the first query called contributor returns a result that looks like SQL sort of, and says to expand to see the results.
The 2nd query does the same thing, and the pValue for the record count is non zero, and triggers an exception of non zero.
So I'm missing something on the concept here, and can't figure it out.
I shortened the sample for easier reading.
Public Shared Function load_movieContributor_array( _
ByVal pMovieID As Integer,
ByVal pPageIndex As Integer,
ByVal pPageSize As Integer,
ByRef pResults As IEnumerable(Of MovieIndex)) As Integer
Dim pValue As Integer = 0
Dim context As New MoviesContext()
Dim contributor = _
(From m In context.Movies
Where m.MovieID = pMovieID
Select m.MovieContributor)
Dim movies As IQueryable(Of Movies) = context.Movies
movies = movies.Where(Function(m) m.MovieContributor = contributor)
movies = movies.OrderBy(Function(m) m.MovieName)
'Get a Single Page of Data
If (pPageIndex = 1) Then
movies = movies.Take(pPageSize)
Else
movies = movies.Skip((pPageIndex - 1) * pPageSize).Take(pPageSize)
End If
pResults = _
movies.AsEnumerable().Select(Function(item) New MovieIndex With
{
.MovieID = item.MovieID,
'More columns .....
})
pValue = pResults.Count()
Return pValue
End Function
|
|
|
|
|
Public Shared Function load_movieContributor_array( _
ByVal pMovieID As Integer,
ByVal pPageIndex As Integer,
ByVal pPageSize As Integer,
ByRef pResults As IEnumerable(Of MovieIndex)) As Integer
Dim pValue As Integer = 0
Dim context As New MoviesContext()
Dim cResults = _
(From m In context.Movies
Where m.MovieID = pMovieID
Select m.MovieContributor).SingleOrDefault()
Dim m_contributor As String = cResults
Dim movies As IQueryable(Of Movies) = context.Movies
movies = movies.Where(Function(m) m.MovieContributor = m_contributor)
movies = movies.OrderBy(Function(m) m.MovieName)
'Get a Single Page of Data
If (pPageIndex = 0) Then
movies = movies.Take(pPageSize + 1)
Else
movies = movies.Skip((pPageIndex + 1) * pPageSize).Take(pPageSize)
End If
|
|
|
|
|
Hi,
I have a question regarding SQL queries.
In my select statement in my stored procedure I will receive a table of information like this:
UserId
------
2
4
5
After this select statement I'd like to add a insert statement to add records For each of these UserIds (as UserIdFrom) with the other ones (as UserIdTo)
I want to Insert into a table information like following:
UserIdFrom UserIdTo
---------- --------
2 4
2 5
4 2
4 5
5 2
5 4
I would be grateful if someone help me Smile |
SELECT UserId FROM Users
INSERT INTO myTable (UserIdFrom, UserIdTo) VALUES (@UserIdFrom, @UserIdTo)
|
|
|
|
|
Try something like this:
INSERT INTO myTable
(
UserIdFrom,
UserIdTo
)
SELECT
F.UserId,
T.UserId
FROM
Users As F
CROSS JOIN Users As T
WHERE
T.UserId != F.UserId
;
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
thank u it helped me a lot!
|
|
|
|
|
So I got my experimental code in VB working now, and a basic query.
So I'm looking at my original code, in which I sort of pieced together a query.
e.g.
Dim query As String = _
"SELECT * FROM MOVIEINFO "
Select Case p_Type
Case "FLV"
query += " WHERE flv = 1 "
Case "H264"
query += " WHERE h264 = 1 "
End Select
Select Case p_Sort
Case "NAME"
query += " ORDER BY MovieName "
Case "TYPE"
query += " ORDER BY MovieType "
End Select
Now I'm seeing if I can do the same thing in Linq
So this is my simple query. I'm not quite sure how to sort of do the same thing in the sample below.
I'm aware that I can load all the records into pResults, and then do another query to further isolate the results, but that doesn't seem efficient. I really just want to get the records of type, and a start and stop range. In other words, this is a indexed page of movies, and you click the page number to show those results. I already have the start and stop values.
Any input that is sound would be appreciated, Thanks
pResults =
(From item In context.Movies
Where item.flv = True
Order By item.MovieName
Select New MovieItem With
{
.MovieID = item.MovieID,
.MovieName = item.MovieName,
.MovieType = item.MovieType,
.MovieDeliveryType = item.MovieDeliveryType,
.MoviePath = item.MoviePath,
.MovieUrl = item.MovieURL,
.MoviePostage = item.MoviePostage,
.MovieThumbnail = item.MovieThumbnail,
.MovieLarge = item.MovieLarge,
.MoviePlaceHolder = item.MoviePlaceHolder,
.MovieBitRate = item.MovieBitRate,
.MovieTime = item.MovieTime,
.MovieSize = item.MovieSize,
.MovieDescription = item.MovieDescription,
.MovieCount = item.MovieCount,
.MovieContributor = item.MovieContributor,
.MoviePostDate = item.MoviePostDate,
.MovieIcon = item.MovieIcon,
.MovieParameters = item.movie_Parameters
}).AsEnumerable
|
|
|
|
|
Something like this should work:
Dim movies As IQueryable(Of Movie) = context.Movies
Select Case p_Type
Case "FLV"
movies = movies.Where(Function(m) m.flv)
Case "H264"
movies = movies.Where(Function(m) m.h264)
End Select
Select Case p_Sort
Case "NAME"
movies = movies.OrderBy(Function(m) m.MovieName)
Case "TYPE"
movies = movies.OrderBy(Function(m) m.MovieType)
End Select
movies = movies.Skip(pageIndex * pageSize).Take(pageSize)
pResults = movies.AsEnumerable().Select(Function(item) New MovieItem With
{
.MovieID = item.MovieID,
.MovieName = item.MovieName,
.MovieType = item.MovieType,
.MovieDeliveryType = item.MovieDeliveryType,
.MoviePath = item.MoviePath,
.MovieUrl = item.MovieURL,
.MoviePostage = item.MoviePostage,
.MovieThumbnail = item.MovieThumbnail,
.MovieLarge = item.MovieLarge,
.MoviePlaceHolder = item.MoviePlaceHolder,
.MovieBitRate = item.MovieBitRate,
.MovieTime = item.MovieTime,
.MovieSize = item.MovieSize,
.MovieDescription = item.MovieDescription,
.MovieCount = item.MovieCount,
.MovieContributor = item.MovieContributor,
.MoviePostDate = item.MoviePostDate,
.MovieIcon = item.MovieIcon,
.MovieParameters = item.movie_Parameters
})
Assuming MovieItem isn't an entity type, you'll need the AsEnumerable call before the Select call.
Everything prior to the AsEnumerable call will be translated into a SQL query.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
That's Radical Man!
It works like a charm. I figured out that Movies was my Model, and imported the Namespace for it.
So I did some tweaks to get the record selection right, records displayed.
I'm still stuck on using the context, instead of Dimensioning it, but the base of what I project into gets disposed before I can generate the HTML. So that's another subject.
This SQL Linq is quite interesting, and seems to offer everything hand typed queries offer. It's fast.
I'm stoked on my Data Access Layer DLL, Seems to be working. I know the model works now.
If I can convert my asp.net webforms app program over to this, and be able to automatically create a database, and seed it with default values, and then if I change the database in one spot, and I can propagate the changes throughout the program, then I'm done.
All I have to do is adopt the BootStrap CSS Concept used in MVC, and I can sell this thing finally.
Thanks for taking the time to school me on this. I owe you the price of tuition for it. - Street Outlaws
And Here's what I got.
Public Shared Function load_index_array( _
ByVal pType As MovieType,
ByVal pSortOrder As SortOrder,
ByVal pPageIndex As Integer,
ByVal pPageSize As Integer,
ByRef pResults As IEnumerable(Of MovieItem)) As Integer
Dim pValue As Integer = 0
Dim context As New MoviesContext()
Dim movies As IQueryable(Of Movies) = context.Movies
'Sort Order
Select Case pSortOrder
Case SortOrder.Name
movies = movies.OrderBy(Function(m) m.MovieName)
Case SortOrder.Type
movies = movies.OrderBy(Function(m) m.MovieType)
Case SortOrder.Time
movies = movies.OrderBy(Function(m) m.MovieTime)
Case SortOrder.Lastest
movies = movies.OrderByDescending(Function(m) m.MoviePostDate)
Case SortOrder.Early
movies = movies.OrderBy(Function(m) m.MoviePostDate)
End Select
'Filter By Type
Select Case pType
Case MovieType.FLV
movies = movies.Where(Function(m) m.flv)
Case MovieType.H264
movies = movies.Where(Function(m) m.h264)
End Select
'Get a Single Page of Data
If (pPageIndex = 1) Then
movies = movies.Take(pPageSize)
Else
movies = movies.Skip((pPageIndex - 1) * pPageSize).Take(pPageSize)
End If
pResults =
movies.AsEnumerable().Select(Function(item) New MovieItem With
{
.MovieID = item.MovieID,
.MovieName = item.MovieName,
.MovieType = item.MovieType,
.MovieDeliveryType = item.MovieDeliveryType,
.MoviePath = item.MoviePath,
.MovieUrl = item.MovieURL,
.MoviePostage = item.MoviePostage,
.MovieThumbnail = item.MovieThumbnail,
.MovieLarge = item.MovieLarge,
.MoviePlaceHolder = item.MoviePlaceHolder,
.MovieBitRate = item.MovieBitRate,
.MovieTime = item.MovieTime,
.MovieSize = item.MovieSize,
.MovieDescription = item.MovieDescription,
.MovieCount = item.MovieCount,
.MovieContributor = item.MovieContributor,
.MoviePostDate = item.MoviePostDate,
.MovieIcon = item.MovieIcon,
.MovieParameters = item.movie_Parameters
}).AsEnumerable
pValue = pResults.Count()
Return pValue
End Function
|
|
|
|
|