|
Apologies for the shouting but this is important.
When answering a question please:
- Read the question carefully
- Understand that English isn't everyone's first language so be lenient of bad spelling and grammar
- If a question is poorly phrased then either ask for clarification, ignore it, or mark it down. Insults are not welcome
- If the question is inappropriate then click the 'vote to remove message' button
Insults, slap-downs and sarcasm aren't welcome. Let's work to help developers, not make them feel stupid.
cheers,
Chris Maunder
The Code Project Co-founder
Microsoft C++ MVP
|
|
|
|
|
For those new to message boards please try to follow a few simple rules when posting your question.- Choose the correct forum for your message. Posting a VB.NET question in the C++ forum will end in tears.
- Be specific! Don't ask "can someone send me the code to create an application that does 'X'. Pinpoint exactly what it is you need help with.
- Keep the subject line brief, but descriptive. eg "File Serialization problem"
- Keep the question as brief as possible. If you have to include code, include the smallest snippet of code you can.
- Be careful when including code that you haven't made a typo. Typing mistakes can become the focal point instead of the actual question you asked.
- Do not remove or empty a message if others have replied. Keep the thread intact and available for others to search and read. If your problem was answered then edit your message and add "[Solved]" to the subject line of the original post, and cast an approval vote to the one or several answers that really helped you.
- If you are posting source code with your question, place it inside <pre></pre> tags. We advise you also check the "Encode "<" (and other HTML) characters when pasting" checkbox before pasting anything inside the PRE block, and make sure "Use HTML in this post" check box is checked.
- Be courteous and DON'T SHOUT. Everyone here helps because they enjoy helping others, not because it's their job.
- Please do not post links to your question into an unrelated forum such as the lounge. It will be deleted. Likewise, do not post the same question in more than one forum.
- Do not be abusive, offensive, inappropriate or harass anyone on the boards. Doing so will get you kicked off and banned. Play nice.
- If you have a school or university assignment, assume that your teacher or lecturer is also reading these forums.
- No advertising or soliciting.
- We reserve the right to move your posts to a more appropriate forum or to delete anything deemed inappropriate or illegal.
cheers,
Chris Maunder
The Code Project Co-founder
Microsoft C++ MVP
|
|
|
|
|
I develop trading systems for stocks and other securities. To manage price data from various sources, I use a custom program written in VB.Net. This program maintains a LiteDB database for each underlying, resulting in several dozen database files.
I've found that LiteDB is very slow when inserting large amounts of data, and since I import a significant amount of data, I'm looking for a more efficient solution. Currently, I'm considering switching to a relational DB system like SQLite or MS SQL Server.
My data model includes a class for price data from CSV files, which contains the following fields:
"Identifier" (text)
"TimeStamp" (date/time)
"Open", "High", "Low", "Close" (single precision floating point numbers)
"Size" (integer)
The issue is that the identifier is identical for several hundred thousand data points, and the timestamps and price values often repeat. Since this data is distributed across various instruments, duplicate entries occur.
One possible solution would be to distribute this data class across multiple tables to avoid duplicate entries. However, I'm concerned that this may cause performance issues when inserting several hundred thousand data points.
Has anyone had similar experiences or advice to share?
Thanks in advance!
|
|
|
|
|
Are there no other NoSQL databases you can try besides LiteDB? If you go the relational route and attempt to normalize your flattened data being inserted, it's not going to get faster. Not sure if your important is mean to be realtime or not. So, this is just general-purpose ideas...
These would be your options:
1) Are you using connnection pooling or opening an new connection with every insert? Perhaps that's the bottleneck. LiteDB may not have the concept of connections at all, but if it does that's the first place to check.
2) Can you toss in more threads to this import process? Will LiteDB even handle concurrency or will it choke?
3) Determine why your current LiteDB is choking. Is the bottleneck in your code or the DB? Is there a locked transaction not working? Is it thread safe? Are you using more than one thread getting locked? etc.
4) If the above doesn't work, find a different NoSQL DB that doesn't choke. Get a real one and not one shipped as a DLL once meant for concurrency as it'll have the best throughput even on a single user. I've used MongoDB, I'm sure there are others.
5) If none of that works, then go download MariaDB (MySQL fork), but make sure this import table is using ISAM storage. SQL Server will not be anywhere as fast as this as it doesn't allow you to choose storage engines. This will be for unnormalized data only that supports nothing fancy like triggers, constraints, and foreign keys, but ISAM is fast preciously for that reason. And, you can always have another import process/ETL transform the data if needed in non-realtime.
6) Since it's just an import, the fastest (but last resort) solution would be to just write out the data into an appended binary file. You'd still need a process to import it into an actual DB, but that can be offloaded so the original process isn't bottlenecked.
Rene Rose 2021 wrote: One possible solution would be to distribute this data class across multiple tables to avoid duplicate entries. However, I'm concerned that this may cause performance issues when inserting several hundred thousand data points. Duplication isn't an issue as long as your tables aren't indexed... for writing. Most NoSQL databases don't index, so chances are you're good. Reading is a different story however.
Jeremy Falcon
|
|
|
|
|
Cutting to a chase, both SQLite (always) and MS SQL Server (for development, certainly) are free so assuming that you've got your experience writing SQL in LiteDB and all your code ... it should be rather easy to convert to either one of these, and without much ado, find out for yourself whether by using them they solve the problems you have.
modified 3 days ago.
|
|
|
|
|
Run an external program C# from within the Oracle Developer and receive the return values from the external program
|
|
|
|
|
|
Greetings,
How do I ensure that the following code only queries records that are submitted between March and May 2024?
If a record has been submitted between the above date range, display 1. Otherwise, display 0.
Currently, we have a code that does this but for entire year.
In the code below, I left the code that performs this check with alias of thisYear.
I left this line of code but commented it out just to show what we have that works except this time, we just want this check to be between March and May 2024.
The code:
CASE WHEN d.dateCreated BETWEEN DATEFROMPARTS(2024, 3, 1) AND DATEFROMPARTS(2024, 5 + 1, 1) THEN 1 ELSE 0 END as thisYear
appears to work sometimes but does not work other times. No errors but wrong results.
That code above, temporarily replaces this line of code below as described above.
--CASE WHEN YEAR(d.dateCreated) = YEAR(getdate()) THEN 1 ELSE 0 END as thisYear
Here is the entire code:
SELECT e.Name, e.email, e.emptitle, d.dateCreated,
CASE WHEN YEAR(d.dateSubmitted) < YEAR(getdate()) THEN 1 ELSE 0 END as previousYear,
--CASE WHEN YEAR(d.dateSubmitted) = YEAR(getdate()) THEN 1 ELSE 0 END as thisYear
CASE WHEN d.dateSubmitted BETWEEN DATEFROMPARTS(2024, 3, 1) AND DATEFROMPARTS(2024, 5 + 1, 1) THEN 1 ELSE 0 END as thisYear
FROM Employees e
INNER JOIN dateDetails d on e.employeeID = d.employeeID
WHERE e.employeeID = someID
I have also tried this:
CASE WHEN d.dateSubmitted >= DATEFROMPARTS(2024, 3, 1) AND d.dateSubmitted < DATEFROMPARTS(2024,5 + 1, 1)
Same inconsistent result.
I guess my question is why does this work perfectly:
CASE WHEN YEAR(d.dateSubmitted) = YEAR(getdate()) THEN 1 ELSE 0 END as thisYear
but the date range does not work well?
Thanks in advance for your help.
|
|
|
|
|
You didn't say what "inconsistent" means.
Additionally what is the data type of 'dateSubmitted'?
|
|
|
|
|
You could have easily told me if my code was right or wrong.
Anyway, I have resolved it.
I was actually coming here to delete the thread.
|
|
|
|
|
You'd be better share your solution.
|
|
|
|
|
samflex wrote: I was actually coming here to delete the thread. Please don't. Removing some part and leaving an orphaned thread makes it difficult to know what the original question was. And many of the replies do not make sense. Just update your question with the solution you found and add the word "Solved" to the title.
|
|
|
|
|
I wasn't going to delete the thread once someone had responded to it.
Here is what ultimately worked for me.
SELECT e.Name, e.email, e.emptitle, d.dateSubmitted,
CASE WHEN YEAR(d.dateSubmitted) < YEAR(getdate()) THEN 1 ELSE 0 END as previousYear,
CASE WHEN d.dateSubmitted >= '20240301' AND d.dateSubmitted < '20240501' THEN 1 ELSE 0 END as thisYear
FROM Employees e
INNER JOIN dateDetails d on e.employeeID = d.employeeID
WHERE e.employeeID = someID
|
|
|
|
|
samflex wrote: me if my code was right or wrong.
I wasn't sure without the information I asked for.
However I can recognize that mixing date/time/stamp can be problematic. As well as not accounting for what time zone means both for programming and display.
|
|
|
|
|
Let's say you have a table with an index on the field "username".
Is it true that eventually you'll have some really time-consuming inserts when the hash table has to resize itself?
Hash table resize is O(n), I think.
Thanks.
|
|
|
|
|
INSERT is costly if there's a lot of indexes. It is not that much about resizing, there's usually a lot of free spare space in a DB and hardly anything needs resizing.
If you can come up with a better strategy than indexes, I'd love to hear it.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
|
I was looking at the documentation for CREATE INDEX in SQL:
https://www.w3schools.com/sql/sql_create_index.asp
CREATE INDEX index_name
ON table_name (column1, column2, ...);
I don't understand when you would create it for multiple columns. Let's say you did this:
CREATE INDEX idx_pname ON Persons (LastName, FirstName);
Does that create one index on two columns? Or, is that the same as creating two indexes?
For instance, does that statement create two hashtables? Or one hashtable?
Thank you.
|
|
|
|
|
|
One index. First column is most significant, second is then used to reduce it further when doing searches.
I don't think you should consider it as a 'hash table'. At a minimum I doubt any relational database uses a hash, in general, to fully implement an index. I suspect hash tables are not an optimal solution for file based storage. A quick google suggests SQL Server specifically supports that type but for a memory table (so not a normal table.)
I don't want to do the deep dive research to actually figure out what they might use for db indexes. But you can read up on what Clustered Indexes are.
Although not directly related the one example I know without research is the comparison of doing sorting. Binary Sort is for memory. Heap Sort is for file based processing.
|
|
|
|
|
Copy/paste from the answer to the question that was posted before yours;
Quote: That means the index is sorted on col1, then col2. Meaning, col1 will be located first, then most rows following with the same col1 will be sorted on col2. Meaning, in a list of users, while random stored in the table, would have an index that orders it as such;
Lastname, Firstname
Doe, Jane
Doe, John
Vluggen, Albert
Vluggen, Eddy
Vluggen, Zack
If you search it, you go "where lastname = bla (col1 from index) AND firstname = bla2 (col2 from index)".
This way, it will first locate the segment for "Vluggen", and only has to scan until it finds "Eddy" within that segment. That's what the DB is optimized to do.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Let's say you have a SQL database with a table called USERS.
Let's say it has 500,000 users in it.
Let's say you do this:
SELECT PASSWORD FROM USERS WHERE USER='bill.gates';
Let's say there's no indices on the table.
Does the database just do a linear search thru each user?
Thank you.
|
|
|
|
|
Let's say there are books available on SQL, and let's say there are search engines on the internet.
|
|
|
|
|
Yes. It will scan segments and it breaks once all predicates are met. That's why having a PK with a clustered index is so very nice to have. Who comes up with the idea of a non-indexed table?
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
In data base talk it is known as a 'table scan'.
Most databases have a way which allows you to see what it is doing. Oracle, SQL Server and MySQL IDEs all have a way to see this clearly. The nomenclature used in the output is hard to read but practice makes that easier.
|
|
|
|
|