|
I concur. I developed a system for a hospital that recorded all activity from around 70 cameras 24/7 and it never went above needing about 800GB. Certainly less than a terabyte, never mind petabytes!
- I would love to change the world, but they won’t give me the source code.
|
|
|
|
|
Is the application 3rd party? If so, ask them for recommendations on how to balance workload across machines. Most applications that have high processing requirements will support this kind of shared workload scenario.
If it's in house, then the developers should have a pretty big say in the best way to maximise performance.
Any serious amounts of image processing are best done across multiple machines, but requires the application to support it.
Storage is a separate issue, so design it as so, you only really need to make sure the connection between processing machine(s) and storage machine(s) is fast enough to keep up. Other than that, it's 2 separate requirements.
|
|
|
|
|
musefan wrote: Is the application 3rd party?
The client is a subsidiary to one of the top Oil & Gas industry. They want to work with us for building the application. They've hired people from AMD on their side. I guess this is just for the hardware department. & They also own the AI/ML teams. We are just focusing on the application that collects data.
Now most probably, as I've updated on my OP, the data seems to be fairly huge. But the intent of the contact person from this company looks to be testing our capacity. He's watching if we'd run away looking at the scale of the application. We did not run, because we don't know what it means to handle Petabytes of data.
|
|
|
|
|
Considering the data requirements you need I suggest the following storage system:
1 - transport layer[^]
2 - storage[^] (note hack-proof encryption in progress)
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
I have the same system in place for work emails, and can confirm it is very effective
|
|
|
|
|
Load test it in the cloud, then buy a server with 2x the capacity of the cloud one to cover additional workload growth.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
The CERN experiments produce 1PB/second of data, which is reduced to 1PB/day for storage (CERN Data Centre passes the 200-petabyte milestone | CERN). This allows them to store the "interesting" results out of 1 billion collision events/second. Are you telling us that your DP and image processing needs are 10% of CERN's?
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
They are not really comparable though are they? CERN decided to discard 99.99% of their data. OP may not choose to discard any of theirs.
Plus, we cannot assume both sets of software are as efficient as they could be. OP software could be using really bad compression (or maybe none at all).
I just don't really understand why everyone is trying to argue the quantity of data. It's not even close to being an impossible amount (given current technology). Also, maybe the numbers are estimates for 5 years from now. You wouldn't want a system that only works for a week would you...
|
|
|
|
|
OK, here's my first attempt at analysis:
- It is just possible to handle this amount of data with a dedicated 10 Gbps connection (the actual data rate is 6.6 Gbps), but once you take into account framing, collisions, etc., it looks very iffy.
[Probably have multiple systems receiving the data] - The interfaces (NVMe, etc.) can handle this data rate, but building a storage system that can handle this sort of sustained write rate is non-trivial.
[Probably use multiple disks running in parallel] - Once you have the data stored locally, you must read it off the storage at the same rate (otherwise you will eventually run out of space), process it, and store it somewhere else.
[The initial processing of this much data would presumably require a massively parallel system, with all the communication and synchronization issues that this entails. Have at least one primary processing node for each receiving system] - How will secondary, tertiary, etc. processing be done?
[Whether you have one secondary processor for one or more primary processors or vice versa depends on the amount of data and the processing required. Again, we have synchronization and communication issues] - Presentation of the results?
[Presumably requires that the results of the processing be sent to a single node. Synchronization, communication issues...]
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
There are so many questions yet to be answered before jumping in to what computer I should buy. Questions like,
- Does all the data from all around the World end up in single data center?
- Does this data center do all the processing?
- Is this really needed or can processing be distributed around the World?
- Sure at some point of time you may need all your data in one location for some kind of analysis. But does this have to be real time? Do you need "raw" data or processed data from remote servers can work fine?
I can think of more if I spend some more time on it.
"It is easy to decipher extraterrestrial signals after deciphering Javascript and VB6 themselves.", ISanti[ ^]
|
|
|
|
|
Well, you obviously need to go distributed and pre-crunch the data locally, before sending it to regional servers for analysis.
|
|
|
|
|
Jörgen Andersson wrote: Well, you obviously need to go distributed and pre-crunch the data locally, before sending it to regional servers for analysis.
I don't know why, but it almost sounds like you are a robot trying to describe the process of eating
|
|
|
|
|
AMD Threadripper 3990x's and 6GB/s tape drives.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
You could look at Dell EMC Isilon for the storage. I worked on a system for an automotive company a couple of years ago where they were collecting and analysing 2PB per week of video and telemetry for self driving car development.
The Isilon storage is NAS and modular so that you can add to clusters as the requirements grow. It is quite an interesting challenge because at 2PB per week you have a constant data input stream of, on average, 3.6 GB/s that has to be stored, next to that backup has to be made, and of course users must be able to access the system for data analysis runs. That's a lot of parallel data movement.
Networking is also a challenge, the initial system for 13PB had over one hundred storage nodes each with 40 Gb/s front end networking ports to connect to the server farm. The system also has its own private network that supports striping data across nodes for availability and protection from failures.
I was the solution architect for the system. It was one of my last projects before I retired from EMC in 2018.
|
|
|
|
|
valuable inputs. thanks a lot AndyChisholm
|
|
|
|
|
As it's Cheltenham week
Include me ? with Horsy racy maybe a rich way of running ? (11)
"We can't stop here - this is bat country" - Hunter S Thompson - RIP
|
|
|
|
|
I have no idea where to even start...
|
|
|
|
|
|
I figured that was the method, but I didn't get "CC" and it doesn't help that I am not familiar with that word.
Can you dumb it down a bit tomorrow please?
|
|
|
|
|
Ok 👍
"We can't stop here - this is bat country" - Hunter S Thompson - RIP
|
|
|
|
|
Not too far, please: "First letter of the alphabet (1)" might be a smidge too easy?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Which alphabet though ?
"We can't stop here - this is bat country" - Hunter S Thompson - RIP
|
|
|
|
|
Doesn't matter - the answer is clearly "T" ...
And are you sure today's CCC is 13 letters?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Yep
"We can't stop here - this is bat country" - Hunter S Thompson - RIP
|
|
|
|
|
Who cannot understand why the Old Dog blindly rejects the wonderful React and keeps insisting on horrible ancient name conventions.
|
|
|
|