Click here to Skip to main content
15,888,521 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
A thought flashed in my mind about a month ago and I am still thinking about it. On one hand, it may seem like a joke, on the other hand, it can have a very serious background. I decided to share it with the community and ask their opinion.

Once I faced a memory fault in the platform that affected the application durability. And I thought:
- RAID is a well-known technology. It has been designed to increase durability of data storage using independent (or inexpensive) disks.;
- CPU and RAM are cheaper and parallelized (more independent) now;
- Can we introduce technology named say as RPTC (Redundant Parallel Thread Calculations) where the same calculations will be performed independently in parallel like RAID stores data ?

So, if independent calculations produce the same result it will mean that the process has been performed without error. If results differ – the error has occurred. As an advantage – more durable calculations.

The serious reason for considering this technique is likelihood of hardware RAM errors and increased volume of calculations. As it is known RAM chips can produce failure (you can find information in Internet).

Areas of application: nuclear plants, medicine, space, - any relevant areas.

Does this make sense ?

What I have tried:

Once the application started to report memory fragmentation errors (OS or Hardware level). Application worked in virtual machine. Only after rebooting guest and host the problem disappears. I an a programmer, but not a system engineer. But it seems that virtualization increases likelihood of platform error.
Posted
Updated 29-Apr-16 8:14am
v2

1 solution

A better method for safety critical systems is true redundancy: Redundancy (engineering) - Wikipedia, the free encyclopedia[^]
The hardware is duplicated, the software is written by different teams, and two out of three can "out-vote" the third. Purely duplicating the same calculation in different threads on the same processor doesn't provide you much security - it you think how many "soft errors" you get in the average year and compare that to the number of bugs you encounter in the same time... :laugh:
It's an idea, but I'm not convinced that the performance hit you would take would in any way balance the small advantage you would get.
 
Share this answer
 
Comments
Matt T Heffron 29-Apr-16 14:17pm    
5

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900