15,121,230 members
Articles
Article
Posted 1 Nov 2017

(untagged)

6.3K views
2 bookmarked

# The Application of CAP Principle and Distributed Matrix

Rate me:
The application of CAP principle and distributed matrix

CAP principle is a theory that describes consistency, availability, and partition compatibility, but this principle is often confusing in practical situations.

• What kind of situation applies to this principle?
• What are the prerequisites?
• Does CAP principle really make the distributed system not consistent and usable?

Now I'm going to try to use a simple and easy way to get distributed problems in front of you.

First, we define the red ball as data, and the blue box is defined as the container. So the distributed problem can be reduced to how the data is put into the container and how you access it? The container is the partition described by the CAP principle. When you want to put 1 data into n partitions, we can have 3 ways to put it. The first way of CPS, let's say that we have two containers, we divide the data into two parts, and put them in a two container. This distributed model can ensure data consistent, about data is split into two parts, the ambiguity of the data won't appear, then this model is the partition compatibility because after it was split into two parts.

This "consistency" refers to when data A in the partition is modified. The data in both containers is the same and there will be no data ambiguity. But when any of the partitions cannot be used, you can't get all of the data, causing the data to appear unusable.

In the second type of AP, we put two copies of the data into two partitions. You can also use the data to keep the data available when a two partition of AB cannot be used.

However, when two containers modify the data separately, it leads to the failure to obtain consistent data, which is how the AP maintains availability and partition compatibility, sacrificing data consistency.

The third way to do CA is to keep data consistency and availability at the expense of data partition compatibility. To put it bluntly, you can put the data in a single partition, and then it's not a distributed system.

Okay, so that's the basic principle of CAP. The most basic consideration for designing a distributed system is how to put data into individual partitions. Because CA is not a distributed system, it can be eliminated first. The consistency of data in the remaining CP and AP is an inevitable choice, because the unusable data or conflicting data is clearly not acceptable. So the only option we have left is the distributed system of CP. The problem with CP, however, is that data is incomplete after any partition or node crash, resulting in data errors and unusable data. As an Internet engineer, the first thing that comes to mind is, do we have to add backup servers to every partition? The CAP principle says that you're not in the backup service, but the CAP principle is just a way of converting CP to AP.

Backup server A1 and B1 combine with the original server into a weird hybrid of CP and AP. Although increased availability, backup server with the original server because if any unexpected cause data inconsistency so the whole system has lost its consistency, this combination is turned into AP system. So we're going to put the backup server A1 in the back of A so that the user can't modify the A1 server directly. A1 server can only be modified by A server.

Then the distributed system becomes a matrix composed of CP surfaces and AP faces.

In the general case, CP face is responsible for unified external communication, providing unified and partitioned data services. When the user is not visible to the user, when the CP surface is broken or unavailable, select a usable server in the AP direction to continue to provide the service to the user. As long as the system has enough AP servers that can be converted to CP servers to offset the probability of not using the server, the whole distributed matrix CP&AP is guaranteed.

In this distributed matrix, it is also possible to not only use one surface to provide data services to the outside world. The AP node can be used to provide a read-only service to the user in the design of the system. This is not limited to either CP nodes or some surface, and any node can actually provide a read-only service. So for a read-only service that requires less reliability, all nodes are equivalent to a two-dimensional system. The distributed matrix is fairly flexible and can be cut differently depending on the user's view.

So is there a CP&AP system in the real world? The answer is yes, the traditional primary slave server is a C&AP system. Note here that C&AP is not CP&AP because the single-player master system is not distributed. But the main system for providing distributed services, such as large web sites and Amazon cloud databases, can be seen as a complete system of CA&AP. Because the amazon cloud's database implicitly provides a master slave backup and disk backup, it is equivalent to an AP system that is not visible to users and developers.

Similarly, the current blockchain technology is a C&AP system. A distributed matrix consisting of a single C of a single C provided by a POW or DPOS selected C, with an AP service provided by the entire network or a number of servers.

The next step is to discuss a question of consistency definition. Because it involves the question of whether a C container can be divided into CP containers? What kind of data needs consistency?

In the figure above, a ball of red is divided into two halves and two containers. When the ball in A container turns green, if it is a valid ball, then we say that the ball can be partitioned, that is, partition compatible data. If the ball has to be all green to be valid, then the data itself is partition incompatible data.

The process of converting C&AP into CP&AP is to find the partition compatible data in C.

## Share

 China
Distributed Technology Professional