Evolution of database architecture
In the era of less business data, we can use a single-machine database to meet the business use. With more and more business requests, the amount of data in the database increases rapidly. At this time, the single-machine database can no longer meet the performance requirements of the business, and the database master-slave replication architecture comes into being.
Master-slave replication separates the write operation from the read operation of the database, uses multiple slaver replication to process the read request, Master to process the write request, and the read-only instance keeps the data with the master instance by replicating the data of the master instance.Uniformity. Because read-only instances can be scaled horizontally, more read requests are not a problem. With the advent of cloud computing and big data era, things have not been solved perfectly. As more and more write requests are made, the main instance write requests become the main performance bottleneck.
How to solve these problems? If only by adding a master instance to share the write request, how to synchronize the write operation between the two master instances to ensure data consistency, how to avoid double write, the problem will become more complex. At this time, we need to use the sub library table (sharding) to segment the writing operations.To solve it, as shown in Figure 1:
Figure 1: typical read-write separation and sub library tables
HUAWEI cloud products DDM (Distributed Database Middleware)As a pre-distributed database access service of RDS, it completely solves the problem of database scalability, transparently realizes high concurrent access to massive data, and realizes the separation of read-write and database tabulation.
Implementation of data fragmentation
The implementation scheme of data fragmentation can be divided into application layer fragmentation and middleware fragmentation. The characteristics of these two implementation schemes are shown in Figure 2.
Figure 2: application layer partitioning and middleware partitioning
DDMAs an excellent distributed database middleware product, it realizes the function of read-write separation and data fragmentation. It uses DDM to divide the database into tables and uses 0 change to make the application transparent.
Segmentation method of sub library table
Data segmentation (Sharding) can be divided into two types according to the type of segmentation rules. One is to slice different tables (or Schemas) onto different databases (hosts), which can be called vertical (vertical) slicing of data; the other isAccording to the logical relationship of the data in the table, the data in the same table is divided into several databases (hosts) according to some conditions. This segmentation is called horizontal (horizontal) segmentation of data.
Vertical segmentation is characterized by simple rules, implementation is more convenient, especially suitable for the coupling between the business is very low, the interaction is very small, business logic is very clear system. In this system, it is easy to split the tables used by different business modules into different databases. According to the differenceThe split of the table has less impact on the application, and the resolution rules are simpler and clearer.
Horizontal segmentation is relatively more complicated than vertical segmentation. Because splitting different data from the same table into different databases, splitting rules themselves are more complex than splitting by table names, and later data maintenance is more complex.
Specifically, if a single library is too large, then we have to see whether it is because of more tables that lead to more data, or because of more data in a single table. If there are more tables and more data, vertical segmentation is used to cut different libraries according to their business. If the data volume of the sheet is too large, the horizontal segmentation is used.That is to say, table data is cut into several tables or even multiple tables on several libraries according to certain rules. The order of the sub library should be divided vertically first and after the water. Because vertical division is more simple and more consistent with our way of dealing with real world problems.
Horizontal splitting does not categorize tables as opposed to vertical splitting, but scatters them into multiple libraries according to certain rules for a field, each table containing some data. Simply put, we can think of horizontal slicing of data as slicing rows into a number of rows in a tableSome other rows are also divided into other databases, as shown in Figure 3:
Fig. 3: horizontal segmentation
Advantages of horizontal segmentation
1、Splitting rules are abstracted, and join operations can basically be done by databases.
2、There is no single library big data and high concurrent performance bottleneck.
3、There are few applications.
4、 The stability and load capacity of the system are improved.
Disadvantages of horizontal segmentation
1、The rule of splitting is difficult to abstract.
2、Piecewise transaction consistency is hard to solve.
3、The difficulty of data expansion and maintenance is enormous.
4、The performance of cross library join is poor.
Typical segmentation rules for horizontal segmentation
For example, take the user ID, then hash take the module and assign it to different databases.
For example, from 0 to 10000, a table, 10001 to 20000, a table.
Time slicing, for example, cuts data from six months ago, or even a year ago, into another table, because the probability that the data will be queried decreases over time, there is no need to put it together with hot data, which is also “hot and cold data separation”.
The principle of segmentation is to find suitable segmentation rules according to the business and disperse them into different libraries, as shown in Figure 4, taking the die as segmentation rules according to the user ID.
Fig. 4: segmentation based on userid sampling.
A database consists of many tables. Each table corresponds to a different business. Vertical slicing refers to sorting the tables according to the business and distributing them over different databases, thus sharing the data or pressure on different libraries, as shown in Figure 5:
Fig. 5: vertical segmentation
Advantages of vertical segmentation
1、Data maintenance is simple.
2、After splitting, the business is clear and the rules are clear.
3、Systems are easy to integrate or expand.
Disadvantages of vertical segmentation
1、Transaction processing is complex.
2、Some business tables can not be join and can only be solved by interface, which improves the complexity of the system.
3、Due to the different constraints of each business, there is a single library performance bottleneck, which is not easy to expand data and improve performance.
Because vertical splitting is to divide the tables into different libraries according to the classification of business, some business tables will be too large, there is a single library read-write and storage bottleneck, so we need to split horizontally to solve.
The principle of segmentation
Because of the difficulty of data Join after data segmentation, we also share the experience of data segmentation.
The first principle: try not to divide, try not to divide.
The second principle: if we want to cut the score, we must choose the right segmentation rules and plan ahead.
Third principle: Data segmentation tries to reduce the possibility of cross-library Joins by data redundancy or table grouping.
Fourth principle: Because the database middleware is difficult to grasp the advantages and disadvantages of data Join implementation, and the implementation of high performance is extremely difficult, business read as little as possible using multi-table Join.
Problems and Countermeasures after dividing the library into tables
The sub library table is mainly used to deal with two common Internet scenarios: massive data and high concurrency. However, the sub-database table is a double-edged sword. Although it is very good to deal with the impact and pressure of massive data and high concurrency on the database, it also increases the complexity of the system and maintenance costs, and brings some problems.
After partitioning the database into tables, it becomes a distributed transaction. How to ensure the consistency of data has become a problem that must be faced. In general, the storage data as far as possible to achieve user consistency, to ensure that the system after a short period of time to self-recovery and correction, data eventually reached the same.
2、Paging and sorting problems
In general, the paging needs to be sorted according to the specified field. Paging and sorting are also very easy in the case of single library and single table. However, with the evolution of sub library and partition table, cross library sorting and cross table scheduling problems will also be encountered. For the accuracy of the final result, data in different sub tables is required.Sort and return, summarize and reorder the result sets returned by different tabs, and then return to the user.
3、Table correlation problem
In case of single library and single table, joint query is very easy. However, with the evolution of sub library and partition table, joint queries will encounter cross library issues. A rough solution: ER fragmentation: Records of a child table are stored on the same data fragment as the associated parent table record. Global table: basic data, allA copy of the library. Field redundancy: in this way, some fields do not need to be queried by join. ShareJoin: It’s a simple cross-slice join that currently supports two tables by parsing SQL statements, splitting them into single-table SQL statements, and executing them on each nodeData collection.
4、Distributed global unique ID
In the case of a single library and single table, it is really simple to generate the primary key ID directly using the database self-increment feature. In the environment of database tabulation, the data is distributed on different tables, so we can no longer rely on the self-growth characteristics of the database, so we need to use the global unique ID.
Sub library sub table case
A tax core collection and management system, the country’s 34 provinces / land tax, electronic tax bureau 15 provinces pattern.
Technology path: Core Collection + tax service business application of distributed cloud transformation.
1、Data query time 3-5 seconds, slow response speed seriously affect the experience.
At present, the business logic is largely placed in the database layer, and the transaction boundary of a tax business is too large (40 SQL statements), which involves many table-related transaction operations mainly with “declaration” and “invoice” large tables, resulting in slow response to business queries.
2、Rapid growth of billion level data, challenging business performance bottlenecks
At the provincial tax bureau, the maximum amount of users carrying million tax is 3000-5000TPS. Current network analysis data: the core collection and management database nearly 1000 tables, of which “declaration” and “invoice” business table data volume, rapid growth, is the main bottleneck table; invoice comprehensive information: 1 billion per province levelThe number of records is increased to tens of millions of records per year.
1、Vertical database and micro service decompose database pressure, reduce single business SQL number
Based on microservice, large transactions are disassembled into asynchronous small transactions, and business logic is stripped from the database level. Separate the main database data, split large tables vertically into multiple databases, a business 40 SQL reduced to 20 SQL, to achieve the purpose of decomposing the database pressure.
2、Data fragmentation supports massive data growth and linearly improves business processing speed.
The single table billion level record is divided into 128 pieces of RDS-MySQL by taxpayers as a splitting key. Support the storage of massive data. After splitting, the database design is concise and simple. There is no foreign key, no trigger, no stored procedure between the tables of the database, and the horizontal expansion of the database records is realized.Exhibition.
3、Read / write separation improves query performance
DDMAutomatic separation of read and write, transparently complete the distribution of write and read operations, the application does not need to make special changes and processing logic. Write operations are distributed to the main instance of RDS, and read operations are automatically distributed to multiple read instances of RDS, so that write operations do not affect the concurrency of read operations, and read concurrent business growth onlyYou need to add read-only instances on demand.
1、With DDM, it is easy to break through the original performance bottleneck, a business operation, the original need for 3 to 5 seconds, now only need 1 second.
2、Read-write operation is separated by DDM automatic read-write operation, which can easily improve the overall read-write concurrency ability without changing the business.