Article From:

The study of big data technology has gradually become a compulsory course for many programmers, because the trend is also due to their career. Sharing and exchanging in various technical communities has become a way for many people to learn. Today, it is a great honor to share some basic knowledge of big data for us to learn together.

Picture 1

  1.Cluster machine monitoring

  This is usually used in scenarios where there is a high requirement for machine state and machine online rate in a cluster, and it can respond quickly to machine changes in the cluster. In such scenarios, there is often a monitoring system to detect whether the cluster machine survives in real time. In the past, it used to be that the monitoring system used some means (e.g. p.).Timely detection of each machine, or each machine reports to the monitoring system “I’m still alive” on a regular basis. This approach is feasible, but there are two obvious problems:

  When machines in the cluster change, there are many modifications involved.

  There is a certain delay.

  With ZooKeeper’s two features, another cluster machine survivability monitoring system can be implemented in real time.

  The client registers a Watcher on node X and notifies the client if the sub-node of x? Changes.

  Create an EPHEMERAL-type node that disappears once the client and server session ends or expires.

  For example, if the monitoring system registers a Watcher on the / cluster Servers node and adds machines dynamically later, create an EPHEMERAL-type node under / cluster Servers: / cluster SErvers /{hostname}. In this way, the monitoring system can know the increase or decrease of the machine in real time. As for the follow-up processing, it is the business of the monitoring system.


  In a distributed environment, the same business applications are distributed on different machines. Some business logic (such as time-consuming computing, network I/O processing) often only needs to be executed by one machine in the whole cluster. The rest of the machines can share this result, which can greatly reduce the duplication of work.High performance, so this master election is the main problem in this scenario.

  With ZooKeeper’s strong consistency, the global uniqueness of node creation can be guaranteed in the case of distributed high concurrency. That is, there are multiple client requests to create / current Master node at the same time, and ultimately only one client request can be created successfully. Use this specialSex makes it easy to select clusters in a distributed environment.

  In addition, this scenario evolves into a dynamic Master election. This uses the characteristics of EPHEMERAL_SEQUENTIAL type nodes.

  As mentioned above, only one client can create a request successfully in the end. A slight change here allows all requests to be created successfully, but there must be a creation order, so one possibility that all requests will eventually create results on ZK is as follows: / currentTMaster /{sessionId} – 1,?/ current Master /{sessionId} – 2, / current Master /{sessionId} – 3… Select the one with the smallest serial number each timeAs a Master, if the machine hangs up, because the nodes he creates will be hours away, then the smallest machine after that is Master. system

  In the search system, if each machine in the cluster generates a full index, not only time-consuming, but also can not guarantee the consistency of index data between them. So let Master in the cluster generate the full index, and then synchronize to other machines in the cluster. In addition, disaster relief measures for Master electionsShi Shi, you can specify master manually at any time, that is to say, when ZK can’t get master information, you can get master from one place through http, for example.

  In Hbase, ZooKeeper is also used to achieve dynamic HMaster elections. In Hbase implementations, the addresses of ROOT tables and HMaster are stored on ZK, and HRegionServer uses itself as a temporary node.(Ephemeral) is registered in Zookeeper so that HMaster can sense the survival status of each HRegion Server at any time. At the same time, once HMaster has problems, it will elect a new HMaster to run.This avoids the single problem of HMaster.

Leave a Reply

Your email address will not be published. Required fields are marked *