Article From:https://www.cnblogs.com/braska/p/9969139.html

First, reference

Reference: https://www.cnblogs.com/flying 607/p/8330551.html

ribbon+spring retrySource code analysis of retry strategy: https://blog.csdn.net/xiao_jun_0820/article/details/79320352

 

Two, background

High availability of services these days.

In order to ensure that customer requests are unavailable due to a failure of a machine providing services, we need to retry the server or route it intelligently to the next available server.

For this reason, we searched some information on the internet, and finally chose the retry strategy of ribbon + spring retry.

 

As can be seen from the reference technical article, the core of fault retry

1It’s the dependency of introducing spring retry

        <dependency>
            <groupId>org.springframework.retry</groupId>
            <artifactId>spring-retry</artifactId>
            <version>1.2.2.RELEASE</version>
        </dependency>

2Is to open zuul and ribbon retry configuration

zuul:
  retryable: true    #Retrial matchRibbon:MaxAutoRetries Next Server:2     #Number of Service Instances ReplacedMaxAutoRetries:0        #Number of current service retries
OkToRetryOnAllOperations: true #Set to false to handle only get request failures

 

Of course, the purpose of this article is not limited to this.

After adding these configurations, we find that there are still some limitations.

1、When the cluster machine instance providing services is smaller than MaxAutoRetriesNextServer, only the load using polling policy can be used normally.

2、When the instances of cluster machines providing services are larger than those of MaxAutoRetriesNext Server, the load with polling or random policies can occasionally be used normally.

A minimum concurrency strategy or a single load (usually to solve the session loss problem, i.e., the same client requests fixed access to a server)

It can’t work properly at all.

      Why do you say that? For example, we have five machines to provide services. The first machine can provide services normally, and the second one has the smallest concurrency.

      When the second to fifth servers hang up, polling is used and MaxAutoRetriesNextServer = 2. Then ribbon will try to access the third and fourth servers.

      The result is self-evident. Of course, if lucky, the third or fourth server is available, then it can provide services normally.

      Using random strategies also depends on luck.

      Minimum concurrency or single strategy is that no matter how many retries are made, the second node always chooses to hang and completely fails.
So, what’s the solution?

 

3. Dynamic setup of MaxAutoRetriesNextServer

One of the keys to these problems is that Max AutoRetriesNext Server is written to death, and the number of servers we provide may increase with the load of the cluster (reduction does not affect).

You can’t change the configuration of MaxAutoRetriesNext Server every time you increase the number of servers, can you? Since you don’t want to change the configuration, of course, you set the value of MaxAutoRetriesNextServer dynamically.

Look at the retry source code Ribbon LoadBalanced RetryPolicy. Java

    @Override
    public boolean canRetryNextServer(LoadBalancedRetryContext context) {
        //this will be called after a failure occurs and we increment the counter
        //so we check that the count is less than or equals to too make sure
        //we try the next server the right number of times
        return nextServerCount <= lbContext.getRetryHandler().getMaxRetriesOnNextServer() && canRetry(context);
    }

You can see that the value of MaxAutoRetriesNextServer is obtained from DefaultLoad BalancerRetryHandler. But Default Load Balancer RetryHandlIt also does not provide an interface for setting up MaxAutoRetriesNextServer.

Tracing up the source code for DefaultLoadBalancerRetryHandler instantiation

    @Bean
    @ConditionalOnMissingBean
    public RibbonLoadBalancerContext ribbonLoadBalancerContext(ILoadBalancer loadBalancer,
            IClientConfig config, RetryHandler retryHandler) {
        return new RibbonLoadBalancerContext(loadBalancer, config, retryHandler);
    }

    @Bean
    @ConditionalOnMissingBean
    public RetryHandler retryHandler(IClientConfig config) {
        return new DefaultLoadBalancerRetryHandler(config);
    }

It was found that the DefaultLoadBalancerRetryHandler object can be retrieved from the RibbonLoadBalancerContext instance, while the RibbonLoadBalancerContext can be retrieved from the RibbonLoadBalancerContext instance.Spring Client Factory gets, so we just need to create a new retryHandler and reassign it to Ribbon Load Balancer Context.

Code:

1、Hosting IClientConfig to spring

    @Bean
    public IClientConfig ribbonClientConfig() {
        DefaultClientConfigImpl config = new DefaultClientConfigImpl();
        config.loadProperties(this.name);
        config.set(CommonClientConfigKey.ConnectTimeout, DEFAULT_CONNECT_TIMEOUT);
        config.set(CommonClientConfigKey.ReadTimeout, DEFAULT_READ_TIMEOUT);
        return config;
    }

2、New retryHandler and update to Ribbon Load Balancer Context

    private void setMaxAutoRetiresNextServer(int size) {  //size: Number of clusters providing servicesSpring Client Factory= SpringContext.getBean(SpringClientFactory.class); //Get spring managed singleton objectsIClientConfig ClientConfig= SpringContext.getBean(IClientConfig.class);
        int retrySameServer = clientConfig.get(CommonClientConfigKey.MaxAutoRetries, 0);//Get the value in the configuration file, default 0boolean retryEnable = clientConfig.get(CommonClientConfigKey.OkToRetryOnAllOperations, false);//Default false.RetryHandler retryHandler= new DefaultLoadBalancerRetryHandler(retrySameServer, size, retryEnable);//New retryHandlerFactory. getLoadBalancerContext (name). setRetryHandler (retryHandler);}

MaxAutoRetriesNextServerThe problem of dynamic setting is solved.

 

4. Eliminate unavailable services.

ErukaIt seems to have the ability to remove and restore services, so if you use the Eruka registry, you don’t have to look down. I’m not sure about the specific configuration.

Because we don’t use eruka, the list of services we retry from failures still contains the servers that have been suspended.

This leads to problems with the load of the minimum concurrency policy and the single policy.

Tracking the source code, we found that the server failure will call the canRetryNextServer method, so it is better to do an article in this method.

 

Custom RetryPolicy inherits Ribbon Load Balanced RetryPolicy and overrides canRetryNextServer

public class ServerRibbonLoadBalancedRetryPolicy extends RibbonLoadBalancedRetryPolicy {

    private RetryTrigger trigger;
    public ServerRibbonLoadBalancedRetryPolicy(String serviceId, RibbonLoadBalancerContext context, ServiceInstanceChooser loadBalanceChooser, IClientConfig clientConfig) {
        super(serviceId, context, loadBalanceChooser, clientConfig);
    }

    public void setTrigger(RetryTrigger trigger) {
        this.trigger = trigger;
    }

    @Override
    public boolean canRetryNextServer(LoadBalancedRetryContext context) {
        boolean retryEnable = super.canRetryNextServer(context);
        if (retryEnable && trigger != null) {
            //Callback trigger
            trigger.exec(context);
        }
        return retryEnable;
    }

    @FunctionalInterface
    public interface RetryTrigger {
        void exec(LoadBalancedRetryContext context);
    }
}    

 

Custom RetryPolicyFactory inherits Ribbon LoadBalanced RetryPolicyFactory and overrides the Create method

public class ServerRibbonLoadBalancedRetryPolicyFactory extends RibbonLoadBalancedRetryPolicyFactory {
    private SpringClientFactory clientFactory;
    private ServerRibbonLoadBalancedRetryPolicy policy;
    private ServerRibbonLoadBalancedRetryPolicy.RetryTrigger trigger;

    public ServerRibbonLoadBalancedRetryPolicyFactory(SpringClientFactory clientFactory) {
        super(clientFactory);
        this.clientFactory = clientFactory;
    }

    @Override
    public LoadBalancedRetryPolicy create(String serviceId, ServiceInstanceChooser loadBalanceChooser) {
        RibbonLoadBalancerContext lbContext = this.clientFactory
                .getLoadBalancerContext(serviceId);
        policy = new ServerRibbonLoadBalancedRetryPolicy(serviceId, lbContext, loadBalanceChooser, clientFactory.getClientConfig(serviceId));
        policy.setTrigger(trigger);
        return policy;
    }

    public void setTrigger(ServerRibbonLoadBalancedRetryPolicy.RetryTrigger trigger) {
        policy.setTrigger(trigger);//The setTrigger doesn't know who will trigger first, so it's set on both sides.this.trigger = trigger;
    }
}

 

Hosting LoadBalanced RetryPolicy Factory to spring

    @Bean
    @ConditionalOnClass(name = "org.springframework.retry.support.RetryTemplate")
    public LoadBalancedRetryPolicyFactory loadBalancedRetryPolicyFactory(SpringClientFactory clientFactory) {
        return new ServerRibbonLoadBalancedRetryPolicyFactory(clientFactory);
    }

Then we can implement the RetryTrigger method on our rule class.

public class ServerLoadBalancerRule extends AbstractLoadBalancerRule implements ServerRibbonLoadBalancedRetryPolicy.RetryTrigger {

    private static final Logger LOGGER = LoggerFactory.getLogger(ServerLoadBalancerRule.class);
    /**
     * Unavailable servers*/
    private Map<String, List<String>> unreachableServer = new HashMap<>(256);
    /**
     * Last request tag*/
    private String lastRequest;

    @Autowired
    LoadBalancedRetryPolicyFactory policyFactory;

    @Override
    public Server choose(Object key) {
        //Initialization Retry Trigger
        retryTrigger();
        return getServer(getLoadBalancer(), key);
    }

    private Server getServer(ILoadBalancer loadBalancer, Object key) {
       //Filtering unavailable services
    }

    private void retryTrigger() {
        RequestContext ctx = RequestContext.getCurrentContext();
        String batchNo = (String) ctx.get(Constant.REQUEST_BATCH_NO);
        if (!isLastRequest(batchNo)) {
            //Clean up all cached unavailable services instead of the same request
            unreachableServer.clear();
        }

        if (policyFactory instanceof ServerRibbonLoadBalancedRetryPolicyFactory) {
            ((ServerRibbonLoadBalancedRetryPolicyFactory) policyFactory).setTrigger(this);
        }
    }

    private boolean isLastRequest(String batchNo) {
        return batchNo != null && batchNo.equals(lastRequest);
    }

    @Override
    public void exec(LoadBalancedRetryContext context) {
        RequestContext ctx = RequestContext.getCurrentContext();
     //UUID,Failure retries do not change. Each time a customer requests a new batchNo, which can be generated in preFilter.  
        String batchNo = (String) ctx.get(Constant.REQUEST_BATCH_NO);
        lastRequest = batchNo;

        List<String> hostAndPorts = unreachableServer.get((String) ctx.get(Constant.REQUEST_BATCH_NO));
        if (hostAndPorts == null) {
            hostAndPorts = new ArrayList<>();
        }
        if (context != null && context.getServiceInstance() != null) {
            String host = context.getServiceInstance().getHost();
            int port = context.getServiceInstance().getPort();
            if (!hostAndPorts.contains(host + Constant.COLON + port))
                hostAndPorts.add(host + Constant.COLON + port);
            unreachableServer.put((String) ctx.get(Constant.REQUEST_BATCH_NO), hostAndPorts);
        }
    }
}

 

In this way, we get the unavailable services, and then filter out the services in unreachable Server when we retry.

One thing to note here is that the value of MaxAutoRetriesNextServer must be the size of the unfiltered list of services.

 

Of course, some people will wonder what happens if the number of servers is too large and the retry time exceeds ReadTimeout. I don’t have timeouts here, because it’s not a reasonable requirement to keep customers waiting too long.

So it’s good to set up a reasonable ReadTimeout in the configuration file. In this time period, if you retry the available service, you will throw the timeout information directly to the customer.

Source address: https://gitee.com/syher/spring-boot-project/tree/master/spring-boot-zuul

Leave a Reply

Your email address will not be published. Required fields are marked *