One, a brief introduction to AdaBoost

Boosting, Also known as the enhanced learning or lifting method, it is an important integrated learning technology, which can strengthen the weak learner with high prediction accuracy only slightly higher than random guess. It provides a new idea and new idea for the design of learning algorithms under the difficult situation of constructing a strong learner directly.Method。 One of the most successful applications is the AdaBoost algorithm proposed by Yoav Freund and Robert Schapire in 1995.

AdaBoostIt is the abbreviation of English “Adaptive Boosting” (adaptive enhancement). Its self-adaptive is that the weight value of the sample which is classified by the previous basic classifier will increase, and the right value of the correct classification will be reduced, and it is used again to train the next basic classifier. At the same time, at eachIn a round of iteration, a new weak classifier is added to determine the final strong classifier until a certain small error rate is reached or the maximum number of pre specified iterations is reached.

AdaboostThe algorithm can be briefly described as three steps:

（1）First, it initializes the weight distribution D1 of training data. Assuming that there are N training samples, each training sample is assigned the same weight at the beginning: w1=1/N.

（2）Then, the weak classifier hi is trained. In the course of specific training, if a training sample point is classified by the weak classifier Hi, then a training set is constructed and its corresponding weight should be reduced; on the contrary, if a training sample point is misclassified, its weight should be increased. More weightThe new sample set is used to train the next classifier, and the whole training process is carried out iteratively.

（3）Finally, the weak classifier trained by each algorithm is combined into a strong classifier. After the training process of each weak classifier, the weight of the weak classifier with small classification error rate is increased, which makes it play a greater role in the final classification function, and reduces the weight of the weak classifier with large classification error rate, so that it is the most important.The final classification function plays a smaller decisive role.

In other words, the weak classifier with lower error rate will be more weighted in the final classifier, otherwise it will be smaller.

Two, AdaBoost algorithm process

Given training data sets:，amongA class label used to represent a training sample,*i=1,…,N*。AdaboostThe aim is to learn a series of weak classifiers or basic classifiers from training data, and then combine these weak classifiers into a strong classifier.

The definition of related symbols:

AdaboostThe algorithm flow is as follows:

Relevant instructions:

Based on the above deduction, we can get the formula of updating weights when the sample is divided into two parts.

Three, AdaBoost example explanation

Example: given the training samples shown in the diagram, the weak classifier uses a straight line parallel to the coordinate axis and realizes the strong classification process by using the Adaboost algorithm.

Data analysis:

The 10 samples are used as training data.*X* and*Y* The corresponding relation can divide these 10 data into two kinds, the graph uses “+” to indicate class 1, and “O” to denote class -1. In this case, a horizontal or vertical line is used as classifier. Three weak classifiers have been given.

Initialization：

First, we need to initialize the weight distribution of training samples, and each training sample is assigned the same weight at the very beginning.*w _{i}*=1/

*N*，The initial weight distribution of the training sample set

*D*

_{1}(

*i*)：

Order each weight*w*_{1}_{i}_{ }= 1/*N* = 0.1，Among them,*N* = 10，*i* = 1,2, …, 10，And then,*t*= 1,2,3, …The equivalence is iterated.*t*Representing the number of iterations, representing the number of*t*The distribution of weights of training samples has been given in the following table:

First iteration*t*=1：

The weight distribution of the first test*D _{1}*For 1/N (10 data, the weight of each data is initialized to 0.1).

D_{1}=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1,0.1, 0.1, 0.1, 0.1]

Weight distribution*D _{1}*In the case of the three known weak classifiers

*h*

_{1}、

*h*

_{2}and

*h*

_{3}The least error rate classifier is used as the first basic classifier.

*H*

_{1}(

*x*)（The error rate of the three weak classifiers is 0.3, so take first.

In the classifier*H*_{1}(*x*)=*h*_{1}Under the circumstances, the sample point “578” is mistaken, so the basic classifier.*H*_{1}(*x*)The error rate is as follows:

It can be seen that the sum influence error rate of the wrong classification sample weight.*e*，Error rate*e*Affects the weight of the basic classifier in the final classifier.*α*。

Then, the weight distribution of the training sample data is updated to be used for the next iteration, and the weights of the correctly classified training samples “12346910” (a total of 7) are updated to:

Thus, after first rounds of iteration, the new weight distribution of each sample data is obtained.

D_{2}=[1/14,1/14,1/14,1/14,1/6,1/14,1/6,1/6,1/14,1/14]

Due to the sample data “578”*H*_{1}(x)They are misdivided, so their weights are increased from 0.1 to 1/6, and conversely, the other data are correctly divided, so their weights are reduced from 0.1 to the previous one to 1/14, and the lower table gives the change of the weight distribution.

Available classification function:*f*_{1}(*x*)= *α*_{1}*H*_{1}(*x*) = 0.4236*H*_{1}(*x*)。At this point, a basic classifier is combined*sign*(*f _{1}*(

*x*))As a strong classifier, there are 3 erroneous classification points (i.e. 578) on the training dataset, and the training error of the strong classifier is: 0.3

Obviously，*H*_{2}(*x*)Divide the sample “346” to the wrong basis.*D*_{2}It is known that their weights are*D*_{2}(3)=1/14，*D*_{2}(4)=1/14， *D*_{2}(6)=1/14，therefore*H*_{2}(*x*)The error rate on the training data set:

Thus, after second rounds of iteration, the new weight distribution of each sample data is obtained.

D_{3}=[1/22,1/22,1/6,1/6,7/66,1/6,7/66,7/66,1/22,1/22]

The following table gives the transformation of the weight distribution:

Available classification function:*f*_{2}(*x*)=0.4236*H*_{1}(x) + 0.6496*H*_{2}(x)。At this point, two basic classifiers are combined*sign*(*f*_{2}(x))As a strong classifier, there are 3 erroneous classification points (34 * 6) on the training dataset. The training error of the strong classifier is: 0.3

So, take the current minimum classifier*h _{3}*As third basic classifiers

*H*

_{3}(

*x*)：

In this way, after third rounds of iteration, the new weight distribution of each sample data is:

D_{4}=[1/6,1/6,11/114,11/114,7/114,11/114,7/114,7/114,1/6,1/38]

The following table gives the transformation of the weight distribution:

Available classification function:*f*_{3}(*x*)=0.4236*H*_{1}(*x*) + 0.6496*H*_{2}(*x*)+0.9229*H*_{3}(*x*)。At this point, three basic classifiers are combined*sign*(*f*_{3}(x))As a strong classifier, there are 0 misclassification points on the training dataset. At this point, the whole training process is over.

By integrating all classifiers, the final strong classifier can be obtained as follows:

This strong classifier*H _{final}*The error rate for training samples is 0!

The Matlab code in this example is as follows:

First, the Matlab function file is set up, and three weak classifiers, H1, H2 and H3, are defined.

function kind = wcH1( X,TH ) %h1Weak classifierX1=X(1); X2=X(2); if X1<TH kind=1; else kind=-1; end end

function kind = wcH2( X,TH ) %h2Weak classifierX1=X(1); X2=X(2); if X1<TH kind=1; else kind=-1; end end

function kind = wcH3( X,TH ) %h3Weak classifierX1=X(1); X2=X(2); if X2<TH kind=-1; else kind=1; end end

Main program Matlab Code:

clc,clear all; %% Training sample dataXData=[1 5;2 2;3 1;4 6;6 8;6 5;7 9;8 7;9 8;10 2] %Sample data points, corresponding numbered 1,2,... 10 Y=[1 1 -1 -1 1 -1 1 1 -1 -1]'; %The corresponding sample classes are expressed in 1 and -1XNum=1:10;% No.Format rat%% plotting of sample distributionL1=find (Y==1);X=xData (L1,1); y=xData (L1,2);Plot (x, y,'b+','LineWidth',3,'MarkerSize',12); hold on; L2=find(Y==-1); x=xData(L2,1);y=xData(L2,2); plot(x,y,'ro','LineWidth',3,'MarkerSize',12); xlabel('X1');ylabel('X2');axis([0 10 0 10]) %% ***********************************The first trial process * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *H1=zeros (10,1); H2=H1; H3=H1For i=1:10X=xData (I,:);H1 (I) = wcH1 (X, 2.5);% weak classifier H1H2 (I) = wcH2 (X, 8.5);% weak classifier H2H3 (I) = wcH3 (X, 6.5));% weak classifier H3EndErrDataH1=find (H1~=Y);%; find the ordinal number of the sample points that are misclassified by H1.ErrDataH2=find (H2~=Y);%; find the ordinal number of the sample points that are misclassified by H2.ERrDataH3=find (H3~=Y);%; find the ordinal number of the sample points that are misclassified by H3.AccDataH1=find (H1==Y);%; find the ordinal number of the sample points correctly assigned by H1.AccDataH2=find (H2==Y);% find the sequence number of the sample point that is correctly divided by the H2AccDataH3=find (H3==Y);%; find the ordinal number of the sample points correctly assigned by H3.ErrDataAll=[errDataH1, errDataH2, errDatAH3];AccDataAll=[accDataH1, accDataH2, accDataH3];N=10;D1=zeros (10,1) +1/N% initialization weight distributionThe first iteration of *% * * * * * * *Err1=sum (D1 (errDa)TaH1):%; the sum of the weights of all the wrong samples is the error rate.Err2=sum (D1 (errDataH2:))%; the sum of the weights of all the wrong samples is the error rate.Err3=sum (D1 (errDa)TaH3):%; the sum of the weights of all the wrong samples is the error rate.ErrAll=[err1, err2, err3];[minErr, minIndex]=min (errAll);% according to the error rate e1 calculate the coefficient of H1:A1=0.5*log ((1-minErr) /minErr)MinErrData=errDataAll (:: minIndex);MinAccData=accDataALl (:: minIndex);D2=D1;For i=minAccData' D2(i)=D2(i)/(2*(1-minErr)); end for i=minErrData' D2(i)=D2(i)/(2*minErr); end D2 %Classification functionF1=a1.*H1;KindFinal=sign (F1)% classification results of strong classifier at this timeThe second iteration of%% * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * *Err1=sum (D2 (errDataH1:))%; the sum of the weights of all the wrong samples is the error rate.Err2=sum (D2 (E)RrDataH2):%; the sum of the weights of all the wrong samples is the error rate.Err3=sum (D2 (errDataH3:))%; the sum of the weights of all the wrong samples is the error rate.ErrAll=[err1,Err2, err3];[minErr, minIndex]=min (errAll);% according to the error rate E2, the coefficient of H2 is calculated.A2=0.5*log ((1-minErr) /minErr)MInErrData=errDataAll (:: minIndex);MinAccData=accDataAll (:: minIndex);D3=D2;For i=minAccData' D3(i)=D3(i)/(2*(1-minErr)); end for i=minErrData' D3(i)=D3(i)/(2*minErr); end D3 % Classification functionF2=a1.*H1+a2*H2;KindFinal=sign (F2)% classification results of strong classifier at this time% * * * * * * * * * * * * * * * * * * * * * * * * * *The third iteration * * * * * * * * * * * * * * * * * * * * * * * * * *Err1=sum (D3 (errDataH1:))%; the sum of the weights of all the wrong samples is the error rate.Err2=suM (D3 (errDataH2:))%; the sum of the weights of all the wrong samples is the error rate.Err3=sum (D3 (errDataH3:))%; the sum of the weights of all the wrong samples is the error rate.ErrAll=[err1, err2, err3];[minErr, minIndex]=min (errAll);% according to the error rate E3, the coefficient of G3 is calculated.A3=0.5*log ((1-minErr) /minErR)MinErrData=errDataAll (:: minIndex);MinAccData=accDataAll (:: minIndex);D4=D3;For i=minAccData' D4(i)=D4(i)/(2*(1-minErr)); end for i=minErrData' D4(i)=D4(i)/(2*minErr); end D4 % Classification functionF3=a1.*H1+a2*H2+a3*H3;KindFinal=sign (F3)% classification results of strong classifier at this time%%

Four, the advantages and disadvantages of AdaBoost

Advantage

（1）AdaboostA framework is provided to construct sub classifiers in various ways within the framework. Simple weak classifiers can be used without filtering the features and no over fitting phenomenon.

（2）AdaboostThe algorithm does not require a priori knowledge of weak classifiers, and the final classification accuracy of strong classifiers depends on all weak classifiers. Whether applied to artificial data or real data, Adaboost can significantly improve learning accuracy.

（3）AdaboostThe algorithm does not need to know the upper limit of the error rate of the weak classifier in advance, and the classification precision of the final strong classifier depends on the classification accuracy of all the weak classifiers, and it can dig the ability of the classifier. Adaboost can adjust the assumed error rate adaptively according to the feedback of weak classifiers, and the efficiency of execution is high..

（4）AdaboostTo train different weak classifiers for the same training sample set, the weak classifier is set up according to a certain method, and a strong classifier with strong classification ability is constructed, that is, “three stinky cobbler surpasses one Zhu Geliang”.

Shortcomings:

In the Adaboost training process, Adaboost makes the weights of the difficult classification samples increase exponentially, and the training will be too biased to this kind of difficult sample, causing the Adaboost algorithm to be easily disturbed by noise. In addition, Adaboost relies on weak classifier while weak classifier.The training time is often very long.

Reprinted from https://blog.csdn.net/guyuealian/article/details/70995333