Article From:https://www.cnblogs.com/zhhy236400/p/9969568.html

Association analysis tries to find relationships in large-scale data sets. These relationships have two forms: frequent itemsets and association rules.

Frequent Itemsets: A collection of items that often appear in one piece

Association Rules: Possible Relationships between Two Items

Support: Proportion of records on a data set that contain that set

Reliability/Confidence: The association rules for item A – & gt; item B are defined.Item A – & gt; Item BConfidence = Support (A | B) / Support (A)

AprioriAlgorithmic Principle: If an item set is frequent, then all its subsets are frequent. == If an item set is infrequent, all its supersets are infrequent.

Find frequent itemsets:

Auxiliary function:

def loadDataSet():
    return [[1,3,4],[2,3,5],[1,2,3,5],[2,5]]
def createC1(dataSet):
    C1 = []
    for transaction in dataSet:
        for item in transaction:
            if not [item] in C1:    #Instead of simply adding each item, add a list containing only that item.C1. append ([item])C1.sort ()Return list (ma)P (frozenset, C1)# frozenset creates immutable setsDef scanD (D, Ck, minSupport): # Data Set, List of Candidate Item Sets, Minimum Support, Filtering Items that do not conform to Minimum SupportcollectionSsCnt = {}For TID in D:For can in Ck:If can. issubset (tid):If can not in ssCnt: ssCnt [can]= 1Other: ssCnt [can]+= 1NumItems = float(len (D))RetList = []Support Data = {}For key in ssCnt:Support = ssCnt [key]/nuMItemsIf support & gt;= minSupport:RetList. insert (0, key)Support Data [ke]Y] = supportReretList, support data

 CompleteApriorialgorithm

def aprioriGen(Lk, k):   #The parameter is the current list of frequent itemsets, and the number of elements needed to generate new frequent itemsets. Because the number of new item elements generated at each time is 1 more than the current one, so [: k-2] means that the current item, except for the last element, is [:, -1]
    retList = []
    print(Lk)
    print('k:',k)
    lenLk = len(Lk)
    for i in range(lenLk):
        for j in range(i+1, lenLk):
            print('i:',Lk[i])
            print('j:',Lk[j])
            L1 = list(Lk[i])[:k-2]
            L2 = list(Lk[j])[:k-2]
            L1.sort()
            L2.sort()
            if L1==L2:
                retList.append(Lk[i] | Lk[j])  #Set operator, union setPrint (retList)ReretListDef Apriori (dataSet, minSupport = 0.5):C1 = createC1 (dataSet)D = list (map (set, dataSet)L1, support data = scanD (D, C1, minSup)Port)L = [L1]K = 2While (len (L [k-2]) & gt; 0):Ck = aprioriGen (L [k-2], k)Lk, supK = scanD (D, Ck, minSupport)Support Data. update (supK)L.append (Lk)K + = 1Return L, support Data

 Test:

if __name__=='__main__':
    dataSet=loadDataSet()
    L,suppData=apriori(dataSet,0.5)
    print(L)
    print(suppData)

 Output:

[[frozenset({1}), frozenset({3}), frozenset({2}), frozenset({5})], [frozenset({3, 5}), frozenset({1, 3}), frozenset({2, 5}), frozenset({2, 3})], [frozenset({2, 3, 5})], []]
{frozenset({5}): 0.75, frozenset({3}): 0.75, frozenset({2, 3, 5}): 0.5, frozenset({1, 2}): 0.25, frozenset({1, 5}): 0.25, frozenset({3, 5}): 0.5, frozenset({4}): 0.25, frozenset({2, 3}): 0.5, frozenset({2, 5}): 0.75, frozenset({1}): 0.5, frozenset({1, 3}): 0.5, frozenset({2}): 0.75}

 

Mining from Frequent ItemsAssociation rules

If a rule does not satisfy the minimum reliability requirement, all subsets of the rule do not satisfy the minimum reliability requirement.

Hierarchy: Start with a frequent itemset, and then create a list of rules with only one element at the right of the rule. Then test these elements. Next, merge all the remaining rules to create a new rule list with two elements on the right of the rule.

Association rule generation function:

def generateRules(L, supportData, minConf=0.7):  #Frequent Itemset List, Support List, Minimum ReliabilityBigRuleList = []# Rule List Containing TrustworthinessFor I in range (1, len (L): # because it is not possible for a single element onlyConstructing rules of frequent itemsets. Intelligence begins with L [1]. L [1] is a list of frequent itemsets with two elements.For freqSet in L [i]: # traverse all frequent itemsets of the current L [i]H1 = [frozenset ([item]) for item in freqSet] # Gets a list of elements from frequent item setsIf (i & gt; 1): # I & gt; 1:00Frequent itemsets have at least three elementsRules FromConseq (freqSet, H1, support data, bigRuleList, minConf)  For H1 Further merges into a list of frequent itemsets with two elementsOther: # Frequent Item Set has two elements, then calculates credibilityCalcConf (freqSet, H1, sup)PortData, bigRule List, minConf)Return bigRule ListDef calcConf (freqSet, H, supportData, brl, mi)NConf = 0.7): \ Calculate the reliability of the rule and find the rule that meets the minimum reliability requirement; there is no reset of BRL inside the function, so BRL can save the rule that meets the condition.PrunedH = []# Save to Minimum TrustworthinessRule List of DegreesFor conseq in H: #Conf = supportData [freqSet]/supportData [freqSet-conseq]# Computational Credibility uuuuuuuuuuuPrint (freqSet - conseq,'==== gt;', conseq,'credibility:', conf)If conf & gt;= minCOnf:# print (freqSet-conseq,'=====>', conseq,'credibility:', conf)Brl. append ((freqS)Et-conseq, conseq, conf)65507PrunedH. append (conseq)Print ('prunedH:', prunedH)Return prunedHDef rules FromConseq (freqSet, H, support data, brl, minCo)NF = 0.7): # Frequent Item Set, which can appear in the element list H on the right side of the ruleM = len (H [0]) # M = 1 on the first callIf len (freqSet) & gt; m + 1:  such asIf the number of elements in a frequent itemset is greater than H and the length of the elements is increased by 1, for example, when freqSet=[2,3,5], H=[2,3,5], 3> 1+1, the elements of H are fused into [[2,3], [2,5], [3,5]]Hmp1 = aprioriGen (H, m + 1) #Hmpl=[[2,3], [2,5], [3,5]]Print ('freqSet:', freqSet)Print ()Hmp1:', Hmp1)Hmp1 = calcConf (freqSet, Hmp1, support data, brl, minConf)If len (Hmp1)> 1:  If more than one rule satisfies the requirement, use Hmp1 iteration to call rulesFromConseq to determine whether these rules can be further combinedRules FromConseq (freqS)Et, Hmp1, support data, brl, minConf)

 Test:

if __name__=='__main__':
    dataSet=loadDataSet()
    L,suppData=apriori(dataSet,0.5)
    rules=generateRules(L,suppData,0.7)
    print(rules)

 Output:

frozenset({5}) ====> frozenset({3}) Reliability: 0.666666666666666666
frozenset({3}) ====> frozenset({5}) Reliability: 0.666666666666666666
prunedH: []
frozenset({3}) ====> frozenset({1}) Reliability: 0.666666666666666666
frozenset({1}) ====> frozenset({3}) Reliability: 1.0
prunedH: [frozenset({3})]
frozenset({5}) ====> frozenset({2}) Reliability: 1.0
frozenset({2}) ====> frozenset({5}) Reliability: 1.0
prunedH: [frozenset({2}), frozenset({5})]
frozenset({3}) ====> frozenset({2}) Reliability: 0.666666666666666666
frozenset({2}) ====> frozenset({3}) Reliability: 0.666666666666666666
prunedH: []
freqSet: frozenset({2, 3, 5})
Hmp1: [frozenset({2, 3}), frozenset({2, 5}), frozenset({3, 5})]
frozenset({5}) ====> frozenset({2, 3}) Reliability: 0.666666666666666666
frozenset({3}) ====> frozenset({2, 5}) Reliability: 0.666666666666666666
frozenset({2}) ====> frozenset({3, 5}) Reliability: 0.666666666666666666
prunedH: []
[(frozenset({1}), frozenset({3}), 1.0), (frozenset({5}), frozenset({2}), 1.0), (frozenset({2}), frozenset({5}), 1.0)]

View Code

 


 Reference resources

The above code has a logical error:

In the function rulesFromConseq, if H has three elements, such as H=[2,3,5], freqSet=[2,3,5], the program will fuse H into [[2,3], [2,5], [3,5], and will not check [2,3]==[5], [2,5]==& gt, [3], [3,5]=== gt, [2] has less credibility, so the result is less of these rules.

Amendment:

def generateRules(L, supportData, minConf=0.7):  #Frequent Itemset List, Support List, Minimum ReliabilityBigRuleList = []# Rule List Containing TrustworthinessFor I in range (1, len (L): # because it is not possible for a single element onlyConstructing rules of frequent itemsets. Intelligence begins with L [1]. L [1] is a list of frequent itemsets with two elements.For freqSet in L [i]: # traverse all frequent itemsets of the current L [i]H1 = [frozenset ([item]) for item in freqSet] # Gets a list of elements from frequent item setsIf (i & gt; 1): # I & gt; 1:00Frequent itemsets have at least three elementsH1 = calcConf (freqSet, H1, support data, bigRuleList, minConf) # calculates credibility,Candidate rules with low filtering credibilityRulesFromConseq (freqSet, H1, support data, bigRuleList, minConf) # For H1Further merge into a list of frequent itemsets with two elementsOther: # Frequent Item Set has two elements, then calculates credibilityCalcConf (freqSet, H1, supp)OrtData, bigRule List, minConf)Return bigRule List

Test output:

[(frozenset({1}), frozenset({3}), 1.0), (frozenset({5}), frozenset({2}), 1.0), (frozenset({2}), frozenset({5}), 1.0), (frozenset({3, 5}), frozenset({2}), 1.0), (frozenset({2, 3}), frozenset({5}), 1.0)]

 if…else…For redundancy, amend it as follows:

def generateRules(L, supportData, minConf=0.7):  #Frequent Itemset List, Support List, Minimum ReliabilityBigRuleList = []# Rule List Containing TrustworthinessFor I in range (1, len (L): # because it is not possible for a single element onlyConstructing rules for frequent itemsets can only start with L [1], which is a list of frequent itemsets containing two elements.For freqSet in L [i]: # traverse all frequent itemsets of the current L [i]H1 = [frozenset ([item]) for item in freqSet] # Gets a list of elements from frequent item setsRules FromConseq (freqSet, H1)SupportData, bigRuleList, minConf)  Further merge H1 into a list of frequent itemsets with two elementsReturn bigRule ListDef rulesFroMConseq (freqSet, H, supportData, brl, minConf = 0.7): # Frequent itemsets, which can appear in the list of elements H on the right side of the ruleM = len (H [0]) #M = 1 on a callIf len (freqSet) & gt; m: \ If the number of elements in a frequent itemset is greater than the length of H, such as freqSet= [2,3,5], H= [2,3,5], 3> 1, then the element of H is put.Prime fusion into [[2,3], [2,5], [3,5]]# print ('freqSet:', freqSet)# print ('Hmp1:', Hmp1)HmP1 = calcConf (freqSet, H, supportData, brl, minConf)If len (Hmp1) & gt; 1: \ If more than one rule meets the requirement, makeUse Hmp1 iteration to call rulesFromConseq to determine whether these rules can be further combinedHmp1 = aprioriGen (Hmp1, m + 1)RulesFromConseq (freqSet, Hmp1, supportData, brl, minConf) # Recursion

 Further modifying the recursive function in rulesFromConseq is as follows:

def rulesFromConseq(freqSet, H, supportData, brl, minConf=0.7):
    m = len(H[0])
    while (len(freqSet) > m): # Judging length & gt; m, then the reliability of H can be obtained.H = calcConf (freqSet, H, supportData, brl, minConf)If (len (H)(gt; 1): \ Determine whether there are items whose reliability is greater than the threshold after the credibility has been calculated to generate the next level of H uuuuuuuuuuuH = aprioriGen (H, m + 1)M + = 1Other: # Candidate association rules cannot continue to be generated at the next level and exit the loop ahead of timeBreak

 

Leave a Reply

Your email address will not be published. Required fields are marked *