Article From:https://www.cnblogs.com/traces2018/p/9968389.html

First,

The relationship and difference between classification and clustering are briefly described.

This paper briefly describes what is supervised learning and unsupervised learning.

Classification and Clustering: Classification is a supervised algorithm that classifies data in the presence of target classification (Naive Bayesian algorithm). Clustering is an unsupervised algorithm, which automatically aggregates data with similar features into one class (KMea) without target classification before the model is established.NS clustering algorithm).

Supervised learning and unsupervised learning: Supervised learning has given the training data set before establishing the model, and the machine trains the model according to the training data set and predicts the new data. Unsupervised learning is the analysis of data without manual labeling, and the machine classifies itself according to the similarity between the data. Similarity degreeHigh data will be grouped together.

 

Two.

Calculus process

 

The code implements naive Bayesian algorithm:

import pandas as pd
import numpy as np

dataDF = pd.read_excel(r'data/Clinical data of patients with heart disease. xlsx')

# Data processing, for men and women (male 1 female 0), age (& lt; 70-1, 70-80, & gt; 801),
# Hospitalization days (& lt; 7-1, 7-140, & gt; 141) were processed in three columns.
sex = []
for s in dataDF['Gender']:
    if s == 'male':
        sex.append(1)
    else:
        sex.append(0)

age = []
for a in dataDF['Age']:
    if a == '<70':
        age.append(-1)
    elif a == '70-80':
        age.append(0)
    else:
        age.append(1)

days = []
for d in dataDF['Length of stay']:
    if d == '<7':
        days.append(-1)
    elif d == '7-14':
        days.append(0)
    else:
        days.append(1)

# In addition, a processed DF is generated.
dataDF2 = dataDF
dataDF2['Gender'] = sex
dataDF2['Age'] = age
dataDF2['Length of stay'] = days

# Turn to arrays for computing
dataarr = np.array(dataDF)
dataarr

# Bayesian model was used to determine which diseases patients belong to: gender ='male', age & lt; 70, KILLP = 1, drinking ='yes', smoking ='yes', hospitalization days & lt;
def beiyesi(sex, age, KILLP, drink, smoke, days):
    # initialize variable
    x1_y1,x2_y1,x3_y1,x4_y1,x5_y1,x6_y1 = 0,0,0,0,0,0
    x1_y2,x2_y2,x3_y2,x4_y2,x5_y2,x6_y2 = 0,0,0,0,0,0
    y1 = 0
    y2 = 0
    
    for line in dataarr:
        if line[6] == 'Myocardial infarction':# Calculate the number of symptoms under myocardial infarction
            y1 += 1
            if line[0] == sex:
                x1_y1 += 1
            if line[1] == age:
                x2_y1 += 1
            if line[2] == KILLP:
                x3_y1 += 1
            if line[3] == drink:
                x4_y1 += 1
            if line[4] == smoke:
                x5_y1 += 1
            if line[5] == days:
                x6_y1 += 1
        else: # Calculate the number of symptoms under unstable angina pectoris
            y2 += 1
            if line[0] == sex:
                x1_y2 += 1
            if line[1] == age:
                x2_y2 += 1
            if line[2] == KILLP:
                x3_y2 += 1
            if line[3] == drink:
                x4_y2 += 1
            if line[4] == smoke:
                x5_y2 += 1
            if line[5] == days:
                x6_y2 += 1
    # print('y1:',y1,' y2:',y2)
            
            
    # Computation, to X | y1, x | Y2
    # print('x1_y1:',x1_y1, ' x2_y1:',x2_y1, ' x3_y1:',x3_y1, ' x4_y1:',x4_y1, ' x5_y1:',x5_y1, ' x6_y1:',x6_y1)
    # print('x1_y2:',x1_y2, ' x2_y2:',x2_y2, ' x3_y2:',x3_y2, ' x4_y2:',x4_y2, ' x5_y2:',x5_y2, ' x6_y2:',x6_y2)
    x1_y1, x2_y1, x3_y1, x4_y1, x5_y1, x6_y1 = x1_y1/y1, x2_y1/y1, x3_y1/y1, x4_y1/y1, x5_y1/y1, x6_y1/y1
    x1_y2, x2_y2, x3_y2, x4_y2, x5_y2, x6_y2 = x1_y2/y2, x2_y2/y2, x3_y2/y2, x4_y2/y2, x5_y2/y2, x6_y2/y2
    x_y1 = x1_y1 * x2_y1 * x3_y1 * x4_y1 * x5_y1 * x6_y1
    x_y2 = x1_y2 *  x2_y2 * x3_y2 * x4_y2 * x5_y2 * x6_y2

        
    # Calculate the probability of symptoms
    x1,x2,x3,x4,x5,x6 = 0,0,0,0,0,0
    for line in dataarr:
        if line[0] == sex:
            x1 += 1
        if line[1] == age:
            x2 += 1
        if line[2] == KILLP:
            x3 += 1
        if line[3] == drink:
            x4 += 1
        if line[4] == smoke:
            x5 += 1
        if line[5] == days:
            x6 += 1
    # print('x1:',x1, ' x2:',x2, ' x3:',x3, ' x4:',x4, ' x5:',x5, ' x6:',x6)
    # Calculation
    length = len(dataarr)
    x = x1/length * x2/length * x3/length * x4/length * x5/length * x6/length
    # print('x:',x)
    
    # Calculate the probability of myocardial infarction and unstable angina pectoris under given symptoms, respectively
    y1_x = (x_y1)*(y1/length)/x
    # print(y1_x)
    y2_x = (x_y2)*(y2/length)/x
    
    # Judging which disease is most likely
    if y1_x > y2_x:
        print('The patient is more likely to suffer from myocardial infarction.',y1_x)
    else:
        print('The patient is more likely to suffer from unstable angina pectoris.',y2_x)

# Judgment: gender = male, age & lt; 70, KILLP = 1, drinking = yes, smoking = yes, length of stay & lt;
beiyesi(1,-1,1,'yes','yes',-1)

 

Screenshots:

 

Link of this Article: The tenth assignment

Leave a Reply

Your email address will not be published. Required fields are marked *