First,

The relationship and difference between classification and clustering are briefly described.

This paper briefly describes what is supervised learning and unsupervised learning.

Classification and Clustering: Classification is a supervised algorithm that classifies data in the presence of target classification (Naive Bayesian algorithm). Clustering is an unsupervised algorithm, which automatically aggregates data with similar features into one class (KMea) without target classification before the model is established.NS clustering algorithm).

Supervised learning and unsupervised learning: Supervised learning has given the training data set before establishing the model, and the machine trains the model according to the training data set and predicts the new data. Unsupervised learning is the analysis of data without manual labeling, and the machine classifies itself according to the similarity between the data. Similarity degreeHigh data will be grouped together.

Two.

Calculus process

The code implements naive Bayesian algorithm:

import pandas as pd import numpy as np dataDF = pd.read_excel(r'data/Clinical data of patients with heart disease. xlsx') # Data processing, for men and women (male 1 female 0), age (& lt; 70-1, 70-80, & gt; 801), # Hospitalization days (& lt; 7-1, 7-140, & gt; 141) were processed in three columns. sex = [] for s in dataDF['Gender']: if s == 'male': sex.append(1) else: sex.append(0) age = [] for a in dataDF['Age']: if a == '<70': age.append(-1) elif a == '70-80': age.append(0) else: age.append(1) days = [] for d in dataDF['Length of stay']: if d == '<7': days.append(-1) elif d == '7-14': days.append(0) else: days.append(1) # In addition, a processed DF is generated. dataDF2 = dataDF dataDF2['Gender'] = sex dataDF2['Age'] = age dataDF2['Length of stay'] = days # Turn to arrays for computing dataarr = np.array(dataDF) dataarr # Bayesian model was used to determine which diseases patients belong to: gender ='male', age & lt; 70, KILLP = 1, drinking ='yes', smoking ='yes', hospitalization days & lt; def beiyesi(sex, age, KILLP, drink, smoke, days): # initialize variable x1_y1,x2_y1,x3_y1,x4_y1,x5_y1,x6_y1 = 0,0,0,0,0,0 x1_y2,x2_y2,x3_y2,x4_y2,x5_y2,x6_y2 = 0,0,0,0,0,0 y1 = 0 y2 = 0 for line in dataarr: if line[6] == 'Myocardial infarction':# Calculate the number of symptoms under myocardial infarction y1 += 1 if line[0] == sex: x1_y1 += 1 if line[1] == age: x2_y1 += 1 if line[2] == KILLP: x3_y1 += 1 if line[3] == drink: x4_y1 += 1 if line[4] == smoke: x5_y1 += 1 if line[5] == days: x6_y1 += 1 else: # Calculate the number of symptoms under unstable angina pectoris y2 += 1 if line[0] == sex: x1_y2 += 1 if line[1] == age: x2_y2 += 1 if line[2] == KILLP: x3_y2 += 1 if line[3] == drink: x4_y2 += 1 if line[4] == smoke: x5_y2 += 1 if line[5] == days: x6_y2 += 1 # print('y1:',y1,' y2:',y2) # Computation, to X | y1, x | Y2 # print('x1_y1:',x1_y1, ' x2_y1:',x2_y1, ' x3_y1:',x3_y1, ' x4_y1:',x4_y1, ' x5_y1:',x5_y1, ' x6_y1:',x6_y1) # print('x1_y2:',x1_y2, ' x2_y2:',x2_y2, ' x3_y2:',x3_y2, ' x4_y2:',x4_y2, ' x5_y2:',x5_y2, ' x6_y2:',x6_y2) x1_y1, x2_y1, x3_y1, x4_y1, x5_y1, x6_y1 = x1_y1/y1, x2_y1/y1, x3_y1/y1, x4_y1/y1, x5_y1/y1, x6_y1/y1 x1_y2, x2_y2, x3_y2, x4_y2, x5_y2, x6_y2 = x1_y2/y2, x2_y2/y2, x3_y2/y2, x4_y2/y2, x5_y2/y2, x6_y2/y2 x_y1 = x1_y1 * x2_y1 * x3_y1 * x4_y1 * x5_y1 * x6_y1 x_y2 = x1_y2 * x2_y2 * x3_y2 * x4_y2 * x5_y2 * x6_y2 # Calculate the probability of symptoms x1,x2,x3,x4,x5,x6 = 0,0,0,0,0,0 for line in dataarr: if line[0] == sex: x1 += 1 if line[1] == age: x2 += 1 if line[2] == KILLP: x3 += 1 if line[3] == drink: x4 += 1 if line[4] == smoke: x5 += 1 if line[5] == days: x6 += 1 # print('x1:',x1, ' x2:',x2, ' x3:',x3, ' x4:',x4, ' x5:',x5, ' x6:',x6) # Calculation length = len(dataarr) x = x1/length * x2/length * x3/length * x4/length * x5/length * x6/length # print('x:',x) # Calculate the probability of myocardial infarction and unstable angina pectoris under given symptoms, respectively y1_x = (x_y1)*(y1/length)/x # print(y1_x) y2_x = (x_y2)*(y2/length)/x # Judging which disease is most likely if y1_x > y2_x: print('The patient is more likely to suffer from myocardial infarction.',y1_x) else: print('The patient is more likely to suffer from unstable angina pectoris.',y2_x) # Judgment: gender = male, age & lt; 70, KILLP = 1, drinking = yes, smoking = yes, length of stay & lt; beiyesi(1,-1,1,'yes','yes',-1)

Screenshots: