Article From:



Why data mining is needed:

  Data Rich Information Poor

Some data mining information:

  The best way to search for articles is to use Google

  WEKAIt is an open source, free data mining software developed by New Zealand, user-friendly and visual.

  UCISome data sets needed for data mining

  MATLABA variety of software packages

  KDnuggetsLarge data mining sites, some information


Several definitions:


Big data(More data, faster production, more and more data types in all aspects (large to traditional methods can not be stored).

Big data and data analysis, data mining, these applications:

Public Security(By visualizing and visualizing rules, such as predicting robberies’ robbery position, stopping robbers before robbery, reducing crime rate.

Health Care Application(Personalized MadicineThrough the analysis of DNA, people with the same disease will be treated with different drugs.

Location Data:Urban Planning(City Planning), Mobile User (parents know where the child is), Shopper (through shopping cart RIFD RFID to get shoppers’ trajectory, residence time)

Retail Data:Targeted MarketingTarget customers (analysis preferences), Sentiment Analysis (sentiment analysis, feeling after buying, identifying a happy or unhappy paragraph of the evaluation).

Social Network

Sports(Moneyball okalandA penalty kick is made into gold)

Attractiveness Mining(What is goddess? The most attractive thing is to collect all the information.


The classification problem Classification (tagging) (I previously trained to tell the cat to be a dog, trained a model, and then feed it a picture, it knew it was a cat and a dog):


Decision Tree decision tree

K-Nearest Neighbours KNN

Neural Netwoks neural network

Support Vector Machines Support vector machine


Prevent Overfitting from overfitting

Cross Validation(The data is divided into two parts: training and testing.

Confusion Matrix(Obfuscation matrix)

TP       FP

FN       TN 

Receiver Operating Characteristic(ROC)

AUCThe closer to 1, the better

Cost Sensitive Learning(Considering the weight of the right, the cost of the error is different.

Lift AnalysisThe degree of promotion (the most likely to purchase customers to analyze, to make phone inquiries, will be much better than random).

Clustering and other data mining problems

Clustering(There is no label! No prior artificial tags)

It’s not how I tell it to get together, but the distance between the group is closer to a group, and the difference between the different group is bigger.

Distance Merics:

Euclidean DistanceEuclidean distance

Manhattan Diatance

Manalanobis Distance

AlgorithmsClustering algorithm:


Saquential Leader

Affinity Propagation


Market Research

Image Segmentation

Social Network Analysis


Hierarchical clustering

Association Rule(Associate rules, buy one and buy another one.

Regression(Linear regression can eventually be a curve and prevent Overfitting from overfitting.

Seeing is Knowing(Can make a visualization)

Performance Dashboard(The data can be clearly displayed with some charts and histogram.

Some visualization software is very valuable, it will look taller and more (no need to write software on its own).

Data preprocessing (real data are ofen dirty)


Link of this Article: Data mining essays 1

Leave a Reply

Your email address will not be published. Required fields are marked *