Why data mining is needed:
Data Rich Information Poor
Some data mining information:
The best way to search for articles is to use Google
WEKAIt is an open source, free data mining software developed by New Zealand, user-friendly and visual.
UCISome data sets needed for data mining
MATLABA variety of software packages
KDnuggetsLarge data mining sites, some information
Big data(More data, faster production, more and more data types in all aspects (large to traditional methods can not be stored).
Big data and data analysis, data mining, these applications:
Public Security（By visualizing and visualizing rules, such as predicting robberies’ robbery position, stopping robbers before robbery, reducing crime rate.
Health Care Application（Personalized MadicineThrough the analysis of DNA, people with the same disease will be treated with different drugs.
Location Data:Urban Planning(City Planning), Mobile User (parents know where the child is), Shopper (through shopping cart RIFD RFID to get shoppers’ trajectory, residence time)
Retail Data:Targeted MarketingTarget customers (analysis preferences), Sentiment Analysis (sentiment analysis, feeling after buying, identifying a happy or unhappy paragraph of the evaluation).
Sports(Moneyball okalandA penalty kick is made into gold)
Attractiveness Mining（What is goddess? The most attractive thing is to collect all the information.
The classification problem Classification (tagging) (I previously trained to tell the cat to be a dog, trained a model, and then feed it a picture, it knew it was a cat and a dog):
Decision Tree decision tree
K-Nearest Neighbours KNN
Neural Netwoks neural network
Support Vector Machines Support vector machine
Prevent Overfitting from overfitting
Cross Validation（The data is divided into two parts: training and testing.
Confusion Matrix（Obfuscation matrix)
Receiver Operating Characteristic(ROC)
AUCThe closer to 1, the better
Cost Sensitive Learning（Considering the weight of the right, the cost of the error is different.
Lift AnalysisThe degree of promotion (the most likely to purchase customers to analyze, to make phone inquiries, will be much better than random).
Clustering and other data mining problems
Clustering(There is no label! No prior artificial tags)
It’s not how I tell it to get together, but the distance between the group is closer to a group, and the difference between the different group is bigger.
Euclidean DistanceEuclidean distance
Social Network Analysis
Association Rule(Associate rules, buy one and buy another one.
Regression(Linear regression can eventually be a curve and prevent Overfitting from overfitting.
Seeing is Knowing(Can make a visualization)
Performance Dashboard(The data can be clearly displayed with some charts and histogram.
Some visualization software is very valuable, it will look taller and more (no need to write software on its own).
Data preprocessing (real data are ofen dirty)