Data mining a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems. The overall goal of the data mining process is to extract knowledge from a data set in a human-understandable structure] and besides the raw analysis step involves database and data management aspects,data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of found structure, visualization and online updating.
The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining).
Computer science conferences on data mining include:
- CIKM – ACM Conference on Information and Knowledge Management
- DMIN – International Conference on Data Mining
- DMKD – Research Issues on Data Mining and Knowledge Discovery
- ECDM – European Conference on Data Mining
- ECML-PKDD – European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
- EDM – International Conference on Educational Data Mining
- ICDM – IEEE International Conference on Data Mining
- KDD – ACM SIGKDD Conference on Knowledge Discovery and Data Mining
- MLDM – Machine Learning and Data Mining in Pattern Recognition
- PAKDD – The annual Pacific-Asia Conference on Knowledge Discovery and Data Mining
- PAW – Predictive Analytics World
- SDM – SIAM International Conference on Data Mining (SIAM)
- SSTD – Symposium on Spatial and Temporal Databases
The knowledge discovery in databases (KDD) process is commonly defined with the stages (1) Selection (2) Preprocessing (3) Transformation (4) Data Mining (5) Interpretation/Evaluation.
Data mining involves six common classes of tasks
- Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors and require further investigation.
- Association rule learning (Dependency modeling) – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
- Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
- Classification – is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam.
- Regression – Attempts to find a function which models the data with the least error.
- Summarization – providing a more compact representation of the data set, including visualization and report generation.
- MUSIC DATA MINING
- SCIENCE AND ENINEERING
- SPATIAL DATA MINING
- VISUAL DATA MINING
- HUMAN RIGHTS
Data mining can be misused, and can unintentionally produce results which appear significant but do not actually predict future behavior and cannot be reproduced on a new sample of data.