Data mining emerged in order to cope with the
challenges that traditional data analysis techniques
where facing up when dealing with large amounts
of data. Moreover, these data often have a lot of
peculiarities (e.g. missing values, noise, etc.).
More specifically, data mining is the main step in
the process of Knowledge Discovery in Databases
or KDD Process . Knowledge Discovery
in Databases is defined as the non-trivial process
of identifying valid, novel, potentially useful, and
ultimately understandable patterns in data (Fayyad
et al., 1996). However, the term “data mining”
is very often used to describe the whole KDD
Process. Although the core of the process is the
data mining step, where a data mining algorithm is
applied in order to extract the patterns from data,
the pre-processing and post-processing phases
are very important too and contribute sensibly to
the quality of the extracted knowledge. The preprocessing phase usually includes the selection
of an appropriate portion of data, the cleansing
of the selected data, as well as the transformation
of data in more appropriate representations. The post-processing phase deals with the management of the produced patterns and models and
focuses on the evaluation and interpretation of
data mining results.
Data mining, in practice, has the following two
“high-level” primary goals :
• Prediction: Involves the use of some fields (variables) in a database to predict unknown or future values of other variables of interest.
• Description: Focuses on finding human interpretable patterns describing the data.
Prediction and description are not equivalently important for every data mining application. In the context of Knowledge Discovery in Databases, description tends to be more important than prediction. In contrast, machine learning and pattern recognition applications, usually favor prediction as the primary goal. Prediction and description are achieved by using various data mining tasks. Depending on the nature of the data and the desired knowledge there is a large variety of algorithms for each task.
0 Comments