Metodos Para Mejorar La Calidad De Un Conjunto De Datos Para Descubrir Conocimiento

Autores del Libro:


Resumen del Libro:


libro Metodos Para Mejorar La Calidad De Un Conjunto De Datos Para Descubrir Conocimiento

Today, data generation is growing exponentially in both directions, instances (rows) and features (columns). This causes that many datasets can not be analyzed without preprocessing. The large size of the dataset to be analyzed may produce serious problems to some data mining algorithms in scalability as well in performance. On the other hand the quality of the data could be inadequate for the knowledge discovery process. For this reason, it is necessary to preprocess the dataset to make it suitable for an efficient performance of the data mining algorithm, and in order to obtain accurate results from it. In this thesis, we introduced new measures to evaluate the quality of a dataset in the context of supervised classification. From these quality measures, we obtain two ways of quantifying the data complexity for a classification problem, specifically, we try to anticipate the behavior of a classification algorithm given a dataset. Our data complexity measures are compared with others already available in the literature, and they give similar performance, but with a lower computational cost. For data cleaning, we propose a new method, which is independent of the classification algorithm. The proposed method detects and eliminates the noise in each class. Our method performs with more efficiency and accuracy than other methods already available in the literature. In the context of dimensionality reduction, we propose two new methods for feature selection. These methods are compared with two well known feature selection methods, the RELIEF and the Sequential Forward Selection (SFS), and similar results are obtained but with a much lower computational costs. Furthermore, we propose a new algorithm, which improves the scalability of the algorithms for instance selection currently in use. Finally, we integrate the three processes: data cleaning, reduction of dimensionality, and instance selection, in order to generate a training set, which it will permit an efficient…


Formatos Disponibles: PDF / EPUB

Opciones de descarga:

Si deseas obtener una copia del libro puedes usar alguna de las siguientes opciones de descarga:

¿Te has leído el Libro? ¿Qué te ha parecido?