Müller J.-A., Lemke F. Self-organising Data Mining. An Intelligent Approach To Extract Knowledge From Data

Файл формата djvu
размером 3,68 МБ

Добавлен пользователем Anatolysh 16.03.2022 15:27
Описание отредактировано 16.03.2022 16:37

Müller J.-A., Lemke F. Self-organising Data Mining. An Intelligent Approach To Extract Knowledge From Data

Books on Demand GmbH, 2000 — 252 p. — ISBN 9783898118613,3898118614

Today, there is an increased need to extract information for decision making from a large collection of data. This transformation of data into knowledge is an interactive and iterative process of various subtasks and decisions, and is called Knowledge Discovery from Data. The central part of Knowledge Discovery is Data Mining.
Most important for a more sophisticated data mining is to try to limit the user involvement in the entire data mining process to the inclusion of well-known a priori knowledge while making this process more automated and more objective. Soft computing, i.e., Fuzzy Modelling, Neural Networks, Genetic Algorithms and other methods of automatic model generation, is a way to mine data by generating mathematical models from empirical data more or less automatically. In the past years there has been much publicity about the ability of Artificial Neural Networks to learn and to generalize despite important problems with design, development and application of Neural Networks:
Neural Networks have no explanatory power by default to describe why results are as they are. This means that the knowledge (models) extracted by Neural Networks is still hidden and distributed over the network.
There is no systematical approach for designing and developing Neural Networks. It is a trialand-error process.
Training of Neural Networks is a kind of statistical estimation often using algorithms that are slower and less effective than algorithms used in statistical software.
If noise is considerable in a data sample, the generated models systematically tend to being overfitted.
In contrast to Neural Networks that use Genetic Algorithms as an external procedure to optimize the network architecture and several pruning techniques to counteract overtraining, the new approach described in this book introduces principles of evolution - inheritance, mutation and selection - for generating a network structure systematically enabling automatic model structure synthesis and model validation. Models are generated from the data in the form of networks of active neurons in an evolutionary fashion of repetitive generation of populations of competing models of growing complexity and their validation and selection until an optimal complex model - not too simple and not too complex - has been created. That is, growing a treelike network out of seed information (input and output variables data) in an evolutionary fashion of pairwise combination and survival-of-the-fittest selection from a simple single individual (neuron) to a desired final, not overspecialized behavior (model). Neither, the number of neurons and the number of layers in the network, nor the actual behavior of each created neuron is predefined. All this is adjusted during the process of self-organization, and therefore, is called self-organizing data mining.

Knowledge Discovery from Data
Models and their application in decision making
Relevance and value of forecasts
Theory driven approach
Data driven approach
Data mining

Self-organizing Data Mining
Involvement of users in the data mining process
Automatic model generation
Regression based models
Rule based modelling
Symbolic modelling
Nonparametric models
Self-organizing data mining

Self-organizing Modelling Technologies
Statistical Learning Networks
Inductive approach - The GMDH algorithm
Induction
Principles
Model of optimal complexity

Parametric GMDH Algorithms
Elementary models (neurons)
Generation of alternate model variants
Nets of active neurons
Criteria of model selection
Validation

Nonparametric Algorithms
Objective Cluster Analysis
Analog Complexing
Self-organizing Fuzzy Rule Induction
Logic based rules

Application of Self-organizing Data Mining
Spectrum of self-organizing data mining methods
Choice of appropriate modelling methods
Application fields
Synthesis
Software tools

KnowledgeMiner
General features
GMDH implementation
Elementary models and active neurons
Generation of alternate model variants
Criteria of model selection
Systems of equations
Analog Complexing implementation

Features

Example
Fuzzy Rule Induction implementation
Fuzzification
Rule induction
Defuzzification
Example
Using models
The model base
Finance module
Sample Applications
From Economics
National economy
Stock prediction
Balance sheet
Sales prediction
Solvency checking
Energy consumption
From Ecology
Water pollution
Water quality
From other Fields
Heart disease
U.S. congressional voting behavior