Chichester: John Wiley & Sons Ltd., 2007. — 325 p. — ISBN: 0470090162, 9780470090169
With the advent of computers, very large datasets have become routine. Standard statistical methods don't have the power or flexibility to analyse these efficiently, and extract the required knowledge. An alternative approach is to summarize a large dataset in such a way that the resulting summary dataset is of a manageable size and yet retains as much of the knowledge in the original dataset as possible. One consequence of this is that the data may no longer be formatted as single values, but be represented by lists, intervals, distributions, etc. The summarized data have their own internal structure, which must be taken into account in any analysis. This text presents a unified account of symbolic data, how they arise, and how they are structured. The reader is introduced to symbolic analytic methods described in the consistent statistical framework required to carry out such a summary and subsequent analysis.
Introduction
Symbolic DataSymbolic and Classical Data
Categories, Concepts, and Symbolic Objects
Comparison of Symbolic and Classical Analyses
Basic Descriptive Statistics: One VariateSome Preliminaries
Multi-Valued Variables
Interval-Valued Variables
Modal Multi-Valued Variables
Modal Interval-Valued Variables
Descriptive Statistics: Two or More VariatesMulti-Valued Variables
Interval-Valued Variables
Modal Multi-Valued Variables
Modal Interval-Valued Variables
Baseball Interval-Valued Dataset
Measures of Dependence
Principal Component AnalysisVertices Method
Centers Method
Comparison of the Methods
Regression AnalysisClassical Multiple Regression Model
Multi-Valued Variables
Interval-Valued Variables
Histogram-Valued Variables
Taxonomy Variables
Hierarchical Variables
Cluster AnalysisDissimilarity and Distance Measures
Clustering Structures
Partitions
Hierarchy–Divisive Clustering
Hierarchy–Pyramid Clusters