As the amount of information recorded and stored electronically grows ever larger, it becomes increasingly useful, if not essential, to develop better and more efficient ways to summarize and extract information from these large, multivariate data sets. The field of classification does just that-investigates sets of "objects" to see if they can be summarized into a small number of classes comprising similar objects.
Researchers have made great strides in the field over the last twenty years, and classification is no longer perceived as being concerned solely with exploratory analyses. The second edition of Classification incorporates many of the new and powerful methodologies developed since its first edition. Like its predecessor, this edition describes both clustering and graphical methods of representing data, and offers advice on how to decide which methods of analysis best apply to a particular data set. It goes even further, however, by providing critical overviews of recent developments not widely known, including efficient clustering algorithms, cluster validation, consensus classifications, and the classification of symbolic data.
The author has taken an approach accessible to researchers in the wide variety of disciplines that can benefit from classification analysis and methods. He illustrates the methodologies by applying them to data sets-smaller sets given in the text, larger ones available through a Web site.
Large multivariate data sets can be difficult to comprehend-the sheer volume and complexity can prove overwhelming. Classification methods provide efficient, accurate ways to make them less unwieldy and extract more information. Classification, Second Edition offers the ideal vehicle for gaining the background and learning the methodologies-and begin putting these techniques to use.
Introduction
Classification, Assignment, and Dissection
Aims of Classification
Stages in a Numerical Classification
Data Sets
Measures of Similarity and Dissimilarity
Introduction
Selected Measures of Similarity and Dissimilarity
Some Difficulties
Construction of Relevant Measures
Partitions
Partitioning Criteria
Iterative Relocation Algorithms
Mathematical Programming
Other Partitioning Algorithms
How Many Clusters?
Links with Statistical Models
Hierarchical Classifications
Definitions and Representations
Algorithms
Choice of Clustering Strategy
Consensus Trees
More General Tree Models
Other Clustering Procedures
Fuzzy Clustering
Constrained Classification
Overlapping Classification
Conceptual Clustering
Classification of Symbolic Data
Partitions of Partitions
Graphical Representations
Introduction
Principal Coordinates Analysis
Non-Metric Multidimensional Scaling
Interactive Graphics and Self-Organizing Maps
Biplots
Cluster Validation and Description
Introduction
Cluster Validation
Cluster Description
References
Author Index
Subject Index
"This book provides an excellent and comprehensive overview of the classification literature [...] contains a considerable amount of new material [...] all chapters of this book are a pleasure to read [...] While the Preface states that the book's material have been used by honours students, in reality it can be used by statisticians and nonstatisticians alike. Indeed, anyone who has to deal with large multivariate sets of data will benefit [...] In all, this volume is a valuable and welcome addition to the literature."
- Short Book Reviews of the ISI, April 2000
"A statistician embarking on a classification analysis of a set of data is presented with a bewildering array of choices [...] Having read this book, he or she will have an even wider methodological palette from which to choose. However, they should be better able to make informed choices."
- The Statistician, Volume 49, Part 3 (2000)
"One of the attractive features of the book is that it illustrates the ideas by application to a variety of real data sets."
- Biometrics, Vol. 56, No. 3, September 2000
" [...] this book presents both a solid basis for understanding its content and an adequate synthesis of the current state of the discipline."
- IEEE Engineering in Medicine and Biology, Auly/August/2002