Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data presents an applied treatment of modern methods for the analysis of categorical data, both discrete response data and frequency data. It explains how to use graphical methods for exploring data, spotting unusual features, visualizing fitted models, and presenting results.
Discrete Data Analysis with R is designed for advanced undergraduate and graduate students in the social and health sciences, epidemiology, economics, business, statistics, and biostatistics as well as researchers, methodologists, and consultants who can use the methods with their own data and analyses. Along with describing the necessary statistical theory, the authors illustrate the practical application of the techniques to a large number of substantive problems, including how to organize data, conduct an analysis, produce informative graphs, and evaluate what the graphs reveal about the data.
The first part of Discrete Data Analysis with R contains introductory material on graphical methods for discrete data, basic R skills, and methods for fitting and visualizing one-way discrete distributions. The second part focuses on simple, traditional nonparametric tests and exploratory methods for visualizing patterns of association in two-way and larger frequency tables. The final part of the text discusses model-based methods for the analysis of discrete data.
Getting Started
Introduction
- Data visualization and categorical data: Overview
- What is categorical data?
- Strategies for categorical data analysis
- Graphical methods for categorical data
Working with Categorical Data
- Working with R data: vectors, matrices, arrays, and data frames
- Forms of categorical data: case form, frequency form, and table form
- Ordered factors and reordered tables
- Generating tables: table and xtabs
- Printing tables: structable and ftable
- Subsetting data
- Collapsing tables
- Converting among frequency tables and data frames
- A complex example: TV viewing data
Fitting and Graphing Discrete Distributions
- Introduction to discrete distributions
- Characteristics of discrete distributions
- Fitting discrete distributions
- Diagnosing discrete distributions: Ord plots
- Poissonness plots and generalized distribution plots
- Fitting discrete distributions as generalized linear models
Exploratory and Hypothesis-Testing Methods
Two-Way Contingency Tables
- Introduction
- Tests of association for two-way tables
- Stratified analysis
- Fourfold display for 2 x 2 tables
- Sieve diagrams
- Association plots
- Observer agreement
- Trilinear plots
Mosaic Displays for n-Way Tables
- Introduction
- Two-way tables
- The strucplot framework
- Three-way and larger tables
- Model and plot collections
- Mosaic matrices for categorical data
- 3D mosaics
- Visualizing the structure of loglinear models
- Related visualization methods
Correspondence Analysis
- Introduction
- Simple correspondence analysis
- Multi-way tables: Stacking and other tricks
- Multiple correspondence analysis
- Biplots for contingency tables
Model-Building Methods
Logistic Regression Models
- Introduction
- The logistic regression model
- Multiple logistic regression models
- Case studies
- Influence and diagnostic plots
Models for Polytomous Responses
- Ordinal response
- Nested dichotomies
- Generalized logit model
Loglinear and Logit Models for Contingency Tables
- Introduction
- Loglinear models for frequencies
- Fitting and testing loglinear models
- Equivalent logit models
- Zero frequencies
Extending Loglinear Models
- Models for ordinal variables
- Square tables
- Three-way and higher-dimensional tables
- Multivariate responses
Generalized Linear Models for Count Data
- Components of generalized linear models
- GLMs for count data
- Models for overdispersed count data
- Models for excess zero counts
- Case studies
- Diagnostic plots for model checking
- Multivariate response GLM models
A summary and lab exercises appear at the end of each chapter.
Michael Friendly is a professor of psychology, founding chair of the Graduate Program in Quantitative Methods, and an associate coordinator with the Statistical Consulting Service at York University. He earned a PhD in psychology from Princeton University, specializing in psychometrics and cognitive psychology. In addition to his research interests in psychology, Professor Friendly has broad experience in data analysis, statistics, and computer applications. His main research areas are the development of graphical methods for categorical and multivariate data and the history of data visualization. He is an associate editor of the Journal of Computational and Graphical Statistics and Statistical Science.
David Meyer is a professor of business informatics at the University of Applied Sciences Technikum Wien. He earned a PhD in business administration from the Vienna University of Economics and Business, with an emphasis on computational economics. Dr. Meyer has published numerous papers in various computer science and statistical journals. His research interests include R, business intelligence, data mining, and operations research.
"This is an excellent book, nearly encyclopedic in its coverage. I personally find it very useful and expect that many other readers will as well. The book can certainly serve as a reference. It could also serve as a supplementary text in a course on categorical data analysis that uses R for computation or – because so much statistical detail is provided – even as the main text for a course on the topic that emphasizes graphical methods."
– John Fox, McMaster University