Table of Contents
Chapter 1 Measurement and Scaling 1
1.1 Introduction 1
1.2 Definition of a test 1
1.3 Measurement and scaling 2
Exercises 7
Chapter 2 Classical Test Theory 9
2.1 Introduction 9
2.2 True score and measurement error 9
2.3 The population of persons 12
Exercises 14
Chapter 3 Classical Test Theory and Reliability 15
3.1 Introduction 15
3.2 The definition of reliability and the standard error of measurement 15
3.3 The definition of parallel tests 17
3.4 Reliability and test length 19
3.5 Reliability and group homogeneity 20
3.6 Estimating the true score 21
3.7 Correction for attenuation 23
Exercises 23
Chapter 4 Estimating Reliability 25
4.1 Introduction 25
4.2 Reliability estimation from a single administration of a test 26
4.3 Reliability estimation with parallel tests 36
4.4 Reliability estimation with the test-retest method 36
4.5 Reliability and factor analysis 37
4.6 Score profiles and estimation of true scores 37
4.7 Reliability and conditional errors of measurement 42
Exercises 44
Chapter 5 Generalizability Theory 47
5.1 Introduction 47
5.2 Basic concepts of G theory 48
5.3 One-facet designs, the p x i design and the i : p design 50
5.3.1 The crossed design 50
5.3.2 The nested i : p design 54
5.4 The two-facet crossed p x i x j design 55
5.5 An example of a two-facet crossed p x i x j design: The generalizability of job performance measurements 59
5.6 The two-facet nested p x (i : j) design 60
5.7 Other two-facet designs 62
5.8 Fixed facets 64
5.9 Kinds of measurement errors 67
5.10 Conditional error variance 73
5.11 Concluding remarks 74
Exercises 75
Chapter 6 Models for Dichotomous Items 79
6.1 Introduction 79
6.2 The binomial model 80
6.2.1 The binomial model in a homogeneous item domain 82
6.2.2 The binomial model in a heterogeneous item domain 87
6.3 The generalized binomial model 88
6.4 The generalized binomial model and item response models 91
6.5 Item analysis and item selection 92
Exercises 98
Chapter 7 Validity and Validation of Tests 101
7.1 Introduction 101
7.2 Validity and its sources of evidence 103
7.3 Selection effects in validation studies 106
7.4 Validity and classification 108
7.5 Selection and classification with more than one predictor 115
7.6 Convergent and discriminant validation: A strategy for evidence-based validity 118
7.6.1 The multitrait-multimethod approach 119
7.7 Validation and IRT 121
7.8 Research validity: Validity in empirical behavioral research 122
Exercises 123
Chapter 8 Principal Component Analysis, Factor Analysis, and Structural Equation Modeling: A Very Brief Introduction 125
8.1 Introduction 125
8.2 Principal component analysis (PCA) 125
8.3 Exploratory factor analysis 127
8.4 Confirmatory factor analysis and structural equation modeling 130
Exercises 132
Chapter 9 Item Response Models 133
9.1 Introduction 133
9.2 Basic concepts 134
9.2.1 The Rasch model 135
9.2.2 Two- and three-parameter logistic models 136
9.2.3 Other IRT models 139
9.3 The multivariate normal distribution and polytomous items 143
9.4 Item-test regression and item response models 146
9.5 Estimation of item parameters 148
9.6 Joint maximum likelihood estimation for item and person parameters 150
9.7 Joint maximum likelihood estimation and the Rasch model 151
9.8 Marginal maximum likelihood estimation 153
9.9 Markov chain Monte Carlo 154
9.10 Conditional maximum likelihood estimation in the Rasch model 156
9.11 More on the estimation of item parameters 157
9.12 Maximum likelihood estimation of person parameters 160
9.13 Bayesian estimation of person parameters 162
9.14 Test and item information 162
9.15 Model-data fit 167
9.16 Appendix: Maximum likelihood estimation of θ in the Rasch model 170
Exercises 174
Chapter 10 Applications of Item Response Theory 177
10.1 Introduction 177
10.2 Item analysis and test construction 179
10.3 Test construction and test development 180
10.4 Item bias or DIF 182
10.5 Deviant answer patterns 189
10.6 Computerized adaptive testing (CAT) 191
10.7 IRT and the measurement of change 194
10.8 Concluding remarks 195
Exercises 197
Chapter 11 Test Equating 199
11.1 Introduction 199
11.2 Some basic data collection designs for equating studies 202
11.2.1 Design 1: Single-group design 202
11.2.2 Design 2: Random-groups design 203
11.2.3 Design 3: Anchor-test design 203
11.3 The equipercentile method 204
11.4 Linear equating 207
11.5 Linear equating with an anchor test 208
11.6 A synthesis of observed score equating approaches: The kernel method 212
11.7 IRT models for equating 212
11.7.1 The Rasch model 213
11.7.2 The 2PL model 214
11.7.3 The 3PL model 215
11.7.4 Other models 216
11.8 Concluding remarks 216
Exercises 219
Answers 221
References 235
Author Index 255
Subject Index 261