Regression Analysis and Linear Models: Concepts, Applications, and Implementation

Regression Analysis and Linear Models: Concepts, Applications, and Implementation

by Richard B. Darlington PhD, Andrew F. Hayes PhD

NOOK Book(eBook)

$76.49 $87.00 Save 12% Current price is $76.49, Original price is $87. You Save 12%.

Available on Compatible NOOK Devices and the free NOOK Apps.
WANT A NOOK?  Explore Now


Emphasizing conceptual understanding over mathematics, this user-friendly text introduces linear regression analysis to students and researchers across the social, behavioral, consumer, and health sciences. Coverage includes model construction and estimation, quantification and measurement of multivariate and partial associations, statistical control, group comparisons, moderation analysis, mediation and path analysis, and regression diagnostics, among other important topics. Engaging worked-through examples demonstrate each technique, accompanied by helpful advice and cautions. The use of SPSS, SAS, and STATA is emphasized, with an appendix on regression analysis using R. The companion website ( provides datasets for the book's examples as well as the RLM macro for SPSS and SAS.

Pedagogical Features:
*Chapters include SPSS, SAS, or STATA code pertinent to the analyses described, with each distinctively formatted for easy identification.
*An appendix documents the RLM macro, which facilitates computations for estimating and probing interactions, dominance analysis, heteroscedasticity-consistent standard errors, and linear spline regression, among other analyses.
*Students are guided to practice what they learn in each chapter using datasets provided online.
*Addresses topics not usually covered, such as ways to measure a variable’s importance, coding systems for representing categorical variables, causation, and myths about testing interaction.

Related collections and offers

Product Details

ISBN-13: 9781462527984
Publisher: Guilford Publications, Inc.
Publication date: 08/22/2016
Series: Methodology in the Social Sciences
Sold by: Barnes & Noble
Format: NOOK Book
Pages: 661
File size: 3 MB

About the Author

Richard B. Darlington, PhD, is Emeritus Professor of Psychology at Cornell University. He is a Fellow of the American Association for the Advancement of Science and has published extensively on regression and related methods, the cultural bias of mental tests, the long-term effects of preschool programs, and, most recently, the neuroscience of brain development and evolution.

Andrew F. Hayes, PhD, is Distinguished Research Professor in the Haskayne School of Business at the University of Calgary, Alberta, Canada. His research and writing on data analysis has been published widely. Dr. Hayes is the author of Introduction to Mediation, Moderation, and Conditional Process Analysis and Statistical Methods for Communication Science, as well as coauthor, with Richard B. Darlington, of Regression Analysis and Linear Models. He teaches data analysis, primarily at the graduate level, and frequently conducts workshops on statistical analysis throughout the world. His website is

Read an Excerpt


Statistical Control and Linear Models

Researchers routinely ask questions about the relationship between an independent variable and a dependent variable in a research study. In experimental studies, relationships observed between a manipulated independent variable and a measured dependent variable are fairly easy to interpret. But in many studies, experimental control in the form of random assignment is not possible. Absent experimental or some form of procedural control, relationships between variables can be difficult to interpret but can be made more interpretable through statistical control. After discussing the need for statistical control, this chapter overviews the linear model — widely used throughout the social sciences, health and medical fields, business and marketing, and countless other disciplines. Linear modeling has many uses, among them being a means of implementing statistical control.

1.1 Statistical Control

1.1.1 The Need for Control

If you have ever described a piece of research to a friend, it was probably not very long before you were asked a question like "But did the researchers account for this?" If the research found a difference between the average salaries of men and women in a particular industry, did it account for differences in years of employment? If the research found differences among several ethnic groups in attitudes toward social welfare spending, did it account for income differences among the groups? If the research found that males who hold relatively higher-status jobs are seen as less physically attractive by females than are males in lower-status jobs, did it account for age differences among men who differ in status?

All these studies concern the relationship between an independent variable and a dependent variable. The study on salary differences concerns the relationship between the independent variable of sex and the dependent variable of salary. The study on welfare spending concerns the relationship between the independent variable of ethnicity and the dependent variable of attitude. The study on perceived male attractiveness concerns the relationship between the independent variable of status and the dependent variable of perceived attractiveness. In each case, there is a need to account for, in some way, a third variable; this third variable is called a covariate. The covariates for the three studies are, respectively, years of employment, income, and age.

Suppose you wanted to study these three relationships without worrying about covariates. You may be familiar with three very different statistical methods for analyzing these three problems. You may have studied the t-test for testing questions like the sex difference in salaries, analysis of variance (also known as "ANOVA") for questions like the difference in average attitude among several ethnic groups, and the Pearson or rank-order correlation for questions like the relationship between status and perceived attractiveness. These three methods are all similar in that they can all be used to test the relationship between an independent variable and a de- pendent variable; they differ primarily in the type of independent variable used. For sex differences in salary you could use the t-test because the in- dependent variable — sex — is dichotomous; there are two categories — male and female. In the example on welfare spending, you could use analysis of variance because the independent variable of ethnicity is multicategorical, since there are several categories rather than just two — the various ethnic groups in the study. You could use a correlation coefficient for the example about perceived attractiveness because status is numerical — a more or less continuous dimension from high status to low status. But for our purposes, the differences among these three variable types are relatively minor. You should begin thinking of problems like these as basically similar, as this book presents the linear model as a single method that can be applied to all of these problems and many others with fairly minor variations in the method.

1.1.2 Five Methods of Control

The layperson's notion of "accounting for" something in a study is a colloquial expression for what scientists refer to as controlling for that something. Suppose you want to know whether driver training courses help students pass driving tests. One problem is that the students who take a driver training course may differ in some way before taking the course from those who do not take the course. If that thing they differ on is related to test performance, then any differences in test performance may be due to that thing rather than the training course itself. This needs to be accounted for or "controlled" in some fashion in order to determine whether the course helps students pass the test. Or perhaps in a particular town, some testers may be easier than others. The driving schools may know which testers are easiest and encourage their students to take their tests when they know those testers are on duty. So the standards being used to evaluate a student driver during the test may be systematically different for students who take the driver training course relative to those who do not. This also needs to be controlled in some fashion.

You might control the problem caused by preexisting difference between those who do and do not take the course by using a list of applicants for driving courses, randomly choosing which of the applicants is allowed to take the course, and using the rejected applicants as the control group. That way you know that students are likely to be equal on all things that might be related to performance on the test before the course begins. This is random assignment on the independent variable. Or, if you find that more women take the course than men, you might construct a sample that is half female and half male for both the trained and untrained groups by discarding some of the women in the available data. This is control by exclusion of cases.

You might control the problem of differential testing standards by training testers to make them apply uniform evaluation standards; that would be manipulation of covariates. Or you might control that problem by randomly altering the schedule different testers work, so that nobody would know which testers are on duty at a particular moment. That would not be random assignment on the independent variable, since you have not determined which applicants take the course; rather, it would be other types of randomization. This includes randomly assigning which of two or more forms of the dependent variable you use, choosing stimuli from a population of stimuli (e.g., in a psycholinguistics study, all common English adjectives), and manipulating the order of presentation of stimuli.

All these methods except exclusion of cases are types of experimental control since they all require you to manipulate the situation in some way rather than merely observe it. But these methods are often impractical or impossible. For instance, you might not be allowed to decide which students take the driving course or to train testers or alter their schedules. Or, if a covariate is worker seniority, as in one of our earlier examples, you cannot manipulate the covariate by telling workers how long to keep their jobs. In the same example, the independent variable is sex, and you cannot randomly decide that a particular worker will be male or female the way you can decide whether the worker will be in the experimental or control condition of an experiment. Even when experimental control is possible, the very exertion of control often intrudes the investigator into the situation in a way that disturbs participants or alters results; ethologists and anthropologists are especially sensitive to such issues. Experimental control may be difficult even in laboratory studies on animals. Researchers may not be able to control how long a rat looks at a stimulus, but they are able to measure looking time.

Control by exclusion of cases avoids these difficulties, because you are manipulating data rather than participants. But this method lowers sample size, and thus lowers the precision of estimates and the power of hypothesis tests.

A fifth method of controlling covariates — statistical control — is one of the main topics of this book. It avoids the disadvantages of the previous four methods. No manipulation of participants or conditions is required, and no data are excluded. Several terms mean the same thing: to control a covariate statistically means the same as to adjust for it or to correct for it, or to hold constant or to partial out the covariate.

Statistical control has limitations. Scientists may disagree on what variables need to be controlled — an investigator who has controlled age, income, and ethnicity may be criticized for failing to control education and family size. And because covariates must be measured to be controlled, they will be controlled inaccurately if they are measured inaccurately. We return to these and other problems in Chapters 6 and 17. But because control of some covariates is almost always needed, and because the other four methods of control are so limited, statistical control is widely recognized as one of the most important statistical tools in the empiricist's toolbox.

1.1.3 Examples of Statistical Control

The nature of statistical control can be illustrated by a simple fictitious example, though the precise methods used in this example are not those we emphasize later. In Holly City, 130 children attended a city-subsidized preschool program and 130 others did not. Later, all 260 children took a "school readiness test" on entering first grade. Of the 130 preschool children, only 60 scored above the median on the test; of the other 130 children, 70 scored above the median. In other words, the preschool children scored worse on the test than the others. These results are shown in the "Total" section of Table 1.1; A and B refer to scoring above and below the test median, respectively.

But when the children are divided into "middle-class" and "working-class," the results are as shown on the left and center of Table 1.1. We see that of the 40 middle-class children attending preschool, 30, or 75%, scored above the median. There were 90 middle-class children not attending preschool, and 60, or 67%, of them scored above the median. These values of 75 and 67% are shown on the left in Table 1.2. Similar calculations based on the working-class and total tables yield the other figures in Table 1.2. This table shows clearly that within each level of socioeconomic status (SES), the preschool children outperform the other children, even though they appear to do worse when you ignore socioeconomic status (SES). We have held constant or controlled or partialed out the covariate of SES.

When we perform a similar analysis for nearby Ivy City, we find the results in Table 1.3. When we inspect the total percentages, preschool appears to have a positive effect. But when we look within each SES group, no effect is found. Thus, the "total" tables overstate the effect of preschool in Ivy City and understate it in Holly City. In these examples the independent variable is preschool attendance and the dependent variable is test score. In Holly City, we found a negative simple relationship between these two variables (those attending preschool scored lower on the test) but a positive partial relationship (a term more formally defined later) when SES was controlled. In Ivy City, we found a positive simple relationship but no partial relationship.

By examining the data more carefully, we can see what caused these paradoxical results, known as Simpson's paradox (for a discussion of this and related phenomena, see Tu, Gunnel, & Gilthorpe, 2008). In Holly City, the 130 children attending preschool included 90 working-class children and 40 middle-class children, so 69% of the preschool attenders were working-class. But the 130 nonpreschool children included 90 middle-class children and 40 working-class children, so this group was only 31% working-class. Thus, the test scores of the preschool group were lowered by the disproportionate number of working-class children in that group. This might have occurred if city-subsidized preschool programs had been established primarily in poorer neighborhoods. But in Ivy City this difference was in the opposite direction: The preschool group was 75% middle-class, while the nonpreschool group was only 25% middle-class; thus, the test scores of the preschool group were raised by the disproportionate number of middle-class children. This might have occurred if parents had to pay for their children to attend preschool. In both cities the effects of preschool were seen more clearly by controlling for or holding constant SES.

All three variables in this example were dichotomous — they had just two levels each. The independent variable of preschool attendance had two levels we called "preschool" and "other." The dependent variable of test score was dichotomized into those above and below the median. The covariate of SES was also dichotomized. Such dichotomization is rarely if ever something you would want do in practice (as discussed later in section 5.1.6). Fortunately, with the methods described in this book, such categorization is not necessary. Any or all of the variables in this problem could have been numerically scaled. Test scores might have ranged from 0 to 100, and SES might have been measured on a scale with very many points on a continuum. Even preschool attendance might have been numerical, such as if we measured the exact number of days each child had attended preschool. Changing some or all variables from dichotomous to numerical would change the details of the analysis, but in its underlying logic the problem would remain the same.

Consider now a problem in which the dependent variable is numerical. At Swamp College, the dean calculated that among professors and other instructional staff under 30 years of age, the average salary among males was $81,000 and the average salary among females was only $69,000. To see whether this difference might be attributed to different proportions of men and women who have completed the Ph.D., the dean made up the table given here as Table 1.4.

If the dean had hoped that different rates of completion of the Ph.D. would explain the $12,000 difference between men and women in average salary, that hope was frustrated. We see that men had completed the Ph.D. less often than women: 10 of 40 men, versus 15 of 30 women. The first column of the table shows that among instructors with a Ph.D., the mean difference in salaries between men and women is $15,000. The second column shows the same difference of $15,000 among instructors with no Ph.D. Therefore, in this artificial example, controlling for completion of the Ph.D. does not lower the difference between the mean salaries of men and women, but rather raises it from $12,000 to $15,000.

This example differs from the preschool example in its mechanical details; we are dealing with means rather than frequencies and proportions. But the underlying logic is the same. In the present case, the independent variable is sex, the dependent variable is salary, and the covariate is educational level. Again, the partial relationship differs from the simple relationship, though this time both the simple and partial relationships have the same sign, meaning that men make more than women, with or without controlling for education.

1.2 An Overview of Linear Models

The examples presented in section 1.1.3 are so simple that you may be wondering why a whole book is needed to discuss statistical control. But when the covariate is numerical, it may be that no two participants in a study have the same measurement on the covariate and so we cannot construct tables like those in the two earlier examples. And we may want to control many covariates at once; the dean might want to simultaneously control teaching ratings and other covariates as well as completion of the Ph.D. Also, we need methods for inference about partial relationships such as hypothesis testing procedures and confidence intervals. Linear modeling, the topic of this book, offers a means of accomplishing all of these things and many others.


Excerpted from "Regression Analysis and Linear Models"
by .
Copyright © 2017 The Guilford Press.
Excerpted by permission of The Guilford Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Table of Contents

List of Symbols and Abbreviations 
1. Statistical Control and Linear Models
1.1 Statistical Control
1.1.1 The Need for Control
1.1.2 Five Methods of Control
1.1.3 Examples of Statistical Control
1.2 An Overview of Linear Models
1.2.1 What You Should Know Already
1.2.2 Statistical Software for Linear Modeling and Statistical Control
1.2.3 About Formulas
1.2.4 On Symbolic Representations
1.3 Chapter Summary
2. The Simple Regression Model
2.1 Scatterplots and Conditional Distributions
2.1.1 Scatterplots
2.1.2 A Line through Conditional Means
2.1.3 Errors of Estimate
2.2 The Simple Regression Model
2.2.1 The Regression Line
2.2.2 Variance, Covariance, and Correlation
2.2.3 Finding the Regression Line
2.2.4 Example Computations
2.2.5 Linear Regression Analysis by Computer
2.3 The Regression Coefficient versus the Correlation Coefficient
2.3.1 Properties of the Regression and Correlation Coefficients
2.3.2 Uses of the Regression and Correlation Coefficients
2.4 Residuals
2.4.1 The Three Components of Y
2.4.2 Algebraic Properties of Residuals
2.4.3 Residuals as Y Adjusted for Differences in X
2.4.4 Residual Analysis
2.5 Chapter Summary
3. Partial Relationship and the Multiple Regression Model
3.1 Regression Analysis with More Than One Predictor Variable
3.1.1 An Example
3.1.2 Regressors
3.1.3 Models
3.1.4 Representing a Model Geometrically
3.1.5 Model Errors
3.1.6 An Alternative View of the Model
3.2 The Best-Fitting Model
3.2.1 Model Estimation with Computer Software
3.2.2 Partial Regression Coefficients
3.2.3 The Regression Constant
3.2.4 Problems with Three or More Regressors
3.2.5 The Multiple Correlation R
3.3 Scale-Free Measures of Partial Association
3.3.1 Semipartial Correlation
3.3.2 Partial Correlation
3.3.3 The Standardized Regression Coefficient
3.4 Some Relations among Statistics
3.4.1 Relations among Simple, Multiple, Partial, and Semipartial Correlations
3.4.2 Venn Diagrams
3.4.3 Partial Relationships and Simple Relationships May Have Different Signs
3.4.4 How Covariates Affect Regression Coefficients
3.4.5 Formulas for bj, prj, srj, and R
3.5 Chapter Summary
4. Statistical Inference in Regression
4.1 Concepts in Statistical Inference
4.1.1 Statistics and Parameters
4.1.2 Assumptions for Proper Inference
4.1.3 Expected Values and Unbiased Estimation
4.2 The ANOVA Summary Table
4.2.1 Data = Model + Error
4.2.2 Total and Regression Sums of Squares
4.2.3 Degrees of Freedom
4.2.4 Mean Squares
4.3 Inference about the Multiple Correlation
4.3.1 Biased and Less Biased Estimation of TR2
4.3.2 Testing a Hypothesis about TR
4.4 The Distribution of and Inference about a Partial Regression Coefficient
4.4.1 Testing a Null Hypothesis about Tbj
4.4.2 Interval Estimates for Tbj
4.4.3 Factors Affecting the Standard Error of bj
4.4.4 Tolerance
4.5 Inferences about Partial Correlations
4.5.1 Testing a Null Hypothesis about Tprj and Tsrj
4.5.2 Other Inferences about Partial Correlations
4.6 Inferences about Conditional Means
4.7 Miscellaneous Issues in Inference
4.7.1 How Great a Drawback Is Collinearity?
4.7.2 Contradicting Inferences
4.7.3 Sample Size and Nonsignificant Covariates
4.7.4 Inference in Simple Regression (When k = 1)
4.8 Chapter Summary
5. Extending Regression Analysis Principles
5.1 Dichotomous Regressors
5.1.1 Indicator or Dummy Variables
5.1.2 Y Is a Group Mean
5.1.3 The Regression Coefficient for an Indicator Is a Difference
5.1.4 A Graphic Representation
5.1.5 A Caution about Standardized Regression Coefficients for Dichotomous Regressors
5.1.6 Artificial Categorization of Numerical Variables
5.2 Regression to the Mean
5.2.1 How Regression Got Its Name
5.2.2 The Phenomenon
5.2.3 Versions of the Phenomenon
5.2.4 Misconceptions and Mistakes Fostered by Regression to the Mean
5.2.5 Accounting for Regression to the Mean Using Linear Models
5.3 Multidimensional Sets
5.3.1 The Partial and Semipartial Multiple Correlation
5.3.2 What It Means If PR = 0 or SR = 0
5.3.3 Inference Concerning Sets of Variables
5.4 A Glance at the Big Picture
5.4.1 Further Extensions of Regression
5.4.2 Some Difficulties and Limitations
5.5 Chapter Summary
6. Statistical versus Experimental Control
6.1 Why Random Assignment?
6.1.1 Limitations of Statistical Control
6.1.2 The Advantage of Random Assignment
6.1.3 The Meaning of Random Assignment
6.2 Limitations of Random Assignment
6.2.1 Limitations Common to Statistical Control and Random Assignment
6.2.2 Limitations Specific to Random Assignment
6.2.3 Correlation and Causation
6.3 Supplementing Random Assignment with Statistical Control
6.3.1 Increased Precision and Power
6.3.2 Invulnerability to Chance Differences between Groups
6.3.3 Quantifying and Assessing Indirect Effects
6.4 Chapter Summary
7. Regression for Prediction
7.1 Mechanical Prediction and Regression
7.1.1 The Advantages of Mechanical Prediction
7.1.2 Regression as a Mechanical Prediction Method
7.1.3 A Focus on R Rather Than the Regression Weights
7.2 Estimating True Validity
7.2.1 Shrunken versus Adjusted R
7.2.2 Estimating TRS
7.2.3 Shrunken R Using Statistical Software
7.3 Selecting Predictor Variables
7.3.1 Stepwise Regression
7.3.2 All Subsets Regression
7.3.3 How Do Variable Selection Methods Perform?
7.4 Predictor Variable Configurations
7.4.1 Partial Redundancy (the Standard Configuration)
7.4.2 Complete Redundancy
7.4.3 Independence
7.4.4 Complementarity
7.4.5 Suppression
7.4.6 How These Configurations Relate to the Correlation between Predictors
7.4.7 Configurations of Three or More Predictors
7.5 Revisiting the Value of Human Judgment
7.6 Chapter Summary
8. Assessing the Importance of Regressors
8.1 What Does It Mean for a Variable to Be Important?
8.1.1 Variable Importance in Substantive or Applied Terms
8.1.2 Variable Importance in Statistical Terms
8.2 Should Correlations Be Squared?
8.2.1 Decision Theory
8.2.2 Small Squared Correlations Can Reflect Noteworthy Effects
8.2.3 Pearson’s r as the Ratio of a Regression Coefficient to Its Maximum Possible Value
8.2.4 Proportional Reduction in Estimation Error
8.2.5 When the Standard Is Perfection
8.2.6 Summary
8.3 Determining the Relative Importance of Regressors in a Single Regression Model
8.3.1 The Limitations of the Standardized Regression Coefficient
8.3.2 The Advantage of the Semipartial Correlation
8.3.3 Some Equivalences among Measures
8.3.4 Cohen’s f 2
8.3.5 Comparing Two Regression Coefficients in the Same Model
8.4 Dominance Analysis
8.4.1 Complete and Partial Dominance
8.4.2 Example Computations
8.4.3 Dominance Analysis Using a Regression Program
8.5 Chapter Summary
9. Multicategorical Regressors
9.1 Multicategorical Variables as Sets
9.1.1 Indicator (Dummy) Coding
9.1.2 Constructing Indicator Variables
9.1.3 The Reference Category
9.1.4 Testing the Equality of Several Means
9.1.5 Parallels with Analysis of Variance
9.1.6 Interpreting Estimated Y and the Regression Coefficients
9.2 Multicategorical Regressors as or with Covariates
9.2.1 Multicategorical Variables as Covariates
9.2.2 Comparing Groups and Statistical Control
9.2.3 Interpretation of Regression Coefficients
9.2.4 Adjusted Means
9.2.5 Parallels with ANCOVA
9.2.6 More Than One Covariate
9.3 Chapter Summary
10. More on Multicategorical Regressors
10.1 Alternative Coding Systems
10.1.1 Sequential (Adjacent or Repeated Categories) Coding
10.1.2 Helmert Coding
10.1.3 Effect Coding
10.2 Comparisons and Contrasts
10.2.1 Contrasts
10.2.2 Computing the Standard Error of a Contrast
10.2.3 Contrasts Using Statistical Software
10.2.4 Covariates and the Comparison of Adjusted Means
10.3 Weighted Group Coding and Contrasts
10.3.1 Weighted Effect Coding
10.3.2 Weighted Helmert Coding
10.3.3 Weighted Contrasts
10.3.4 Application to Adjusted Means
10.4 Chapter Summary
11. Multiple Tests
11.1 The Multiple-Test Problem
11.1.1 An Illustration through Simulation
11.1.2 The Problem Defined
11.1.3 The Role of Sample Size
11.1.4 The Generality of the Problem
11.1.5 Do Omnibus Tests Offer “Protection”?
11.1.6 Should You Be Concerned about the Multiple-Test Problem?
11.2 The Bonferroni Method
11.2.1 Independent Tests
11.2.2 The Bonferroni Method for Nonindependent Tests
11.2.3 Revisiting the Illustration
11.2.4 Bonferroni Layering
11.2.5 Finding an “Exact” p-Value
11.2.6 Nonsense Values
11.2.7 Flexibility of the Bonferroni Method
11.2.8 Power of the Bonferroni Method
11.3 Some Basic Issues Surrounding Multiple Tests
11.3.1 Why Correct for Multiple Tests at All?
11.3.2 Why Not Correct for the Whole History of Science?
11.3.3 Plausibility and Logical Independence of Hypotheses
11.3.4 Planned versus Unplanned Tests
11.4 Summary
11.5 Chapter Summary
12. Nonlinear Relationships
12.1 Linear Regression Can Model Nonlinear Relationships
12.1.1 When Must Curves Be Fitted?
12.1.2 The Graphical Display of Curvilinearity
12.2 Polynomial Regression
12.2.1 Basic Principles
12.2.2 An Example
12.2.3 The Meaning of the Regression Coefficients for Lower-Order Regressors
12.2.4 Centering Variables in Polynomial Regression
12.2.5 Finding a Parabola’s Maximum or Minimum
12.3 Spline Regression
12.3.1 Linear Spline Regression
12.3.2 Implementation in Statistical Software
12.3.3 Polynomial Spline Regression
12.3.4 Covariates, Weak Curvilinearity, and Choosing Joints
12.4 Transformations of Dependent Variables or Regressors
12.4.1 Logarithmic Transformation
12.4.2 The Box–Cox Transformation
12.5 Chapter Summary
13. Linear Interaction
13.1 Interaction Fundamentals
13.1.1 Interaction as a Difference in Slope
13.1.2 Interaction between Two Numerical Regressors
13.1.3 Interaction versus Intercorrelation
13.1.4 Simple Linear Interaction
13.1.5 Representing Simple Linear Interaction with a Cross-product
13.1.6 The Symmetry of Interaction
13.1.7 Interaction as a Warped Surface
13.1.8 Covariates in a Regression Model with an Interaction
13.1.9 The Meaning of the Regression Coefficients
13.1.10 An Example with Estimation Using Statistical Software
13.2 Interaction Involving a Categorical Regresson
13.2.1 Interaction between a Dichotomous and a Numerical Regressor
13.2.2 The Meaning of the Regression Coefficients
13.2.3 Interaction Involving a Multicategorical and a Numerical Regressor
13.2.4 Inference When Interaction Requires More Than One Regression Coefficient
13.2.5 A Substantive Example
13.2.6 Interpretation of the Regression Coefficients
13.3 Interaction between Two Categorical Regressors
13.3.1 The 2 × 2 Design
13.3.2 Interaction between a Dichotomous and a Multicategorical Regressor
13.3.3 Interaction between Two Multicategorical Regressors
13.4 Chapter Summary
14. Probing Interactions and Various Complexities
14.1 Conditional Effects as Functions
14.1.1 When the Interaction Involves Dichotomous or Numerical Variables
14.1.2 When the Interaction Involves a Multicategorical Variable
14.2 Inference about a Conditional Effect
14.2.1 When the Focal Predictor and Moderator Are Numerical or Dichotomous
14.2.2 When the Focal Predictor or Moderator Is Multicategorical
14.3 Probing an Interaction
14.3.1 Examining Conditional Effects at Various Values of the Moderator
14.3.2 The Johnson–Neyman Technique
14.3.3 Testing versus Probing an Interaction
14.3.4 Comparing Conditional Effects
14.4 Complications and Confusions in the Study of Interactions
14.4.1 The Difficulty of Detecting Interactions
14.4.2 Confusing Interaction with Curvilinearity
14.4.3 How the Scaling of Y Affects Interaction
14.4.4 The Interpretation of Lower-Order Regression Coefficients When a Cross-Product Is Present
14.4.5 Some Myths about Testing Interaction
14.4.6 Interaction and Nonsignificant Linear Terms
14.4.7 Homogeneity of Regression in ANCOVA
14.4.8 Multiple, Higher-Order, and Curvilinear Interactions
14.4.9 Artificial Categorization of Continua
14.5 Organizing Tests on Interaction
14.5.1 Three Approaches to Managing Complications
14.5.2 Broad versus Narrow Tests
14.6 Chapter Summary
15. Mediation and Path Analysis
15.1 Path Analysis and Linear Regression
15.1.1 Direct, Indirect, and Total Effects
15.1.2 The Regression Algebra of Path Analysis
15.1.3 Covariates
15.1.4 Inference about the Total and Direct Effects
15.1.5 Inference about the Indirect Effect
15.1.6 Implementation in Statistical Software
15.2 Multiple Mediator Models
15.2.1 Path Analysis for a Parallel Mediation Model
15.2.2 Path Analysis for a Serial Mediation Model
15.3 Extensions, Complications, and Miscellaneous Issues
15.3.1 Causality and Causal Order
15.3.2 The Causal Steps Approach
15.3.3 Mediation of a Nonsignificant Total Effect
15.3.4 Multicategorical Independent Variables
15.3.5 Fixing Direct Effects to Zero
15.3.6 Nonlinear Effects
15.3.7 Moderated Mediation
15.4 Chapter Summary
16. Detecting and Managing Irregularities
16.1 Regression Diagnostics
16.1.1 Shortcomings of Eyeballing the Data
16.1.2 Types of Extreme Cases
16.1.3 Quantifying Leverage, Distance, and Influence
16.1.4 Using Diagnostic Statistics
16.1.5 Generating Regression Diagnostics with Computer Software
16.2 Detecting Assumption Violations
16.2.1 Detecting Nonlinearity
16.2.2 Detecting Non-Normality
16.2.3 Detecting Heteroscedasticity
16.2.4 Testing Assumptions as a Set
16.2.5 What about Nonindependence?
16.3 Dealing with Irregularities
16.3.1 Heteroscedasticity-Consistent Standard Errors
16.3.2 The Jackknife
16.3.3 Bootstrapping
16.3.4 Permutation Tests
16.4 Inference without Random Sampling
16.5 Keeping the Diagnostic Analysis Manageable
16.6 Chapter Summary
17. Power, Measurement Error, and Various Miscellaneous Topics
17.1 Power and Precision of Estimation
17.1.1 Factors Determining Desirable Sample Size
17.1.2 Revisiting the Standard Error of a Regression Coefficient
17.1.3 On the Effect of Unnecessary Covariates
17.2 Measurement Error
17.2.1 What Is Measurement Error?
17.2.2 Measurement Error in Y
17.2.3 Measurement Error in Independent Variables
17.2.4 The Biggest Weakness of Regression: Measurement Error in Covariates
17.2.5 Summary: The Effects of Measurement Error
17.2.6 Managing Measurement Error
17.3 An Assortment of Problems
17.3.1 Violations of the Basic Assumptions
17.3.2 Collinearity
17.3.3 Singularity
17.3.4 Specification Error and Overcontrol
17.3.5 Noninterval Scaling
17.3.6 Missing Data
17.3.7 Rounding Error
17.4 Chapter Summary
18. Logistic Regression and Other Linear Models
18.1 Logistic Regression
18.1.1 Measuring a Model’s Fit to Data
18.1.2 Odds and Logits
18.1.3 The Logistic Regression Equation
18.1.4 An Example with a Single Regressor
18.1.5 Interpretation of and Inference about the Regression Coefficients
18.1.6 Multiple Logistic Regression and Implementation in Computing Software
18.1.7 Measuring and Testing the Fit of the Model
18.1.8 Further Extensions
18.1.9 Discriminant Function Analysis
18.1.10 Using OLS Regression with a Dichotomous Y
18.2 Other Linear Modeling Methods
18.2.1 Ordered Logistic and Probit Regression
18.2.2 Poisson Regression and Related Models of Count Outcomes
18.2.3 Time Series Analysis
18.2.4 Survival Analysis
18.2.5 Structural Equation Modeling
18.2.6 Multilevel Modeling
18.2.7 Other Resources
18.3 Chapter Summary
A. The RLM Macro for SPSS and SAS
B. Linear Regression Analysis Using R
C. Statistical Tables
D. The Matrix Algebra of Linear Regression Analysis
Author Index
Subject Index
About the Authors


Graduate students and instructors in psychology, human development, education, sociology, public health, communication, and management; applied researchers who want to refresh their skills. Will serve as the main text for graduate-level courses such as Regression Analysis, Quantitative Methods and Analysis, Multivariate Statistics, Linear Modeling, and Multiple Regression, or as a supplementary text for courses such as Research Design and Experimental Design.

Customer Reviews