14 CONCEPTS AND TOOLS
articles in research journals where the authors describe the results of SEM analyses. It
is also difficult to look through an issue of a research journal in psychology, education,
or other areas and not find at least one article that concerns SEM. Interest in SEM has
also expanded to other disciplines, including wildlife management (Grace, 2006, 2008),
communication sciences (Holbert & Stephenson, 2002), medical research (DiLalla,
2008), administrative pharmacy (Schreiber, 2008), and pediatric psychology (Nelson,
Aylward, & Steele, 2008), to name a few.
It is not hard to understand this growing interest in SEM. As described by David
Kenny in the Series Editor’s Note in the previous edition of this book, researchers love
SEM because it addresses the questions they want answered and it “thinks” about
research the way researchers do. The brief description given earlier of the kinds of
hypotheses that can be tested in SEM only hints at its flexibility. However, there is
evidence that many—if not most—published reports of the application of SEM have at
least one flaw so serious that it compromises the scientific value of the article. MacCal-
lum and Austin (2000) reviewed about 500 applications of SEM in 16 different psychol-
ogy research journals, and they found problems with the reporting in many of these
reports. For example, in about 50% of the articles, the reporting of parameter estimates
was incomplete (e.g., unstandardized estimates were omitted); in about 25% the type of
data matrix analyzed (e.g., a correlation vs. a covariance matrix) was not described; and
in about 10% the model specified or the indicators of factors were not clearly specified.
Shah and Goldstein (2006) reviewed 93 articles published in four management science
journals. In a majority of articles, Shah and Goldstein (2006) found that it was difficult
to determine the model actually tested or the complete set of observed variables. Along
the same lines, they found in 31 out of 143 analyses that the model described in the text
did not match the statistical results reported in text or tables, and the method of estima-
tion was not mentioned in about half of the articles.
Both sets of authors of the review studies just described found similar kinds of
problems in their respective sets of articles. For example, MacCallum and Austin (2000)
found that about 20% of studies used samples of fewer than 100 cases. Shah and Gold-
stein (2006) found that the N:q ratio was < 10:1 in about 70% of studies and < 5:1 in
about 30%. The author of the typical article in these sets of reviewed studies did not
consider alternative models that might account for the same pattern of observed covari-
ances just as well as the author’s preferred model. Such alternative models are known as
equivalent models. Ignoring equivalent models is a form of confirmation bias whereby
researchers test a single model, give an overly positive evaluation of that model, and fail
to consider other explanations of the data (Shah & Goldstein, 2006). The potential for
confirmation bias is further strengthened by the relative lack of replication. Specifically,
most SEM studies are “one-shot” studies that do not involve cross-validation or a split-
sample approach. The need for large samples in SEM undoubtedly hinders the ability of
researchers to replicate their analyses. But whether results reported in most SEM studies
would be found across independent samples is typically unknown.
The problems just described—and others covered later—are serious, and they indi-
cate that our collective enthusiasm about SEM has outstripped our good judgment about