Factor Analysis: A Guide to Exploratory Factor Analysis (EFA)
# Factor Analysis: A Guide to Exploratory Factor Analysis (EFA)
You have a questionnaire with 30 items and you are wondering: do these 30 items actually measure three or four different things? Do some items cluster together? Can you reduce 30 variables to a smaller number of meaningful dimensions? That is what factor analysis is for.
Factor analysis is one of the most powerful psychometric techniques, but also one of the most frequently misapplied. Understanding its principles and prerequisites is the difference between a quality paper and statistical chaos.
What Is Factor Analysis and Why Use It?
Factor analysis is a statistical method that identifies latent (hidden) constructs based on correlations among observed variables. In other words, it looks for structure in your data.
Imagine you have a questionnaire about attitudes toward technology in education with 24 items. Respondents rate each item on a scale from 1 to 5. When you look at the correlation matrix, you notice that some items cluster: items about the usefulness of technology correlate with each other, items about fear of technology correlate with each other, and items about ease of use correlate with each other.
Factor analysis formally identifies these groups and tells you: "Your 24 items reduce to 3 factors: perceived usefulness, technology anxiety, and perceived ease of use."
Two fundamental goals:
- Variable reduction (reduce 24 items to 3 factor scores)
- Identification of latent constructs (discover what your questionnaire actually measures)
EFA vs CFA: When to Use Which?
These are two fundamentally different approaches to factor analysis, and it is important to know the distinction.
Exploratory factor analysis (EFA) is used when you do not know in advance how many factors exist or which items belong to which factor. You let the data "speak" and discover the structure based on statistical criteria.
Use EFA when:
- •You are developing a new questionnaire and testing its structure
- •You are translating a questionnaire into another language and want to check whether the structure replicates
- •You do not have a clear theoretical basis for the expected structure
Confirmatory factor analysis (CFA) is used when you have a precise hypothesis about how many factors exist and which items belong to which factor. You test whether the data fit your model.
Use CFA when:
- •You are validating an existing questionnaire on a new sample
- •You are testing a theoretical model
- •You are comparing two or more alternative structural models
In practice, a good strategy is: EFA on one sample, then CFA on another (or split your sample in half).
Prerequisites for Factor Analysis
Before running EFA, you need to check three things.
1. Sample Size
There are multiple rules for minimum sample size:
- •The 5:1 rule means you need at least 5 participants per item. For a 24-item questionnaire, that is a minimum of 120 participants.
- •The 10:1 rule is a more conservative estimate and is recommended when communalities are weak.
- •An absolute minimum of 100 participants, regardless of the number of items.
For more detailed guidance on planning your sample size, see the article on determining how many participants you need.
2. The KMO Test (Kaiser-Meyer-Olkin)
The KMO measure of sampling adequacy indicates how suitable your variables are for factor analysis. KMO ranges from 0 to 1.
| KMO Value | Interpretation |
|---|---|
| < 0.50 | Unacceptable |
| 0.50 - 0.59 | Poor |
| 0.60 - 0.69 | Mediocre |
| 0.70 - 0.79 | Good |
| 0.80 - 0.89 | Very good |
| >= 0.90 | Excellent |
Minimum for factor analysis: KMO > 0.60. If KMO falls below this threshold, your variables do not share enough common variance and factor analysis is not justified.
3. Bartlett's Test of Sphericity
This test checks whether the correlation matrix is sufficiently different from an identity matrix (a matrix in which all correlations are zero). If Bartlett's test is not significant (p > .05), it means your variables are not sufficiently correlated for factor analysis.
Requirement: p < .05. In practice, this test is almost always significant with a reasonable sample size, so KMO is generally considered the more informative indicator.
Extraction Method: PCA vs PAF
This is where the confusion that has puzzled students and researchers for years begins.
Principal Components Analysis (PCA)
PCA is a data reduction technique. It transforms variables into linear combinations (components) that explain maximum total variance. PCA does not assume latent constructs and does not differentiate between shared and unique variance.
Principal Axis Factoring (PAF)
PAF is a true factor analysis method. It assumes that observed variables are indicators of latent constructs and attempts to identify the shared variance among variables.
The key difference: PCA analyzes total variance (including measurement error), while PAF analyzes only shared variance.
Which Method to Choose?
- •If you want data reduction without theoretical assumptions: PCA
- •If you are looking for latent constructs and developing theory: PAF
- •For psychometric purposes (questionnaire development): PAF is the better choice
Rotation: Varimax vs Oblimin
After extraction, factors are rotated to make them more interpretable. There are two categories of rotation.
Orthogonal Rotation (Varimax)
Varimax assumes that factors are independent (uncorrelated). After rotation, each variable has a high loading on one factor and low loadings on the others.
Use Varimax when:
- •You theoretically expect the constructs to be independent
- •You want a simpler structure for interpretation
- •You are running an exploratory analysis without clear hypotheses
Oblique Rotation (Oblimin)
Oblimin allows factors to be correlated. This is a more realistic approach because in psychology and the social sciences, constructs are rarely completely independent (e.g., anxiety and depression are correlated).
Use Oblimin when:
- •You expect factors to be correlated
- •You are working with psychological constructs that naturally overlap
- •You want a more realistic picture of the data structure
Practical tip: Run both rotations. If Oblimin shows inter-factor correlations below .32, the results will be nearly identical to Varimax, and you can use the simpler Varimax solution.
How to Determine the Number of Factors
This is one of the most important decisions in factor analysis, and unfortunately, there is no single correct answer. Three criteria are commonly used.
1. Kaiser's Criterion (Eigenvalue > 1)
Retain only factors with an eigenvalue greater than 1. This is the most commonly used criterion, but also the most criticized because it tends to overestimate the number of factors.
2. Scree Plot (Cattell's Scree Test)
Eigenvalues are plotted on a graph, and you look for the "elbow" (the point where the curve sharply changes slope). Factors above the elbow are retained. The problem is that identifying the elbow is subjective.
3. Parallel Analysis (Horn)
This is the most objective criterion. A large number of random matrices of the same dimensions as your data are generated, eigenvalues are computed for each, and only factors whose eigenvalues exceed the average eigenvalues from the random matrices are retained.
Recommendation: Use all three criteria and look for convergence. If Kaiser says 4, the scree plot says 3 or 4, and parallel analysis says 3, you probably have 3 factors.
How to Interpret Factor Loadings
A factor loading is the correlation between an item and a factor. The higher the loading, the better the item serves as an indicator of that factor.
Guidelines:
- •> 0.70 = excellent loading
- •0.55 - 0.70 = good loading
- •0.40 - 0.55 = acceptable loading
- •< 0.40 = the item does not belong to the factor (consider removing it)
Cross-loadings: If an item has a loading above 0.40 on two or more factors, that is problematic. Such an item is "ambiguous" and is typically removed from the questionnaire.
Practical Example: Attitudes Toward Technology in Education
Suppose you are developing a questionnaire about attitudes toward technology use in education. You started with 24 items and collected data from 250 students.
Step 1: Prerequisites
- •KMO = 0.87 (very good)
- •Bartlett's test: chi-square(276) = 2341.5, p < .001 (significant)
- •Sample of 250 for 24 items = 10.4:1 ratio (excellent)
Step 2: Extraction (PAF) and Determining the Number of Factors
- •Kaiser: 4 factors with eigenvalue > 1
- •Scree plot: elbow at 3 or 4 factors
- •Parallel analysis: 3 factors
- •Decision: 3 factors
Step 3: Rotation (Oblimin)
| Item | F1: Usefulness | F2: Anxiety | F3: Ease |
|---|---|---|---|
| Technology improves the quality of teaching | .78 | .05 | .12 |
| Students learn better with digital tools | .73 | -.08 | .15 |
| I feel nervous when using new applications | .02 | .81 | -.10 |
| I am afraid of making mistakes on a computer | -.05 | .76 | -.14 |
| I easily master new technologies | .11 | -.12 | .72 |
| I intuitively understand how software works | .08 | -.06 | .69 |
Step 4: Interpretation
- •Factor 1 (Perceived Usefulness): items about how useful technology is for learning
- •Factor 2 (Technology Anxiety): items about fear and discomfort when using technology
- •Factor 3 (Perceived Ease of Use): items about how easy it is to use technology
The three factors together explain 58.3% of total variance, which is acceptable for the social sciences (50-60% is considered good).
Once you establish the factor structure, the next step is checking the reliability of each subscale. For that, see the guide on Cronbach's alpha coefficient, which is the standard measure of internal consistency.
Common Mistake
Using PCA and calling it "factor analysis."
This is so widespread that many researchers do not realize they are making an error. PCA (Principal Components Analysis) and FA (Factor Analysis) are mathematically different procedures with different assumptions.
PCA looks for linear combinations of variables that explain maximum total variance. FA looks for latent constructs that explain shared variance.
Why does this matter? If you write "factor analysis was conducted" in your paper but actually used PCA, a knowledgeable reviewer will flag it. More importantly, PCA typically produces higher factor loadings and can create a false impression of item quality.
How to avoid this mistake:
- In SPSS: when choosing extraction, explicitly select "Principal Axis Factoring" instead of "Principal Components"
- In your paper: clearly state the extraction method, rotation type, and criterion for the number of factors
- If you use PCA, be honest and write "principal components analysis," not "factor analysis"
Reporting in APA Format
When reporting factor analysis, be sure to include:
- Extraction method (PAF, PCA, ML...)
- Type of rotation (Varimax, Oblimin...)
- Criterion for number of factors
- KMO and Bartlett's test
- Percentage of variance explained
- Table of factor loadings (with cross-loadings)
- Correlations among factors (if using oblique rotation)
Example: "Exploratory factor analysis was conducted using principal axis factoring (PAF) with Oblimin rotation. The KMO measure of sampling adequacy was .87, and Bartlett's test of sphericity was statistically significant (chi-square(276) = 2341.5, p < .001). Based on parallel analysis and the scree plot, three factors were extracted, accounting for 58.3% of total variance."
Try the Istrazimo Platform
Istrazimo includes exploratory factor analysis with the KMO test, scree plot, and automatic factor determination. Instead of configuring everything manually in SPSS or R, you can run a complete EFA in just a few clicks, with a clear loadings table and a scree plot visualization that you can export directly for your paper. Get started.
Try this in Istražimo
From creating surveys to statistical analysis, all in one place. Free for students and researchers.
Start for free →