How to Create a Likert Scale: A Complete Guide for Researchers
# How to Create a Likert Scale: A Complete Guide for Researchers
Imagine asking a student: "How much do you agree that online education is high quality?" The student says: "Well... I somewhat agree." You write that down as a 4 out of 5. But what does that four actually mean? How did you arrive at that number? And is your scale even well constructed?
The Likert scale is probably the most widely used instrument in the social sciences. And precisely because it is so common, many researchers use it on autopilot, without thinking about whether they have built it correctly. Let us change that.
Who Was Likert, Anyway?
Rensis Likert (pronounced "LIK-ert," not "LYE-kurt") was an American psychologist who, in his 1932 doctoral dissertation at Columbia University, introduced a new technique for measuring attitudes. Instead of the Thurstone scales that had been in use, which required a complex calibration process involving dozens of judges, Likert proposed a simpler idea: present the respondent with a statement and ask them to indicate their degree of agreement on a numerical scale.
That simplicity is the reason the Likert scale has been in use for nearly a century. But that same simplicity sometimes leads researchers to skip important steps during construction.
Likert Item vs. Likert Scale: A Distinction That Is Often Overlooked
This is a key distinction that roughly 90% of methodology students do not articulate clearly.
A Likert item is a single statement with an associated response scale. For example:
"Online education is just as effective as in-person instruction."
1 - Strongly disagree | 2 - Disagree | 3 - Neither agree nor disagree | 4 - Agree | 5 - Strongly agree
A Likert scale is a set of multiple items that together measure a single construct. Only when you combine answers to several items (typically 4 to 10) do you get a reliable measure of some attitude or trait.
Why does this matter? A single item has low reliability. If you ask only one question about attitudes toward online education, you cannot know whether you have captured the true attitude or just a momentary mood. Multiple items yield a more stable score, and you can check the reliability of that combined score using Cronbach's alpha.
How Many Points on the Scale: 5, 7, or Something Else?
This question generates endless debate at methodology conferences. Here is what the data say.
The 5-Point Scale
This is the most popular option. It is differentiated enough to capture variability in responses, yet simple enough that respondents do not waste time agonizing over the difference between point 6 and point 7.
Advantages: quick to complete, intuitive, widely accepted in the literature.
Disadvantages: may have reduced variability, especially if respondents avoid the extreme points (a phenomenon known as "central tendency bias").
The 7-Point Scale
This gives greater variability and somewhat better sensitivity. Research suggests that 7-point scales can have better psychometric properties than 5-point scales, particularly when measuring nuanced constructs.
Advantages: greater precision, better variability, often recommended for research purposes.
Disadvantages: verbally labeling all 7 points can be challenging, and some populations (younger children, fatigued respondents) may find it harder to differentiate.
Even or Odd Number of Points?
An odd number (5, 7) includes a midpoint ("Neither agree nor disagree"). An even number (4, 6) forces respondents to "take a side" because there is no neutral option.
Arguments for a midpoint:
- •The respondent may genuinely be neutral, and you should allow them to express that.
- •Without a midpoint, forcing a choice can lead to random answers that add noise.
Arguments against a midpoint:
- •Some respondents use the midpoint as a "lazy" default, choosing it instead of thinking carefully.
- •In cultures where acquiescence or conflict avoidance is common, the midpoint can attract a disproportionate number of responses.
Recommendation: For most academic research, a 5-point or 7-point scale (odd, with a midpoint) works well. Use an even-point scale only if you have a theoretical reason for forcing a choice, and acknowledge this in your methodology section.
Formulating Good Items
Writing Likert items is deceptively hard. Here are rules that will save you from the most common pitfalls.
Rule 1: One Idea Per Item
Bad: "Online education is convenient and high quality."
Good: "Online education is convenient." (separate item) + "Online education is high quality." (separate item)
The first version is a double-barreled item. A respondent might agree that it is convenient but disagree that it is high quality. What do they mark? You have no way to interpret their answer.
Rule 2: Mix Positively and Negatively Worded Items
Including some reverse-worded items helps detect acquiescence bias (the tendency to agree with everything). For example:
- •Positive: "I find online lectures engaging."
- •Negative (reversed): "I have difficulty concentrating during online lectures."
When scoring, you reverse the values on negatively worded items so that a higher total always means more of the construct.
Rule 3: Avoid Double Negations
Bad: "I do not disagree that online education is not ineffective."
Good: "I believe online education is effective."
Double negations confuse respondents and produce unreliable data. When you formulate a reversed item, keep it simple and direct.
Rule 4: Label Every Point
Research by Krosnick (1999) shows that scales with verbal labels on every point outperform those with labels only at the extremes. Instead of "1 ... 5," use "1 - Strongly disagree, 2 - Disagree, 3 - Neither agree nor disagree, 4 - Agree, 5 - Strongly agree."
Complete Example: Attitudes Toward AI in Education
Let us build a small Likert scale step by step. Suppose you are investigating university students' attitudes toward the use of artificial intelligence in education.
Construct: Attitudes toward AI in higher education
Scale: 5-point (1 = Strongly disagree, 5 = Strongly agree)
| # | Item | Direction |
|---|---|---|
| 1 | AI tools can help me understand course material better. | Positive |
| 2 | I feel comfortable using AI-assisted learning platforms. | Positive |
| 3 | I worry that AI will reduce critical thinking skills. | Negative |
| 4 | Instructors should integrate AI tools into their teaching. | Positive |
| 5 | AI-generated feedback is less trustworthy than human feedback. | Negative |
| 6 | Using AI in education prepares students for the modern workplace. | Positive |
| 7 | I find it difficult to learn effectively with AI tools. | Negative |
| 8 | AI can make personalized learning more accessible. | Positive |
After data collection, you would reverse-score items 3, 5, and 7 (so that 1 becomes 5, 2 becomes 4, etc.) and then compute the total or mean score. A higher score indicates a more positive attitude toward AI in education.
Before using this scale in your main study, run a pilot with 30 to 50 respondents. Check reliability using Cronbach's alpha, look at item-total correlations, and consider whether any item drags down the overall reliability. If an item has an item-total correlation below 0.30, it likely does not belong in the scale.
The Midpoint Debate, Revisited
Some researchers worry that the midpoint ("Neither agree nor disagree") functions as "I don't know" rather than "I'm genuinely neutral." There is evidence on both sides.
A practical solution: include a separate "I don't know / Not applicable" option that is visually distinct from the rating scale. That way, respondents who are genuinely uncertain have a place to go, while the midpoint retains its meaning as true neutrality.
Common Mistake: Treating Ordinal Data as Interval
This is the single most contentious methodological issue involving Likert scales.
The problem: A Likert item produces ordinal data. The psychological distance between "Strongly disagree" (1) and "Disagree" (2) is not necessarily the same as the distance between "Agree" (4) and "Strongly agree" (5). Strictly speaking, you cannot compute a mean on ordinal data.
The practice: Virtually every researcher in the social sciences computes means on Likert scales. And there is empirical justification for this: multiple simulation studies (Norman, 2010; Carifio & Perla, 2008) have shown that parametric tests are robust to this violation when you work with summed or averaged scores from multi-item Likert scales (not single items).
The recommendation:
- •For individual Likert items, use non-parametric tests (Mann-Whitney, Kruskal-Wallis) or report medians and frequencies.
- •For Likert scales (summed or averaged scores across multiple items), parametric tests are generally acceptable.
- •Always report which approach you used and why. If a reviewer challenges you, cite Norman (2010) and show that your scale has adequate reliability.
When planning your survey, think carefully about sample size as well. Determining how many participants you need before data collection prevents underpowered results and wasted effort.
Quick Checklist Before You Launch
Before you send your Likert scale out into the world, run through this list:
- Does each item contain only one idea?
- Have you mixed positively and negatively worded items?
- Are there verbal labels on every scale point?
- Have you piloted the scale with at least 30 respondents?
- Did you check Cronbach's alpha (aiming for 0.70 or above)?
- Have you checked item-total correlations (removing items below 0.30)?
- Is the reading level appropriate for your target population?
- Have you specified how you will handle the midpoint and "I don't know" responses?
Wrapping Up
The Likert scale may seem simple, but constructing one that produces reliable, interpretable data requires thoughtful decisions at every stage: from the number of points and the wording of items, to reverse scoring and reliability analysis. By following the principles in this guide, you will build scales that stand up to methodological scrutiny.
If you are ready to put this into practice, the Istrazimo platform offers predefined templates for 5-point and 7-point Likert scales with automatic item randomization, built-in reverse scoring, and instant reliability calculation. It is a straightforward way to go from theory to data collection without wrestling with formatting or manual computation.
Try this in Istražimo
From creating surveys to statistical analysis, all in one place. Free for students and researchers.
Start for free →