Using MS Word for Readability Analysis
What I found while using Microsoft Word to analyze readability and calculate metrics used to improve writing. Supplement to "Write on! 7 free tools for more readable writing on a shoestring".
I’ve used Word’s size and readability statistics on 3 months of Substack and LinkedIn posts from this year. My conclusion is that the Readability Statistics in Word are too flaky to be useful, and I need to find a better tool. Here’s why.
Details on Microsoft Word’s Readability Features
Microsoft Word offers 2 UI features with relevant metrics: Word Count and Readability Statistics. I own a retail license for MS Office (2019). So Word is free for me to use.
The Readability Statistics feature provides some data on counts and gives the Flesch Reading Ease score and Flesch-Kincaid Grade Level. Disappointingly, Word does not share the syllable count used in the formulas for these metrics.
In using the tool on 3 months of posts, I’ve found it flaky and awkward to use.
“Word Count” can process a subset (selection) of text. However, “Readability Statistics” cannot. It counts in the readability calculations everything I typically save in my draft file - title, subtitle, comments, etc.
Most of these other texts aren’t sentences and will skew the metrics.
The only workaround is to create a second Word file that has only the body of the post.
“Word Count” provides two character counts - one including spaces, one without spaces. “Readability Statistics” just provides one character count. (It doesn’t say whether spaces are counted.)
For my evaluations, I used both tools on multiple Word files. Neither character count from the Word Count tool matches the character count in the Readability Statistics tool!
Sometimes the word counts of the two tools don’t quite match, either.
I learned the hard way that, ironically, Word is inaccurate on rich text content in a .DOCX file. With hyperlinked text, Word appears to be counting the characters inside the linked URL. Long non-visible URLs inflate counts and worsen readability scores, which is illogical.
The only known workaround is to create a plain-text Word copy of the file. (A TXT file can’t be scored in Word without importing.) That’s extra work for a Word user. The Word feature ought to handle this automatically.
The sentence and paragraph counts are off on short posts. (The counts might be wrong on longer posts too. I haven’t manually counted them, because they’re long and counting would be tedious.)
Example: I manually counted one short post as 3 sentences and 2 paragraphs. The counts could be 4 sentences and 2 paragraphs if one compound sentence (using a colon) counts as 2 sentences. However, the readability stats from Word said 3 sentences and 4 paragraphs. ??
Switching to plain text DOCX input did not fix this.
The math doesn’t math. The ratios Word shows are inconsistent with the counts it shows.
Example: For the short post mentioned above, Word showed 3 sentences and 4 paragraphs. That equals 0.75 sentences/paragraph. Word showed an average of 1.5 sentences/paragraph!
Switching to plain text DOCX input did not fix this, either.
The points above mention just one example. I used Word’s Readability Statistics on 3 months of my posts (66 very short to long). These were not isolated problems. To illustrate, here are specific results from a second short post which contained hyperlinked text.
Example: Errors in Word’s Readability Statistics
📊📈Data #visualization folks: I've just published my latest article on "vizzes" and ways to embed charts and tables in Substack posts with Datawrapper. Please check it out and let me know which options YOU think are best for each of the experiments! https://lnkd.in/eFTpaMfM
Thanks to Kate Strachnyi and Heiner Romero Leiva (He/Him) for their kind feedback on these tools and on improving this article 🙌
Manual counting says this is 2 paragraphs and 3 sentences of text. If the lnkd.in hyperlink is treated like a sentence, the total would be 4. The maximum could be 5 sentences if the compound sentence (colon separator “… folks: I’ve …”) is counted as two. (Handling of compound sentences is configurable in some readability analysis tools.) I call it 4, though.
First try: I used the rich text above with the hyperlinks. The character count in Readability Statistics (770) was ~2x the value from Word Count (338 or 403). The word counts were close, but didn’t match (65 and 67). Readability Statistics called it 3 sentences and 1 paragraph.
Second try: I repeated the measurements with the plain-text equivalent in Word (all hyperlinks and formatting removed). In Readability Statistics, the character count dropped from 770 to 341, which still didn’t match the character count from Word Count (338 or 401). 341 implies that Readability Statistics may be excluding spaces, though. The paragraph count went up from 1 to 2. Oddly, the sentence count went down from 3 to 2. The word counts are unchanged and still don’t match.
NOTE: Even if the counts were correct in both pairs of Readability Statistics measurements, the math was clearly wrong on the 3 averages (“Sentences per Paragraph”, “Words per Sentence”, and “Characters per Word”).
What’s Next?
After applying Word’s Readability Statistics to my 66 posts and seeing how it worked (or didn’t), I’m now on a quest to find a better tool.
My main post on readability metrics, tools, and evaluations is here (coming soon!). Four bonus pages provide details:
Readability features of Microsoft Word and why I’m looking for a new tool. [YOU ARE HERE]
What’s “readable”, 16 readability metrics, the 6 I chose to evaluate, and why.
14 readability analysis tools besides Word, the 7 I chose to evaluate, and why.
Scoring for Word and the 7 selected tools on the 6 selected metrics. (coming soon!)
Great resource, thank you for sharing, Karen