Spelling and grammar checking from the terminal
2023-06-14
tl;dr try Vale + Openly!
In my attempts to write without leaving a terminal, I ran into one small issue: checking spelling and grammar. Is it entirely necessary? No - but I’m a stickler for at least catching obvious mistakes. When reading works written by others, it’s the glaring issues that indicate to me the writer didn’t at least bother to proofread what they’d written before publishing. In the current climate, however, it may also be a positive to leave some obvious human error behind to really cement the point that it isn’t just AI-generated content, but that’s an entirely different topic (at least until content farms tune their bots to explicitly introduce errors 😔).
aspell, hunspell, LanguageTool?
Searching for this topic returns the popular aspell, hunspell, and LanguageTool options - all viable in their own right, but with varying drawbacks. Basic spellchecking is easy to achieve and script into my attempted workflow via the first two options, grammar is out of their scope. LanguageTool was the next option, and there is a number of ways get a decent front-end to it, e.g. via pyLanguagetool. Open-source to some extent, but with the caveat of needing to either send text to their servers, or self-host the rather heavy server component yourself. Two popular docker implementations exist and are advertised on their Github page, and for what it’s worth, that totally works - only drawback is having yet another docker container running for yet another service that I’ll use maybe once or twice a week. Spinning up/down the container via a script whenever I need it is possible, but it’s also just more cruft that I’d rather not deal with. The necessary ngram data is also a hefty 8GB - moving that around to devices that may not have a network connection when writing is also more added friction.
What about AI?
I’ve made ample use of self-hosted AI models for creative projects - e.g. generating story contexts for DnD, or seeing how far a bot can flesh out ideas I have for a plot in a game. Similarly, image generation is useful for getting an idea down for what I would want a character to look like before trying my hand at modeling in Blender or hiring an artist to actually do art.
ChatGPT is entirely capable of parsing a markdown file and fixing spelling and grammar errors, complete with actually useful suggestions and an understanding of most technical terms. Drawback is paying for even minimal API usage adding yet another monthly subscription service (totally understandable though - they are a business, despite the “Open” in “OpenAI”), on top of the fact that it runs into the same problem as the non-self-hosted LanguageTool option - shipping your data off to a third-party.
However, AI isn’t all dependant on remote hosts with large GPU farms; there is a number of available models that can be self-hosted and run on just CPU. Setting up pszemraj/flan-t5-large-grammar-synthesis takes only a few minutes and the results are actually very good to some extent:
- IN: I walkd to the store and I bught milk. I will eat fish for dinner and drank milk. We all eat the fish and then made dessert.
- OUT: I went to the store and bought some milk. I will eat fish for dinner and drink milk. We all ate the fish and then made dessert.
- IN: Anna and Mike is going sking. Matt lke fish. We enjoys horror movies.
- OUT: Anna and Mike are going skiing. I like fish. We enjoy horror movies.
- IN: Anna and Pat are maried; he has been togther for 20 years. I fed all of her fish, then cleaned its tank. When the girls on the team got to the hotel, they dropped off her luggage.
- OUT: Anna and Peter are married; they have been together for 10 years. I fed all of her fish, and then cleaned the tank. When the girls on the team got to the hotel, they dropped off her luggage.
- IN: I can't harly believe what she said. I don't want no pudding. I won't do nothing.
- OUT: I can't really believe what she said. I don't want any peanuts. I won't do anything.
- IN: Because I ate dinner. I accidentaly. You accidentally what? Jumped high.
- OUT: Because I ate dinner. I accidently dropped a knife. What happened? High jump.
- IN: I went to the store I got milk and cookies. We drove all day in the van, we drove 206 miles to get to our cabin. We got to the cabin and we unpacked our things and we built a campfire and we made dinner and we slept under the stars.
- OUT: I went to the store and bought some milk and cookies. We drove all day in the van, and we drove 206 miles to get to our cabin. We got to the cabin and unpacked our things and we built a fire and made dinner and we slept under the stars.
- IN: Frustated, the chairs took me forever to set up. When they were finaly all together, studets drapd the chairs in bunting. Sleeping in the orchard, a serpent stung me.
- OUT: Frustrated, the chairs took me far too long to set up. When they were all together, the students drew pictures on the wall. Slumping in the garden, I sat down.
- IN: Since your going out, you should check the weather on you're phone. There car broke down so their hitching a ride to they're class. Can they ride home with you? I won't drive them because my car is in the shop.
- OUT: Since you are going out, you should check the weather on your phone. Their car broke down so they had to take a ride. Can they ride home with you? I won't drive because my car is in the shop.
- IN: Test line at the end, totally proper grammar.
- OUT: The test line at the end, totally proper grammar.
Examples sourced and modified from here.
Only real issue that led me away from this option is the lack of feedback, e.g. what was corrected and why, coupled with the general errors introduced by the model (“hallucinations” and similar issues). Without batching the input into several sentences at a time as recommended by the developer, it would frequently completely transform a sentence:
- IN: Matt lke fish.
- OUT: I like fish... I like fish too. . . My dad likes fish.
- IN: Matt like fish.
- OUT: My dad likes fish. I like fish too.
One of the smaller models in earlier testing turned the same line into My cat likes ice cream. Still early days projects like this, but I’m otherwise hopeful that more work on openly available models will get them to parity with commercial offerings, at least for specialized use cases.
Vale
Eventually, I stumbled on Vale from a post about LanguageTool. While I was familiar with code linting, the concept of prose linting never crossed my mind - that’s what Vale is marketed for, but it isn’t actually built for the specific case of grammar checking. That being said, it’s open source and built around processing based on “styles” - one of which exists to attempt to replicate the most popular commercial grammar checking product, Grammarly, aptly named Openly. The included “Microsoft” style also seems to catch a lot of relevant errors as well, and honestly may be good enough by itself in some cases.
Vale compiles to a single binary, pull down the Openly release in the styles folder, and a simple config file from the examples:
StylesPath = styles
MinAlertLevel = suggestion
Packages = Microsoft, write-good, Hugo
[*]
BasedOnStyles = openly, Vale, Microsoft, write-good
It also supports directly checking files intended to be processed by Hugo as well, major bonus. The alerts can be reduced based on their levels (error, warning, or suggestion) but I’ve left it at the default for myself. It is by no means perfect, and I’m certain there are issues it doesn’t catch - that being said, I also ignore many of the suggestions because I’m not submitting my writing as professional literature in any level. Glaring spelling errors and sentences that have noticeably bad grammar are really my only targets, which this utility handles well enough for me.
rwz@xb ~/hugo > vale content/posts/grammarcheck.md
content/posts/grammarcheck.md
1:1 warning Grade level (13.44) too high! openly.Readability
11:1 warning 'tl;dr try Vale + Openly!' openly.Titles
should be in sentence case
11:1 warning Don't use end punctuation in Microsoft.HeadingPunctuation
headings.
11:1 suggestion 'tl;dr try Vale + Openly!' Microsoft.Headings
should use sentence-style
capitalization.
11:4 warning Possible future tense. openly.FutureTense
13:4 warning Use first person (such as Microsoft.FirstPerson
'my') sparingly.
13:52 warning Use first person (such as ' I Microsoft.FirstPerson
') sparingly.
13:112 suggestion Try to avoid using 'Is'. write-good.E-Prime
13:147 suggestion Try to avoid using 'I'm'. write-good.E-Prime
13:147 warning Use first person (such as Microsoft.FirstPerson
'I'm') sparingly.
13:147 suggestion Avoid using "I'm" openly.E-Prime
13:202 warning Write short sentences (less openly.SentenceLength
than 25 words).
13:240 suggestion Try to avoid using 'it's'. write-good.E-Prime
13:269 warning 'indicate' is too wordy. write-good.TooWordy
13:281 warning Use first person (such as Microsoft.FirstPerson
'me') sparingly.
13:370 warning Write short sentences (less openly.SentenceLength
than 25 words).
13:370 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
13:394 warning 'however' is too wordy. write-good.TooWordy
13:412 error Don't start a sentence with openly.So
'so '
13:415 suggestion Try to avoid using 'be'. write-good.E-Prime
13:473 warning Consider removing 'really'. Microsoft.Adverbs
13:473 warning 'really' is a weasel word! write-good.Weasel
13:505 suggestion Try to avoid using 'isn't'. write-good.E-Prime
13:505 suggestion Avoid using "isn't" openly.E-Prime
13:542 suggestion Avoid using "that's" openly.E-Prime
13:542 suggestion Try to avoid using 'that's'. write-good.E-Prime
15:4 suggestion 'aspell, hunspell, Microsoft.Headings
LanguageTool?' should use
sentence-style capitalization.
15:4 warning 'aspell, hunspell, openly.Titles
LanguageTool?' should be in
sentence case
15:33 warning Don't use end punctuation in Microsoft.HeadingPunctuation
headings.
16:170 suggestion Try to avoid using 'is'. write-good.E-Prime
16:205 warning Use first person (such as Microsoft.FirstPerson
'my') sparingly.
16:262 suggestion Try to avoid using 'is'. write-good.E-Prime
16:298 suggestion Try to avoid using 'was'. write-good.E-Prime
16:329 suggestion Try to avoid using 'is'. write-good.E-Prime
16:332 warning 'a number of' is too wordy. write-good.TooWordy
16:379 error Use 'for example' instead of Microsoft.Foreign
'e.g.'.
16:379 error Use 'eg' instead of 'e.g.' openly.Punctuation
16:389 error Did you really mean Vale.Spelling
'pyLanguagetool'?
16:389 error Did you really mean openly.Spelling
'pyLanguagetool'?
16:450 warning Write short sentences (less openly.SentenceLength
than 25 words).
16:584 suggestion Consider using 'part' instead Microsoft.ComplexWords
of 'component'.
16:604 warning Write short sentences (less openly.SentenceLength
than 25 words).
16:604 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
16:649 suggestion 'are advertised' looks like Microsoft.Passive
passive voice.
16:649 warning 'are advertised' may be write-good.Passive
passive voice. Use active
voice if you can.
16:649 suggestion Try to avoid using 'are'. write-good.E-Prime
16:690 warning Try to avoid using clichés write-good.Cliches
like 'for what it's worth'.
16:699 suggestion Try to avoid using 'it's'. write-good.E-Prime
16:732 warning 'only' is a weasel word! write-good.Weasel
16:746 suggestion Try to avoid using 'is'. write-good.E-Prime
16:822 warning Use first person (such as Microsoft.FirstPerson
'I'll') sparingly.
16:824 warning Possible future tense. openly.FutureTense
16:859 warning Write short sentences (less openly.SentenceLength
than 25 words).
16:911 warning Use first person (such as ' I Microsoft.FirstPerson
') sparingly.
16:919 warning 'it is' is too wordy. write-good.TooWordy
16:919 error Use 'it's' instead of 'it is'. Microsoft.Contractions
16:939 suggestion Try to avoid using 'it's'. write-good.E-Prime
16:946 error Don't start a sentence with openly.So
'so '
16:970 warning Use first person (such as Microsoft.FirstPerson
'I'd') sparingly.
16:996 warning Write short sentences (less openly.SentenceLength
than 25 words).
16:1026 error Don't start a sentence with openly.So
'so '
16:1130 error Don't start a sentence with openly.So
'so '
18:15 warning Avoid using acronyms in a Microsoft.HeadingAcronyms
title or heading.
19:1 warning Write short sentences (less openly.SentenceLength
than 25 words).
19:1 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
19:1 warning Use first person (such as Microsoft.FirstPerson
'I've') sparingly.
19:70 error Use 'for example' instead of Microsoft.Foreign
'e.g.'.
19:70 error Use 'eg' instead of 'e.g.' openly.Punctuation
19:153 warning Use first person (such as ' I Microsoft.FirstPerson
') sparingly.
19:183 warning Write short sentences (less openly.SentenceLength
than 25 words).
19:183 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
19:211 suggestion Try to avoid using 'is'. write-good.E-Prime
19:254 warning Use first person (such as ' I Microsoft.FirstPerson
') sparingly.
19:307 warning Use first person (such as Microsoft.FirstPerson
'my') sparingly.
21:1 warning Write short sentences (less openly.SentenceLength
than 25 words).
21:9 suggestion Try to avoid using 'is'. write-good.E-Prime
21:184 warning Write short sentences (less openly.SentenceLength
than 25 words).
21:184 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
21:193 suggestion Try to avoid using 'is'. write-good.E-Prime
21:249 warning 'monthly' is a weasel word! write-good.Weasel
21:311 error Use 'they're' instead of 'they Microsoft.Contractions
are'.
21:316 suggestion Try to avoid using 'are'. write-good.E-Prime
23:1 warning Write short sentences (less openly.SentenceLength
than 25 words).
23:1 warning 'However' is too wordy. write-good.TooWordy
23:13 suggestion Avoid using "isn't" openly.E-Prime
23:13 suggestion Try to avoid using 'isn't'. write-good.E-Prime
23:69 suggestion Try to simplify this sentence. Microsoft.Semicolon
23:69 error Don't start a sentence with '; openly.ThereIs
there is'
23:77 suggestion Try to avoid using 'is'. write-good.E-Prime
23:80 warning 'a number of' is too wordy. write-good.TooWordy
23:118 suggestion Try to avoid using 'be'. write-good.E-Prime
23:279 warning 'only' is a weasel word! write-good.Weasel
23:286 warning 'few' is a weasel word! write-good.Weasel
23:314 suggestion Try to avoid using 'are'. write-good.E-Prime
23:327 warning 'very' is a weasel word! write-good.Weasel
23:327 warning Consider removing 'very'. Microsoft.Adverbs
54:1 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
54:1 warning Write short sentences (less openly.SentenceLength
than 25 words).
54:1 warning 'Only' is a weasel word! write-good.Weasel
54:26 warning Use first person (such as Microsoft.FirstPerson
'me') sparingly.
54:51 suggestion Try to avoid using 'is'. write-good.E-Prime
54:76 error Use 'eg' instead of 'e.g.' openly.Punctuation
54:76 error Use 'for example' instead of Microsoft.Foreign
'e.g.'.
54:86 warning 'was corrected' may be passive write-good.Passive
voice. Use active voice if you
can.
54:86 suggestion 'was corrected' looks like Microsoft.Passive
passive voice.
54:86 suggestion Try to avoid using 'was'. write-good.E-Prime
54:306 suggestion Consider using 'often' instead Microsoft.ComplexWords
of 'frequently'.
54:317 warning 'completely' is a weasel word! write-good.Weasel
62:73 warning Use first person (such as Microsoft.FirstPerson
'My') sparingly.
62:98 warning Write short sentences (less openly.SentenceLength
than 25 words).
62:98 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
62:104 warning 'early' is a weasel word! write-good.Weasel
62:139 warning Use first person (such as Microsoft.FirstPerson
'I'm') sparingly.
62:139 suggestion Try to avoid using 'I'm'. write-good.E-Prime
62:139 suggestion Avoid using "I'm" openly.E-Prime
62:179 error Use 'Openly' instead of Vale.Terms
'openly'.
62:179 warning 'openly' is a weasel word! write-good.Weasel
62:203 warning Possible future tense. openly.FutureTense
65:1 warning 'Eventually' is a weasel word! write-good.Weasel
65:150 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
65:150 warning Write short sentences (less openly.SentenceLength
than 25 words).
65:155 warning Use first person (such as ' I Microsoft.FirstPerson
') sparingly.
65:158 suggestion Try to avoid using 'was'. write-good.E-Prime
65:194 suggestion Consider using 'idea' instead Microsoft.ComplexWords
of 'concept'.
65:235 warning Use first person (such as Microsoft.FirstPerson
'my') sparingly.
65:245 suggestion Try to avoid using 'that's'. write-good.E-Prime
65:245 suggestion Avoid using "that's" openly.E-Prime
65:262 suggestion Try to avoid using 'is'. write-good.E-Prime
65:262 suggestion 'is marketed' looks like Microsoft.Passive
passive voice.
65:262 warning 'is marketed' may be passive write-good.Passive
voice. Use active voice if you
can.
65:286 suggestion Avoid using "isn't" openly.E-Prime
65:286 suggestion Try to avoid using 'isn't'. write-good.E-Prime
65:350 warning Write short sentences (less openly.SentenceLength
than 25 words).
65:350 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
65:355 warning 'being said' may be passive write-good.Passive
voice. Use active voice if you
can.
65:355 suggestion Try to avoid using 'being'. write-good.E-Prime
65:355 suggestion 'being said' looks like Microsoft.Passive
passive voice.
65:355 suggestion Avoid using "being" openly.E-Prime
65:367 suggestion Try to avoid using 'it's'. write-good.E-Prime
65:455 suggestion Consider using 'try' instead Microsoft.ComplexWords
of 'attempt'.
65:602 warning Write short sentences (less openly.SentenceLength
than 25 words).
65:635 error Don't start a sentence with openly.So
'so '
65:638 warning Consider removing 'seems' openly.Hedging
65:691 warning 'honestly' is a weasel word! write-good.Weasel
65:691 warning Consider removing 'honestly'. Microsoft.Adverbs
65:704 suggestion Try to avoid using 'be'. write-good.E-Prime
65:729 warning Consider using 'sometimes' Microsoft.Wordiness
instead of 'in some cases'.
78:6 error Don't start a sentence with openly.So
'so '
78:54 warning 'be processed' may be passive write-good.Passive
voice. Use active voice if you
can.
78:54 suggestion Try to avoid using 'be'. write-good.E-Prime
78:54 suggestion 'be processed' looks like Microsoft.Passive
passive voice.
78:112 suggestion Try to avoid using 'be'. write-good.E-Prime
78:112 warning 'be reduced' may be passive write-good.Passive
voice. Use active voice if you
can.
78:112 suggestion 'be reduced' looks like Microsoft.Passive
passive voice.
78:181 warning Use first person (such as Microsoft.FirstPerson
'I've') sparingly.
78:221 warning 'It is' is too wordy. write-good.TooWordy
78:221 error Use 'it's' instead of 'It is'. Microsoft.Contractions
78:221 suggestion Try to keep sentences short (< Microsoft.SentenceLength
30 words).
78:221 warning Write short sentences (less openly.SentenceLength
than 25 words).
78:224 suggestion Try to avoid using 'is'. write-good.E-Prime
78:252 warning Use first person (such as Microsoft.FirstPerson
'I'm') sparingly.
78:252 suggestion Avoid using "I'm" openly.E-Prime
78:252 suggestion Try to avoid using 'I'm'. write-good.E-Prime
78:270 suggestion Try to avoid using 'are'. write-good.E-Prime
78:305 suggestion 'being said' looks like Microsoft.Passive
passive voice.
78:305 suggestion Try to avoid using 'being'. write-good.E-Prime
78:305 warning 'being said' may be passive write-good.Passive
voice. Use active voice if you
can.
78:305 suggestion Avoid using "being" openly.E-Prime
78:316 warning Use first person (such as ' I Microsoft.FirstPerson
') sparingly.
78:321 error Don't start a sentence with openly.So
'so '
78:331 warning 'many' is a weasel word! write-good.Weasel
78:363 suggestion Avoid using "I'm" openly.E-Prime
78:363 warning Use first person (such as Microsoft.FirstPerson
'I'm') sparingly.
78:363 suggestion Try to avoid using 'I'm'. write-good.E-Prime
78:382 warning Use first person (such as Microsoft.FirstPerson
'my') sparingly.
78:505 suggestion Try to avoid using 'are'. write-good.E-Prime
78:509 warning Consider removing 'really'. Microsoft.Adverbs
78:509 warning 'really' is a weasel word! write-good.Weasel
78:519 warning 'only' is a weasel word! write-good.Weasel
78:576 warning Use first person (such as Microsoft.FirstPerson
'me') sparingly.
✖ 20 errors, 85 warnings and 75 suggestions in 1 file.