Spelling and grammar checking from the terminal

2023-06-14

tl;dr try Vale + Openly!

In my attempts to write without leaving a terminal, I ran into one small issue: checking spelling and grammar. Is it entirely necessary? No - but I’m a stickler for at least catching obvious mistakes. When reading works written by others, it’s the glaring issues that indicate to me the writer didn’t at least bother to proofread what they’d written before publishing. In the current climate, however, it may also be a positive to leave some obvious human error behind to really cement the point that it isn’t just AI-generated content, but that’s an entirely different topic (at least until content farms tune their bots to explicitly introduce errors 😔).

aspell, hunspell, LanguageTool?

Searching for this topic returns the popular aspell, hunspell, and LanguageTool options - all viable in their own right, but with varying drawbacks. Basic spellchecking is easy to achieve and script into my attempted workflow via the first two options, grammar is out of their scope. LanguageTool was the next option, and there is a number of ways get a decent front-end to it, e.g. via pyLanguagetool. Open-source to some extent, but with the caveat of needing to either send text to their servers, or self-host the rather heavy server component yourself. Two popular docker implementations exist and are advertised on their Github page, and for what it’s worth, that totally works - only drawback is having yet another docker container running for yet another service that I’ll use maybe once or twice a week. Spinning up/down the container via a script whenever I need it is possible, but it’s also just more cruft that I’d rather not deal with. The necessary ngram data is also a hefty 8GB - moving that around to devices that may not have a network connection when writing is also more added friction.

What about AI?

I’ve made ample use of self-hosted AI models for creative projects - e.g. generating story contexts for DnD, or seeing how far a bot can flesh out ideas I have for a plot in a game. Similarly, image generation is useful for getting an idea down for what I would want a character to look like before trying my hand at modeling in Blender or hiring an artist to actually do art.

ChatGPT is entirely capable of parsing a markdown file and fixing spelling and grammar errors, complete with actually useful suggestions and an understanding of most technical terms. Drawback is paying for even minimal API usage adding yet another monthly subscription service (totally understandable though - they are a business, despite the “Open” in “OpenAI”), on top of the fact that it runs into the same problem as the non-self-hosted LanguageTool option - shipping your data off to a third-party.

However, AI isn’t all dependant on remote hosts with large GPU farms; there is a number of available models that can be self-hosted and run on just CPU. Setting up pszemraj/flan-t5-large-grammar-synthesis takes only a few minutes and the results are actually very good to some extent:

- IN: I walkd to the store and I bught milk. I will eat fish for dinner and drank milk. We all eat the fish and then made dessert.
- OUT: I went to the store and bought some milk. I will eat fish for dinner and drink milk. We all ate the fish and then made dessert.

- IN: Anna and Mike is going sking. Matt lke fish. We enjoys horror movies.
- OUT: Anna and Mike are going skiing. I like fish. We enjoy horror movies.

- IN: Anna and Pat are maried; he has been togther for 20 years. I fed all of her fish, then cleaned its tank. When the girls on the team got to the hotel, they dropped off her luggage.
- OUT: Anna and Peter are married; they have been together for 10 years. I fed all of her fish, and then cleaned the tank. When the girls on the team got to the hotel, they dropped off her luggage.

- IN: I can't harly believe what she said. I don't want no pudding. I won't do nothing.
- OUT: I can't really believe what she said. I don't want any peanuts. I won't do anything.

- IN: Because I ate dinner. I accidentaly. You accidentally what? Jumped high.
- OUT: Because I ate dinner. I accidently dropped a knife. What happened? High jump.

- IN: I went to the store I got milk and cookies. We drove all day in the van, we drove 206 miles to get to our cabin. We got to the cabin and we unpacked our things and we built a campfire and we made dinner and we slept under the stars.
- OUT: I went to the store and bought some milk and cookies. We drove all day in the van, and we drove 206 miles to get to our cabin. We got to the cabin and unpacked our things and we built a fire and made dinner and we slept under the stars.

- IN: Frustated, the chairs took me forever to set up. When they were finaly all together, studets drapd the chairs in bunting. Sleeping in the orchard, a serpent stung me.
- OUT: Frustrated, the chairs took me far too long to set up. When they were all together, the students drew pictures on the wall. Slumping in the garden, I sat down.

- IN: Since your going out, you should check the weather on you're phone. There car broke down so their hitching a ride to they're class. Can they ride home with you? I won't drive them because my car is in the shop.
- OUT: Since you are going out, you should check the weather on your phone. Their car broke down so they had to take a ride. Can they ride home with you? I won't drive because my car is in the shop.

- IN: Test line at the end, totally proper grammar.
- OUT: The test line at the end, totally proper grammar.

Examples sourced and modified from here.

Only real issue that led me away from this option is the lack of feedback, e.g. what was corrected and why, coupled with the general errors introduced by the model (“hallucinations” and similar issues). Without batching the input into several sentences at a time as recommended by the developer, it would frequently completely transform a sentence:

- IN: Matt lke fish.
- OUT: I like fish... I like fish too. . . My dad likes fish.

- IN: Matt like fish.
- OUT: My dad likes fish. I like fish too.

One of the smaller models in earlier testing turned the same line into My cat likes ice cream. Still early days projects like this, but I’m otherwise hopeful that more work on openly available models will get them to parity with commercial offerings, at least for specialized use cases.

Vale

Eventually, I stumbled on Vale from a post about LanguageTool. While I was familiar with code linting, the concept of prose linting never crossed my mind - that’s what Vale is marketed for, but it isn’t actually built for the specific case of grammar checking. That being said, it’s open source and built around processing based on “styles” - one of which exists to attempt to replicate the most popular commercial grammar checking product, Grammarly, aptly named Openly. The included “Microsoft” style also seems to catch a lot of relevant errors as well, and honestly may be good enough by itself in some cases.

Vale compiles to a single binary, pull down the Openly release in the styles folder, and a simple config file from the examples:

StylesPath = styles

MinAlertLevel = suggestion

Packages = Microsoft, write-good, Hugo

[*]
BasedOnStyles = openly, Vale, Microsoft, write-good

It also supports directly checking files intended to be processed by Hugo as well, major bonus. The alerts can be reduced based on their levels (error, warning, or suggestion) but I’ve left it at the default for myself. It is by no means perfect, and I’m certain there are issues it doesn’t catch - that being said, I also ignore many of the suggestions because I’m not submitting my writing as professional literature in any level. Glaring spelling errors and sentences that have noticeably bad grammar are really my only targets, which this utility handles well enough for me.

rwz@xb ~/hugo > vale content/posts/grammarcheck.md 
 content/posts/grammarcheck.md
 1:1      warning     Grade level (13.44) too high!   openly.Readability           
 11:1     warning     'tl;dr try Vale + Openly!'      openly.Titles                
                      should be in sentence case                                   
 11:1     warning     Don't use end punctuation in    Microsoft.HeadingPunctuation 
                      headings.                                                    
 11:1     suggestion  'tl;dr try Vale + Openly!'      Microsoft.Headings           
                      should use sentence-style                                    
                      capitalization.                                              
 11:4     warning     Possible future tense.          openly.FutureTense           
 13:4     warning     Use first person (such as       Microsoft.FirstPerson        
                      'my') sparingly.                                             
 13:52    warning     Use first person (such as ' I   Microsoft.FirstPerson        
                      ') sparingly.                                                
 13:112   suggestion  Try to avoid using 'Is'.        write-good.E-Prime           
 13:147   suggestion  Try to avoid using 'I'm'.       write-good.E-Prime           
 13:147   warning     Use first person (such as       Microsoft.FirstPerson        
                      'I'm') sparingly.                                            
 13:147   suggestion  Avoid using "I'm"               openly.E-Prime               
 13:202   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 13:240   suggestion  Try to avoid using 'it's'.      write-good.E-Prime           
 13:269   warning     'indicate' is too wordy.        write-good.TooWordy          
 13:281   warning     Use first person (such as       Microsoft.FirstPerson        
                      'me') sparingly.                                             
 13:370   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 13:370   suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 13:394   warning     'however' is too wordy.         write-good.TooWordy          
 13:412   error       Don't start a sentence with     openly.So                    
                      'so '                                                        
 13:415   suggestion  Try to avoid using 'be'.        write-good.E-Prime           
 13:473   warning     Consider removing 'really'.     Microsoft.Adverbs            
 13:473   warning     'really' is a weasel word!      write-good.Weasel            
 13:505   suggestion  Try to avoid using 'isn't'.     write-good.E-Prime           
 13:505   suggestion  Avoid using "isn't"             openly.E-Prime               
 13:542   suggestion  Avoid using "that's"            openly.E-Prime               
 13:542   suggestion  Try to avoid using 'that's'.    write-good.E-Prime           
 15:4     suggestion  'aspell, hunspell,              Microsoft.Headings           
                      LanguageTool?' should use                                    
                      sentence-style capitalization.                               
 15:4     warning     'aspell, hunspell,              openly.Titles                
                      LanguageTool?' should be in                                  
                      sentence case                                                
 15:33    warning     Don't use end punctuation in    Microsoft.HeadingPunctuation 
                      headings.                                                    
 16:170   suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 16:205   warning     Use first person (such as       Microsoft.FirstPerson        
                      'my') sparingly.                                             
 16:262   suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 16:298   suggestion  Try to avoid using 'was'.       write-good.E-Prime           
 16:329   suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 16:332   warning     'a number of' is too wordy.     write-good.TooWordy          
 16:379   error       Use 'for example' instead of    Microsoft.Foreign            
                      'e.g.'.                                                      
 16:379   error       Use 'eg' instead of 'e.g.'      openly.Punctuation           
 16:389   error       Did you really mean             Vale.Spelling                
                      'pyLanguagetool'?                                            
 16:389   error       Did you really mean             openly.Spelling              
                      'pyLanguagetool'?                                            
 16:450   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 16:584   suggestion  Consider using 'part' instead   Microsoft.ComplexWords       
                      of 'component'.                                              
 16:604   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 16:604   suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 16:649   suggestion  'are advertised' looks like     Microsoft.Passive            
                      passive voice.                                               
 16:649   warning     'are advertised' may be         write-good.Passive           
                      passive voice. Use active                                    
                      voice if you can.                                            
 16:649   suggestion  Try to avoid using 'are'.       write-good.E-Prime           
 16:690   warning     Try to avoid using clichés      write-good.Cliches           
                      like 'for what it's worth'.                                  
 16:699   suggestion  Try to avoid using 'it's'.      write-good.E-Prime           
 16:732   warning     'only' is a weasel word!        write-good.Weasel            
 16:746   suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 16:822   warning     Use first person (such as       Microsoft.FirstPerson        
                      'I'll') sparingly.                                           
 16:824   warning     Possible future tense.          openly.FutureTense           
 16:859   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 16:911   warning     Use first person (such as ' I   Microsoft.FirstPerson        
                      ') sparingly.                                                
 16:919   warning     'it is' is too wordy.           write-good.TooWordy          
 16:919   error       Use 'it's' instead of 'it is'.  Microsoft.Contractions       
 16:939   suggestion  Try to avoid using 'it's'.      write-good.E-Prime           
 16:946   error       Don't start a sentence with     openly.So                    
                      'so '                                                        
 16:970   warning     Use first person (such as       Microsoft.FirstPerson        
                      'I'd') sparingly.                                            
 16:996   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 16:1026  error       Don't start a sentence with     openly.So                    
                      'so '                                                        
 16:1130  error       Don't start a sentence with     openly.So                    
                      'so '                                                        
 18:15    warning     Avoid using acronyms in a       Microsoft.HeadingAcronyms    
                      title or heading.                                            
 19:1     warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 19:1     suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 19:1     warning     Use first person (such as       Microsoft.FirstPerson        
                      'I've') sparingly.                                           
 19:70    error       Use 'for example' instead of    Microsoft.Foreign            
                      'e.g.'.                                                      
 19:70    error       Use 'eg' instead of 'e.g.'      openly.Punctuation           
 19:153   warning     Use first person (such as ' I   Microsoft.FirstPerson        
                      ') sparingly.                                                
 19:183   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 19:183   suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 19:211   suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 19:254   warning     Use first person (such as ' I   Microsoft.FirstPerson        
                      ') sparingly.                                                
 19:307   warning     Use first person (such as       Microsoft.FirstPerson        
                      'my') sparingly.                                             
 21:1     warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 21:9     suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 21:184   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 21:184   suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 21:193   suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 21:249   warning     'monthly' is a weasel word!     write-good.Weasel            
 21:311   error       Use 'they're' instead of 'they  Microsoft.Contractions       
                      are'.                                                        
 21:316   suggestion  Try to avoid using 'are'.       write-good.E-Prime           
 23:1     warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 23:1     warning     'However' is too wordy.         write-good.TooWordy          
 23:13    suggestion  Avoid using "isn't"             openly.E-Prime               
 23:13    suggestion  Try to avoid using 'isn't'.     write-good.E-Prime           
 23:69    suggestion  Try to simplify this sentence.  Microsoft.Semicolon          
 23:69    error       Don't start a sentence with ';  openly.ThereIs               
                      there is'                                                    
 23:77    suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 23:80    warning     'a number of' is too wordy.     write-good.TooWordy          
 23:118   suggestion  Try to avoid using 'be'.        write-good.E-Prime           
 23:279   warning     'only' is a weasel word!        write-good.Weasel            
 23:286   warning     'few' is a weasel word!         write-good.Weasel            
 23:314   suggestion  Try to avoid using 'are'.       write-good.E-Prime           
 23:327   warning     'very' is a weasel word!        write-good.Weasel            
 23:327   warning     Consider removing 'very'.       Microsoft.Adverbs            
 54:1     suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 54:1     warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 54:1     warning     'Only' is a weasel word!        write-good.Weasel            
 54:26    warning     Use first person (such as       Microsoft.FirstPerson        
                      'me') sparingly.                                             
 54:51    suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 54:76    error       Use 'eg' instead of 'e.g.'      openly.Punctuation           
 54:76    error       Use 'for example' instead of    Microsoft.Foreign            
                      'e.g.'.                                                      
 54:86    warning     'was corrected' may be passive  write-good.Passive           
                      voice. Use active voice if you                               
                      can.                                                         
 54:86    suggestion  'was corrected' looks like      Microsoft.Passive            
                      passive voice.                                               
 54:86    suggestion  Try to avoid using 'was'.       write-good.E-Prime           
 54:306   suggestion  Consider using 'often' instead  Microsoft.ComplexWords       
                      of 'frequently'.                                             
 54:317   warning     'completely' is a weasel word!  write-good.Weasel            
 62:73    warning     Use first person (such as       Microsoft.FirstPerson        
                      'My') sparingly.                                             
 62:98    warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 62:98    suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 62:104   warning     'early' is a weasel word!       write-good.Weasel            
 62:139   warning     Use first person (such as       Microsoft.FirstPerson        
                      'I'm') sparingly.                                            
 62:139   suggestion  Try to avoid using 'I'm'.       write-good.E-Prime           
 62:139   suggestion  Avoid using "I'm"               openly.E-Prime               
 62:179   error       Use 'Openly' instead of         Vale.Terms                   
                      'openly'.                                                    
 62:179   warning     'openly' is a weasel word!      write-good.Weasel            
 62:203   warning     Possible future tense.          openly.FutureTense           
 65:1     warning     'Eventually' is a weasel word!  write-good.Weasel            
 65:150   suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 65:150   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 65:155   warning     Use first person (such as ' I   Microsoft.FirstPerson        
                      ') sparingly.                                                
 65:158   suggestion  Try to avoid using 'was'.       write-good.E-Prime           
 65:194   suggestion  Consider using 'idea' instead   Microsoft.ComplexWords       
                      of 'concept'.                                                
 65:235   warning     Use first person (such as       Microsoft.FirstPerson        
                      'my') sparingly.                                             
 65:245   suggestion  Try to avoid using 'that's'.    write-good.E-Prime           
 65:245   suggestion  Avoid using "that's"            openly.E-Prime               
 65:262   suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 65:262   suggestion  'is marketed' looks like        Microsoft.Passive            
                      passive voice.                                               
 65:262   warning     'is marketed' may be passive    write-good.Passive           
                      voice. Use active voice if you                               
                      can.                                                         
 65:286   suggestion  Avoid using "isn't"             openly.E-Prime               
 65:286   suggestion  Try to avoid using 'isn't'.     write-good.E-Prime           
 65:350   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 65:350   suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 65:355   warning     'being said' may be passive     write-good.Passive           
                      voice. Use active voice if you                               
                      can.                                                         
 65:355   suggestion  Try to avoid using 'being'.     write-good.E-Prime           
 65:355   suggestion  'being said' looks like         Microsoft.Passive            
                      passive voice.                                               
 65:355   suggestion  Avoid using "being"             openly.E-Prime               
 65:367   suggestion  Try to avoid using 'it's'.      write-good.E-Prime           
 65:455   suggestion  Consider using 'try' instead    Microsoft.ComplexWords       
                      of 'attempt'.                                                
 65:602   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 65:635   error       Don't start a sentence with     openly.So                    
                      'so '                                                        
 65:638   warning     Consider removing 'seems'       openly.Hedging               
 65:691   warning     'honestly' is a weasel word!    write-good.Weasel            
 65:691   warning     Consider removing 'honestly'.   Microsoft.Adverbs            
 65:704   suggestion  Try to avoid using 'be'.        write-good.E-Prime           
 65:729   warning     Consider using 'sometimes'      Microsoft.Wordiness          
                      instead of 'in some cases'.                                  
 78:6     error       Don't start a sentence with     openly.So                    
                      'so '                                                        
 78:54    warning     'be processed' may be passive   write-good.Passive           
                      voice. Use active voice if you                               
                      can.                                                         
 78:54    suggestion  Try to avoid using 'be'.        write-good.E-Prime           
 78:54    suggestion  'be processed' looks like       Microsoft.Passive            
                      passive voice.                                               
 78:112   suggestion  Try to avoid using 'be'.        write-good.E-Prime           
 78:112   warning     'be reduced' may be passive     write-good.Passive           
                      voice. Use active voice if you                               
                      can.                                                         
 78:112   suggestion  'be reduced' looks like         Microsoft.Passive            
                      passive voice.                                               
 78:181   warning     Use first person (such as       Microsoft.FirstPerson        
                      'I've') sparingly.                                           
 78:221   warning     'It is' is too wordy.           write-good.TooWordy          
 78:221   error       Use 'it's' instead of 'It is'.  Microsoft.Contractions       
 78:221   suggestion  Try to keep sentences short (<  Microsoft.SentenceLength     
                      30 words).                                                   
 78:221   warning     Write short sentences (less     openly.SentenceLength        
                      than 25 words).                                              
 78:224   suggestion  Try to avoid using 'is'.        write-good.E-Prime           
 78:252   warning     Use first person (such as       Microsoft.FirstPerson        
                      'I'm') sparingly.                                            
 78:252   suggestion  Avoid using "I'm"               openly.E-Prime               
 78:252   suggestion  Try to avoid using 'I'm'.       write-good.E-Prime           
 78:270   suggestion  Try to avoid using 'are'.       write-good.E-Prime           
 78:305   suggestion  'being said' looks like         Microsoft.Passive            
                      passive voice.                                               
 78:305   suggestion  Try to avoid using 'being'.     write-good.E-Prime           
 78:305   warning     'being said' may be passive     write-good.Passive           
                      voice. Use active voice if you                               
                      can.                                                         
 78:305   suggestion  Avoid using "being"             openly.E-Prime               
 78:316   warning     Use first person (such as ' I   Microsoft.FirstPerson        
                      ') sparingly.                                                
 78:321   error       Don't start a sentence with     openly.So                    
                      'so '                                                        
 78:331   warning     'many' is a weasel word!        write-good.Weasel            
 78:363   suggestion  Avoid using "I'm"               openly.E-Prime               
 78:363   warning     Use first person (such as       Microsoft.FirstPerson        
                      'I'm') sparingly.                                            
 78:363   suggestion  Try to avoid using 'I'm'.       write-good.E-Prime           
 78:382   warning     Use first person (such as       Microsoft.FirstPerson        
                      'my') sparingly.                                             
 78:505   suggestion  Try to avoid using 'are'.       write-good.E-Prime           
 78:509   warning     Consider removing 'really'.     Microsoft.Adverbs            
 78:509   warning     'really' is a weasel word!      write-good.Weasel            
 78:519   warning     'only' is a weasel word!        write-good.Weasel            
 78:576   warning     Use first person (such as       Microsoft.FirstPerson        
                      'me') sparingly.                                             

✖ 20 errors, 85 warnings and 75 suggestions in 1 file.