Spell and grammar check

WProofreader suggestions differ in three types:

  • spelling (red underlines)

  • grammar, punctuation (blue underlines)

  • style (yellow underlines)

Algorithmic suggestions

Spelling suggestions are generated by Hunspellarrow-up-right, a third-party open-source algorithmic spell checker and morphological analyzer that is used as a spelling engine in our product. In the checkkits it comes as 'hs'.

It checks word-by-word if the word is in the dictionary. If not, it generates the closest suggestion. In general, there are 13–14 steps, 11 of which can be repeated if the language has a lot of compounds that can be written as 1 word. For example, German or Dutch.

For better suggestions, we rely on N-gram prioritization, functionality that is developed to reorganize (sort in another way) the suggestions based on the context. The context is two left words and two right words.

Also, we manually create spelling prioritization rules for English, German, Spanish, Italian, French, Portuguese, etc. that can be applied to the whole word or a part of it to prioritize valid suggestions. It can also give suggestions outside the scope that Hunspell has generated.

Named Entity Recognition (NER) is another functionality to avoid false positives, that skips correction of unknown proper names. Sometimes there are inconsistencies that may underline the proper name in one sentence as spelling, and skip the same proper name in another sentence.

For English and German, we provide dialect support in the form of files with words written with different spellings in different dialects of the same language.

Grammar suggestions are provided by the third-party grammar engine, LanguageToolarrow-up-right. In user checkkits it comes as 'lt' and has blue underlying. As well as Hunspell, this engine is algorithmic, so all the rules are written down where it should work and where not. LanguageTool supports 36 languages (excluding dialects), may give more than 1 suggestion and may have descriptions for grammar suggestions.

Starting from v6.6.0, our linguistic team modifies inconsistent and irrelevant grammar rules to improve the performance of the grammar engine.

AI-driven suggestions

WProofreader AI-driven engine provides both spelling and grammar suggestions marked by red and blue underlines respectively. In the checkkits it comes as 'ai'.

We use in-house RedPenNet (RPN) architecturearrow-up-right, based on RoBERTaarrow-up-right base model for WProofreader AI-driven engine.

99% of the time, the AI has only 1 suggestion. Suggestions are always context-dependent and removing or adding 1 word may change all suggestions in the sentence. Right now, suggestions have no explanations as to why something should be corrected.

Spelling suggestions are those that have a ratio of less than 0.5, meaning that the match and the suggestions are pretty similar. Additionally, the suggestion is word-to-word.

Grammar suggestions are those that have a ratio of more than 0.5, meaning that the match and the suggestions are not pretty similar. Suggestions include all corrections that were not classified as spelling.

NB! Some spelling suggestions may be underlined as grammar because the model doesn’t know the types.

Check kits

Users and admins can change the default proofreading configuration via check kits. As of now, there are several configurations to choose from:

  • lt_hs – Provides standard spelling and grammar suggestions through algorithmic engines only; no AI involvement.

  • ai_lt_hs0 – AI-engine provides grammar and spelling suggestions. The algorithmic engine highlights spelling errors without showing suggestions; algorithmic grammar suggestions are also enabled.

  • ai_lt_hs1 – AI-engine provides grammar and spelling suggestions, with algorithmic grammar and spelling support where AI does not offer coverage.

  • ai – Only AI-engine provides grammar and spelling suggestions; no additional support from algorithmic engines is applied.

  • ai_lt – AI-engine provides grammar and spelling suggestions, with algorithmic grammar support where AI does not offer coverage.

To change the server default check kit per language (self-hosted), see Check kits.

Last updated

Was this helpful?