Customization of the rules
All your technical terms (but not some proper nouns) are unknown to the term checker until you add them to the the disambiguation files. If you do not add your organization's technical names and technical verbs to the term checker, the term checker cannot fully analyse your text.
To customize the term checker, you must know these:
- Basic English grammar: nouns, verbs, adjectives, countability of nouns, inflections of nouns and verbs.
- ASD-STE100. If you do not know ASD-STE100, buy training:
To learn how to write rules, refer to https://dev.languagetool.org/#rule-development. Use an XML editor or a text editor that has syntax highlighting (https://en.wikipedia.org/wiki/Syntax_highlighting).
You can write rules that use regular expressions. To learn about regular expressions, refer to www.regular-expressions.info.
For information about terminology management, refer to Case study: text simplification for shipping procedures (www.techscribe.co.uk/techw/text-simplification-for-shipping-procedures.htm).
To buy customization services, contact TechScribe. TechScribe can do these tasks for you:
- Use your documents to create a dictionary of technical names and technical verbs. Your technical people must specify which of the terms are approved, and which are not-approved alternatives.
- Convert your dictionary terms to the XML format that the term checker uses.
- Write grammar rules to give guidance to your technical writers. For example, possibly some terms are approved only in some documents, not all documents.
- Write grammar rules that find not-approved synonyms of approved terms.
To customize the term checker
- Make sure that you did the installation procedure 'Download the templates for your project terms'.
- Add technical names and technical verbs to disambiguation-projectterms.xml.
- Add not-approved terms and misused terms to grammar-projectterms.xml.
- To make sure that complex rules are correct, use
testrules
.
Add terms to disambiguation-projectterms.xml
The term checker contains technical names from these sources:
- All the technical names that are in the dictionary. The dictionary shows technical names with (TN). Example: BANK (TN).
- Most technical names that are in rule 1.5. Refer to Numbers, units of measurement and time (and their symbols) (rule 1.5.9).
- A small number of technical names that are applicable to most projects, but which are not in ASD-STE100.
The term checker contains most technical verbs that are in rule 1.12.
For each approved term that is not in the term checker, add each inflection of the term. Use the rules that are in disambiguation-projectterms.xml
as examples. For British English, the term checker uses the Oxford spelling. Thus, if applicable, use the Oxford spelling for the terms that you add.
Multi-word terms are more difficult to add than 1-word terms because a separate rule for each inflection is necessary. If you keep the technical terms in a spreadsheet, you can use a script to convert the data to XML. TechScribe uses a series of regular expressions in PowerGREP (www.powergrep.com) to convert the terms to XML.
If a term is approved for only 1 meaning, and if you want to give guidelines to technical writers, add a grammar rule for that term.
ASD-STE100 rule 1.5 does not tell you the part of speech that a technical name has. The rule only tells you the categories of terms that you can add. All the examples are nouns or noun clusters. Rule 1.5.17 tells you that colours are adjectives, but in STE they are technical names. Thus, to comply with the STE specification, you can only add nouns or noun clusters as technical names.
When you write a disambiguation rule, do not use the immunize
attribute. Immunization can cause a grammar rule not to find text (https://dev.languagetool.org/developing-a-disambiguator#immunizing-words-from-matching).
Numbers, units of measurement and time (and their symbols) (rule 1.5.9)
These terms are approved in the term checker:
- Cardinal numbers: one, two, three, four,…
- Ordinal numbers: first, second, third, fourth,…. (Ordinal abbreviations (1st, 2nd, 3rd,…) are not in the term checker.)
- Hyphenated ordinal numbers: twenty-first, twenty-second, twenty-third, twenty-fourth,…
- Basic plural fractions: thirds, fourths,… twentieths, hundredths, thousandths, millionths
- Singular hyphenated fractions: Examples: one-half, one-quarter, one-third, one-fourth, one-hundredth, one-millionth
- Plural hyphenated fractions: Examples: two-thirds, three-quarters, seven-elevenths
- SI units of measurement and their abbreviations: Examples: millimeter, millimetre, mm, lux, lx, gigahertz, GHz
- Non-SI units of measurement (bel, decibel, electronvolt, litre) that are accepted for use with the SI and their abbreviations: Examples: dB, GeV, yoctolitre. For information about SI units of measurement, refer to SI-Brochure-9-EN.pdf, available from www.bipm.org/en/publications/si-brochure/.
- Seasons of the year: spring, summer, autumn, winter, but not American English fall
- Plural numbers (hundreds and larger). Examples: (hundreds, millions, trillions). The words are undefined in ASD-STE100.
- percent, rpm, pi
The term checker uses SI units of measurement. The terms knot, mile, and inch are examples in rule 1.5.9, but they are not in the term checker because they are imperial units of measurement.
To simplify noun clusters (part of rule 2.2)
To simplify a noun cluster, you can "use hyphens (-) between words that are used as a single unit." Sometimes, hyphens in different locations are possible. For example, for the noun cluster filter unit top cover, hyphens in these locations are possible:
- Filter-unit top-cover. Add a technical name that has 2 tokens: filter-unit, top-cover.
- Filter unit-top cover. Add a technical name that has 3 tokens: filter, unit-top, cover.
- Filter unit top-cover. Add a technical name that has 3 tokens: filter, unit, top-cover.
You must make a decision about where to put the hyphens.
To add project terms that are not-approved in STE
A 1-word not-approved term in the STE dictionary can be technical term, if the meaning of the technical term is different from the meaning of the not-approved term. Examples are in the table:
Term | Not-approved meaning | Project approved meaning |
---|---|---|
case (n) | condition | a type of bag: briefcase, suitcase |
chip (n) | particle | semiconductors: integrated circuit, microchip |
collapse (v) | close, fall | astronomy: for a star to fall in on itself |
compile (v) | make a list, record, collect | software development: to change a high-level programming language to binary code to make an executable program |
deposit (n) | particle, contamination | geophysics: a natural underground layer of rock or other material |
route (n, v) | noun: routing [direction of cables and pipes] verb : put. |
logistics: noun: the course to go from a start location to an end location. verb: to calculate or to specify the course of a transport vehicle. |
To ignore the default rule for the not-approved term, do one of these tasks:
- Deactivate the rule
- Add the approved inflections to
disambiguation-projectterms.xlm
.
For example, the noun case is not approved. You will see different messages for the noun case and the verb case with the two sentences that follow:
Each passenger is permitted to put 1 case in the overhead locker.
To prevent an accident, case the gun after you use it.
If you add the singular noun case to disambiguation-projectterms.xlm
, the term checker will ignore the rule for the not-approved noun case and will use the rules for technical names and technical verbs.
To tell writers to use the term with the approved meaning, add a grammar rule for the term.
Most rules for not-approved terms have an exception for words that have a different part of speech. Thus, the rule STE_NOT_APPROVED_case_CASE
does not give a warning for the verb case. Most rules for terms that are not-approved with more than one part of speech do not have exceptions for other parts of speech. For example, the word route is a not-approved as a noun and as a verb. The rule has an exception only for multi-word project terms and for proper nouns.
If you add the noun route to disambiguation-projectterms.xlm
, the term checker will give a correct analysis, but because rule STE_NOT_APPROVED_route_ROUTE
contains examples, if you use testrules
, you will see an error message that contains this text:
Errors expected: 1
Errors found : 0
To add adjectives
Usually, an adjective is not a technical name or a technical verb. (Rule 1.5.17 tells you that colours are technical names). Thus, you must not add an adjective to the disambiguation rules, unless it is a colour. As an alternative, add the full technical term.
Examples of technical names that contain adjectives are as follows: achromatic material, allergic reaction, biodegradable container, sterile surgical equipment.
If you add only the adjective part of a technical name, you will not get an error message for incorrect STE. Examples:
The material is achromatic.
Are you allergic to penicillin?
This container is biodegradable.
The needle is not sterile.
To add adverbs
An adverb is not a technical name or a technical verb. Thus, you must not add an adverb to the disambiguation rules. As an alternative, add the full technical term.
For example, the term intrinsically safe is a technical term (www.osha.gov/laws-regs/regulations/standardnumber/1910/1910.307/).
Examples of technical names that contain intrinsically are as follows: intrinsically safe apparatus, intrinsically safe equipment, intrinsically safe tool.
If you add intrinsically as an adverb, you will not get an error message for this incorrect STE:
The test is intrinsically easy to do.
The term intrinsically safe is an adjective, not a technical name. Thus, if you add intrinsically safe as an adjective, you will not get an error message for this incorrect STE:
The equipment is intrinsically safe.
In ASD-STE100, for an example of a technical name that contains an adverb, refer to the approved example for view (v). The term n o'clock position, where n is an integer between 1 and 12, is a technical name:
The bolt will be at the 2 o'clock position when you look at the pump from the rear.
To specify that text is part of a list item that starts a sentence
In a regular expression, the ^
character (caret) matches the start of a string. In LanguageTool, the largest possible string is a sentence. LanguageTool has a POS tag SENT_START
, which is equivalent to a caret.
Technical documentation frequently contains numbered instructions. The term checker gives the POS tag NLI_SENT_START
to the last token in some basic number patterns. Examples: 2), 3.c). The term checker cannot identify all possible number sequences. For example, Step n) is an unknown number sequence. The term checker gives warnings for the sentence that follows:
Step 3) Open the window.
If Step n) is an approved number sequence, to make the term checker ignore the number sequence and give a correct analysis for the words at the start of a sentence after the number sequence, add this rule to disambiguation-projectterms.xml
:
<rule id="PROJECT_SENTENCE_START" name="Project sentence start: Step n)">
<pattern>
<token postag="SENT_START"/>
<marker>
<token case_sensitive="yes">Step</token>
<token regexp="yes">[1-9]|1[0-9]</token><!-- The approved number range is 1 to 19 -->
<token>)</token>
</marker>
</pattern>
<disambig action="add">
<wd pos="IS_NOUN"/><!-- In the context of the pattern, disambiguate the word to assert that it is a noun -->
<wd pos="IS_NOUN"/><!-- In STE, a number is a noun -->
<wd pos="NLI_SENT_START"/>
</disambig>
<example type="untouched">Step 99) If necessary, change the number range.</example>
<example type="untouched">STEP 5) Use initial capitals only.</example>
<example type="untouched">Before you do Step 5) Open the window.</example>
<example type="ambiguous" inputform=")[)]" outputform=")[)/NLI_SENT_START]">Step 3<marker>)</marker> Open the window.</example>
</rule>
If the sentences do not end with a full stop (period) and are not separated by an empty line, LanguageTool does not find the end of a sentence, and you will see incorrect warnings.
Add terms to grammar-projectterms.xml
To give guidelines to technical writers, add terms to grammar-projectterms.xml
. Typically, add rules for these:
- An not-approved term that has an approved alternative. For example, refer to rule PROJECT_NOT_APPROVED_TECHNICAL_AUTHOR.
- An approved term that is frequently misused.
For examples of the types of rules that you can write, refer to grammar-projectterms.xml
.
In English, many words have more than one part of speech. To prevent unwanted warnings, you can make a rule that shows a message only if a term has (or does not have) a specified part of speech. This example is from Managing terminology with term checker, Jake Cahill, 2018:
<rule id="PROJECT_NOT_APPROVED_screen" name="Project Not Approved noun: screen">
<pattern>
<token regexp="yes">screens?<exception postag="IS_VERB"/></token>
</pattern>
<message>The noun '\1' is not approved. Possible replacements: <suggestion><match no="1" postag_regexp="yes" postag="(NNS?)" postag_replace="$1">page</match></suggestion></message>
<<short>Project Dictionary. Not approved noun: screen</short>
<example correction="page" type="incorrect">This <marker>screen</marker> displays the results.</example>
<example correction="pages" type="incorrect">If the <marker>screens</marker> do not show these messages, stop the test.</example>
<example type="correct">On this <marker>page</marker> you can enter a new name.</example>
<example type="correct">When you <marker>screen</marker> the drugs for side-effects...</example>
<example type="correct">Who <marker>screens</marker> the drugs for side-effects?</example>
<example type="triggers_error">When the medical technicians <marker>screen</marker> the drugs for side-effects...</example><!-- False positive -->
</rule>
This line in the rule tells the term checker to find the words screen and screens except if they are verbs :
<token regexp="yes">screens?<exception postag="IS_VERB"/></token>
In the term checker, the noun screen is approved as a technical name. The word is unknown as a verb. Thus, until you add the verb screen and its approved inflections in disambiguation-projectterms.xml
, you will see a message that tells you not to use a technical name as a verb. (You can deactivate the rule.)
In grammar-projectterms.xml
, you can use these values with the postag
attribute:
- The LanguageTool POS tags, as shown in tagset.txt (https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/tagset.txt).
- The term checker POS tags that identify the part of speech that a word has:
IS_ADJECTIVE
IS_ADVERB
IS_CONJUNCTION
IS_NNP
(=proper noun)
IS_NOUN
IS_PREPOSITION
IS_PRONOUN
IS_VERB
TechScribe makes continual improvements to the disambiguation, but not all words are disambiguated. Thus, not all words have a POS tagIS_x
. - Other term checker POS tags:
NLI_SENT_START
. Refer to To specify that text is part of a list item that starts a sentence.
PUNCTUATION_QUOTES
PUNCTUATION_SEPARATORS
Notes:
- A marker is not necessary in a correct example, but a marker helps to make the example clear.
- Use a minimum of one correct example and one incorrect example. The user sees the first example of each type.
- If you use
testrules
, the more examples there are with different grammatical structures, the more sure you can be that the rule is correct.
To find the part of speech that a word has
- In LanguageTool, select Text Checking>Tag Text.
- The Tagger Result screen shows the parts of speech that a word has:
To make sure that complex rules are correct, use testrules
If you write complex rules, use testrules to make sure that the rules are correct. Refer to https://dev.languagetool.org/development-overview#testing-rules.
STE rule 1.6 shows that an not-approved STE term can be an approved project term. For example, the word regulation is not approved as a noun, but rule 1.5.15 and the example in people (n) show that it can be a technical name. The word is in the term checker and a rule tells you to make sure that it has the correct meaning.
Not all the not-approved STE terms that can be technical names or technical verbs are in the term checker. For example, the word route as a noun and as a verb is not approved in ASD-STE100 and it is not in the term checker as a technical name or a technical verb. If route is an approved term in your organization, if you add the approved inflections of route to disambiguation-projectterms.xml
, testrules will give an error message because the term checker has a dictionary rule for the word. The dictionary rule contains an example of incorrect text and it also has an exception for project terms. Thus, there is a conflict, which testrules finds.
Local files version only. To prevent the testrules error message, put the STE rule into comments or delete the rule from grammar-ste8.xml
and change the POS tags in the applicable rules in disambiguation-ste8.xml
.
Other customization
You can customize the rules to make other types of language quality-assurance software, for example:
- A term checker for other controlled languages
- A style checker for a style guide, for example, the Microsoft Manual of Style
- A style checker for plain English
- A pre-editor for machine translation.