Package Insert 40
Interesting facts and figures for customers and partners of mt-g.
February 2017
In the spotlight
Terminology Extraction
In the 38th Package Insert, we gave you an overview of the terminology management. We would like to add to this with two more articles. In our first article, we will show you what is meant by terminology extraction.

Where are we heading?

The goal of terminology extraction is to collect and evaluate relevant terms – the term candidates – and to include them in a database. Term extraction naturally takes place in the source language. After the approval of the source-language term candidates, the target-language equivalents are correspondingly defined and likewise approved.
In this way, consistency can be ensured as early as during the production of the text. This restricts the use of several source-language variants and also reduces the need for many exhaustive corrections during the later steps of translation, as well as its checking and validation stages.

Sometimes, less is more

One of the first steps is the selection of the appropriate material. Two aspects are important when selecting material: it needs to have been well evaluated and approved. This provides a sensible basis to avoid unnecessary variations. Next, attempts should be made to use documents with a sufficient, so-called term density, i.e. to use texts that are interspersed with specialist terms. It can then be quite useful to use a less extensive document if it is particularly rich in such terms.
When selecting the term candidates themselves, it is a good idea to be economical: not every word should be accepted; only specialist terms are searched for. The most common words often have a purely grammatical function and are not descriptions for objects or processes. These stop words are sorted out. The terminologist is mainly interested in names, usually expressed as nouns:

  • Nominal phrases (automatic bread buttering knife)
  • Compounds (spreading aid)
  • Common and uncommon nouns (fork/pitch fork)
  • Product names (SeBuBroM)

It decides which terms are retained in the term database. To this end, the relationship to the specialisation and the product are considered as indicators, and a decision is made regarding acceptance of the term. The result can then ideally be used as the core vocabulary of a company or of a particular product line. The courage to be economical is not wrong in this regard – here, quality presides over quantity.
Which extraction method (manual or automated) is used mainly depends on the department budget and the available time. Recording the most important terms from central documents in a simple Excel list by hand can result in a good base of terminology. Software-supported terminology extraction is already possible with most CAT systems, such as crossTerm from Across or MultiTerm Extract 2015 from SDL. Some of these programmes also support multilingual term extraction.

Automation and investment in the future

Term-extraction software scans your documents according to certain guidelines (using statistical and linguistic rules) and creates a list of so-called term candidates, including a statement of their frequency of occurrence. This pre-selection is helpful, but not perfect: the listing contains a large number of term candidates that are to be rejected, in addition to all the possible basic forms of a word and the possible deflected forms.
Finally, it should be mentioned that, depending on the scenario, a multi-lingual term extraction can also be offered, such as when there is approved content in several languages.
For further reading...
Cat: News
Cat: News
Current events
Cat: Termine und Veranstaltungen
Cat: Termine und Veranstaltungen
Unsere Qualitätsstandards sind durch die tiefe Spezialisierung und
fortwährende Prozessoptimierung auf einem außergewöhnlich hohen Niveau.