Glossary file format

Details about the glossary file structure

You can upload a glossary in XLSX, XLS, or ODS format.

Import glossary-1

Matecat’s glossary can include as many languages as you need and it supports columns for context information both at the concept level and at the term level. All columns aside from the locale columns are optional and the minimum accepted format is a file containing at least two locale columns.

template

Click on the image above to download a ready-to-use template with the concept-level columns and the term-level columns for American English and European Spanish.

Permutations

If your glossary has between two and ten locales in it, you will be able to use it for jobs in any of the possible combinations of the locales (e.g. if you have en-US, es-ES and it-IT in a glossary you will be able to use it for en-US <> es-ES, en-US <> it-IT and es-ES <> it-IT jobs).

If your glossary has more than 10 locales, Matecat will only create combinations between the first locale column from the left and the rest of the locales (e.g. if in a glossary you have en-US as the first locale from the left, es-ES, it-IT and 10 more locales, you will be able to use it for en-US <> es-ES and en-US <> it-IT, but not for es-ES <> it-IT).

Concept-level columns

The first four columns are concept-level ones and apply to all the terms in the corresponding row. The content of these columns will be taken into account for all jobs where the glossary is active, regardless of the language combination.

Column A "Forbidden": you can use this column to indicate words that should not be used in the target language. You can mark words as forbidden by entering "TRUE" in the A column.

When a row has "TRUE" in this column, it means that the terms in that row should not be used in the translation.

  • If the row only contains a term in one language, the term will be flagged as a forbidden word any time it appears in the target language, regardless of the content of the source. Looking at the example above, the term "Felino" will be flagged as forbidden every time it appears in any job into it-IT.
  • If the row contains terms in multiple languages, they will only be flagged as a forbidden word when the source text contains a term in the same row. Looking at the example above, the term "Fish" will be flagged as forbidden only when "Pez" appears in the source of an es-ES > en-US job and vice versa.

Column B, "Domain": in this column you can indicate what domain the terms in the row belong to. It can be used to disambiguate between different domain-dependent translations of the same concept or just to specify the domain where a certain translation belongs.

Column C, "Subdomain": you can use this column to further improve the accuracy of the translation when “Domain” alone is not sufficient.

Column D, "Definition": used to provide a definition for the concept.

 

Please note that all the concept-level columns are optional, but if more than one is included in your file they should follow the Forbidden-Domain-Subdomain-Definition order for the file to be accepted by Matecat.

Term-level columns

The term-level columns only apply to the terms for a specific locale, so their content will only be taken into account for jobs that have that locale as the source or target.

The term-level columns should be placed to the right of the concept-level columns (if present).

  • Locale column: the heading varies based on what locale the column refers to. This column is used to insert the translation of a concept for the locale in the heading. You can find a list of the available locales here. The glossary is not case sensitive, so if the glossary has "House", Matecat will also match "house" and "HOUSE". For several locales (list available here), Matecat uses advanced matching algorithms which allow it to also match declensions of words such as plurals, different verbal forms etc.
  • "Notes" column: you can use it to add additional information regarding the term in that locale.
  • "Example of use" column: here you can provide a concrete example of use for the translator.

For each locale, the term-level columns should follow the Locale-Notes-Example of use order for the file to be accepted by Matecat.

If a term is wrongly saved in the glossary with a space before or after it, Matecat automatically applies a trim in order to avoid issues on the editor page. This means that whatever form of the term you have saved in the glossary file ("House", " House", "House ") Matecat will underline the term in the editor page.