1. Analyze

How Does Matecat Calculate Payable Words?

Matecat relies on a combination of advanced TM technology (MyMemory) and a reduction in word weighting applied for machine translation suggestions. This allows Matecat to reveal more matches than any other translation tool.

According to industry standards, words or phrases with a 100% Translation Memory match are given a weighting of 30% and words or phrases with a partial TM match are given a weighting of 60%.

For Machine Translation, Matecat decides which reduction in weighting to apply, depending on the extent to which the MT has been useful for the past 1 million words for each language pair.

Matecat assumes that the less the machine translation suggestion is edited by translators, the more useful it is.

We decided to split the benefits of this technology between the language service provider and the translator. So, if a translator saves 20% of his time, the word count is reduced by 10% only.

How Matecat counts payable words in a project is indicated in the Volume Analysis Report generated during creation of the project itself, which can be downloaded on the same page.

Payable words are marked in bold and calculated by multiplying the words for each match type by the payable percentage.

                                           (2*0.6) + (104*0.6)+(7*0.3)+(199*0.8) = 225

Looking at the report above,  we have the indication of:

✔️Payable Words

Payable word count is the sum of the weighted word count for each match type multiplied by its payable rate percentage (225).

✔️Total Words

The total word count without leveraging any content from translation memory matches, repetitions, or machine translation. This is similar to what Microsoft Word would give as a word count in a .doc or .docx file (312).

✔️New Words

All words found in segments that:

  • do not match any fuzzy or complete match in the private TM Key and/or in the public TM;
  • are not repeated in the project;
  • do not have a suggestion from a machine translation.

✔️Repetitions

A number of words of identical segments that occur more than once throughout the project.

For example, imagine that we find the following segments in our translation:

  • My house is blue.
  • My house is blue.

Segments 1 and 2 are identical segments, so they are counted as 4 repetitions.

✔️Internal Matches

Internal matches are similar segments found in the document you are translating. For example, imagine that we find the following segments in our translation:

  • My house is blue.
  • My house is red.

Matecat recognizes that segment 2 is similar to segment 1 (3 words out of 4 are identical) so the following would occur during translation:

  • You translate segment 1.
  • The translation memory is updated with this translation.
  • You open segment 2 for translation.
  • A search is performed in the TM for segment 2 and an 80% fuzzy match is found.

Matecat, therefore, counts four new words for segment 1 but, because an 80% fuzzy match has been found in the TM, the four words in segment 2 are counted as Internal Matches (in terms of weighted words, these are counted as 2.4, or 60% of 4).

✔️Partial TM

In this case, the similarity between the document to be translated and any correspondences found in the translation memory (fuzzy matches) are calculated.

For example, imagine that we have the following segments in an EN>FR translation:

  • My house is blue.
  • My house is red.

and that the TM contains this:

Source Target My house is blue.Ma maison est bleue.

For segment 1 (My house is blue), there will be a 100% match in the translation memory. For segment 2 (My house is red), there will be a 75% fuzzy match (segment 2 and the segment found in the translation memory only differ in terms of the colour of the house: blue/red –bleue/rouge, so 1 word out of 4).

✔️100% TM

This is a 100% match between a segment in the source language found in the document to translate and an identical sentence found in the source language in the private translation memory.

✔️100% Public TM

This is a 100% match between a segment in the source language found in the document to translate and an identical sentence found in the source language in the public translation memory.

✔️100% TM in Context

This is more than a 100% match.

If you have a 100% context match (the corresponding label is “101%”), this means that both of the following 2 conditions exist:

  • the segment in the document has a 100% match in the translation memory;
  • the segment in the document and the segment in the translation memory must both be preceded by the same segment.

Actually, this means that context information is stored in the translation memory as metadata, so each segment stored in the TM contains the following information (not visible in the translation memory contents by the users):

  • the source segment;
  • the corresponding translation;
  • the segment preceding the segment itself.

When a segment has an in-context exact match the target segment has a green bar to its right. That means that you will not need to translate or approve the segment.

There is also a lock icon that appears on the left, which enables translators to unlock the segment and modify it if necessary.