Introducing Corpus, how to manage the Translation Memory and the Termbase



Introducing Corpus



Learning Objectives

After completing this activity, you will be able to:

  • Manage the Translation Memories and Term Bases of your different projects.

Duration: 15 minutes


Maintain the CAT tools to help Linguistic Producers deliver quality translations


Wezen includes CAT tools that help translators and reviewers work better and faster. Currently, Wezen includes the following CAT tools:

  1. Translation Memory
  2. Term Base


Your responsibilities regarding the Translation Memory

  • Check that entries in the source-target languages pair that you manage are correct and do not contain 
    mistakes (spelling, spacing).
  • Add/Delete translation memory entries
  • Import/export of Translation Memories.


Your responsibilities regarding the Term Base

  • Check that terms are correctly linked so that a selected term in a specific language is translated correctly
     into another language
  • Add/Delete Terms
  • Import/Export of term Bases


Translation Memory


A Translation Memory (TM) is a database in which you will find all the source segments and their corresponding possible translations which are used by users to process segments in the Translation Studio. To access the administration interface of the TM connected to the selected project, head to the "Translation Memory” window on your Homepage dashboard, and select the Translation Memory you want to work on.



Inside a Translation Memory, you can see segments that have a specific language as their source language. To view said entries, you will have to click on the flag of your choice, or select the source language in the “source language” filter. 


An entry is composed of three main attributes:

  • Source: contains a source segment.
    For example: source segment in English (language code: en-US) "Style them with a statement blazer and kick-flare jeans for an alternative evening ensemble."


  • Translation: contains possible translations of the source segment. These possible translations can differ by a few words, a few characters, or be completely different.
    For example: possible translations in French (language code: fr-FR).
    Translation 1:"Portez-le avec un blazer et un jean évasé pour un ensemble de soirée alternatif."
    Translation 2: "Portez-le avec un blazer et un jean évasé, afin d'avoir le look parfait pour une soirée alternative.


  • Status: indicates which type of user validated the possible translation of the source segment in the Translation Studio. The values can be:
  • Translated: if the entry has been proposed by a translator.

       

  • ApprovedTranslation: if the entry has been proposed or approved by a reviewer. It should be preferred over translations in Translated status.
  • ApprovedSignOff: if the entry has been created or modified by a customer. It should be preferred over translations in ApprovedTranslation and Translated statuses.


Other attributes for an entry are available in the Translation Memory Management Interface

  • Target language: flag and language code corresponding to the language of the translation for that segment
  • Comments: any comments that were added during translation or by a Manager can be seen for a translation of a segment
  • Creator: displays the username of the user who created the translation of that segment
  • Creation date (Source): displays the date of the source segment’s creation
  • Creation date (Target): displays the date of the translation’s creation
  • Last editor: displays the username of the last user to edit this translation
  • Last edit date: displays the date on which the translation was last edited
  • Last user: displays the username of the last user to use this translation during the Localization process
  • Last usage date: displays the date at which the translation has been last used during the Localization process.


Other fields have also been added for an entry, if you want more detailed explanations, please check the dedicated Helpdesk article.

Whenever a user  sends a task to the following status in the workflow, the TM is updated with all the segments from that task:

  • If the source segment is not present in the TM, a new entry is added into the TM for the pair of source-target languages related to the user. "Source" will be filled with the source segment content, "Translation" with the target segment content, and the "Status" will depend on the type of the user who validated the target segment.
  • If the source segment is already present in the TM and if the target segment was validated with an existing entry in the TM, the "Number of usage" attribute of the used translation is incremented by one, and its Wezen Score is updated.
  • If the source segment is already present in the TM and if the target segment was validated without using an existing entry in the TM, a new possible translation is added to the source segment in the TM.



Edit entries in the Translation Memory

To edit entries in the Translation Memory, you can click on them and select or type in the desired value. Editable fields in a Translation Memory entry are the source content, target content, comments and status.



Add/Delete entries in the Translation Memory

You can add an entry by creating a new source segment or by adding a translation to an existing source segment. To create a new source segment, click on the “+” button in the toolbar. To create a new translation for an existing source segment, click on the “+” button on the right side of the entry. However, TM entries are mainly added by the contributions of linguistic producers and customers when they are processing segments in the Translation Studio, you will rarely add entries manually.



Deleting wrong translations ensures that the TM provides linguistic producers with quality translation suggestions. You can delete one or multiple entries at once by selecting them and clicking on "Delete" in the toolbar. Confirm the deletion by clicking on the "Confirm" button in the popup which will appear.



 

Import/Export Translation Memories


For now, Translation Memories can only be imported if they are in .tmx format, and exported as a .tmx file. To import entries into the Translation Memory using a .tmx file, please create a ticket on the Wezen Helpdesk

To import entries into the Translation Memory using a .tmx file, please first make sure that:

1) Language codes in <tuv xml:lang="[language code]"> tag are the same as those used on Wezen (ex: fr-FR, en-US, etc.).

2) That for each <tu> tag, there is a value for <prop type="x-ConfirmationLevel"></prop> among:

  • Translated
  • ApprovedTranslation
  • ApprovedSignOff 

Below is an example of a Translation Memory .tmx file that is supported by Wezen:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<tmx>

    <header></header>

    

        <tu usagecount="4" lastusagedate="20180207T175440Z" creationdate="20180207T174655Z" creationid="rkhau" changedate="20180207T175440Z" changeid="rkhau">

            <prop type="x-ConfirmationLevel">ApprovedSignOff</prop>

            <tuv xml:lang="en-US">

                <seg>Hello</seg>

            </tuv>

            <tuv xml:lang="fr-FR">

                <seg>Bonjour</seg>

            </tuv>

        </tu>

        <tu usagecount="4" lastusagedate="20180207T175440Z" creationdate="20180207T174655Z" creationid="rkhau" changedate="20180207T175440Z" changeid="rkhau">

            <prop type="x-ConfirmationLevel">Translated</prop>

            <tuv xml:lang="en-US">

                <seg>Hello</seg>

            </tuv>

            <tuv xml:lang="fr-FR">

                <seg>Bonsoir</seg>

            </tuv>

        </tu>

        <tu usagecount="4" lastusagedate="20180207T175440Z" creationdate="20180207T174655Z" creationid="rkhau" changedate="20180207T175440Z" changeid="rkhau">

            <prop type="x-ConfirmationLevel">ApprovedTranslation</prop>

            <tuv xml:lang="en-US">

                <seg>Hello</seg>

            </tuv>

            <tuv xml:lang="fr-FR">

                <seg>Bonne matinée</seg>

            </tuv>

        </tu>

    

</tmx>


Once your .tmx file is ready to be imported, simply head to the Translation Memory section of the target project, select the pair of languages, and click on the "Import" button to upload your .tmx file:


Below is the result of the uploaded .tmx file:




Termbase


The Term Base module is where all the term bases of a specific Wezen instance are stored. A Term Base (TB) can be seen as a multilingual dictionary where terms in different languages are linked with one another. To access the administration interface of the TB connected to the selected project, head to the "Term Base" section in the "Tools" panel. 


For a word in a given source language, its translation in other languages are defined in the TB. For example: "jacket" in English (en-US), "veste" in French (fr-FR), "giacca" in Italian (it-IT) are linked with one another in the TB.

When someone wants to translate from English to Italian, if the word "jacket" is in the source content, the word "giacca" will be suggested as a possible translation. If it was from Italian to French, the TB would suggest to translate "giacca" into "veste".



Each term has a status:

To be approved: this term was created in the Translation Studio and needs to be approved by a manager

Approved: this translation can be used.

Preferred: this translation must be used over the approved ones.

Rejected: this translation must not be used.


A term can contain information such as definition, notes, context and search volumes (SEO) which help linguistic producers work even more efficiently.



Add/Delete a Term in the Term Base


You can add a new linked term for a selected term by clicking on the “+” button that appears on the right when hovering over an entry, or delete a term by clicking on the bin button that appears next to the “+” button.


You can also add a new term without linking it to other terms by clicking on the “+” button in the toolbar and filling its info.

 

Edit information about a Term

You can edit a term and its information by clicking on the different fields and clicking on “enter” to validate the changes.




Import/Export Term Bases


For now, Term Bases can only be imported if they are in .tbx format, and exported as a .tbx file. 

To import entries into the Term Base using a .tbx file, please first make sure that:

1) Language codes in <langSet xml:lang="[language code]"> tag are the same as those used on Wezen (ex: fr-FR, en-US, etc.).

2) If you have any additional attributes such as "definition", "context", "notes", "status" you would like to attach to a term, please make sure these terms are followed <descripGrp> and <descrip> tags, such as below:

           <termEntry id="5acf6c7e544fc42185256896">

                <langSet xml:lang="en-US">

                    <tig id="5acf6c7e544fc42185256898">

                        <term>Hello</term>

                        <descripGrp>

                            <descrip type="definition"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="context"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="notes"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="status">Approved</descrip>

                        </descripGrp>

                    </tig>

                </langSet>

                <langSet xml:lang="fr-FR">

                    <tig id="5acf6c7e544fc42185256897">

                        <term>Bonjour</term>

                        <descripGrp>

                            <descrip type="definition"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="context">Used during morning</descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="notes"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="status">Approved</descrip>

                        </descripGrp>

                    </tig>

                </langSet>

            </termEntry>

Please note that the "status" values can be one of the following:

  • Preferred 
  • Approved
  • Rejected

3) Below is an example of a Term Base in .tbx containing the terms "Hello" (en-US) and "Bonjour" (fr-FR) which is supported by Wezen:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<martif type="TBX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="TBXcsV02.xsd">

    <martifHeader>

        <fileDesc/>

    </martifHeader>

    <text>

        <body>

            <termEntry id="5acf6c7e544fc42185256896">

                <langSet xml:lang="en-US">

                    <tig id="5acf6c7e544fc42185256898">

                        <term>Hello</term>

                        <descripGrp>

                            <descrip type="definition"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="context"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="notes"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="status">Approved</descrip>

                        </descripGrp>

                    </tig>

                </langSet>

                <langSet xml:lang="fr-FR">

                    <tig id="5acf6c7e544fc42185256897">

                        <term>Bonjour</term>

                        <descripGrp>

                            <descrip type="definition"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="context">Used during morning</descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="notes"></descrip>

                        </descripGrp>

                        <descripGrp>

                            <descrip type="status">Approved</descrip>

                        </descripGrp>

                    </tig>

                </langSet>

            </termEntry>

        </body>

    </text>

</martif>


Once your .tbx file is ready to be imported, simply head to the Term Base section of the target project, and click on the "Import" button to upload your .tbx file. 

You can select the languages for which the terms in your .tbx file should be imported in the dropdown list by selecting the languages. 


Wezen includes a translation workbench for translators to work on their translation tasks, which we call the Translation Studio. You will see how your maintaining the CAT tools is reflected in the Translation Studio.



Use of the TM and TB inside the Translation Studio



To open the Translation Studio, you need to click on the  ("Open") button of a translation task (in the Tasks screen). Only the user that was assigned to the task can make edits to the target segments. The Translation Studio is an environment that includes Computer Assisted Translation (CAT) tools to speed up the translation process. The Translation Studio's interface is different depending on the status of translation batch. When the translation batch is in "Translation", "Review" and "Correction" steps, all the CAT tools are present. When it is in "Validation" or "Post-Edition", the "Translation Memory" and "Term Base" are not visible.


The main sections you need to know are the following:

 


Translation Memory

Wezen will fetch previous records of translated segments that fully or partially match the source segment currently being processed and will automatically fill the target segment if these matches are 70% matches or above, avoiding translators to translate from scratch. As soon as a segment is validated by a user, the Translation Memory is updated.

Translators and reviewers can pick another entry in the list by clicking on theat the right of the entry.



You can also look for entries in the Translation Studio by typing words in the Translation Memory search bar to search in source language or in the target language by clicking on this button:


NB: please note that customers do not have access to the Translation Memory tool in the Translation Studio.


Term Base

Wezen will look for the words in the source segment currently being processed in the database. For each word, possible translations are given, among which some are either preferred, approved or rejected by the client. Note that term statuses appear next to the terms in the widget.

Translators and reviewers can pick an entry in the list and add it to the target segment by clicking on theat the right of the entry. Clicking on the “i” icon next to a term you can get more details like definition, notes, context and comments:


They can also look for entries in the Term Base by typing words in the Term Base search bar and toggling source language search or target language search

NB: please note that customers do not have access to the Term Base tool in the Translation Studio.