Caveats of machine translation

Posted on September 27, 2022

by Oliver Czulo, Venema Victor, Jo Havemann, Jennifer Miller and Dasapta Irawan

Translation tools (often referred to as CAT tools for ‘computer aided translation’) are a great means of streamlining some of the elements of a translation process, such as checking terminology or retrieving existing translations (so-called Translation Memories). Modern versions of these tools allow for a web-based, collaborative translation, giving collaborators such possibilities as revising and/or commenting proposed translations, evaluating existing translations or adding machine translation (henceforth MT) support. Modern MT systems are based on artificial neural networks, which have boosted quality considerably since roughly the mid-2010s.

CAT tools, with or without added machine translation support, have been studied from various angles. While they in general increase efficiency and often ease the task of translation as translators do not have to start from scratch, there are some caveats to be kept in mind when working with them. Here are some of the more important ones.

A major problem which has been described is lack of consistency. This does not only extend to the terminological level as, e.g., shown by (Čulo & Nitzke 2016), but a system may suddenly change in the output style, switching between different forms of addressing readers, for instance. A problem which is sometimes also attributed to how CAT tools display source and target text (mostly in segments of sentences, aligned left-to-right) is that translators do not necessarily spot these inconsistencies, a sort of peephole effect, as they check sentence by sentence and thus do not easily perceive the text as a whole in their revision. Sentence-by-sentence evaluation is also the reason why MT systems often used to score better in their evaluation than they deserved and sometimes still do (see, e.g., Castilho 2021; Krüger 2022): Being evaluated by means of checking translations of single sentences only, inconsistencies are not spotted and thus not penalised.

A second very serious problem, as known from other fields of AI, is that neural MT systems reproduce biases that are implicitly or explicitly encoded in the training texts, a notable issue being gender bias. When translating from a language that has little or no grammatical gender such as English into a language such as German which differentiates between a grammatical ‘masculine’, ‘feminine’ and ‘neuter’ gender (which often, but not necessarily coincide with (supposed) biological sex for nouns referring to humans), this shows: Try translating “sexy pianist” and “clever pianist” into German with MT systems like DeepL. At the time of writing the first version of these notes, the former translates into “sexy Pianistin” (feminine gender), the latter into “geschickter Pianist” (masculine gender). Also, gendering across a text can be wildly inconsistent. And highlighting the non-deterministic and adaptive nature of such systems, the results can actually vary not only between systems, but even for one system over time.

Third, watch out for missing or even spurious additional text. Koehn (2017) describes some of the challenges of early neural machine translation research, some of which have been addressed in the meantime, but an important one remains: MT hallucination, or also called MT fiction. Neural MT systems basically operate by trying to predict the next most likely output based on previous input (which, in principle, is the same mechanism that allows for search completion in a web search bar). Take a moment to reflect on the options you are given in a search query completion: some of them may be very fitting, others quite nonsensical. Modern MT systems have become very good at picking out the fitting options, but when they cannot ‘make sense’ of the input, they may omit something, just try to ‘guess’ or even add stuff that is not there in the source text.

Last but not least, data ethics should be raised as an issue here. Note that for web-based CAT tools and/or machine translation systems (also those that you can plug into your locally installed CAT tool), the source text will be copied over to and processed by multiple other machines. Even if you have the permission to produce a translation that is accessible under more liberal terms, this can technically be a violation of copyright for the source text if it falls under stricter copyright terms. Anonymization of people which may not have been much of an issue for printed, narrowly distributed material can also pose an issue in such settings, even if you chose to perform anonymization for the target text. Ecological matters may apply as well, giving rise to the question how often and at which stage(s) MT should be used: it requires, after all, quite a bit of computing power. For a more in-depth discussion of ethics and the use of machine translation, see Moorkens (2022).

This is the second post to help people make and publishing translations of scientific works. The first post gave some “theoretical background to translation“.

References

Castilho, Sheila. 2021. ‘Towards document-level human MT evaluation: On the issues of annotator agreement, effort and misevaluation’. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), 34–45. Online: Association for Computational Linguistics. https://aclanthology.org/2021.humeval-1.4.

Čulo, Oliver, and Jean Nitzke. 2016. ‘Patterns of Terminological Variation in Post-Editing and of Cognate Use in Machine Translation in Contrast to Human Translation’. Baltic Journal of Modern Computing 4 (2): 106–14. https://aclanthology.org/W16-3401.pdf

Koehn, Philipp, and Rebecca Knowles. 2017. ‘Six Challenges for Neural Machine Translation’. In Proceedings of the First Workshop on Neural Machine Translation, 28–39. Vancouver: Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-3204.

Krüger, Ralph. 2022. ‘Some Translation Studies Informed Suggestions for Further Balancing Methodologies for Machine Translation Quality Evaluation’. Translation Spaces, March. https://doi.org/10.1075/ts.21026.kru.

Moorkens, Joss. 2022. ‘Ethics and Machine Translation’. In Machine Translation for Everyone: Empowering Users in the Age of Artificial Intelligence, edited by Dorothy Kenny, 121–40. Translation and Multilingual Natural Language Processing 18. Language Science Press. https://zenodo.org/record/6653406.

Top photo of robot by Hello Robotics. This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

Theoretical background to translation

Posted on September 19, 2022

by Oliver Czulo, Jennifer Miller, Jo Havemann, Venema Victor and Dasapta Irawan

Photo of a backlit keyboard with a person typing

This is the first of two blog posts with general notes on how to approach the task of translating science, touching upon the most prevalent basic notions and advice relevant to the task. The two main points presented in this and the upcoming post are (a) an introduction into a present-day functionalist view of translation which provides for a wide range of purpose-driven strategic translation options and (b) key caveats when making use of digital support tools for translation including machine translation. These general notes are meant for people who read academic texts at a postgraduate level and have experience with scholarly publishing, but may have little to no experience in translation. As technological tools are nowadays omnipresent in translation processes, they have been comprised here under basic background to translation.

Translation

Translation is a cluster concept (Tymoczko 2005) that is constituted by various cultural practices with complex overlapping similarities. This includes what is sometimes referred to as ‘translation proper’, i.e. ‘transferring’ a (mostly) written source text from one language to a target text in another language. Interpreting, i.e. the ‘transfer’ of (mostly) spoken language is part of the cluster concept, just as well as localization – of software, video games and the like – or sur-/subtitling, transcreating etc. In the following, the terms translation and translate shall include all these practices.

On a side note: It is exactly this understanding of translation as a cluster of cultural practices which opens up the possibility of studying not just the linguistic differences between two texts, but the whole range of patterns of practices and power concerning translation, including, but not limited to such questions as what is translated and who commissions translations, what conscious and subconscious translation strategies are being taught and applied, how censorship and translation interact, etc. etc. This wiki page introduces key concepts and issues that inform the pragmatics of translating a specific text.

Functionalism and translation strategies

Functionalist theories of translation (see, e.g.,Vermeer 1989) have highlighted that translation is a purposeful activity, i.e. it is text production with a goal and an audience, with a precursor, the source text, which may require different levels of adaptation to the target culture. Nord (1989) introduces the spectrum between documentary and instrumental translation: The former is meant to highlight the original make-up of the source text with interlinear glosses being an extreme form, the latter aims at producing a text which is meant to act as a target culture text and should not be discernible from original texts. All in all, a functionalist approach to translation offers us a wide array of translation strategies, keeping in mind that, following Nord, we should remain loyal to both the creators of the source text as well as the intended audience of the target text.

Applied to the purpose of translating science, you might ask yourself, for instance, how to go about the subtitling of a video which introduces a scientific topic. While, of course, you will want to get the terminology and the science right, do think about what the idea of the source text is: Is it purely informatory or does the video at hand also aim to entertain? Assuming it does, what is your goal in translation: Do you mostly care about the science or do you want to entertain as well? What you probably will not want is a ‘close’ translation in a structural sense, i.e. trying to mimic the syntactic or lexical structures of the source text – unless you are aiming, e.g., at documenting which linguistic strategies can be used in a certain language for edutainment videos. Another example is that of the translation of Bron Taylor’s book “Dark green religion” into German, where the author explicitly encouraged the translator Kocku von Stuckrad to add comments explaining how historical US-related circumstances compare to those in Germany, making the text more accessible to a German audience (von Stuckrad in Taylor 2020: 303).

Technical translation and cultural influences

A common misconception is that terminology (or language on the whole) in the natural and engineering sciences is near-‘objective’ in a sense that it fosters a ‘simple’ one-to-one transfer between languages. However, cultural influences abound also in technical language, influencing terminology, phraseology, style, text structures, argumentation patterns etc. Cultural influence here does not solely refer to the larger setting of regional, national, areal or global cultures, but also to cultures of specific scientific fields and subfields (i.e., shared assumptions, traditions, practices, etc.). Even within a language, creating, e.g., something like a common terminology may be quite an undertaking especially in younger fields of research (see, e.g., Avizienis et al. 2004 for the field of dependable and fault tolerant computing). Between languages, even slightest differences in conceptualizations and uses can pose a challenge. On top of this, influence of larger cultural contexts is omnipresent not just in the humanities or social sciences, with the discussion about the English master/slave terminology in computing and electrical engineering as a very prominent and illustrative example (Charboneau, 2020). As pointed out above, these differences may extend to other linguistic levels such as phraseology, argumentation patterns or text structures, in some cases giving rise to strategies of translation which are often subsumed under adaptation, i.e. making deep(er) changes to the make-up of a (stretch of) text in order to make it more target culture adequate and fitting to the purpose, which can be quite in line with Nord’s principle of loyalty. Whichever strategy you choose, be aware of these cultural factors even in technical language.

Translating science

Who can translate science?

Translation is very likely more often than not: co-creation. Professional translators will have learned how to quickly adapt to the terminology, phraseology, style of a field, how to invent new terminology, how to perform effective research in cases of doubt and – actually one of the most challenging and frequent problems in translation – how to deal with faulty, ambiguous or badly formulated (stretches of) source texts; but technical expertise is still often required for translation, inside as much as outside of translating science. Many works are translated by (groups of) people with domain knowledge and the necessary linguistic competences, and it is not unusual to have MA or PhD translation students as well as career jumpers from completely different fields than linguistics given a certain background in their languages and cultures of interest.

This provides us with a number of options when it comes to the question of who could translate science: It could be scientists, alone or in groups with complementary domain or language skills; some institutions might even have translation services that can spare at least some time to (aid) translate science; or some stakeholders might have money on the side to commission translation. In all cases, however, the domain knowledge of scientists will be crucial, and should you be in the position to commission a translation, be prepared to answer questions on linguistic and other aspects of the field in question.

Aiding (commissioned) translation / translators

In any case, for a commissioned translation be prepared to act as the domain expert as a scientist. You can aid the linguistic side of a (commissioned) translation if you have some sort of terminology (e.g. any dictionary for your field that you have at hand) at the disposal of those who translate, or if you have a collection of texts (ideally in all languages involved) which you can make available so that term candidates and collocation patterns can be extracted quickly by means of the appropriate tools (see, e.g., on this wiki; professional translators should have acquired access to such tools). If you have commissioned a translation, the use of tools which allow for collaborative work can be a great help, e.g. in order to quickly comment on questions translators might have.

Literature

Avizienis, Algirdas, J-C Laprie, Brian Randell, and Carl Landwehr. 2004. ‘Basic Concepts and Taxonomy of Dependable and Secure Computing’. IEEE Transactions on Dependable and Secure Computing 1 (1): 11–33.

Charboneau, Tyler. 2020. ‘How “Master” and “Slave” Terminology Is Being
Reexamined in Electrical Engineering – News’. Accessed 1 August 2022.
https://www.allaboutcircuits.com/news/how-master-slave-terminology-reexamined-in-electrical-engineering/

Nord, Christiane. 1989. ‘Loyalität Statt Treue. Vorschläge zu einer funktionalen Übersetzungstypologie’. Lebende Sprachen, no. 3: 100–105.

Taylor, Bron. 2020. Dunkelgrüne Religion: Naturspiritualität und die Zukunft des Planeten. Translated by Kocku von Stuckrad. Leiden, Netherlands: Brill, Wilhelm Fink.

Tymoczko, Maria. 2005. ‘Trajectories of Research in Translation Studies’. Meta 4 (50): 1082–97.1 Vermeer, Hans J. 1989. ‘Skopos and commission in translational action’. In Readings in Translation Theory, edited by Andrew Chesterman, 173–87. Helsinki: Oy Finn Lectura AB.

Top photo of keyboard made by Colin / Wikimedia Commons / CC BY-SA 4.0

Translate Science Blog

Month: September 2022