Screenshot of a mock-up of the Translation Science Switchboard.

You know an article exists, but cannot read its language. So you go to our tool, paste the Digital Object Identifier of the article and get a list with translated versions. You manage your articles in a reference manager and notice that an article on your reading list is now also available in your mother tongue. You are really enthusiastic about a new article that was just published, which has policy implications for your country and you want to translate it so that more people can read it, on our tool you find a partial translation made by a colleague from another university department; you jointly finish the translation, publish it on a repository and upload the link to our database.

These scenarios demonstrate that a translation finding tool would be really useful and could also stimulate the production of translations.

One of us started dreaming of such a tool attending a climate conference in Peru. Colleagues from the local weather service were doing interesting work, but many did not speak much English. An important way they kept up to date were the guidance reports written by the World Meteorological Organization (WMO), one of the oldest open science organizations. They translate all their guidance reports into many languages because the weather services who control the WMO see this as a priority. A colleague at the conference told me that she sometimes translates important English articles into Spanish and emails them to her colleagues; just like Albert Einstein translated important studies into English. That made me wonder whether we could spread translations in a better way and thus also stimulate their production.

Lingua Franca

Translations are part of the open science movement. Translated scientific articles make science more accessible to regular people, science enthusiasts, activists, advisors, trainers, consultants, architects, doctors, journalists, planners, administrators, technicians and many scientists.

English as a common language has mostly made global communication within science easier. However, this has made communication with non-English communities harder. This goes both ways, people who could benefit from scientific knowledge and people who have knowledge scientists should know.

For English-speakers it is easy to overestimate how many people speak English because we mostly deal with foreigners who do speak English. However, it is thought that that only about one billion people speak English. That means that seven billion people do not.

Translated scientific articles speed up scientific progress by tapping into more knowledge and avoiding double work. They thus improve the quality of science. The additional two-way knowledge transfer aids innovation and tackling the big global challenges in the fields of climate change, agriculture and health. Translations can improve public disclosure, scientific engagement and science literacy.

Phone screenshot of the Translate Science Switchboard.

Open Source tool

We want to develop and deploy an open source tool to make it easier to find translations and thus make them more worthwhile to make. In its simplest form people should be able to search using a Digital Object Identifier, a title or the names of the authors and be presented with a list with links to translations. People or organizations who made or have translations should be able to upload lists with links. Users and volunteers should be given moderation tools.

Also searching by topic and a topic directory would be useful as translated articles tend to be the more important ones in a field. The database should also be accessible via an Application Programming Interface (API) so that other tools and webpages can automatically display information on any translations and inform us about new translations.

People or organizations who made or have translations should be able to upload lists with links. There were similar databases during the Cold War to keep up with Soviet research and we want to try to rescue their datasets and upload them to our database. Many research libraries, international organizations and research institutes (World Meteorological Organization, UK Met Office, …) have translated articles and reports, which should be included. Also translations of articles written before English was the Lingua Franca.

The expensive organizations maintaining these databases and making translations collapsed after the Cold War. In the internet age, we can maintain large knowledge bases cost effectively with global volunteers, as Wikipedia has demonstrated, and include many more languages. Also translating has become much easier as a reasonable first draft can often be provided by machine learning. And we can now network people who only occasionally make translations (of their own articles).

Not every contribution will be perfect. Users and editors of such a database should be be able to indicate how good a translation is and need moderation tools. With versioning it should be easy to revert vandalism or spamming. We could green lists known scientific repositories and red lists known spammers.

If there are multiple translations for a language, editors or users should be able to rank them and indicate which one is best. If only because external systems using our information may be designed to only accept one translation per language as that will be the most typical case.

A “talk page”, similar to Wikipedia’s, could be useful to allow users to point to problems, discuss which translations are best and which quality flags need to be set. Possibly even to organize to jointly make a better translation. This could be implemented with a commenting or forum system in a background tab.

Copying the idea of Wikipedia of making a page with recent changes can help with quality control. Such a page can be filtered in several ways, e.g., for contributions by new people. In case someone finds that a participant made a problematic contribution a look at their user pages may find more problems.

Many more technical details of how such a system could look can be found on our Wiki.

Mockup of the search page showing all articles on the search term for which the database has translations.

Points to ponder

How hard would it be to make the system distributed, to have multiple servers who talk to each other and exchange data if they trust each other? We are doing this for science, but there are groups outside of science who could use similar system; the nearest to us would be education and science communication. (Disciplinary) groups within science may be able to use their networks to promote the production of translations. That would make bulk download of our data a good idea to get a new server started.

It could be worthwhile to make a (private) backup of the known translations and regularly check for broken links. The backup can help the editors find the new location of the translation or to upload it elsewhere if the license allows for this.

It may be a good idea to have multiple types of links to translations. Literal translations, but also related works in another language, for example a PhD thesis in language X and a corresponding article in language Y. Sometimes people may write a summary of an article in another language, which could be valuable if there is no full translation. Also links to partial translations can still be valuable and showing them could promote their completion.

The road ahead

The above mainly describes the technical aspects of such a Translations Switchboard, but there is also a human aspect. We will need a community of editors for every language to check submitted URLs to avoid spam and select the best version in case multiple ones are available. So we need tools to build and organize this community. We will also need publicity so that people know about the service. Part of the advertising could function via integration of our system in others. We will need volunteers who contact possible sources of translations which could be integrated into the database and to promote the production of translations in their circles.

Designing and coding the full system described above would be a considerable task. If someone has experience with similar projects and would like to apply for funding: feel free to make the idea your own; we are also happy to be the science advisory team. For now we decided to start small. Create a minimal system and add the data we know of to it. That way the idea becomes more concrete, which will hopefully help to find resources to build it and to fill the database. This first version will be coded using PHP, HTML, CSS and Maria DB.

You can already help us a lot by spreading this idea to increase the chance that people interested in contributing learn out it. Also feedback on the idea in the comments below is very valuable. If any of the above appeals to you, please get in touch on Mastodon or by email.

Mockup of a results page listing all translations we know of for a specific original article.

In case you might not yet have had a chance to read this previous blog post by my colleague please do so, it accurately addresses the well-known dilemma faced in the current scholarly publishing landscape in science.

About 2000 languages are spoken in Africa, and these traditional and indigenous dialects are also a medium of choice in knowledge dissemination for many scientists on and off the continent.

As pointed out in the earlier mentioned blog post, many African scientists are proficient in the English language and regularly publish their scholarly communications in Anglophone. In 2018 alone AfricArXiv preprint repository scholarly African collection had 25 submissions in English.

It is however not lost on such scholars, myself included, that whereas we are multilingual, we face unilingual constraints in expressing our mostly written publications as well as sometimes in our spoken word presentations.

I believe that technology in its role as an enabler of positive change could play a vital role in bridging this gap through the use of Artificial Intelligence (A.I.) offering a service of providing a seamless translation platform for scientific work written in different official African languages.

One of the key task for such an A.I. system could be accepting English-papers written by African researchers and offering a seamless translation service resulting in the output of as many African languages as possible, and vice versa, and in a manner that is structured to build on previous learning.

To quote my colleague in the previous blog post “With the advancement of Natural Language Processing (NLP), it should be fairly easy for non-Indonesian [or African] speakers to understand articles written in Indonesian [or African local dialects]. Hence the burden to immediately use English as the main language of science could be lowered.”

There is a language bias in the current global scientific landscape that leaves non-English speakers at a disadvantage and prevents them from actively participating in the scientific process both as scientists and citizens. Science’s language bias extends beyond words printed in elite English-only journals.

https://www.frontiersin.org/articles/10.3389/fcomm.2020.00031/full

Hello all.

It’s Dasapta from Indonesia. Thank you Victor for inviting me to joining The Translate Science Initiative. Although scientists are coming from every corner of the earth, living perfectly using their own native/mother tongue, but it’s English which has been used as the lingua franca of science.

Conversely, many scientists in Africa, Asia, Latin America and Europe still publish their work in national journals, often in their mother tongue, which creates the risk that worthwhile insights and results might be ignored, simply because they are not readily accessible to the international scientific community. To overcome this dilemma, several initiatives now aim to strengthen the impact and quality of national journals with the goal of gaining greater international visibility for articles published in a language other than English.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1796769/

Born and raised in Indonesia, a non-English speaking country, it’s important for me to promote the use of national language (Indonesian) instead of English in scholarly communications, because:

  • Most researches in Indonesia are about local problems. Therefore it’s very logical if the main mode of dissemination should be in Indonesian.
  • Although many Indonesians would take English course since kindergarten or primary schools, but English still is not used as the first language. Therefore it takes more time and effort to translate our researches to English, while it could be shared faster if we used Indonesian.
  • With the advancement of Natural Language Processing (NLP), it should be fairly easy for non-Indonesian speakers to understand articles written in Indonesian. Hence the burden to immediately use English as the main language of science could be lowered.