LANGUAGES AND LINGUISTICS: Whole Latin Sentences - How and Why Does a 1988 Enya album leave Google Translate thoroughly confused?

{ Since the writing of this piece the problem described has been corrected - however the problems it revealed are an interesting insight into the problems of translating certain languages and so I have elected to leave this post in place, unedited }

Google Translate can be something of a revelation to many. It can sometimes feel like a magical box into which one can type a sentence in another language and be supplied with its translation into your desired language. If you are unaware of the language of the sentence you can use the Detect Language feature - allowing the Translate engine to work out what the source language is before translating it. Google Translate is however, often ridiculed for its lack of accuracy – and this all relates to the problems of faced by machine translation.

The production of accurate translation relies on many factors - but particularly the nature of the input. In terms of single words - even in its best reviewed and most accurate languages it can be wildly inaccurate. This is in part because in many languages words can have multiple meanings. Furthermore in many languages meaning can be altered by emphasis. A common example in British English would be the statement – “I never said he stole my money.” – which has a potential of 7 different meanings depending on which word is emphasized – for example emphasizing ‘said’ could be a defence or deflection – implying the speaker didn’t make an explicit statement but may have implied, ‘I’ or ‘my’ could point towards it having been another individual make the statement or another person’s money, et cetera.

Meaning is therefore conveyed in more ways than through the direct text and in these cases the translation engine must make certain assumptions. It works best with formalised language using simple, relatively short sentences. Longer sentences introduce further problems. In this video Tom Scott, as part of his Language Files series from 2015, explains the issues faced by machine translation, and why computers in general, are not good at translation. In 2016 Google announced it was moving to using neural machine translation – a method using an artificial neural network to make predictions – and emphasized that the method would look at entire sentences rather than words and concepts. It is worth noting that the majority the languages available on the service have not been assessed against human translation. By 2020 Google Translate was using this form of machine translation for all of its language pairing available with the sole exception of between English and Latin. And it is this which makes a small and somewhat amusing example of very questionable translation I recently found particularly interesting.

As someone who works with Latin texts regularly – and who has worked both transcribing unpublished Latin manuscripts, translating sources available in Latin, to back checking past translations – I find Google Translate both strangely fascinating, and at times horrifying. The most frustrating examples of this are when I see people simply using Google Translate with the assumption that whatever it responds with is absolutely correct. I have to admit a mix of grudging admiration at what is being attempted, and despair at what can at times appear to be a massive overconfidence in the ability of machines to rapidly be able to outpace human linguistic ability. It is also worth noting that such machine translation software would particularly struggle with certain rare language types – such as whistled speech (a real world example of which is found in Greece - although perhaps the most famous example is the speech of the alien Clangers in the TV series of the same name – which was occasionally able to use its slide-whistle speech to slide some more colourful expressions into its dialogue, as can be seen here). At times this view results in me deciding to test the software against my own knowledge of certain languages and texts to see if it has changed and occasionally to ‘suggest corrections’ repeatedly in the probably forlorn hope that a particular bad example might be corrected to something more accurate. It was having checked another simple sentence in a different language that I decided to see what Google Translate would make of a phrase that appeared occasionally as a marginal note in the library of, and personal writings of, my late father – ‘cursum perficio’ – the earliest appearance of which, in his notes, is in a marginal notation dated in 1956. And It was here that my knowledge of Latin, and my knowledge of Gaelic-language music unexpected collided.

The recording artist Enya – whose name in Irish Gaelic is Eithne Pádraigín Ní Bhraonáin – anglicised at Enya Patricia Brennan – is a familiar figure in the Celtic music scene – known for having cross over appeal, and have a reach outside of the general confines of mixed language Irish/English music. Her work has also feature in films including her contribution to the soundtrack of the 2001 film The Lord of the Rings: The Fellowship of the Ring – the track ‘May It Be’ which was using over the closing titles, and which saw her nominated for an Academy Award. I have long been a fan – having found her music through the works of other members of her family who make up the band Clannad (whose music, and it’s part in my passion for history I’ve discussed previously on this blog). Like many artists with names unfamiliar to English speaking cultures she elected to use the phonetic spelling – as decision that would be followed by her sister Maire (Moya) Brennan for her solo work separate from Clannad. Enya’s second solo album was titled Watermark.*

So, with Google Translate open, Latin to English selected let us look again at cursum perficio. Despite the above note that the translation engine is particularly inaccurate when working with only single words – especially in those languages where it uses the context with in the sentence to predict / assist in determining which form to use, in the first case for the Latin ‘cursum’ – the top suggested result is ‘Course’ – which is approximately correct. Poetically it could equally be rendered as journey. It is the second part of the statement as whole which does not result in what you might expect. Along – while there are alternative suggestions Latin terms given, the word ‘perficio’ – even with altered endings does not provided an accurate rendering – which would in fact be along the lines of ‘I complete’ or ‘I finish’ – with a sense similar to the phrasing ‘to have perfected something’. That is not however what happens here. Instead, we get the result seen in this image - and the answer 'Watermark'.

And the strangeness does not stop when, following the suggestion we give it the remainder of the admittedly very short Latin annotation we are dealing with. With some deliberate error to divide them Google Translate gives us the answer ‘Course Watermark’, and when typed correctly, this becomes simply ‘Watermark’. It is worth noting that the list of translation results for the word 'perficio' does in fact include the multiple possible accurate translations of the term. Why does the algorithm behind the translation software, which while often ridiculed for inaccuracy – including in cases where the problems stem from lack of context of inflection to give additional meaning – provide such as unusual answer? Well, we have in fact already given it within this entry. It is the 1988 album by Enya, Watermark. This is however not because somehow or someway, perhaps in a different set of languages there is a connection between the individual parts of the terms cursum and perficio which confuses the algorithm over which language it is meant to be rendering an answer in. It is a known problem with the system that it can find it difficult to differentiate between Latin and Italian – and in some cases between other languages with common roots (including English and French where there are numerous words which are spelt the same way, pronounced slightly differently, and have completely different meanings). However, digging through the code will not find a clear answer because in essence the algorithm is doing what it is intended to do – it is giving what appears to be the correct translation for the term in question based on the data it already has. So what is the cause, and how does it related to the album in question? The answer is that the second track of the album is itself entitled Cursum Perficio – a track in which the lyrics, which are entirely in Latin, are given in a sharp repetitive chanted manner over a building instrumental track – returning again to the repetition of the phrase – cursum perficio – I complete the course, I complete the journey, or I finish the journey. The remaining lyrics refer to people always wanting more – translating as ‘a word to the wise, the more one has the more one desires…’. Enya herself cites the phrase ‘cursum perficio’ as having from a documentary about Marylin Monroe – and gives her own rendering ‘my journey ends here’. The choice of the Latin for the lyrics was aesthetic in that it fitted better with the instrumental that had been composed with minimal adaption versus adapting the lyrics in Latin. It appears that, since the track title in Latin ‘Cursum Perficio’ frequently and consistently appears all over the Internet in reviews, in news, on websites, catalogues, discographies, etc alongside the album title in English ‘Watermark’ – the translation software has taken this repeated association (including the fact that both may often appear together in the same sentence in quotation marks) that the one is a translation of the other. Indeed, even giving Google Translate the complete Latin lyrics produces the same result – imply that what is being chanted is in fact the word ‘Watermark’ over and over.

What can we take from this? My own conclusion is simply that despite more than one computer scientist of my circle of acquaintance telling me that machine translation will soon replace that element of my work and my research - that is far from the case, and it is particularly important to continue this kind of work, especially given that many endangered languages are unlikely to receive the attention given to more common languages when it comes to this type of translation system. And as I have remarked language death can mean the death of entire swathes of human knowledge and understanding which past societies have gone to great pains to preserve.

To close here is the track which accidentally highlighted the inherent problems of these systems.

* (On a personal note – Enya’s album Watermark contains the solo piano piece Miss Clare Remembers, which together with another piano solo No Holly For Miss Quinn, both inspired by the titles of novels by the author Dora Jessie Saint under her pen name Miss Read, feature in a forthcoming post in the Personal Experience series on my Brain Blog about the link between music, sound, and memory)

Jude Seal's Notebook

LANGUAGES AND LINGUISTICS: Whole Latin Sentences - How and Why Does a 1988 Enya album leave Google Translate thoroughly confused?

Popular posts from this blog

HISTORY OF MEDICINE: The Anglesey Leg

ROMA HISTORY: Boxer: The Life and Death of Johann Trollmann

MANUSCRIPTS - The Book of Kells