Recently, some Spotify users noticed that inside the app, lyrics for the latest season of Coke Studio changed the way they were displayed. Instead of being displayed in the native Arabic script used for Urdu and other regional languages, lyrics were instead displayed in Latin or Roman script. I can speak with a bit of inside knowledge to explain why and how this happened.
Spotify gets lyrics for songs from a service known as Musixmatch. This service provides lyrics to other major platforms as well, such as Apple Music, Deezer and social media platforms. Musixmatch itself grows its lyrics library using crowdsourced contributors. Internally, these contributors are known as ‘curators’ and ‘language specialists’. Until recently, I was also an Urdu language specialist for Musixmatch.
Previously, Musixmatch used native scripts as the canonical representations of lyrics. So all Urdu lyrics would find their primary representation in the native and officially designated Arabic script. Every South Asian language would also receive a secondary representation in Latin or Roman script, to allow for lyrics to be read in the same letters as used by English. An automated bot would help transliterate lyrics into Roman script, allowing lyrics to be read by a wider audience.
Recently, language specialists were assigned tasks to retranscribe lyrics for songs that had already been processed in native scripts. The task was to redo this work in Roman scripts. In some languages, these tasks were from artists and labels themselves. For Urdu, all songs pushed to our task queues to be retranscribed in Roman were Coke Studio songs. It was stated directly to me that these changes came directly as a result of clients of Musixmatch operating in South Asia.
But the real tragedy is that this is not isolated to a few songs. Musixmatch has since altered its official guidelines for Urdu and songs in major South Asian languages to be transcribed in Roman Urdu. While previously individual requests to change the canonical representation of songs in Urdu and other South Asian languages could be handled per the request of each artist or label, now all South Asian songs will be displayed in Roman by default. As a result, transcriptions in native scripts will be considered a ‘translation’, and Roman will be considered ‘official’.
The goals of this policy are not clearly identified. It’s possible that Coke Studio requested that its lyrics be displayed in Roman by default. This would increase the addressable market of their songs from just Urdu speakers to a much wider audience, such as Hindi and Punjabi speakers in India. However, Musixmatch’s policy change only affected Urdu and other South Asian languages, and not languages such as Arabic, Chinese, Japanese or Korean. These languages continued to use their native scripts. So one wonders why a presumptive goal of reaching a wider audience would not apply to all languages equally.
Musixmatch have said internally that “the romanization decision was a difficult one for the company to make”, and while this may only be a corporate policy change for Musixmatch, they does not seem to comprehend the consequences it may have on South Asian languages, bearing in mind the influence they have across the music industry.
Spotify and Apple Music users across South Asia will be forced to read lyrics to songs in their native languages in a secondary script. This also has secondary effects. Features which rely on lyrics, such as ‘Spotify Lyrics Search’ which work well with native scripts, will be subpar in Roman script. Because roman transliterations are non-standard, users may not be able to transcribe their languages the same way as Musixmatch. These changes also make it less likely that Urdu results appear in search engines.
In an attempt for standardisation, Musixmatch are also attempting to create an official ‘Romanisation policy’ for South Asian languages. It’s worth bearing in mind that attempts to officially ‘Romanise’ Urdu have always been thwarted in the past, and now that we are in an era where it is becoming easier than ever to use native scripts, it seems extraordinary to promote Roman to canonical stature in such a widely-used cultural interface.
South Asian languages have porous boundaries. Urdu and Hindi are vocally intelligible but are written in different scripts. And while Musixmatch transliterates both languages, they do not transliterate to Arabic or Devanagari script, which both languages use natively. The situation is even stranger for languages like Punjabi, which are written in multiple scripts. Musixmatch only considers one canonical form of Punjabi, meaning that one way or another one segment of the population is systematically left out.
Urdu speakers (and for that matter, speakers of any language) should not have to settle for anything less than the representation of their official and native script, and certainly not a transliteration.