Menu

What does XML mean to you?

XML means a lot of different things to different people. Some think it is “the Answer to Life, the Universe, and Everything”, others describe it as generation X’s childhood sandpit dreams coming true. And although it might have already reached mainstream, there is still a lot of urban myth floating about when it comes to the question of what XML actually is. So let us have a quick look why everybody seems to love XML, why already more and more large corporations use XML to save a lot of money, and finally how XML is going to creep into the lives of everyday computer users – and to be honest, it’ll be a relief when that happens!

XML has been a bit of a buzz word over the last few years – but today it seems that the buzz has settled and the community of blog writers has moved on (probably to something even more fascinating) – but have they?

Actually no, they have not. The opposite is the case. It seems that the initial buzz about XML brought us a great number of experimental users and uses – and heaps of abbreviations too. (one still wonders if it was XSLT, SAX, DOM or TMX that brought us RSS, CSS, and XLIFF?). So the truth is that besides being a Petri dish for abbreviations, XML also went from fringe to mainstream and the number of applications and XML based extensions continues to grow. This is anything but surprising, considering that XML stands for “Extensible Markup Language” – which is a bulls-eye name.

Without becoming too technical let’s say what the language and the mark-up is about. The main reason for the hype is that XML makes a file truly “cross-platform”. OK, you are right – that sounds like yet another quote from people who hardly get away from their computers. So let us come from a different angle: the data recorded, for example, when men landed on the moon, is stored in a format that today’s computers simply cannot read. If you want to find out how much fuel was left after landing on the moon, you’d have to use the original hardware (probably some kind of tape recorder the size of an average 4-bedroom house). And even if you can find the hardware, you will still have to look through millions of cubic metres of unmarked data streams to find what you are looking for. If this example sounds too absurd, then try something yourself: go and find one of those floppy disks with all your personal documents in the bottom drawer (written in the early 90s) – and open those files with your current word processor. The effect is the same. And if you cannot find a prehistoric disc, then just try to remember back when was the last time that your word processor told you with a sardonic smile that the 20MB sized Word document you want to open is “corrupted” and asks if you are happy just to go on without it? Now here is the buzz: with XML this would not have happened (…well, of course it still happens, but the effect is far less scary)!

The reason that XML copes much better in such situations is twofold: on one hand, XML allows you to store all data as text. On the other hand, it incorporates an ingenious “kind of” filing system in which each single piece of information or cluster of information is labelled in a more or less descriptive way to explain what it is. So for example young Jane Doe’s name and birth date might be stored like this in XML:

<firstName>Jane</firstName>
<lastName>Doe</lastName>
<dateBirth>03-27-75</dateBirth>

With this kind of system, all you need is a tool with a fancy abbreviation and you are set to do all you want to do with it.

So even if this example doesn’t look like much, the principle is impressive enough for XML to turn from being the favourite file format of the “Open Source” community to becoming the favourite file format of Microsoft. To prove this point: with its next Office Suite, Microsoft will say “good bye” to their proprietary file formats (like MS Word’s .doc and Excel’s .xls) and “hello” to a tailor-made XML format (with its own set of new abbreviations).

“Show it all” courtesy of “Unicode support”

As translators, we face up to language and cross-cultural challenges as well as numerous computer issues. Our goal: the struggle should be invisible to the final user i.e. you the client. But technology did not play along. And users of translations could see our struggles: remember the olden days and maybe even more recently when more and more Asian web pages popped up on the Internet literally ‘overnight’. “Back then” you might have come across garble like this when viewing files that originated in Asian countries:

“【大å ç²è ¨Šã€’ç¾?國 é§é¦ å°¼æ‹‰å ¤§ä½¿é¤¨ç™¼è”.

But what looks like a random “string of characters at first glance is actually (did you guess?) the following Chinese sentence:

“【大公網訊】美國駐馬尼拉大使館發表聲明稱,由於收到「可能的威脅資訊」”

The problem is: it is displayed wrongly.

I should explain that you just came across one of a translation company’s common nightmares: we’ve translated a sentence into Chinese, but the application in which it is being used is not able to display our translation correctly. The technical issue is that every computer application (like Word, Internet explorer and even your accounting software) is capable of displaying a range of character sets. Historically, there were fewer character sets – and the most common one was the USA set called “ASCII”. It contained all characters necessary to display English sentences and punctuation. But because it didn’t contain, for example, French accents or German Umlauts (or any Asian characters for that matter) – ASCII was very limiting when it came to displaying translations. The first uncoordinated solution brought the emergence of a raft of character sets – almost one for each language. Some were free and others were proprietary and had to be purchased. Due to deep dissatisfaction about this, Unicode was developed and agreed on as a standard. Unicode is a character set that contains 65,536 characters – covering almost every language and every language rule in the world. Of course ‘our’ XML supports Unicode – (but it allows any other character set too). The ingenious part of this solution is that XML allows and even asks you to declare which character set the file content format is based on. We love it for that, because we can identify at one glance if you are likely to encounter any problems with the translation and, if so, we can help you to solve it before you know it. The reason we can promise this has to do with the other issue we want to highlight today:

Move it along – Portability

Portability doesn’t necessarily mean that you can take your XML file with you on holiday (although you actually can) but rather that XML can be exchanged between applications rather easily. The beauty of this is not only that different software applications can do lots of different things with the data. Furthermore XML became somewhat of a killer application because it can be used across all kinds of platforms – be it Windows, Linux, Apple or even your mobile phone. The other aspect of this is “what” you store: XML is so flexible that you can use it to “parallel store” source text with the translated target text in the same file.

So not only you have a file that is quite likely to survive version changes and hardware development, but you also have options as to what you want to store. If you wanted to, you could, for example, store the content in 23 languages in the same file and publish your content in 23 languages from that one source to all the applications you want to. File management becomes a breeze and your publications can be easily managed across languages.
Christof Schneider
Lingo24 Translation Services

Felicia Bratu

Felicia Bratu is the operations manager of wintranslation, in charge of quality delivery and client satisfaction. As a veteran who has worked in many roles at the company since 2003, Felicia oversees almost every aspect of the company operations from recruitment to project management to localization engineering. She recently received certification as a Localization Project Manager as well as Post-Editing Certification for Machine Translation. Felicia holds a BSc. in Industrial Robotics from the University of Craiova, Romania.