Feed aggregator

OpenID login failed.

How To Embed kasahorow Keyboard Layouts On Your Own Website

kasahorow - Sun, 04/07/2010 - 14:22

Ever wanted to type in most of the Ghanaian local languages?. With kasahorow virtual keyboards, you can do that. You can embed them on your own website to allow people to type in the Ghanaian local languages. Imagine updating your Facebook status message in Akan? or using kasahorow API to allow visitors of your website to search for local words or do simple translations. Don't you love love kasahorow tools :-)

read more

Emosies en lokalisering

Friedel en ander frappanthede - Sat, 19/06/2010 - 09:37

Ek moes onlangs 'n effens moeiliker stuk vertaalwerk doen vir Pidgin. Deel daarvan was 'n uitbreiding vir die XMPP-protokol om emosies te standaardiseer. Met dié uitbreiding kan verskillende geselsprogramme inligting uitruil oor die gebruiker se gemoedstoestand op 'n standaardmanier. Maar die teks wat vir die gebruiker gewys word, moet natuurlik vertaal word, en dit is toe glad nie so maklik nie.

Daar is meer as 80 emosies wat in die spesifikasie beskryf word en dit sluit ook 'n paar fisiese toestande in (soos "koud" en "siek"). Ek val toe weg om te vertaal, maar besef vinnig dat hierdie nie so maklik is nie. Eerstens ken ek nie die nuanses van al die Engelse terme goed genoeg nie, maar natuurlik raadpleeg ek toe maar net woordeboeke. Kort voor lank is daar egter terme met veelvuldige vertalings, waarvan sommige van die vertalings oorvleuel met die vertaling van ander terme. Sommige van die terme waarmee ek toe nou sukkel:

  • Amazed, In awe
  • Thankful, Grateful
  • Amorous, In love, Aroused
  • Dismayed, Dejected
  • Contented, Satisfied
  • Humbled: Dit is dubbelsinnig. Dit kan vernederd of nederig beteken — nogal 'n groot verskil. Die betekenis word in die spesifikasie gegee, maar hoe gaan gebruikers weet watter betekenis hier bedoel word?
  • Lucky en Happy: Hierdies vertaal ongelukkig altwee na "Gelukkig" en onderskeid word gewoonlik uit konteks afgelei. Hier sal waarskynlik geen konteks wees waar hierdie vertalings gebruik word nie; dalk net 'n lys emosies om uit te kies.

Die doel is natuurlik nie net om sommer net 'n vertaling te kry nie, maar iets wat akkuraat is, onderskeibaar is van die ander, en wat die regte boodskap gaan oordra aan die persoon met wie mens gesels — wat baie moontlik 'n ander geselsprogram gebruik, dalk selfs in 'n ander taal. Dit help nie as twee terme op die selfde manier vertaal is en mens sien iets soos "Gelukkig" twee keer in die lys nie.

Alhoewel dit in die spesifikasie die idee gee dat die lys emosies gebaseer is op 'n klomp goeie navorsing oor emosies (insluitend oor kultuurgrense heen), wonder ek hoe nuttig hierdie lys is terwyl dit terme het wat so naby aan mekaar is. Alhoewel sommige van die vangplekke spesifiek met Afrikaans te doen het, lyk dit of daar tog 'n paar terme hier is wat verwarring vir vertalers kan skep, en boonop nog vir gebruikers ook.

Later in die selfde lêer kom ek toe 'n klomp goeters teë uit die OSCAR-protocol (gebruik in ICQ en AIM) waarmee gebruikers kan aandui waarmee hulle besig is. Daar is toe hierdie drie vertalings: Surfing, Searching the web, Browsing the web — frustrasie. Dan is daar ook 'n paar wat ek nie in sou belangstel om te weet dat iemand daarmee besig is nie. 'n Paar hiervan het toe ook nie my aandag gekry nie.

Lesse hieruit
  • Internasionalisering is moeilik as dit met emosies te make het.
  • Programmeerders moet kommentaar gee om dinge so goed as moontlik te verduidelik aan vertalers.
  • Dié kommentaar moet in die vertaallêer wees. Die skakel na die spesifikasie is eers later op die poslys met vertalers gedeel. Mense wat Pidgin in die toekoms vertaal gaan dit mis.
  • Goeie woordeboeke is fantasties. Dié werk sou aansienlik langer geneem het as ek nie goeie vertrekpunte telkens in die woordeboeke kon kry nie.
  • Dit het baie gehelp om 'n tweede opinie te vra.
  • Dié woorde kan nie goedsmoeds in 'n sin ingevoeg word nie. Mens kry warm, maar voel hartseer. Daar is dalk nog meer redes in ander tale om nooit te probeer om hierdies in 'n sin in te prop nie.
Categories:

Exposing Our API

kasahorow - Mon, 14/06/2010 - 11:30

I'm sure you have been asking several times, do kasahorow have an API I can work with to build this killer app. The answer I have for you is, YES we do have one.

In this post, I am going to expose you to the API and how you can use it for whatever purpose you have in mind and bear in mind, you cannot abuse it :-).

read more

What's new in Virtaal 0.6.1

dB - Tue, 08/06/2010 - 05:31

I missed out a post on Virtaal 0.6.0 so I'll wrap both the major and 0.6.1 bugfix release together. For those not in the know, Virtaal is our Computer Aided Translation Tool (CAT) that we've been developing as part of the ANLoc project.

Our aim in Virtaal continues to be to have a simple clean interface, yet to present powerful features to translators. We seem to be doing the right thing when you read the following comments from a recent review of Virtaal, "It’s clean interface and ease of use are the best virtues of this application. ... there [are] NO extra buttons, and the layout looks like a side-by-side sheet presentation. Beautiful. It also allows access to machine translation services such as Google, Moses and Opentran. Other features include highlighted diffs between the translation memory suggestions, a don’t-touch-your-mouse approach, and much more."
<!--break-->
So what did we add to 0.6.* version of Virtaal? Let's have a look.

Welcome to Virtaal's welcome area

The most notable change in Virtaal 0.6.0 is the new welcome area. In early versions of Virtaal new users where faced with the "What now" thought as they opened the tool and faced a blank screen. Since Virtaal has a very clean interface there aren't any hints about what the application does. There are no unused panes for TM or glossary entries. We realised that we could actually make use of this space to enhance usability and help newbies. In true Virtaal fashion we avoided adding a splash screen or tip of the day dialogue. What we developed is the following welcome screen and we hope that you like it.

The welcome screen is not meant to be just a pretty face, we wan it to be really useful for the translator. As you can see it gives easy access to previous translations, guides and other help so both the seasoned translator and the newbie are easily and quickly helped.

We hope to add other features to the welcome screen in future versions, hopefully emerging as a dashboard of sorts where we can show the state of work and activities currently in progress.

New and improved Machine Translation plugins

We added support for Microsoft Translator (or Bing Translator as they sometimes call it). You may recall that we did a special release of this plugin on Windows to allow translators to translate into Haitian Creole at the time of the Haitian earthquake. A recent study comparing Google, Yahoo's Babelfish and Microsoft Bing MT solutions seems to indicate that for short texts that Microsoft and Yahoo may offer better results.

We've supported Apertium, the FOSS rule-based Machine Translation engine, for a long time now. Apertium recently created a new service API that mostly mimics Google's MT API. We've adapted the Virtaal plugin to use this new API. While most other MT engines are statistical based, Apertium uses a rule based approach. For the languages that Apertium supports it might present better MT suggestions then statistical MT services.

Improved format support

Virtaal uses the Translate Toolkit to provide support for various localisation formats. With this release we now integrate support for OmegaT glossary files, you can now edit these directly in Virtaal instead of in a spreadsheet or wordprocessor. We hope this leads to more reuse of terminology.

An XLIFF file can provide alt-trans entries and Virtaal will now display these in the suggestion dropdown. In the screenshot below you can see the suggested translation as the first entry provided by user 'admin'.

When you are working in Pootle with XLIFF files you will now be able to review suggestions off-line. XLIFF files supplied to you might also contain alt-trans entries with MT and TM suggestions, these can now also be seen when you translate.

In case you've forgotten Virtaal can edit Qt Linguists .ts files, thus you can translate pretty much any FOSS applications in Virtaal. With this release we fixed some bugs relating to plural support in newer TS files so we should be able to manage any file currently in the wild.

New languages, improved language features and language related bugs

A translation tool that isn't itself translated! We're proud to see a growing number of people contributing translations to Virtaal. We've added: Bulgarian, Icelandic and Thai and of course many other translations have been updated. Virtaal is now translated into 40 languages.

Virtaal running with Translate Toolkit, versions > 1.7.0, is able to detect your target language based on the 'Language-Team' header entry in your PO files. So your language pair selection is almost always going to be just correct.

We now have better interaction with the Voikko backend of Enchant and improved autocorrect data for Polish (yes we do autocorrect using OpenOffice.org data files). We've also added a workaround for GNOME bug 569581 (Windows US intl layout, Afrikaans 'n).

Accessibility

We worked hard in this release to make sure that Virtaal works well in high contrast modes to assist people with visual disabilities.

The following before and after pictures show the changes in a High Contrast Inverse theme. While the changes are small it's worth realising that the tool was unusable for someone needing inverse colour schemes in order to use a computer. You will notice that the text input area is now properly rendering as light on dark. You can't see it here but we also made sure that the placeable colours, placeable highlighting and terminology colours all now work in inverse.


A raft of bugfixes and small features
  • Virtaal has a very good system to handle placeables. We've now made it possible to select placeables from the plural in the source as well as to cycle through the placeables back to selecting the whole source text after you've moved through all placeables.
  • Support for proxy servers - Virtaal just didn't work in university labs, hopefully this provides enough support for most cases.
  • Reduced flickering in the editing area - stepping through large units in Virtaal produced too much flicker, now we will be gentle on the eye.
  • Use the most frequent word as autocomplete suggestion - we just weren't giving you the best autocomplete suggestion all the time, now we do.
  • Better handling of errors in the Open-Tran service - Open-Tran.eu has been down quite a lot recently and we get a few XMLRPC errors, these are now all caught.

You can read the release notes for other minor bugs that were fixed in 0.6.0 and 0.6.1.

Typing Keyboard For Windows

kasahorow - Wed, 02/06/2010 - 14:00

Keyboard layout for windows for various African languages.Eg, Akan, Ga, Ewe, Hausa, etc

Vertaling komende Saterdag

Friedel en ander frappanthede - Tue, 25/05/2010 - 23:01

Ek is deur 'n paar vriende oortuig om 'n geleentheid te reël om 'n paar mense bymekaar te kry om bietjie te vertaal. Dwayne is so gaaf om die Translate.org.za-kantoor aan te bied vir die geleentheid.

Dis sommer 'n geleentheid vir diegene wat belangstel in Afrikaanse lokalisering vir bietjie touwys maak, bietjie kuier, bietjie vertaal, bietjie inspireer, ens. As iemand belangstel om by ons aan te sluit, moet hulle my so gou as moontlik laat weet, ter wille van die reëlings en koördinasie.

Plek

Translate.org.za se kantoor in Groenkloof, Pretoria. Daar is heelwat woordeboeke, vinnige internet, en sitplek vir 'n paar mense.

As mense virtueel wil aansluit, kan ons dalk iets reël, alhoewel dit dalk makliker gaan wees by 'n volgende geleentheid. Laat weet gerus.

Tyd

Saterdag, 29 Mei 2010

Ons begin 10:00 en hou aan tot so 12:30 of omtrent middagetenstyd. Daarna kan die wat wil sommer bietjie kuier terwyl ons iets ligs eet.

Wat gaan vertaal word?

Voorstelle van programme om te vertaal is welkom. Alhoewel ons eintlik aan omtrent enigiets kan werk wat iemand wil doen, raai ek 'n paar van ons gaan aan hierdie projekte wil werk (party het reeds begin):

Dasher
Xiphos
GNOME se speletjies
gedit

Meeste van hierdie programme loop op Linux en Windows, en 'n paar nog op OSX ook. Daar is reeds gedeeltelike vertalings vir al hierdies om mee te werk.

Verder wil ek graag werk aan die verbetering van die komende GNOME 3, en ek sal graag dat ons kyk na iets wat daartoe bydra.

Categories:

Ontoeganklike vertalings

Friedel en ander frappanthede - Mon, 17/05/2010 - 11:27

Ek is al vir 'n geruime tyd geïnteresseerd in toeganklikheid en glo daar is klomp raakpunte tussen toeganklikheid en lokalisering. By voorbeeld: wat help 'n vertaalde program as daar nie 'n skermleser is in die betrokke taal nie? Die onlangse geskryf oor toeganklikheid om die GNOME-planeet het my seker geïnspireer om dinge in Virtaal te verbeter.

Virtaal verlig sekere elemente in vertalings soos XML-etikette en veranderlikes met kleure ter wille van leesbaarheid. Dit is ook maklik om hierdie plaasbare items in te voeg sonder om te tik. Dit kan produktiwiteit aansienlik verhoog. Tot onlangs het ons egter nog die kleure as enkelwaardes in die kode gespesifiseer, wat natuurlik beteken dat dit nie goed gewerk het met inverse temas (ligte teks op donker agtergrond) nie.

Met die vrystelling van Virtaal 0.6 was ek in my noppies oor die verbeteringe wat ons gemaak het. Ons speel nou beter saam met GTK-temas wat maak dat ons beter werk in inverse temas. Verder het ons ook visuele kontras verbeter vir die inverse temas, en het baie goeie terugvoer gekry van 'n gebruiker wat nou Virtaal sonder probleem kan gebruik.

Hier is sommige van die verbeteringe:

  • Die URL en XML-etikette is ligter, om te verseker dat hulle steeds gelees sal kan word.
  • Wanneer die soekfunksie niks kry nie, het dit 'n rooi agtergrond (soortgelyk aan Firefox). Nou is dié agtergrond is donkerder sodat goeie kontras verseker met die soekteks.
  • Die term waarvoor daar 'n voorstel is ("Download") het 'n donker agtergrond.
  • Die plaasbare item met syfers ("0.6.0") het nou 'n donker agtergrond, en die voorgrondkleur is redelik lig.

Die laaste twee was moeilik, want daar moet goeie kontras wees tussen die voorgrond en die agtergrond, maar die verskil met die gewone teks moet ook duidelik wees. Aangesien die gewone teks reeds optimale kontras het, beteken dit dat ons in effek die kontras verlaag.

Dit alles word outomaties gedoen as Virtaal in 'n inverse tema uitvoer. Ek is bly dat hierdie reg werk sonder dat die gebruiker enigiets hoef te soek of op te stel. Dit dateer ook op as die stelseltema verander terwyl Virtaal uitvoer.

Categories:

PanAfrLoc / Malagasy

PanAfriL10n - Sun, 16/05/2010 - 03:15
Malagasy On this page/Sur cette page... (hide)1. Classification / Classification2. Where Spoken / Localisation géographique 1. Classification / Classification Malagasy belongs to ...

Localisation: How we guess the target translation language in Virtaal

dB - Thu, 13/05/2010 - 13:40

In Virtaal, our desktop Computer Aided Translation (CAT) tool, we've have a number of usability goals. One of those is trying to limit the configuration required to use the tool. Most of us think nothing about setting the target translation language in our CAT tool when requested. But we've always asked the question, can't the CAT tool work this out itself?

In this post I'll talk about how we've been able to correctly determine the target language for about 87% of the localisation files on a typical Linux system.
<!--break-->

I'm a translator, how does this help me?

Most translator, who work in one language and one direction, are probably wondering why this is an issue. For anyone who translates in both directions, translates a number of language or who manages a number of translation teams will understand just how important this feature is. When they open the files their language settings will be changed and should be correct.

The feature allows the CAT tool to configure itself without any intervention from the translator, apart from the simple act of opening a file for translation. But even a single language translator will benefit from this feature as a translator will examine other translations to see how someone translated the source text. In this case Virtaal's settings will change for this quick lookup and will change back when the real translation begins, all without the translator doing anything.

I personally review a number of translated languages. I like using Virtaal as it simply reconfigures itself to the target language when I open a file. Mostly I don't need to even check that the selected target language is correct. My Machine Translation, Translation Memory, terminology and spell checking are automatically enabled for the correct target language.

A little history and some background information

We've been building this language guessing for Virtaal for some time now, our aim is to do the right thing with minimal user input. When first run Virtaal's approach is to first try to determine the target language by examining the environment. This mostly involves looking at your locale. This was our first effort to get the language right.

The Translate Toolkit, on which Virtaal is built, allows us to determine the source and target languages of a number of file formats (TMX, XLIFF, Qt). Thus once we load a file we're able to look at the file metadata to determine the language pair. But this doesn't work on PO files since there isn't any target language information in the header.

The missing target language information in Gettext was why I proposed that we add a language header to Gettext PO files. Fortunately this idea was accepted upstream and it has been implemented in Gettext. However, we're still waiting for this new version of Gettext to be released and once released we'll still need to wait quite some time for it to gain wide adoption.

So while we waited for Gettext 0.18 to be released we implemented ngram matching techniques as another approach to guess the target language. This works quite well but we need language models for each language that we need to guess. Ngrams are still useful for us in Virtaal as we add the ngram guessed language to our language pair chooser, thus if the target language is incorrectly indicated we'll still include the ngram guessed pair the language chooser list.

Realising that we can't wait around for Gettext 0.18 to be released and for it to filter down into distribution over 1-2 years we decided to look at other ways in which we could more reliably determine the target language based on information in the file header.

Language-Team header analysis

We've looked at analysing the Gettext 'Language-Team' header entry to help determine the target language. To do this analysis our script msgunfmt'ed the 15,000+ MO files I have on my Fedora 12 installation. This created long lists of the potential Language-Team headers that we then ran through our guesser. We added information and improved the guesser as we identified patterns in the extracted headers.

In the analysis we found the following:

  • A Language-Team of English is almost always a false positive. E.g "Kannada <en@li.org>", an English email address for the Kannada team, unlikely.
  • Small languages almost always get this header wrong. E.g an Hawaiian translation has this header "English <en@translate.freefriends.org>".
  • Some meta language translation projects don't distinguish between the languages that they are translating. This mostly affects Indic languages e.g. "<info.gist@cdac.in>" is used for a number of Indic translations.
  • Some projects use generic contact information. Examples include: wxWidget, Novell, Compiz and OpenSUSE. Technically there is nothing wrong with this and we can work around it if the actual target language is mentioned, but often the target language isn't mentioned.
  • Even with these issues we can safely guess 87% of the target languages from the headers with minimal false positives.

In the cases where we can't guess the language we're almost always dealing with: missing or default header information, English headers, or personal email addresses that we've excluded.

Here are some of the details of our analysis:

  • Analysed 15244 MO files.
  • Could not classify 7,5% (1133).
  • Incorrect language classification for 5,5% (848) of the files. Many of these cover issues were translators have indicated regional variants, e.g. de vs de_DE, af vs af_ZA, bn vs bn_IN or different encodings e.g. sr vs sr@latin.
  • Only 1,8% (287) are true misclassifications. Most of these are due to incorrect language information in the headers. This probably indicates that the data is quite reliable more then it highlights any issue

So combining this data we can safely and correctly guess 87% of the language teams based simply on the team header. We expect 5,5% to be incorrect or to not capture the regional and encoding information. We can't guess 7,5% of the headers.

Even though we'll miss guess some target languages, the translator will still be able to set their target language within Virtaal. This will allow them to correct any bad classifications and also ensure that when saved the file will use the correct Gettext 'Language' header. We won't need to guess the language again.

How does our guesser use the Language-Team header to guess the target language?

Our analysis of existing headers was to help build our actual Language-Team guesser. We guess the target language as follows:

  1. Firstly before we even try analyse Language-Team we first look for the Language header, then headers used by Poedit. These headers are likely to be correct and are set by the users to actually indicate their target language. If we don't find those header then we move onto the Language-Team analysis.
  2. Our first step in the Language-Team header is to check with a number of regular expression for common language team email addresses. Thus "<fr@li.org>" is easily identified as French. By using a regex we also future proof the guesses and can detect teams that emerge later.
  3. Then we use snippets of contact information which are almost always email addresses and sometimes URLs. These are essentially team contacts that can't be detected with our regular expressions.
  4. Lastly, we use snippets of language names both in English and the target language, e.g. Dutch and Nederlands.
  5. If all that fails we give up guessing.
Can I see this in action?

You can see this in action if you run a recent version of Virtaal with Translate Toolkit 1.7.0 (which was released on 2010-05-12). Windows users will need to wait for a new release of Virtaal (>v0.6.0).

How can you help?

We think we've got most of the data sorted out, if you can help us reduce the 5,5% misclassification and 7,5% unclassifiable entries then that would be great.

If you are a translator then please have a look at our team.py file and check that your team's email address (see LANG_TEAM_CONTACT_SNIPPETS) and that your language name, variants or other defining information (see LANG_TEAM_LANGUAGE_SNIPPETS) are listed.

But probably the easiest and best way that you can help is to use a good localisation tool, such as Virtaal, Pootle or Poedit, that captures the target language information in the header. The next best thing is to make sure that you make use of very standard contact information for your team so that its easy to guess your language.

Deurlopende integrasie vir lokalisering - wat van vals positiewes?

Friedel en ander frappanthede - Wed, 12/05/2010 - 11:07

My kollega Dwayne het pas 'n interessante stuk geskryf oor die gebruik van deurlopende integrasie vir die toets van ons produkvertalings. Nou wil ek begin dink aan hoe ons dit kan uitbrei met pofilter.

Een verskil tussen msgfmt en pofilter is dat msgfmt enkele streng toetse het waaraan vertalings moet voldoen. Daarenteen het pofilter 'n klomp kwaliteittoetse waarvan nie almal doodsake is nie. Terwyl die toetse soms verkeerdelik kla oor iets wat fout is, dink ek amper meer aan pofilter as iets soos "lint" — iets wat jou help om die vertalings skoon te maak en te verbeter, maar dis nie alles noodwendig krities belangrik nie (sommige daarvan is natuurlik wel krities belangrik).

Daarom praat ons nou oor hoe om vals positiewes te kan merk sodat pofilter nie in die toekoms daaroor sal kan kla nie (soortgelyk aan hoe mens lint-toetse kan stil maak met kommentaar). 'n Inskrywing in die lyntjies wat met #, begin sou ideaal wees, maar die gettext-programme behou nie enige niestandaard inskrywings daar nie. Dus sal dit waarskynlik in die kommentaar moet wees. Dit het die voordeel dat 'n vertaler maklik dié velde kan redigeer soos kommentaar.

Maar as dit in die kommentaar is, hoe gemaak tydens die opgradering van vertalings? As msgmerge of pot2po 'n eenheid wasig merk (die Engels het verander), dan sal mens tog wel die kwaliteitstoets weer wil doen, maar daar sal nou 'n merker wees wat sê dat dit nié moet gebeur nie. Dit sal ideaal wees as ons 'n verandering in die msgid óf msgstr sal bespeur en eenvoudig die toetse in elk geval uitvoer.

Ek dink daaraan om 'n hutswaarde in te voeg, maar dit sal natuurlik nie so mooi lyk nie, en is nie meer maklik redigeerbaar vir 'n mens nie.

Categories:

Continuous integration, can it work for software localisation?

dB - Tue, 11/05/2010 - 22:29

At Translate.org.za we want to keep delivering the best FOSS localisation tools. To do that we've started using Continuous Integration (CI) in the development of Pootle, Virtaal and the Translate Toolkit. We're using a tool called Hudson to manage our CI process.

Since the tools that we develop are all focused on localisation we thought, "Wouldn't it be great if we could use CI to continuously check our translations?". I hope that you will start to use some of our scripts, or your own, to ensure that localisation is part of your CI build process.

<!--break-->

The problem

Since we build localisation tools we pride ourselves on doing localisation well. But even we've made a few mistakes along the way, mistakes like:

  • Shipping broken translation files. There is nothing quite as frustrating as sending out an application that breaks because of a typo in the translation of a variable. The cost of fixing the issue and releasing a bug fix build is just too much for a small development team. We want to focus on cool new features, we'd rather not fix a bug that we could have caught with CI.
  • All text not present in the translation files. We work on string freezes and try hard not to change things while in freeze. So nothing hurts as much as discovering that a feature you added many months ago is not actually present in the new translation files. You now realise that you are about to release a feature that will only be in English. So now we must break string freeze and get the new files to translators with a lot of communication overhead. For translators it means updating their just completed translations to the new set of translations, they might not have the time. Many of these are simple steps but they require lots of overhead and because so many people are involved there is a real potential for other errors to occur. So we want to make sure that when we enter string freeze everything that we want to be translated is ready for translation. We'd rather not break string freeze simply because we forgot to add a file to POTFILES.in.
  • Broken XML file building. As we're using intltool we build some files (mimetype XML and .desktop) from our translations. We don't need to run this step very often, so infrequently in fact that we might only run it as we prepare to release. We'd like to catch any errors in the building of these files when they the error occurs, not just before the release.

We want to apply CI to our localisations because we're not machines, we simply want to be able to forget about localisation issues while we work towards a release. We want to know that our code is always ready for localisation and that our localisations are always 100% technically correct. We don't want any surprises and we want to fix errors that occur when they occur.

We've manage to achieve this.

As you can see above we have a Hudson job called validate-translations that runs a number of localisation related build steps.

The solution to catching technical localisation errors

We run intltool as part of the build process to catch files that aren't being extracted for localisation and for mimetype and .desktop file building from the translation. That part was easy, the harder part of making sure that translations that are committed are correct, for that we built a more elaborate script around the Gettext tools.

Hudson can monitor errors reported in the JUnit XML format. Our solution was to build a simple bash script that exercises Gettext's msgfmt command and outputs the results in a JUnit XML file. The script is simple. For each PO file that it finds it runs msgfmt -cv. Any errors are captured so that we can more easily fix them when we review the results.

Feel free to use the JUnit XML script for PO files within your own Hudson jobs.

Since starting this CI process we've seen good results.

As you can see above we solved 20 msgfmt errors over just three builds. More importantly we can now can safely modify our code and know that our CI will catch any localisation issues.

So what is next for CI and localisation?

At the moment we simply catch msgfmt errors, we will be looking to add the following:

  • PO file snippet - it would make it easier to find and fix the errors that we find if we have the snippet of PO that caused the error. Currently we only have the line number and have to first find that line in the PO file before we can even check what is causing the error. With the snippet we can make the full diagnosis while reviewing the Hudson test failure report.
  • pofilter checks - the Translate Toolkit has a number of checks (47 in fact) that catch technical localisation errors. We'd like to XML test result files that show those errors. The Translate Toolkit is very useful for human review but we'll need to create a method to mark false positives that we wish to ignore in the future test runs.
  • pocount - We want to count the translation status of a group of PO files. You might wonder why we'd want to do that. The reason is that many projects ship with translations that meet some level of completeness. For Virtaal, our Computer Aided Translation tool, we set that threshold at 75% complete for shipping translations. With pocount we should be able to automate this so that we return a test failure if a translation falls below this threshold. If you are able to compare the files that meet the threshold with the files listed in a LINGUAS file (a file that lists all shipped translations) then it's possible to raise an error when a new file needs to be added to the LINGUAS file to ensure that it's shipped. Similarly it would be possible to raise an error if an existing translation falls below the threshold, in which case it needs to be removed from the list of shipped localisations. Now there will be no risk of shipping incomplete translations or of forgetting to ship a new translation.

I'll try to post new blog entries when we add some of these new features or scripts to our own build process.

PanAfrLoc / Nko

PanAfriL10n - Fri, 30/04/2010 - 00:16
N'Ko script The N'Ko (ߒߞߏ) script was created in 1948 by Sulemana Kante (Guinea) and is used mainly for Manding languages. It is most popular in Guinea and Cote d'Ivoire, with ...

PanAfrLoc / Arial

PanAfriL10n - Sat, 24/04/2010 - 19:56
IA IA friend ...

Verlopig klaar met lokaliseringsgids vir Amharies

Friedel en ander frappanthede - Sat, 20/03/2010 - 09:21

Na my besoek aan Limerick het ek en Solomon 'n proefkopie van die lokaliseringsgids vir Amharies. Dit is spesifiek bedoel as 'n manier om sprekers van Amharies te help om 'n paar van die sake wat spesifiek voorkom in lokalisering in Amharies beter te verstaan en te hanteer. Ons hoop dat dit vertalers sal help om te begin werk met sagtewarelokalisering.

Ons het dit nou uitgestuur aan 'n paar mense vir kommentaar. Laat weet gerus as nog iemand daarna wil kyk en kommentaar lewer. Ons kan dit hopelik binnekort publiseer op die webwerf van die Afrikanetwerk vir Lokalisering.

Categories: