ANLoc Conference 2011 - Wiki Notes

boutonniere polo shirts cardigan natural hair styles


Notes from some of the sessions in ANLoc Conference 2011 that took part in Kenya 21st - 23rd February 2011.

Please click on "add a child page" to create a new page for your session notes.

Data collection from Institutions

Problem Statement
• There are two kinds of institution who has OLD data. Proprietary and public institutions.
• All of data exists that either they don’t want to share the data or they haven’t consider it to share because they don’t have the technology and infrastructure , lack of awareness

Research Question
• Why these institutions are not sharing their data?
• How can we get the data from these institutions (Licenses, legal issues etc..)?
• How can we acknowledge/recognize institutions who share the data?
• What are the metrics to share the data from the institutions (Availability, accebility, comprehensiveness, Standard Format of the data, Quality etc…)?
Resources
• Legal/recognition
o Memorandum of Understanding
o Experations of the intent
o Collaborations
o Consortium
• Metrics
o Rating mechanisms
Suggestion
• African OLD consortium that requires membership from the institutions
• Who have the right
o To attend meetings
o To have a vote in how data management is used
o To abid on the agreement t of the consortium

Day 2: Spellchecker notes

Spellcheckers (using Hunspell engine)

information on a language

which you can consider to include in wordlist for Hunspell

as Hunspell

Hunspell, they could be included in applications that are still being localised

corsage plus-size dresses cocktail dresses flower girl dresses mother of the bride dresses quotes about life

SUB-PROJECT ON TOOLS

Problem Statement:
We are lacking certain tools or, if we have them at all, they are inadequate. We need tools for handling, creating and analysis.
• Tools for harvesting; e.g., extraction, structuring, annotation
• Tools and methods to improving analytic ability; e.g., morphological engines, POS tagging, text normalization
• Tools and methods for improving corpus quality; e.g., orthographic correction, spelling correction, normalizing orthography
• Tools to manually review and correct automated processing; e.g., training POS tagging
• Tools to manually create seed data

Research Questions:
• Are the tools available known to end-users?
• Do the tools available support African languages?
• Are there fundamental issues to deal with regard to African languages when it comes to the use of particular tools?
• What tools are necessary to make inputs and outputs (impact) measurable

Resources:
Details soon ...

Subproject: Multimedia

ANLoc 2011, Nairobi, 23 February 2011

Subproject: Linguistic Data (LD) and Multimedia

Problem statement
• Exclusion of people with disabilities from the digital world
o Also banking and public services, e.g. ATMs don’t talk to you, it’s all text based
• Video games (education, training, entertainment) are not localized into underserved languages, incl. African languages.
o Leading to missed opportunities for
• Educators
• Business
• …
• Multimedia access in countries with low literacy rates is the key to access and interaction with content. Yet the opportunities presented by this situation are not explored.
o Compare impact of local radio with huge growth rates to that of local language newspapers with huge losses.
• Video games are cool – take up is virtually guaranteed. Yet few games have been localized into underserved languages. Could be just for fun but have incredible impact
• Angry Birds (on Android): everybody is playing it!!! – So why not localize it?
• Doctor - Patient interaction is difficult sometimes because patients don’t have enough knowledge about medical terminology, especially if doctors and patients do not speak the same language.

Research questions
• How to reduce access limitations for people with disabilities using LD and LT, especially multimodal?
• How can LD make the localization of video games easier?
• How could localized multimedia games help in education?
o Imagine a game in Acan, where the game “reads out” the numbers shown on a dice when playing a game like ludo.
• Can localized games be used for learning
o Languages
o Maths
o Strategy
o …
• What role does the fun-factor play in the take up of localized content and in the recognition of its importance.
• Which localized multimedia systems could help improve doctor-patient communication?

Resources
• LD for sign languages
• Video dictionary for sign languages
• Audio corpus
• E-Speak corpora for 12 ACALAN languages to support open text-to-speech / speech-to-text (starting with Swahili and Arabic)
o on (mobile) Android
o Keyword-based speech recognition corpora for 12 ACALAN languages (not sure what is meant here?)
• Recorded audio sermons (exist)
• MONEY & PEOPLE’s TIME
• Subtitles
• Transcription of audio/video
• Dubbing
• Encyclopedic dictionaries: audiovisual illustrations of African plants, animals (cow patterns: Songhy has ~150 words for cows)
• Create repository of animal sounds: e.g. how do dogs bark in Swahili, Acan, …

Terminology Notes

Terminology: Basically, it means what has been decided on as a term to a concept.

In the decision-making on a term, two problems are encountered. These are i) Translators do not often see terms in context and, so, we or end-users can end up using wrong terms, and ii) Changing a term that is seen as official but considered as problematic.

Consider the word ‘OUTLINE’ in English and its translation in to some African languages; does it encode one concept even in the source language, English? No! And that means there is the need to look into conceptual meanings of each term.

Moving on, two questions also come to mind. These are:
i. What makes a good term?
ii. How do we develop a new term?

Basically, developing or coming up with a new term has to be done by considering three steps. These are:
i. Terminologist must understand the concept.
ii. Localizer must understand the concept/term that was developed by the terminologist.
iii. End-user must understand what localizer meant.

Also of importance is the fact that (1), (2) and (3) have to learn the new concept. Terms can be made from the definition of source words; e.g. consider a disease and the symptoms that are associated with it; i.e. the symptoms may describe the disease. Other prominent means are also as follows:
i. Borrowing;
ii. Metaphor
iii. Descriptive phrase
iv. Political agenda.

Working Group Report: Global Impact and games

Agenda
• Situation today
• Where we want to be in 2016
• How to get there

Situation today
• What we do is
o Not widely known
o Financially not viable / sustainable
o Not taken up
o Un-connected (country/region/continent/global)
• People do not know what is available in their language (people = end-users OR publishers OR politicians OR…)
• We are working in the background (it’s not always easily visible) as it is a service to others.
• We are the enablers but have no enabled / productized / pushed into relevant space
• What we do is not always relevant

Where we want to be in 2016
• Connected globally
o Know what is going on, where to get help, where to promote our work
• Viable and sustainable
• Relevant, e.g.
o School and education
o Health
o Fun
• Reference point
o Guidelines
o Tools
o Best practice
• Understand “market” requirements; how can what we do contribute to
o Improve living standards
o Create employment opportunities
o Increase people’s income
o Keep people out of jail
o Ensure people’s survival
• Have a Marketing and a PR department
• Have a team creating products
• Be multi: -modal, -domain, and mobile

How to get there
• Work with
o Children
• And their teachers
o Free licenses for schools
o Public service (local, state, intra-state)
• Identify 3 (or more?) core application areas where local language access has made a real difference AND were it would make a significant difference, e.g.
o Computerize patient’s records in your language
o A really successful game
o Weather, market and sowing information for farmers
• Implement proper market research
• Create a global nonprofit network and events
• Work with organizations on the ground who would like to use the results of our work
• Look at Language Technologies not as something that has a value per se but as a service or enabler
• Highlight impact of existing work
jogos de meninas

jogos de carros

jogos de tiro

jogos de princesas

jogos de cozinhar

Working Group Report: Open Linguistic Data and Localisation & jogos

This is about proposal writing, jogos de carros linking OLD to ANLoc

Question If we had LD – how would that help us?

Link to ANLoc vision? jogos de meninas
• Remove limitations for disabled people
o Speech might be language specific; colours, font size is not
• jogos de motos

Link to ANLoc (what we can do, jogos de Bicicletas not necessarily linked to the vision)
• Spellchecker project had need for OLD
o Results limited by availability of LD
o
• There are other activities doing this, investing heavily jogos de Guerra (because they have good reasons to do so), cross-reference
o TAUS Data Association
o ELRA/ELDA
o Linguistic Data Association (LDC)
• It’s necessary for
o Fonts
o Spellcheckers
o Thesauri

Link to Localisation
• Needed for
o Translation Memory ™
o MT
o Terminology

Which problems to solve
o Predictive text input (mobile phones)
o Translate English laws jogos de Barcos
into local languages
o Person calls, uses voice prompts to respond (though this does not scale)
o See “Freedom games ..... jogos da monica

Questions
• What is a reasonable amount of data to collect per language
• Needs to be open (to stimulate and enable activity)
• Needs investment
• Needs prioritization (which languages, domains; amount; tagging)
• Don’t forget speech; jogos de Labirintos
why not use radio programmes?

Observation
• People in Africa do not call up voicemail
• NEED a SUCCESS STORY!

jogos de Ben 10

jogos de corrida

jogos de sonic

Working Group Report: New Business Approaches

ANLoc 2011, Nairobi, 24 February, afternoon session, 15:10-15:45

Working Group New Business Approaches

About 14 participants (the LARGEST group:)

Introduction by Reinhard
Current mainstream l10n model does not work: focused on short-term financial return on investment; does not cover majority of languages and content of the world. Is there an alternative model?

The initial question:
1. What do you want to get out of your business?
2. How could that be achieved?
Moved on to a discussion on the form of organization most suitable
- Nonprofit Enterprise?
- For profit?
- Social Enterprise?
Finally discussed ways to reach
- more languages
- more content
- more people

Points discussed
- Need success stories such as local language local radio
- Effort has to be competitive to be sustainable
o Could it be nonprofit?
- Traditional l10n could find imaginative ways to deal with less viable content and languages, e.g.
o Microsoft’s Local Language Programme
- Work with government, influence policies
- Focus (less but higher impact)
- Content exchange (not just localization)
- Open standards, tools & technologies,

Challenges for Next Generation Localisation
- Access
- Volume
- Localisation

Working Group Report: New Business Approaches

ANLoc 2011, Nairobi, 24 February, afternoon session, 15:10-15:45

Working Group New Business Approaches

About 14 participants (the LARGEST group:)

Introduction by Reinhard
Current mainstream l10n model does not work: focused on short-term financial return on investment; does not cover majority of languages and content of the world. Is there an alternative model?

The initial question:
1. What do you want to get out of your business?
2. How could that be achieved?
Moved on to a discussion on the form of organization most suitable
- Nonprofit Enterprise?
- For profit?
- Social Enterprise?
Finally discussed ways to reach
- more languages
- more content
- more people

Points discussed
- Need success stories such as local language local radio
- Effort has to be competitive to be sustainable
o Could it be nonprofit?
- Traditional l10n could find imaginative ways to deal with less viable content and languages, e.g.
o Microsoft’s Local Language Programme
- Work with government, influence policies
- Focus (less but higher impact)
- Content exchange (not just localization)
- Open standards, tools & technologies,

Challenges for Next Generation Localisation
- Access
- Volume
- Localisation

Workshop on GNOME

Workshop on Gnome lead by Claude

Discussion were about the following topics:

Different platforms from which you can contribute to GNOME translations:
Linux, MacOS and Windows (several GNOME programs are also cross platforms)

Release times of Gnome:
Every six months for core modules – for other modules, some developers choose to release more often

How to contribute to the Gnome project:
You have to be a member inside a Language team on l10n.gnome.org
Those who were not yet members were helped to create an account
Process of deciding which modules to localise – prioritising according to own needs
Process of downloading the po files for translation
Process of uploading files ready for review or commit
Commit need a coordinator with specific rights

Bugzilla
How to report any problems encountered in any of the modules
Any suggestions you want to make to contribute to a module can also be channeled through Bugzilla

How to check source code when you encounter a problem on deciding how to translate a specific string

Web address for Gnome: http://l10n.gnome.org

test

testing