ReturnProjects Collaborators Publications Bibliography

I18N

The global nature and accessibility of the Internet has generated interest in globalization, i.e. making products, such as documents and websites, available in various languages and cultures. Globalization consists of two parts; internationalization and localization. Internationalization entails generalizing the product so that, without the need for redesign, it can be localized to specific languages and cultural conventions. While the interest in globalization is growing, up to date, not much research has been done on various aspects of globalization.
We investigated the globalization process using the XML-related tools. Our objectives included investigating and developing techniques for efficient representation and storage of source data and their translations, support for translation reusability, and XML data compression to increase efficiency of translation reusability.
Our preliminary research revealed that there are many interesting research questions resulting from the investigation of the globalization process. First, we designed the Internationalized Faculty Website (IFW), the XML-based system for creating internationalized products (both websites and documents) storing faculty CV (Curriculum Vitae), and showed how some globalization activities may be structured.

Next, we developed a complete Globalization Framework (GF), a client/server-based internationalization system, using Cocoon, within which we tackled the following issues: (1) various representations of source data to be used in the globalization process and techniques for entering source data and their translations; (2) various kinds of persistent storage to efficiently store source data and their translations; (3) various representations of translation memories, and translation reusability; and (4) XML data compression. 

An XML-based Globalization Framework (GF) for building internationalized products was designed withe following high-level requirements:
1. Multilingualism: support for issues such as right-to-left and vertical text rendering.
2. Efficiency: the framework will be efficient in terms of time and space.
3. Distribution: implementation of the framework allowing for cross-platform development.
4. Platform Independence: accessibility by users of heterogeneous systems.
5. Customizability: of both the framework and the final product.
6. Scalability: i.e., addition of new components to the framework does not hinder its performance.
7. Persistence and Reusability: of both source data and translations.
8. Low Cohesion: loosely connected components, allowing replacement of existing components and addition of new components without affecting the operation of existing components
9. Portability: data can be moved between different kinds of persistent storage and network nodes.

We used XML because of its support for Unicode, separation of concerns (XML describes content rather than formatting), and ease of conversion between various formats. We used schema-based XML documents, which are based on a pre-defined schema, and have to adhere to the grammar represented by the schema. As a programming language for the implementation, we chose Java, because of its support for internationalization, and various built-in tools, such as servlets, JSP, and JAXB.

GF is a structured environment, in which users are assigned separate roles; such as creators, translators and verifiers. Creators of the website submit a request, which defines the kind of information to be available on this website, provides complete source data in one or more languages, and specifies languages to which the source data are to be translated. Once the request has been submitted, the creators are not involved any more in the internationalization process. The request is processed by the system administrator who decides which translators can provide some required translations, and then after the translations have been completed, which verifiers can approve or correct existing translations. When the verification process is completed, the translations are made available to the creator. Therefore, this approach supports SoC, separation of concerns, and does not support cooperative development.

ReturnProjects Collaborators Publications Bibliography