What is Unicode?

By: Markus Kuhn Viewed: 153196 times  Printer Friendly Format    

In the late 1980s, there have been two independent attempts to create a single unified character set. One was the ISO 10646 project of the International Organization for Standardization (ISO), the other was the Unicode Project organized by a consortium of (initially mostly US) manufacturers of multi-lingual software. Fortunately, the participants of both projects realized in around 1991 that two different unified character sets is not exactly what the world needs. They joined their efforts and worked together on creating a single code table. Both projects still exist and publish their respective standards independently, however the Unicode Consortium and ISO/IEC JTC1/SC2 have agreed to keep the code tables of the Unicode and ISO 10646 standards compatible and they closely coordinate any further extensions. Unicode 1.1 corresponded to ISO 10646-1:1993, Unicode 3.0 corresponded to ISO 10646-1:2000, Unicode 3.2 added ISO 10646-2:2001, and Unicode 4.0 corresponds to ISO 10646:2003, and Unicode 5.0 corresponds to ISO 10646:2003 plus its amendments 1–3. All Unicode versions since 2.0 are compatible, only new characters will be added, no existing characters will be removed or renamed in the future.

The Unicode Standard can be ordered like any normal book, for instance via amazon.com for around 60 USD:

The Unicode Consortium: The Unicode Standard 5.0,
Addison-Wesley, 2006,
ISBN 0-321-48091-0.

If you work frequently with text processing and character sets, you definitely should get a copy. Unicode 5.0 is also available online.

Most Viewed Articles (in Java )

Latest Articles (in Java)

Comment on this tutorial