What is UCS? What is ISO 10646?

By: Markus Kuhn Emailed: 1728 times Printed: 2329 times    

Latest comments
By: rohit kumar - how this program is work
By: Kirti - Hi..thx for the hadoop in
By: Spijker - I have altered the code a
By: ali mohammed - why we use the java in ne
By: ali mohammed - why we use the java in ne
By: mizhelle - when I exported the data
By: raul - no output as well, i'm ge
By: Rajesh - thanx very much...
By: Suindu De - Suppose we are executing

The international standard ISO 10646 defines the Universal Character Set (UCS). UCS is a superset of all other character set standards. It guarantees round-trip compatibility to other character sets. This means simply that no information is lost if you convert any text string to UCS and then back to its original encoding.

UCS contains the characters required to represent practically all known languages. This includes not only the Latin, Greek, Cyrillic, Hebrew, Arabic, Armenian, and Georgian scripts, but also Chinese, Japanese and Korean Han ideographs as well as scripts such as Hiragana, Katakana, Hangul, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Khmer, Bopomofo, Tibetan, Runic, Ethiopic, Canadian Syllabics, Cherokee, Mongolian, Ogham, Myanmar, Sinhala, Thaana, Yi, and others. For scripts not yet covered, research on how to best encode them for computer usage is still going on and they will be added eventually. This includes not only historic scripts such as Cuneiform, Hieroglyphs and various Indo-European notations, but even some selected artistic scripts such as Tolkien’s Tengwar and Cirth. UCS also covers a large number of graphical, typographical, mathematical and scientific symbols, including those provided by TeX, PostScript, APL, the International Phonetic Alphabet (IPA), MS-DOS, MS-Windows, Macintosh, OCR fonts, as well as many word processing and publishing systems. The standard continues to be maintained and updated. Ever more exotic and specialized symbols and characters will be added for many years to come.

ISO 10646 originally defined a 31-bit character set. The subsets of 216 characters where the elements differ (in a 32-bit integer representation) only in the 16 least-significant bits are called the planes of UCS.

The most commonly used characters, including all those found in major older encoding standards, have been placed into the first plane (0x0000 to 0xFFFD), which is called the Basic Multilingual Plane (BMP) or Plane 0. The characters that were later added outside the 16-bit BMP are mostly for specialist applications such as historic scripts and scientific notation. Current plans are that there will never be characters assigned outside the 21-bit code space from 0x000000 to 0x10FFFF, which covers a bit over one million potential future characters. The ISO 10646-1 standard was first published in 1993 and defines the architecture of the character set and the content of the BMP. A second part ISO 10646-2 was added in 2001 and defines characters encoded outside the BMP. In the 2003 edition, the two parts were combined into a single ISO 10646 standard. New characters are still being added on a continuous basis, but the existing characters will not be changed any more and are stable.

UCS assigns to each character not only a code number but also an official name. A hexadecimal number that represents a UCS or Unicode value is commonly preceded by “U+” as in U+0041 for the character “Latin capital letter A”. The UCS characters U+0000 to U+007F are identical to those in US-ASCII (ISO 646 IRV) and the range U+0000 to U+00FF is identical to ISO 8859-1 (Latin-1). The range U+E000 to U+F8FF and also larger ranges outside the BMP are reserved for private use. UCS also defines several methods for encoding a string of characters as a sequence of bytes, such as UTF-8 and UTF-16.

The full reference for the UCS standard is

International Standard ISO/IEC 10646, Information technology — Universal Multiple-Octet Coded Character Set (UCS) . Third edition, International Organization for Standardization, Geneva, 2003.

The standard can be ordered online from ISO as a set of PDF files on CD-ROM for 112 CHF.


Java Home | All Java Tutorials | Latest Java Tutorials

Sponsored Links

If this tutorial doesn't answer your question, or you have a specific question, just ask an expert here. Post your question to get a direct answer.



Bookmark and Share

Comments(0)


Be the first one to add a comment

Your name (required):


Your email(required, will not be shown to the public):


Your sites URL (optional):


Your comments:



More Tutorials by Markus Kuhn
Using UTF-8 in JSP / ASP / HTML pages.
What is UTF-8?
Unicode vs ISO 10646?
What is Unicode?
What is UCS? What is ISO 10646?

More Tutorials in Java
Update contents of a file within a jar file
Tomcat and httpd configured in port 8080 and 80
Java File
Java String
Count number of vowels, consonants and digits in a String in Java
Reverse a number in Java
Student marks calculation program in Java
Handling Fractions in Java
Calculate gross salary in Java
Calculate average sale of the week in Java
Vector in Java - Sample Program
MultiLevel Inheritance sample in Java
Multiple Inheritance sample in Java
Java program using Method Overriding
Java program to check if user input is an even number

More Latest News
Most Viewed Articles (in Java )
How to Send SMS using Java Program (full code sample included)
The Basic Structure of a Simple Java program
XML and Java - Parsing XML using Java Tutorial
What is Java?
Method Overloading (function overloading) in Java
Method Overriding in Java
How to use ArrayList in Java
FileReader and FileWriter example program in Java
Vector example in Java
Simple Port Scanner application using Java
append() in Java
Read from a COM port using Java program
Calendar - sample program in Java
Count number of vowels, consonants and digits in a String in Java
indexOf( ) and lastIndexOf( ) in Java
Most Emailed Articles (in Java)
Vector in Java - Sample Program
Configuring a DataSource in Tomcat
Applet Notinited : Applet xxxxxxxxxxxx notinited
Use of 'finally' in Java
Basics of Exception Handling in Java
Tomcat and httpd configured in port 8080 and 80
Calculate average sale of the week in Java
MultiLevel Inheritance sample in Java
Multiple Inheritance sample in Java
Java program using Method Overriding
LinkedList in Java
Taking the size of an Array at runtime & generate random numbers to populate the Array
Getting Started with Java
Creating Users and Passwords with Tomcat using tomcat-users.xml
What is JasperReports?