Unicode and UTF-8 in C

By: Ramlak Emailed: 1598 times Printed: 2044 times    

Latest comments
By: rohit kumar - how this program is work
By: Kirti - Hi..thx for the hadoop in
By: Spijker - I have altered the code a
By: ali mohammed - why we use the java in ne
By: ali mohammed - why we use the java in ne
By: mizhelle - when I exported the data
By: raul - no output as well, i'm ge
By: Rajesh - thanx very much...
By: Suindu De - Suppose we are executing

Starting with GNU glibc 2.2, the type wchar_t is officially intended to be used only for 32-bit ISO 10646 values, independent of the currently used locale. This is signalled to applications by the definition of the __STDC_ISO_10646__ macro as required by ISO C99. The ISO C multi-byte conversion functions (mbsrtowcs(), wcsrtombs(), etc.) are fully implemented in glibc 2.2 or higher and can be used to convert between wchar_t and any locale-dependent multibyte encoding, including UTF-8, ISO 8859-1, etc.

For example, you can write

  #include <stdio.h>
  #include <locale.h>

  int main()
  {
    if (!setlocale(LC_CTYPE, "")) {
      fprintf(stderr, "Can't set the specified locale! "
              "Check LANG, LC_CTYPE, LC_ALL.\n");
      return 1;
    }
    printf("%ls\n", L"Schöne Grüße");
    return 0;
  }

Call this program with the locale setting LANG=de_DE and the output will be in ISO 8859-1. Call it with LANG=de_DE.UTF-8 and the output will be in UTF-8. The %ls format specifier in printf calls wcsrtombs in order to convert the wide character argument string into the locale-dependent multi-byte encoding.

Many of C’s string functions are locale-independent and they just look at zero-terminated byte sequences:

  strcpy strncpy strcat strncat strcmp strncmp strdup strchr strrchr
  strcspn strspn strpbrk strstr strtok

Some of these (e.g. strcpy) can equally be used for single-byte (ISO 8859-1) and multi-byte (UTF-8) encoded character sets, as they need no notion of how many byte long a character is, while others (e.g., strchr) depend on one character being encoded in a single char value and are of less use for UTF-8 (strchr still works fine if you just search for an ASCII character in a UTF-8 string).

Other C functions are locale dependent and work in UTF-8 locales just as well:

  strcoll strxfrm

C Home | All C Tutorials | Latest C Tutorials

Sponsored Links

If this tutorial doesn't answer your question, or you have a specific question, just ask an expert here. Post your question to get a direct answer.



Bookmark and Share

Comments(0)


Be the first one to add a comment

Your name (required):


Your email(required, will not be shown to the public):


Your sites URL (optional):


Your comments:



More Tutorials by Ramlak
While Loop in VB.net
For Each…Next Loop in VB.net
For Loop in VB.net
Do Loop in VB.net
Setting Up SSL on Tomcat
Unicode and UTF-8 in C
Sample program to demonstrate the use of ActionListener
java.io.IOException: HTTPS hostname wrong: should be
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
NotifyUtil::java.net.ConnectException: Connection refused: connect
'LINK.EXE' is not recognized as an internal or ext
Using Transactions in JDBC
What is the ACID principal?
How connection pooling works in Java and JDBC
A simple JDBC application sample code

More Tutorials in C
Sum of the elements of an array in C
Printing a simple histogram in C
Sorting an integer array in C
Find square and square root for a given number in C
Simple arithmetic calculations in C
Command-line arguments in C
Calculator in C
Passing double value to a function in C
Passing pointer to a function in C
Infix to Prefix And Postfix in C
while, do while and for loops in C
Unicode and UTF-8 in C
Formatting with printf in C
if, if...else and switch statements in C with samples
Statements in C

More Latest News
Most Viewed Articles (in C )
Using memset(), memcpy(), and memmove() in C
Constants and escape sequences in C
UNIX read and write system calls sample program in C
A Storage Allocator sample program in C
ERRNO.H Header File in C
Passing double value to a function in C
Printing a simple histogram in C
Getting Started with C
File Copying in C
Pointers and Arrays in C
Address Arithmetic and pointers in C
Table Lookup - hashtab - example program in C
fgets(), fputs() - Line Input and Output - sample program in C
Using realloc() Function in C
Trigonometric, Hyperbolic, Exponential and Logarithmic Functions in C
Most Emailed Articles (in C)
Conditional Expressions in C
fgets(), fputs() - Line Input and Output - sample program in C
Symbolic Constants using #define in C
#if, #elif, #ifndef, #ifdef in C (Conditional Inclusion)
Address Arithmetic and pointers in C
Character Pointers and Functions in C
Initialization of Pointer Arrays in C
Pointers to Structures example program in C
Using printf function in C
A Storage Allocator sample program in C
Using free() Function in C
The C Character Set
Writing The First C program
Formatting with printf in C
while, do while and for loops in C