Perl's Encoding::FixLatin equivalent in PHP

By: squeegee Emailed: 1648 times Printed: 2121 times    

Latest comments
By: rohit kumar - how this program is work
By: Kirti - Hi..thx for the hadoop in
By: Spijker - I have altered the code a
By: ali mohammed - why we use the java in ne
By: ali mohammed - why we use the java in ne
By: mizhelle - when I exported the data
By: raul - no output as well, i'm ge
By: Rajesh - thanx very much...
By: Suindu De - Suppose we are executing

I think this is a reasonable port of Perl's Encoding::FixLatin by Grant McLean, which converts a string with mixed encodings (ASCII, ISO-8859-1, CP1252, and UTF-8) to UTF-8.

<?php

function init_byte_map(){
  global $byte_map;
  for($x=128;$x<256;++$x){
    $byte_map[chr($x)]=utf8_encode(chr($x));
  }
  $cp1252_map=array(
    "x80"=>"xE2x82xAC",    // EURO SIGN
    "x82" => "xE2x80x9A",  // SINGLE LOW-9 QUOTATION MARK
    "x83" => "xC6x92",      // LATIN SMALL LETTER F WITH HOOK
    "x84" => "xE2x80x9E",  // DOUBLE LOW-9 QUOTATION MARK
    "x85" => "xE2x80xA6",  // HORIZONTAL ELLIPSIS
    "x86" => "xE2x80xA0",  // DAGGER
    "x87" => "xE2x80xA1",  // DOUBLE DAGGER
    "x88" => "xCBx86",      // MODIFIER LETTER CIRCUMFLEX ACCENT
    "x89" => "xE2x80xB0",  // PER MILLE SIGN
    "x8A" => "xC5xA0",      // LATIN CAPITAL LETTER S WITH CARON
    "x8B" => "xE2x80xB9",  // SINGLE LEFT-POINTING ANGLE QUOTATION MARK
    "x8C" => "xC5x92",      // LATIN CAPITAL LIGATURE OE
    "x8E" => "xC5xBD",      // LATIN CAPITAL LETTER Z WITH CARON
    "x91" => "xE2x80x98",  // LEFT SINGLE QUOTATION MARK
    "x92" => "xE2x80x99",  // RIGHT SINGLE QUOTATION MARK
    "x93" => "xE2x80x9C",  // LEFT DOUBLE QUOTATION MARK
    "x94" => "xE2x80x9D",  // RIGHT DOUBLE QUOTATION MARK
    "x95" => "xE2x80xA2",  // BULLET
    "x96" => "xE2x80x93",  // EN DASH
    "x97" => "xE2x80x94",  // EM DASH
    "x98" => "xCBx9C",      // SMALL TILDE
    "x99" => "xE2x84xA2",  // TRADE MARK SIGN
    "x9A" => "xC5xA1",      // LATIN SMALL LETTER S WITH CARON
    "x9B" => "xE2x80xBA",  // SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
    "x9C" => "xC5x93",      // LATIN SMALL LIGATURE OE
    "x9E" => "xC5xBE",      // LATIN SMALL LETTER Z WITH CARON
    "x9F" => "xC5xB8"       // LATIN CAPITAL LETTER Y WITH DIAERESIS
  );
  foreach($cp1252_map as $k=>$v){
    $byte_map[$k]=$v;
  }
}

function fix_latin($instr){
  if(mb_check_encoding($instr,'UTF-8'))return $instr; // no need for the rest if it's all valid UTF-8 already
  global $nibble_good_chars,$byte_map;
  $outstr='';
  $char='';
  $rest='';
  while((strlen($instr))>0){
    if(1==preg_match($nibble_good_chars,$input,$match)){
      $char=$match[1];
      $rest=$match[2];
      $outstr.=$char;
    }elseif(1==preg_match('@^(.)(.*)$@s',$input,$match)){
      $char=$match[1];
      $rest=$match[2];
      $outstr.=$byte_map[$char];
    }
    $instr=$rest;
  }
  return $outstr;
}

$byte_map=array();
init_byte_map();
$ascii_char='[x00-x7F]';
$cont_byte='[x80-xBF]';
$utf8_2='[xC0-xDF]'.$cont_byte;
$utf8_3='[xE0-xEF]'.$cont_byte.'{2}';
$utf8_4='[xF0-xF7]'.$cont_byte.'{3}';
$utf8_5='[xF8-xFB]'.$cont_byte.'{4}';
$nibble_good_chars = "@^($ascii_char+|$utf8_2|$utf8_3|$utf8_4|$utf8_5)(.*)$@s";

?>

Then just call fix_latin wherever you need it.


PHP Home | All PHP Tutorials | Latest PHP Tutorials

Sponsored Links

If this tutorial doesn't answer your question, or you have a specific question, just ask an expert here. Post your question to get a direct answer.



Bookmark and Share

Comments(0)


Be the first one to add a comment

Your name (required):


Your email(required, will not be shown to the public):


Your sites URL (optional):


Your comments:



More Tutorials by squeegee
Perl's Encoding::FixLatin equivalent in PHP

More Tutorials in PHP
PHP code to import from CSV file to MySQL
PHP code to write to a CSV file from MySQL query
PHP code to write to a CSV file for Microsoft Applications
Convert XML to CSV in PHP
Password must include both numeric and alphabetic characters - Magento
PHP file upload (Large Files)
PHP file upload prompts authentication for anonymous users
PHP file upload with IIS on windows XP/2000 etc
Error: Length parameter must be greater than 0
Multiple File Upload in PHP using IFRAME
Resume or Pause File Uploads in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Handling file locks in PHP
HTML table output using Nested for loops in PHP
Count occurrences of a character in a String in PHP

More Latest News
Most Viewed Articles (in PHP )
Exception in module wampmanager.exe at 000F15A0 in Windows 8
isset() function in PHP
Handling BLOB in PHP and MySQL
Reading Cookie Values in PHP
Parent: child process exited with status 3221225477 -- Restarting
Cannot load /usr/local/apache/libexec/libphp4.so into server: ld.so.1:......
Get the next working day in PHP
Return multiple values from a function in PHP
History and origin of PHP
.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable
superglobals in PHP
public, protected, and private Properties in PHP
Reading contents of a File into a String in PHP
Reading word by word from a file in PHP
Appending One Array to Another in PHP
Most Emailed Articles (in PHP)
PHP code to write to a CSV file for Microsoft Applications
PHP code to write to a CSV file from MySQL query
PHP code to import from CSV file to MySQL
Convert XML to CSV in PHP
Using PEAR::Crypt_HMAC in PHP
Password must include both numeric and alphabetic characters - Magento
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
Resume or Pause File Uploads in PHP
Comparison operators in PHP
Extract files from a .zip file using PHP
Appending One Array to Another in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
isset() function in PHP
do...while Loops in PHP
Counting Lines, Paragraphs, or Records in a File using pc_split_paragraphs() in PHP