Reading word by word from a file in PHP

By: David Sklar Emailed: 1609 times Printed: 2077 times    

Latest comments
By: rohit kumar - how this program is work
By: Kirti - Hi..thx for the hadoop in
By: Spijker - I have altered the code a
By: ali mohammed - why we use the java in ne
By: ali mohammed - why we use the java in ne
By: mizhelle - when I exported the data
By: raul - no output as well, i'm ge
By: Rajesh - thanx very much...
By: Suindu De - Suppose we are executing

You want to do something with every word in a file. Read in each line with fgets(), separate the line into words, and process each word:

$fh = fopen('great-american-novel.txt','r') or die($php_errormsg);
while (! feof($fh)) {
    if ($s = fgets($fh,1048576)) {
        $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
        // process words
    }
}
fclose($fh) or die($php_errormsg);

Here's how to calculate average word length in a file:

$word_count = $word_length = 0;

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    if ($s = fgets($fh,1048576)) {
      $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
      foreach ($words as $word) {
        $word_count++;
        $word_length += strlen($word);
      }
    }
  }
}

print sprintf("The average word length over %d words is %.02f characters.",
              $word_count,
              $word_length/$word_count);

Processing every word proceeds differently depending on how "word" is defined. The code in this recipe uses the Perl-compatible regular-expression engine's \s whitespace metacharacter, which includes space, tab, newline, carriage return, and formfeed. Code sample above breaks apart a line into words by splitting on a space, which is useful in that recipe because the words have to be rejoined with spaces. The Perl-compatible engine also has a word-boundary assertion (\b) that matches between a word character (alphanumeric) and a nonword character (anything else). Using \b instead of \s to delimit words most noticeably treats differently words with embedded punctuation. The term 6 o'clock is two words when split by whitespace (6 and o'clock); it's four words when split by word boundaries (6, o, ', and clock).


PHP Home | All PHP Tutorials | Latest PHP Tutorials

Sponsored Links

If this tutorial doesn't answer your question, or you have a specific question, just ask an expert here. Post your question to get a direct answer.



Bookmark and Share

Comments(1)


1. View Comment

Hi
Thankyou for your code. Its good.


View Tutorial          By: Rajeshkumar at 2010-01-29 22:34:17

Your name (required):


Your email(required, will not be shown to the public):


Your sites URL (optional):


Your comments:



More Tutorials by David Sklar
Find Difference between two dates in PHP
Reading .CSV file in PHP
Appending One Array to Another in PHP
Removing Duplicate Elements from an Array in PHP
Sorting an Array in PHP
Iterating Through an Array in PHP
Password protecting a page in PHP
Deleting Cookies in PHP
Reading Cookie Values in PHP
Setting cookies in PHP
Encrypting and decrypting in PHP
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
Using Text-File Databases in PHP
Upload and Download files with FTP in PHP
Extract files from a .zip file using PHP

More Tutorials in PHP
PHP code to import from CSV file to MySQL
PHP code to write to a CSV file from MySQL query
PHP code to write to a CSV file for Microsoft Applications
Convert XML to CSV in PHP
Password must include both numeric and alphabetic characters - Magento
PHP file upload (Large Files)
PHP file upload prompts authentication for anonymous users
PHP file upload with IIS on windows XP/2000 etc
Error: Length parameter must be greater than 0
Multiple File Upload in PHP using IFRAME
Resume or Pause File Uploads in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Handling file locks in PHP
HTML table output using Nested for loops in PHP
Count occurrences of a character in a String in PHP

More Latest News
Most Viewed Articles (in PHP )
isset() function in PHP
Reading word by word from a file in PHP
Different versions of PHP - History and evolution of PHP
superglobals in PHP
The new keyword and constructors in PHP
Using Text file as database in PHP
is_utf8 in PHP
How to make one else for two ifs in PHP
break out of an if() block in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Error: Length parameter must be greater than 0
Resume or Pause File Uploads in PHP
Password must include both numeric and alphabetic characters - Magento
Convert XML to CSV in PHP
Variables in PHP
Most Emailed Articles (in PHP)
preg_split() and explode() in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Where does the PHP run?
Handling BLOB in PHP and MySQL
Cannot load /usr/local/apache/libexec/libphp4.so into server: ld.so.1:......
Handling file locks in PHP
PHP file upload (Large Files)
Password must include both numeric and alphabetic characters - Magento
PHP code to import from CSV file to MySQL
Convert XML to CSV in PHP
History and origin of PHP
preg_match(), function preg_match_all(), preg_grep() in PHP
Reading word by word from a file in PHP
Upload and Download files with FTP in PHP
Encrypting and decrypting in PHP