Reading word by word from a file in PHP

By: David Sklar Emailed: 1743 times Printed: 2431 times    

Latest comments
By: rohit kumar - how this program is work
By: Kirti - Hi..thx for the hadoop in
By: Spijker - I have altered the code a
By: ali mohammed - why we use the java in ne
By: ali mohammed - why we use the java in ne
By: mizhelle - when I exported the data
By: raul - no output as well, i'm ge
By: Rajesh - thanx very much...
By: Suindu De - Suppose we are executing

You want to do something with every word in a file. Read in each line with fgets(), separate the line into words, and process each word:

$fh = fopen('great-american-novel.txt','r') or die($php_errormsg);
while (! feof($fh)) {
    if ($s = fgets($fh,1048576)) {
        $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
        // process words
    }
}
fclose($fh) or die($php_errormsg);

Here's how to calculate average word length in a file:

$word_count = $word_length = 0;

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    if ($s = fgets($fh,1048576)) {
      $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
      foreach ($words as $word) {
        $word_count++;
        $word_length += strlen($word);
      }
    }
  }
}

print sprintf("The average word length over %d words is %.02f characters.",
              $word_count,
              $word_length/$word_count);

Processing every word proceeds differently depending on how "word" is defined. The code in this recipe uses the Perl-compatible regular-expression engine's \s whitespace metacharacter, which includes space, tab, newline, carriage return, and formfeed. Code sample above breaks apart a line into words by splitting on a space, which is useful in that recipe because the words have to be rejoined with spaces. The Perl-compatible engine also has a word-boundary assertion (\b) that matches between a word character (alphanumeric) and a nonword character (anything else). Using \b instead of \s to delimit words most noticeably treats differently words with embedded punctuation. The term 6 o'clock is two words when split by whitespace (6 and o'clock); it's four words when split by word boundaries (6, o, ', and clock).


PHP Home | All PHP Tutorials | Latest PHP Tutorials

Sponsored Links

If this tutorial doesn't answer your question, or you have a specific question, just ask an expert here. Post your question to get a direct answer.



Bookmark and Share

Comments(1)


1. View Comment

Hi
Thankyou for your code. Its good.


View Tutorial          By: Rajeshkumar at 2010-01-29 22:34:17

Your name (required):


Your email(required, will not be shown to the public):


Your sites URL (optional):


Your comments:



More Tutorials by David Sklar
Find Difference between two dates in PHP
Reading .CSV file in PHP
Appending One Array to Another in PHP
Removing Duplicate Elements from an Array in PHP
Sorting an Array in PHP
Iterating Through an Array in PHP
Password protecting a page in PHP
Deleting Cookies in PHP
Reading Cookie Values in PHP
Setting cookies in PHP
Encrypting and decrypting in PHP
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
Using Text-File Databases in PHP
Upload and Download files with FTP in PHP
Extract files from a .zip file using PHP

More Tutorials in PHP
PHP code to import from CSV file to MySQL
PHP code to write to a CSV file from MySQL query
PHP code to write to a CSV file for Microsoft Applications
Convert XML to CSV in PHP
Password must include both numeric and alphabetic characters - Magento
PHP file upload (Large Files)
PHP file upload prompts authentication for anonymous users
PHP file upload with IIS on windows XP/2000 etc
Error: Length parameter must be greater than 0
Multiple File Upload in PHP using IFRAME
Resume or Pause File Uploads in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Handling file locks in PHP
HTML table output using Nested for loops in PHP
Count occurrences of a character in a String in PHP

More Latest News
Most Viewed Articles (in PHP )
isset() function in PHP
PHP file upload (Large Files)
Parent: child process exited with status 3221225477 -- Restarting
Password protecting a page in PHP
PHP code to write to a CSV file from MySQL query
History and origin of PHP
Handling BLOB in PHP and MySQL
preg_split() and explode() in PHP
Reading word by word from a file in PHP
Encrypting and decrypting in PHP
Convert a hex string into a 32-bit IEEE 754 float number in PHP
Using Sessions in PHP
Upload and Download files with FTP in PHP
Using Text file as database in PHP
A Basic Example using PHP in AWS (Amazon Web Services)
Most Emailed Articles (in PHP)
Setting up PHP in Windows 2003 Server IIS7, and WinXP 64
Count occurrences of a character in a String in PHP
The new keyword and constructors in PHP
Execution Lifetime of a PHP script
Counting Lines, Paragraphs, or Records in a File using pc_split_paragraphs() in PHP
A Basic Example using PHP in AWS (Amazon Web Services)
Building a Video Sharing Site using PHP in AWS
Function to return number of digits of an integer in PHP
Floating point precision in PHP
Retrieve multiple rows from mysql and automatically create a table in PHP
The Object (compound) Type in PHP
History and origin of PHP
Different versions of PHP - History and evolution of PHP
Traversing Arrays Using foreach in PHP
public, protected, and private Methods in PHP