Counting Lines, Paragraphs, or Records in a File using pc_split_paragraphs() in PHP

By: David Sklar Emailed: 1729 times Printed: 2341 times    

Latest comments
By: rohit kumar - how this program is work
By: Kirti - Hi..thx for the hadoop in
By: Spijker - I have altered the code a
By: ali mohammed - why we use the java in ne
By: ali mohammed - why we use the java in ne
By: mizhelle - when I exported the data
By: raul - no output as well, i'm ge
By: Rajesh - thanx very much...
By: Suindu De - Suppose we are executing

You want to count the number of lines, paragraphs, or records in a file.To count lines, use fgets(). Because it reads a line at a time, you can count the number of times it's called before reaching the end of a file:

$lines = 0;

if ($fh = fopen('orders.txt','r')) {
  while (! feof($fh)) {
    if (fgets($fh,1048576)) {
      $lines++;
    }
  }
}
print $lines;

To count paragraphs, increment the counter only when you read a blank line:

$paragraphs = 0;

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    $s = fgets($fh,1048576);
    if (("\n" == $s) || ("\r\n" == $s)) {
      $paragraphs++;
    }
  }
}
print $paragraphs;

To count records, increment the counter only when the line read contains just the record separator and whitespace:

$records = 0;
$record_separator = '--end--';

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    $s = rtrim(fgets($fh,1048576));
    if ($s == $record_separator) {
      $records++;
    }
  }
}
print $records;

In the line counter, $lines is incremented only if fgets( ) returns a true value. As fgets( ) moves through the file, it returns each line it retrieves. When it reaches the last line, it returns false, so $lines doesn't get incorrectly incremented. Because EOF has been reached on the file, feof( ) returns true, and the while loop ends.

This paragraph counter works fine on simple text but may produce unexpected results when presented with a long string of blank lines or a file without two consecutive linebreaks. These problems can be remedied with functions based on preg_split( ). If the file is small and can be read into memory, use the pc_split_paragraphs( ) function shown in example below. This function returns an array containing each paragraph in the file.

pc_split_paragraphs( )
function pc_split_paragraphs($file,$rs="\r?\n") {
    $text = join('',file($file));
    $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text,-1,
                          PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
    return $matches;
}

The contents of the file are broken on two or more consecutive newlines and returned in the $matches array. The default record-separation regular expression, \r?\n, matches both Windows and Unix linebreaks. If the file is too big to read into memory at once, use the pc_split_paragraphs_largefile( )function shown in example below, which reads the file in 4K chunks.

pc_split_paragraphs_largefile( )
function pc_split_paragraphs_largefile($file,$rs="\r?\n") {
    global $php_errormsg;

    $unmatched_text = '';
    $paragraphs = array();

    $fh = fopen($file,'r') or die($php_errormsg);

    while(! feof($fh)) {
        $s = fread($fh,4096) or die($php_errormsg);
        $text_to_split = $unmatched_text . $s;

        $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text_to_split,-1,
                              PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);

        // if the last chunk doesn't end with two record separators, save it
         * to prepend to the next section that gets read 
        $last_match = $matches[count($matches)-1];
        if (! preg_match("/$rs$rs\$/",$last_match)) {
            $unmatched_text = $last_match;
            array_pop($matches);
        } else {
            $unmatched_text = '';
        }
        
        $paragraphs = array_merge($paragraphs,$matches);
    }
    
    // after reading all sections, if there is a final chunk that doesn't
     * end with the record separator, count it as a paragraph 
    if ($unmatched_text) {
        $paragraphs[] = $unmatched_text;
    }
    return $paragraphs;
}

This function uses the same regular expression as pc_split_paragraphs( ) to split the file into paragraphs. When it finds a paragraph end in a chunk read from the file, it saves the rest of the text in the chunk in $unmatched_text and prepends it to the next chunk read. This includes the unmatched text as the beginning of the next paragraph in the file.


PHP Home | All PHP Tutorials | Latest PHP Tutorials

Sponsored Links

If this tutorial doesn't answer your question, or you have a specific question, just ask an expert here. Post your question to get a direct answer.



Bookmark and Share

Comments(1)


1. View Comment

could you please send me a complete appointment record code please!!!.
thank you so much..
jay


View Tutorial          By: Jaylon aron at 2010-02-21 02:22:57

Your name (required):


Your email(required, will not be shown to the public):


Your sites URL (optional):


Your comments:



More Tutorials by David Sklar
Find Difference between two dates in PHP
Reading .CSV file in PHP
Appending One Array to Another in PHP
Removing Duplicate Elements from an Array in PHP
Sorting an Array in PHP
Iterating Through an Array in PHP
Password protecting a page in PHP
Deleting Cookies in PHP
Reading Cookie Values in PHP
Setting cookies in PHP
Encrypting and decrypting in PHP
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
Using Text-File Databases in PHP
Upload and Download files with FTP in PHP
Extract files from a .zip file using PHP

More Tutorials in PHP
PHP code to import from CSV file to MySQL
PHP code to write to a CSV file from MySQL query
PHP code to write to a CSV file for Microsoft Applications
Convert XML to CSV in PHP
Password must include both numeric and alphabetic characters - Magento
PHP file upload (Large Files)
PHP file upload prompts authentication for anonymous users
PHP file upload with IIS on windows XP/2000 etc
Error: Length parameter must be greater than 0
Multiple File Upload in PHP using IFRAME
Resume or Pause File Uploads in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Handling file locks in PHP
HTML table output using Nested for loops in PHP
Count occurrences of a character in a String in PHP

More Latest News
Most Viewed Articles (in PHP )
History and origin of PHP
preg_split() and explode() in PHP
Get the first and last day of the month in PHP
func_get_arg() and func_get_args() functions in PHP
isset() function in PHP
Handling BLOB in PHP and MySQL
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
XMLRPC for PHP - A simple client and server program
Decrypting files using GnuPG (GPG) via PHP
Where does the PHP run?
Function to sort array by elements and count of element in PHP
Function to return number of digits of an integer in PHP
Error: Length parameter must be greater than 0
Convert XML to CSV in PHP
Getting Started with PHP
Most Emailed Articles (in PHP)
PHP code to import from CSV file to MySQL
History and origin of PHP
Setting up PHP in Windows 2003 Server IIS7, and WinXP 64
Convert XML to CSV in PHP
Getting Started with PHP
Using list() in PHP
if Statements in PHP
The new keyword and constructors in PHP
public, protected, and private Methods in PHP
preg_split() and explode() in PHP
Opening a Remote File in PHP
Removing Duplicate Elements from an Array in PHP
PHP 5.1.4 INSTALLATION on Solaris 9 (Sparc)
Installing PHP with nginx-server under windows
PHP pages does not display in IIS 6 with Windows 2003