File I/O

Filehandles

A filehandle is the name in A Perl program for I/O connection between your Perl process and the outside world. Filehandles are named like any other Perl identifiers, but they do not have any prefix character. To not to confuse them with reserved words, it's recommended to use uppercase letters in the name of a filehandle.

To open a file we need to use function open. This function takes two arguments:

  • a filehandle to create
  • and a name of the file:
    open(MYDATA, "data.txt");
    open NEWDATA, "<moredata.txt";
    open RESULT, ">output.dat";
    open(LOG, ">>myprog.log");
    
    The sign in front of the file name specifies how we want the file to be open:
  • < - open for reading
  • > - create for writing
  • >> - open for appending
    If there is no sign (like in the first example), then file will be opened for reading.

    We can reuse filehandles by opening another file using already opened filehandle. If we do this, Perl automatically closes the file currently opened with this filehandle. If we need to close a filehandle manually we can use function close. This function takes only one argument - the filehandle to be closed:

    close MYDATA;
    close(RESULT);
    

    If Perl cannot open the file we requested it returns a false value, true otherwise. So, we can write code like this:

    my $opened = open(MYDATA, "data.txt");
    if( ! $opened ){
       print "Error!";
       exit 1;
    }
    
    or like this (using backward notation):
    unless( open LOGFILE, ">>myprog.log" ){
      die "Error";
    }
    
    or even shorter:
    open(RESULT, ">output.dat") or die("Error");
    

    Using filehandles

    Reading from files

    To read one line from an open file we can use the <> operator providing the filehandle for the file:
    $str = <MYDATA>;
    
    Please note that the variable $str will contain the complete line from the file including the new line symbol at the end. To delete the new line symbol we can use function chomp. The following example opens file data.txt, reads all the lines, and prints each line and the line number:
    use strict;
    open(MYDATA, "data.txt") or 
      die("Error: cannot open file 'data.txt'\n");
    my $line;
    my $lnum = 1;
    while( $line = <MYDATA> ){
      chomp($line);
      print "$lnum: $line\n";
      $lnum++;
    }
    close MYDATA;
    

    We also can read the complete file into an array. The line below shows how to do that:

    @lines = <MYDATA>;
    
    After this command each element of the array $lines contains exactly one line of the file. Remember that each line still has the new line symbol at the end. To remove these symbols use function chomp. Surprisingly, we can use this function in list context, or, in the other words, we can apply this function to a whole array. If applied to an array, chomp removes the trailing new line symbols from each element of the array. The following example illustrates this:
    open(MYDATA, $ARGV[0]) or die("Error: cannot open file '$ARGV[0]'\n");
    my @lines = <MYDATA>;
    chomp @lines;
    print "@lines";
    close MYDATA;
    

    Printing to a file

    By default the print function we already familiar with and function printf print to the standard output. We can change this by providing a filehandle as the first argument. This example, shows how to read a file assigned by the first argument, sort it, and print the result to the file specified by the second argument:
    if( $#ARGV < 1 ){
      die("Not enough arguments\n");
    }
    open(INP, "<$ARGV[0]")  or die("Cannot open file '$ARGV[0]' for reading\n");
    open(OUTP, ">$ARGV[1]") or die("Cannot open file '$ARGV[1]' for writing\n");
    my @content = <INP>;
    print OUTP sort(@content); 
    close INP;
    close OUTP;
    

    Standard filehandles

    Perl has three predefined filehandles:
  • STDOUT - for standard output
  • STDIN - for standard input
  • STDERR - for error messages
    Thus, print and printf functions by default use STDOUT filehandle to print to. Knowing the filehandle for standard input we can read everything user types just by using STDIN like usual file:
    print "Please enter the file name: ";
    chomp( $fname = <STDIN> );
    printf(STDERR "You entered '%s'\n", $fname); 
    
    The last line prints the confirmation to the standard output for error messages, which is by default the same as standard output.

    Default values

    The diamond operator <> has an interesting feature. If there is no filehandle given it takes all command line arguments (elements of @ARGV array) as file names, opens them one by one and reads lines from them. Thus, the following example reads all lines from the files specified as arguments of the script and prints them to the screen:
    while( my $str = <> ){
      print $str;
    }
    
    If there is no arguments in the command line, then the diamond operator <> reads from the standard input, that is, in this situation <> is the same as <STDIN>.

    Perl has an interesting default variable $_. Every time Perl syntax requires a variable and user did not provide one Perl uses this special variable. For example, the following code does exactly the same as the previous example:

    while( <> ){ # reading a line into $_ variable
      print;     # printing $_ variable
    }
    
    Now we'll try to print not all the lines but only those containing the word print:
    while( <> ){ # reading a line into $_ variable
      if( /print/ ){
        print;     # printing $_ variable
      }
    }
    
    The second line of the code if( /print/ ) is actually the same as if( $_ =~ /print/ ). That is, Perl used regular expression search against the default variable $_.