gather.cgi Annotated Source

This CGI both collects information into the gather file and displays the results of the nightly batch script.

Main Definitions

use strict;
use CGI;
use File::stat;
use lib '/www/lib', '/usr/local/www/lib';
require '';
require './';
require '';

The page we display is based on a template, and we wish to process that template with chtml. We also use our own library to translate a browser User-Agent string into the name of the browser and operating system of the client system. The last library we load is used to make sure that the files to which we write are not also opened by another process.

sub grindtmpl ($$$$);
sub grindtab ($$$);
sub abort ($$);

These are subroutine templates so Perl knows what subroutines we're using, as well as how many arguments each takes.

Variable definitions

my $cgi = new CGI;
my $imgwid = 400;        # Maximum width the bars will be

We are using Perl's library, and we define a object. We also use a variable to determine how wide we would like the bars to be for the graph.

my $tmpl = $ENV{'SCRIPT_FILENAME'};
$tmpl =~ s{[^/]*$}{gather.tmpl};
my $data = $tmpl;
$data =~ s{[^/]*$}{};
my $file = chtml_procfile ($tmpl);

We require that the output template and data file be in the same directory as this cgi, so we create paths to those files based on the path to this script. We then process our template with chtml and save the resulting filename.

Reading Template

abort ($cgi, "Couldn't open template") unless open F, "<$file";
my $text = undef;
while (<F>) {
    $text .= $_;
close F;

We then read the template and save it for later use. Note we're assuming that the template is short enough to fit into memory.

Processing Data File

abort ($cgi, "Couldn't get data file") unless open F, "<$data";
my $ab = <F>;        # all browser
my $ap = <F>;        # all platform
my $wb = <F>;        # week browser
my $wp = <F>;        # week platform
close F;

Next we read the data file. We only need the first four lines, which contains the browser and platform counts over the history of the data file and for the past week. The format of the data file is described in detail in the batch process annotated source.

# First we fill in the tables with everything except the usage bars.
# We want the length of the bars to be proportional to usage, and the
# maximum bar length should be the same as the maximum single usage
# entry.  Since we won't know the maximum until we actually generate
# the code for the tables, we put in placeholders for this first pass.
my $max = 0;
$text =~ s{<!--GATH all-->(.*)<!--GATH /all-->}
            {grindtmpl ($1, $ab, $ap, \$max)}es;
$text =~ s{<!--GATH week-->(.*)<!--GATH /week-->}
            {grindtmpl ($1, $wb, $wp, \$max)}es;

The template is designed with two major blocks of code, one to display the statistics for all accesses, and the other to display statistics for the past week's accesses.

Since we want to make sure that the maximum bar length is not greater than the variable we set above, we pass a scalar reference to keep track of the maximum value.

# Now that we know what the maximum value is, we can compute the bar lengths
$text =~ s{<!--GATH usage (\d+)-->}
            {'<img src="1pix.gif" width="' . int ($imgwid * $1 / $max) .
                '" height="10">'}ge;

Now that we know the maximum value, we can insert the code for all the graph bars, appropriately scaled.

Saving Data to gather File

# Save the user agent string.  Note we only do so if we can lock the file!
my $ua = $ENV{'HTTP_USER_AGENT'};
my $gfile = $ENV{'SERVER_GATHER'} . '/webinfo_case_gather';
my $warnstr = '';

We save away the current user agent so we can include it in the statistics, as well as showing the user what we believe the client browser and operating system to be.

unless (lockfile ($gfile)) {
    $warnstr = " (but couldn't save results - stray lock)";
} else {

If we couldn't obtain a lock on the gather file, we warn about that and continue.

    # The process which will read the data from the gather directories
    # won't be able to clear the input file, so we use a timestamp created
    # by the data-reading process.  We clear the gather file if it was
    # read more recently than last written.
    my $gfilewhen = "$gfile.READ";

If we're able to obtain a lock on the gather file, we proceed. First, we define the name of the status file that the batch process uses to signal when it last read the gather file.

Clearing out gather file

    if (-f $gfile && -f $gfilewhen) {        # no gather file?  already empty!
        my $filest = stat $gfile;
        my $whenst = stat $gfilewhen;
        unlink $gfile if $filest->mtime < $whenst->mtime;

We test if both the gather file and the batch status file exist. If the gather file doesn't exist, we don't need to test the batch status file, since there's no data anyway. If there's no batch status file, the batch script has never run, so again we don't need to empty the gather file. However, if both exist and the gather file is older than the batch status file, then we know the gather file has been read and it can be removed.

Writing the data

    if (open F, ">>$gfile") {
        print F $ua, "\n";
        close F;
    } else {
        $warnstr = " (but couldn't save results - permission denied)";
    unlockfile ($gfile);

Next we append our user agent string to the gather file. However, if we can't open the file, we don't write the information and warn the user.

Displaying Browser Information

# Now figure out what this user is using
my ($browser, $platform) = whatbrowser ($ua);
$text =~ s/<!--GATH browser-->/$browser/;
$text =~ s/<!--GATH platform-->/$platform $warnstr/;

We then substitute the browser and operating system strings into the template.

print $cgi->header ('-content-length' => length $text), $text;

After we're done, we send the information back to the client. Since we're able to compute the length of the text, we pass that along too (this helps the client since it knows how much data will arrive, and it can give that feedback to the user; knowing the size also enables persistent connections to properly work.)

Processing the Template

sub grindtmpl ($$$$) {
    my ($tmpl, $br, $pl, $max) = @_;
    $tmpl =~ s{<!--GATH btmpl-->(.*)<!--GATH /btmpl-->}
                {grindtab ($1, $br, $max)}es;
    $tmpl =~ s{<!--GATH ptmpl-->(.*)<!--GATH /ptmpl-->}
                {grindtab ($1, $pl, $max)}es;

This function is called for each main section (one for all data, one for the past week.) Each main section is split into a plot for browser usage, and another for operating system. After substituting the appropriate data, the full template is returned. Note that we pass the maximum value variable directly to the grindtab() subroutine.

Processing individual tables

sub grindtab ($$$) {
    my ($tmpl, $datain, $max) = @_;
    chomp $datain;
    my %data = split /\|/, $datain;

This subroutine generates the table for one particular graph. We know that each data line consists of key and value pairs each separated with a vertical bar. If we create an array using the vertical bar as a separator, we can create a hash array directly from that array.

Computing sum and maximum

    my $sum = 0;
    foreach my $item (values %data) {
        $sum += $item;
        $$max = $item if $item > $$max;

We add the totals for this complete chart, as well as keep track of the maximum value.

Processing each entry

    my $out = undef;
    foreach my $item (sort {$data{$b} <=> $data{$a}} keys %data) {

In order to show the entries sorted by usage, we create an array of the names and use the sort function to arrange them. We then take this list and iterate over it with the foreach command.

        my $t = $tmpl;
        $t =~ s/<!--GATH tag-->/$item/;

Since we're going to take the template and change it to insert the data for the current entry, we need to make a local copy. The first thing we do is insert the item's name for this entry.

        my $val = $data{$item};
        my $cnt = $val * 100.0 / $sum;
        $cnt .= '.0' unless $cnt =~ s/(\.\d)(.*)/$1/;

We then get the value for the current item and figure out the percentage of this item out of the total we computed earlier. If the value is an integer, make sure we display that number with one decimal place. If the number isn't an integer, make sure to truncate to one decimal place.

        $t =~ s/<!--GATH count-->/$val ($cnt\%)/;
        $t =~ s/<!--GATH usage-->/<!--GATH usage $val-->/;
        $out .= $t;

After the percentage is computed, we insert it and the total value into our template. We also save away the total value into the template so we can use it to compute the graph's bar length at a later time.

Utility routines

sub abort ($$) {
    my ($cgi, $str) = @_;
    my $data = $cgi->start_html (-title => 'Error') .
                $cgi->h1 ('Error') . $str . $cgi->end_html . "\n";
    print $cgi->header ('-content-length' => length $data), $data;
    exit 1;

If we need to display an error, we want to make sure that we display the error with correct CGI syntax so the server doesn't generate an error. We also make sure to send the length of the content so persistent connections will still work.