Using the gather Directories

The gather directories are a place that web applications can save information to be collected by a batch process. Note that this does not allow real-time updates; changes from all the web servers can only be seen after the batch process runs.

The host on which the batch process runs has access to all of the gather directories from the different web servers, but each web server can only view its own gather directory. You need to send mail to www-mgmt to get the batch process executed on a regular schedule. While developing the batch process, you can write a short CGI which calls the program on the host wwwdev.cac.washington.edu or wwwudev.cac.washington.edu, which also have access to all gather directories.

An example of how the gather directories can be used is in the gather Directory Case Study.

Uses for the gather Directories

The gather directories are used when an application receives data from users and either needs to publish the data or process the data and publish the results.

Writing to the gather Directories

The web server passes the SERVER_GATHER environment variable to all CGIs. Use this directory to write information you wish to later retrieve from the batch process. Note that there are other applications which also use this directory, so be sure to uniquely name your file. If you have several files, you should create a subdirectory in which you can save them.

You should not assume that the file and/or directories you use have already been created, so your CGI should needs to create them if they do not already exist.

An example of a path you can use (using Perl syntax) is:

"$ENV{'SERVER_GATHER'}/myapp"

Format of Data in gather Directories

You should format the gather file data to speed saving data. If you are gathering short data (perhaps one line per entry) then one possibility is to append each line to the gather file. If you are collecting a large amount of data, or data which is very unstructured, then you can either save the data in a file using unique separator lines, or you can write each entry into an individual file; you should save these files in a gather subdirectory.

Locking Issues

You need to be sure to use file locking for any files you write in the gather directories, since it's possible to have multiple CGIs and even the batch process running at the same time. If the CGI is a Perl script, then you can use the lockfile library. If you created a subdirectory and need to access the whole directory, you can lock the whole directory with the lockfile library.

Batch Processes

The batch process collects data from all the gather directories and writes data to the web directories, which can then get pushed out the next evening. Another possibility is that the batch process summarizes data and then sends a report via email.

Access Restrictions

The design of the www.washington.edu cluster assures the integrity of its web filesystem by assuring that the server runs as a user different than the user which ownes the files. If you have a batch process which needs to modify files in the web directories, then that process must execute as the user which owns those files. However, since that user is different than the user which owns the files in the gather directories, your batch process will not be able to modify or remove the files in the gather directories.

Writing into the Web Directories

Because the data is automatically changed, it's best to have the batch process modify files directly in the production web directories (such as /www/world.) Be sure that the directory in /usr/local/wwwdev or /usr/local/wwwudev uses a .wwwinstrc file to make sure the modified data files to not get removed by a wwwdinst or wwwuinst command. Information about these files can be found in the wwwdinst and wwwuinst documentation.

The reason you should have the batch process write into the production directories is because there is then no need to run wwwdinst or wwwuinst to install those files into production. The files will automatically be pushed out with the nightly push, assuming the batch process completes its work before the nightly push begins.

Reading from gather Directories

The paths to use in your batch process to read information from the gather directories depends on which server you are using. For www.washington.edu, you should use a path such as:

/gather/*/www.cac/myapp

A sample Perl code fragment to process all the files:

foreach my $file in (glob "/gather/*/www.cac/myapp") {
    
do work
}

More detailed code examples are in the gather case study.