Table of Contents Previous Article Next Article


Specialized Cluster Serves the UW Home Page


By Frank Fujimoto, C&C Software Engineer

C&C operates a specialized cluster dedicated to publishing official UW information (including text, images, sounds, and software) via a variety of different protocols, or methods of distribution. Among other things, this includes support for the University of Washington's home page on the World Wide Web.

THE INFO CLUSTER

The concept behind the ``Info'' cluster is to have a single master copy of data, regardless of how many protocols and servers are used to distribute the information. One machine houses the master copies of files (Web pages, for example), and changes to these master copies are propagated to as many servers as are needed to meet performance, availability, and protocol requirements.

This strategy for managing online data is analogous to the ``reference system'' approach we use for keeping track of software and configuration files on the hundreds of computers C&C manages. This software reference system stores master copies of system programs and configuration files used for updating and verifying that correct versions are in place.

There are, however, some differences between managing system files and managing data files to be exported to our constituency. Because there are many people involved in preparing data (content) for the Info servers, certain constraints must be in place. Potential interference is eliminated by granting each information provider access only to the information for which they are responsible, but not to other people's data.

The different protocols embraced by the Info server concept include:

Arguably, the most important protocol for publishing online information is currently HTTP, which is used for the World Wide Web. Let's now focus on this special portion of the Info cluster.

THE WWW PORTION

World Wide Web service for the UW home page and related pages is provided by a special set of machines named www.washington.edu. This system is different from the clusters serving the Uniform Access machines described in the previous article in several key ways.

Instead of being a Homer-like time-sharing cluster for running interactive programs (Pine and UWIN being two examples), www.washington.edu (www for short) is a collection of World Wide Web servers that do nothing but export information via HTTP. Developers of Web documents on www do not have direct access to these computers, but they do have access to the master ``Info'' computer that distributes information to the www servers. One computer in this special cluster provides compute-intensive pre-processing and post-processing. There also is a log-searching mechanism so people developing pages for the Web can view access and error logs.

DESIGN GOALS

The design of the Info cluster, in which distinct functions are allocated to specific dedicated machines and some computers only serve World Wide Web (HTTP) requests, contributes to the following:

Other features deemed important in the design of the www portion of the Info cluster include the following:

ARCHITECTURE

The Info cluster has similarities to a Uniform Access cluster, but is specifically designed to serve large amounts of information over the Internet. Its components are shown in the accompanying diagram. Four types of servers are involved.

Schematic of 
the Info cluster showing compute, master file,
production, and back-room servers and the protocols.

The Info cluster has four types of servers and differs from a Uniform Access cluster in that it is designed to export large amounts of data, including images and sound.
The master file server. There is a dedicated file server for this cluster. All master copies of pages and scripts used by www.washington.edu are stored here, as are files exported by ftp.cac.washington.edu (via FTP) and files exported by pine.cac.washington.edu (via IMAP). Rather than acting like a file server in a Uniform Access cluster where files are meant to be changed by only one user (the owner of that directory), files on this master file server can be changed by the group of people responsible for that directory. Providers of the information in these directories have access from several of our departmental computing clusters.

Production servers. There are different production servers for different protocols. For example, World Wide Web browsers connect directly to the production web servers. These servers are analogous to the compute servers in a Uniform Access cluster in that they are the machines with which the end user directly interacts.

Back-room servers. These servers, intended primarily to support the www service, export large collections of data (such as PhotoCD images) as well as sound and video clips. They are named back-room servers because the end user client does not directly connect to them. In addition, these computers are not in the Domain Name Service pool for the production servers (e.g., www.washington.edu). Although similar to application servers for a Uniform Access cluster, back-room servers are designed for exporting large amounts of data rather than for supporting a large number of simultaneous users.

Compute server. The compute server for the Info cluster handles the compute-intensive pre-processing needed to generate pages (such as the cambot images) as well as any post-processing that may be necessary to gather data collected by all the servers in the cluster (such as the UW home page statistics). As with the back-room servers, this computer's function is similar to an application server for a Uniform Access cluster. In addition, this computer is the system used by developers to view information before it is put into production.

Photograph from 
the cambot  of a rainy Red Square.

Rain clings to the lens of the camera robot (or cambot) that captures this ``almost live'' view of red square for the UW home page on the Web. This image is generated by a computer in the www portion of the Info cluster that does the compute-intensive data pre-processing necessary to bring up an updated image about every minute. Click on the photo on the UW home page for more information.

ADVANTAGES OF SPECIALIZED CLUSTER SERVERS

All information sent out by any of the production and back-room servers resides on disks that are local to that server. This provides both speed and uninterrupted service. The systems would be more vulnerable to interruptions if a networked file server were used.

Master copies of files and scripts used by the production servers are on the file server. Once a night all changes are sent out from the file server to the production servers. Developers can also install individual changes on an as-needed basis. Because the back-room servers contain large amounts of data, there is only one copy of these files and they are local to the back-room server that provides the information.

The master file server for the Info cluster is used for data from ftp.cac.washington.edu, as well the www service and some other specialized information services. Accordingly, master copies of the ftp data are stored on this file server and FTP access is provided by the ftp production servers. The same Info cluster compute server that provides image processing for the www service also performs tasks such as automatically generating index files for ftp.cac.washington.edu.

SPECIAL HTTP ENVIRONMENTS

There are three HTTP environments in the www cluster, each with a particular function:

The development and evaluation environments are, by design, not visible to the general public. Most information on www.washington.edu is created in the development environment and then is moved to the production environment, but there are some services that use all three environments.

Within each environment there are several access trees that can present different information depending on the origin of the client. For example, if a developer wants a set of services to be visible only to the UW community--not to the entire Internet--the information would go in the UW access tree. Any requests directed to the default HTTP port are automatically granted a particular level of access depending on the originating computer.

The HTTP daemon has been enhanced to accommodate the different environments and access trees, as well as other features. These enhancements include:

FUTURE DIRECTIONS

The design of the Info cluster has been an evolutionary process driven by the needs of users and developers. Available hardware and software has also had a large impact on how the cluster and its services have grown.

In addition to providing an ever-growing amount of information, one of the future directions of the Info cluster is to provide more types of information. For example, the services provided by UWIN are developed in an environment that is similar to the Info cluster. Merging UWIN into the Info cluster would give producers of content for UWIN better access to tools to develop their information, and would also simplify management of the UWIN cluster.

Another area of development is that of secure transactions, which will allow confidential information to be visible only by the individuals who should have access to that data.

As high-speed networking technology becomes more widespread, the production www servers will move away from their current 10 megabit Ethernet connections toward a faster network link. The Info cluster will continue to grow and to evolve as required to provide users with the information they need, and to give developers the tools and support necessary to make that information available.

Frank Fujimoto worked in systems programming and administration at Stanford and Hewlett-Packard for about seven years before coming to the UW in 1991. His work managing special purpose clusters for C&C involves system software configuration and administration and helping the www cluster developers.

Table of Contents Previous Article Next Article


University of Washington Computing & Communications
Windows on Computing, No. 18, Spring 1996
newsltr@cac.washington.edu