IMPORTANT ANNOUNCEMENT

Development and support of  Willow is now discontinued. Willow was removed from production at UW on June 30, 1999.

Willow Technical Report: Architecture

Architecture Overview

Willow is designed to be a general-purpose bibliographic database front-end. It can be configured to act as an interface to virtually any network-accessible text-retrieval database. This flexibility is a direct consequence of Willow's architecture. The architecture is based loosely on the concept of Unix device-drivers. In Unix, any program can communicate with any physical input/output device (such as disks, printers, networks, terminals, keyboards etc.) through a few simple, standardized procedure calls. It does not matter that the physical devices may be wildly different from each other. This is because for each device somebody has written a program called a device-driver. The job of this program is to translate between the simple standardized procedure calls and the instructions that the hardware device expects to receive. Thus an application program can read data from a disk-drive with the exact same lines of code it uses to read from the keyboard, even though the two physical devices are entirely different from each other.

In Willow, instead of device-drivers, we have database-drivers. The Willow program itself does not know anything about the command syntax for any particular database. Instead, it knows how to communicate with a second program, the database driver. It is this program that actually knows how to make a connection to a particular database host, as well as the details of its commands, and how to parse the incoming stream of data.

Unlike Unix device-drivers, Willow and the database-drivers do not communicate via a procedure call interface. Instead, they use message-passing. More specifically, Willow opens a Unix pipe, and then splits off the driver (using standard Unix fork and exec functions). The two programs then send information back and forth in packets over the pipe (It is also possible to run the driver program on a separate computer, and communicate via a socket. See Willow for MS-Windows).

Though a Willow database-driver is hopefully easier to write than a Unix device-driver, it is no trivial undertaking. However, the Z39.50 driver promises to alleviate the need to customize a driver for every unique database. See Z39.50 Driver for details.

Willow Interface Implementation

The Willow interface is a Unix application, built with the Motif tool-kit, which is itself layered on top of the X Windows system. Because it is built with a standard user-interface tool-kit, the interface acts in a standard way. Thus anybody who is familiar with other Motif applications will already be familiar with the basics of manipulating the Willow user interface.

The interface is set-up via the Motif User Interface Language (UIL), rather than in the C language code. Thus it is relatively easy to change the look of the interface without having to touch (or understand) real code. Willow has been successfully ported to a variety of standard Unix workstations including DEC Ultrix, Sun, IBM RS/6000, and Hewlett Packard HP/UX.

When Willow is running there are as many as three separate computers involved. Willow itself is running on a Unix box somewhere, but the user is often not actually sitting in front of that machine. Instead she is using the X Window System to display Willow on her own screen (X-Terminal, other workstation, or PC/Mac running an X server). And the database that she is searching is usually on a third machine. At the UW a typical configuration is: user sitting at an NCD 17c X-Terminal, remotely running Willow on a DECstation 5000/240, connected to a BRS database (BRS Search is the commercial database engine we employ at the UW) running on an IBM RS/6000.

Like all X programs, Willow is event-driven. This means that Willow initially sets up lots of routines to handle all of the X Windows events it is interested in -- button clicks and key-strokes in various areas of the user-interface. After initial set-up, Willow spends most if its time waiting for the X system to pass incoming events to the specified handlers. In Willow's case though, not only are events coming in from the user-interface, but once it is connected to a database, packets are also coming in from the connection to the database-driver. The code to handle the packets is set up much like the code to handle X-events, i.e. a handler is declared for each type of packet that may come in.

Willow knows very little about interacting with databases -- it leaves that to the database drivers. For example, when the user selects a new database, Willow first sets up the pipe, forks the appropriate driver program, and sends a "connect" packet to it. Then Willow sits back and waits for events to come in -- either X-events such as button presses, or packet-events coming in from the driver. Eventually the driver will send a packet saying that the database has been started up, and is ready to search (or there was some specific problem starting it).

Searching is very similar. When the user presses the search button, Willow gathers up all the information the user has typed in, the names of the fields, and the settings of the limit buttons, and ships it all out to the driver. Willow lets the driver format all the information into the appropriate syntax for the current database system. Eventually the driver will send back a packet with some search result-statistics, followed by a stream of packets containing result-titles.

Data Flow in a Willow Session

Database-Driver Implementation

The database-drivers are simple Unix applications. They know nothing at all about X Windows (and consequently are much smaller than Willow itself). The only requirement for a driver is that it understands the protocol that Willow uses. The implementor decides how a driver translates between the Willow protocol and the commands that a given database is expecting. However, Willow comes with a model driver, which with some modification, can probably be used for most bibliographic databases. In an attempt to make the model driver portable, its interactions with the host database system are table-driven. This section describes the implementation of the model driver, which communicates with BRS Search on an IBM RS/6000 running AIX.

The driver communicates with the database host by opening a pipe, and forking a telnet connection to the appropriate machine. Thus any database that is on the Internet can have a driver customized for it. It is certainly possible to write drivers which use other methods to reach non-internet databases (for example by dialing a modem, or releasing carrier pigeons) but there are no current plans to do so.

Like Willow itself, the driver has two sources of incoming events. One is the pipe to Willow, over which packets will periodically appear. The other is the pipe to the telnet process, over which a stream of characters will appear. The driver sends back command strings to the database, mimicking the actions of a human user. In fact, as far as the database is concerned, there is a human user at the other end of the telnet connection (a rather fast typist though).

Protocol packets coming in over the pipe from Willow are taken care of by event-handlers, one for each packet type, just as in Willow itself. The stream of characters from the database on the other hand, is handled by a finite state automaton (FSA). An FSA consists of a set of states, and for each state a set of input strings which might be encountered. Encountering a given input string causes the automata to move to another state (or return to the current state), and send out an output string.

For programming purposes the FSA is converted to a table. Thus at the core of the database driver is an FSA, in table form. The program has a variable called state, which indicates the current row of the table. For each state, there is a list of possible input strings that are currently "interesting". For each input string there is a corresponding output string which should be sent as a response, and a next state (row) to move to.

The driver continually scans the telnet connection, looking to see if the any of the listed input strings for the current state show up. If one does, the driver sends back the appropriate output string, then updates the state variable to the specified next state, and starts scanning again. Thus the command syntax of the host-database is not built into the driver directly, instead it is stored in the table.

Driver Complications

As you might suspect, the driver is actually rather more complicated than it sounds. Incoming packets from Willow can also trigger output strings. For example, search requests obviously cannot be built into the table in advance, since they are dynamically created by the user. In addition, there is another optional item in each row of the table -- a procedure to be called when the input string is discovered. For example, when the driver sees the string which tells it that the search is complete, it will automatically call a routine named in the table which analyzes the text that came back from the database, extracts the search's vital statistics, and sends them back to Willow for display to the user. In fact, there are a number of similar routines for analyzing the data stream, and putting it into a form that Willow is expecting. Some of these data-parsing routines can get quite ugly. These routines will typically all have to be re-written in order to port the driver to a new database.

The biggest problem with the current driver-architecture is that remote databases can change at any time. For example, if the driver is scanning for the prompt "PRESS ANY KEY:", and one day that prompt changes to "PRESS A KEY:" 99% of the human users would never notice, but it would stop the database-driver dead in its tracks. Many databases come with an API (Application Program Interface). If one is available, it would make sense to use that to communicate with the database rather than trying to mimic a human user by sending command-strings. An API could also make the data-parsing much simpler.

Z39.50 Driver

A much better solution to the complexities of writing a custom driver for every database would be if a single driver could speak to a wide variety of different databases using a standard protocol. Z39.50 is an ANSI standard network protocol for bibliographic database access over the Internet. The Massachusetts Institute of Technology has successfully interfaced the UW model Willow driver to the Stanford Z39.50 API library. This driver translates between the Willow/driver protocol and the Z39.50 protocol. Thus Willow is now a full-featured Z39.50 search client, and can talk to a wide variety of Z39.50-speaking databases across the Internet. There is detailed information on using Willow with Z39.50 available.

List Browser Implementation

As described in Browsing Lists of Search Terms, Willow provides a simple-to-use interface for browsing the possible search terms in some database fields. However the implementation of this system is far from simple. Because BRS, the database engine used at UW, does not handle this type of browsing at the speeds that are required, we pre-build the lists. I.e. for each field of each database that we want to provide browsing capability for, we have to extract every existing value for that field, and sort them. There are a large number of lists (as many as five or six per database), they can each be quite long (the list of titles held by the University of Washington Libraries is almost 100 Megabytes), and some of them need to be updated quite frequently. We have also created a centralized network browse-list server which our BRS database driver can talk to. This is a custom text-search engine, optimized for doing extremely fast searches of sorted lists (Using a simple binary search turns out to be plenty fast). Willow makes a browse query on every user keystroke in the List Browser window, and the server returns one screen's worth of the list, centered around the first item in the list which matches the prefix the user has typed. See figure. (Actually, if the user is typing rapidly we wait until a pause to send the request). The user can also move through the list via the scrollbar. Motif is duped into thinking it is scrolling a huge file, when actually only a tiny fraction of the file is there at any time.

Note that the Willow program itself does not know anything about the BRS list-server. A list-browse is just a second type of search that gets sent down the pipe to the database-driver. It is the BRS driver which interacts with our list-server.

In the simplest case, a browse list consists of a simple list of homogenous items, i.e. just the subject terms. However Willow and our list-server can also handle multi-field lists, with a variety of display formats (do not confuse browse-list fields with the fields of the actual database). For example some of our lists also include a second field which indicates the number of records in which this term appears, i.e. how many hits you will get if you actually search this item. See figure. Sometimes the text we want to display to the user is not directly searchable in our database. For example, in our list of titles in the Libraries Catalog, we want the user to be able to browse through entries such as "Music (vocal & instrumental) in the Works of Kalidasa", but as you can imagine, inputting a string with all those special characters would not go over well with the database engine. Instead, we want to search the string "Music-vocal-instrumental-in-the-works-of-Kalidasa" which has no special characters, and is pre-bound at database load-time so that the retrieval is very fast. In order to support this, certain browse-list fields are invisible. So what the user sees in the List Browser is not necessarily what gets transferred back to the Search window.

Other parameters of the list-server indicate whether secondary fields should be displayed on the same line as the main field, or whether every field should get its own line, where and how the list field-names should be displayed, and whether the text should be displayed right or left justified.

List Browsing and the Z39.50 Driver

Our list-server is definitely a home-brew system, and is not likely to be supported by non-UW database servers. However, the Z39.50 driver provides a more generic solution to the list-browse problem, via that protocol's SCAN functionality. When Willow is connected to a Z39.50 database, it sends the exact same list-browse requests down the pipe to the driver, but instead of forwarding the requests to the BRS list-server, the Z39.50 driver will send a SCAN directive to the host-database. The driver then packages the results of that search into the form that Willow expects, and sends it back. Not all of the options described above are available with the SCAN-based browse lists.

Related Software

There are several other interesting pieces of software developed at the University of Washington that are closely tied to Willow.

Wilco

WILCO -- the Washington Information Looker-upper - Character Oriented, is, as the name implies, a character-cell version of Willow. The idea is to provide as much as possible of Willow's functionality using only standard text-terminal emulation, so it can be run at home via a modem, or on other non-X-capable devices. Wilco uses the exact same protocol (and much of the code) as Willow, so it can run with identical driver programs. Thus Wilco is also a full-featured Z39.50 search client. Wilco has a very similar interface to Pine, the UW-developed electronic mail interface for character-cell environments. Of course much of Willow's power comes from the ability to have several windows on the screen at the same time, and from the convenience of using a mouse. Naturally both of these features are missing in Wilco. However user-feedback indicates that Wilco is still considered an improvement over standard text-based database interfaces. To see Wilco, simply telnet to uwin.u.washington.edu, and select the UW Libraries Catalog.

Wilco Search Screen.

Willow for MS-Windows

Willow has successfully been ported to Microsoft Windows, and is now generally available.

Willow for MS-Windows prototype Search screen.

To help with porting to non-Unix platforms, the connection between Willow and the driver can be made a network socket, rather than a simple pipe. That way, the drivers can always run on a Unix box, and only the Willow interface itself needs to be ported.

Willow and the World Wide Web

Willow is designed to be compatible with the World Wide Web (WWW) and and any standard browser. The situation is analogous to the way image files work with Mosaic. With those, you click on a hypertext link which causes a file with a .gif extension to be downloaded. Mosaic knows that when it has a GIF file, it should start the xv program to view it. Similarly, you can pick a WWW hypertext link that causes a file containing a single db.conf entry to be downloaded (with the extension .dbcf), and if you have configured your browser properly, it will know to start a Willow session to "view" that file. One important difference however, is that if you select a second Willow link, the first Willow will offer to change to the new database for you, so you do not have to wait for an entire new Willow session to start up, nor clutter your desktop with Willows. Live Willow-launching links can be found on the University of Washington home pages.

As previously mentioned, Willow is normally configured to retrieve its help and database configuration files remotely from a Web server, via http.

Willow and WAIS

Willow can be used to search any database that speaks the standard Z39.50 information retrieval protocol (though these are not the only databases Willow can talk to). One notable example of a Z39.50 compatible system is Isite, formerly called free-WAIS. Isite provides a Z39.50 protocol engine that can be put in front of a standard search engine, such as WAIS. Isite is produced by the Clearinghouse for Networked Information Discovery and Retrieval (CNIDR).

Multimedia Extensions

In order to achieve 100% buzzword compatibility, a multimedia extension mechanism has been developed for Willow. Willow itself only deals with data in the form of plain ASCII text. However, more and more databases contain data in a more complex form -- images, sounds, animations, SGML, HTML, and Unicode, just to name a few. Rather than try to add a mode to Willow that can handle every possible data type, there are hooks that allow you to configure Willow so that it can start external programs to handle the non-standard portions of any retrieved information.

For example, many of our Libraries Catalog holdings are in non-Roman languages (i.e. Chinese, Japanese, Korean). We are developing a program that allows you to view library holdings in any of the languages in which they appear. When Willow retrieves a record that contains non-Roman text, it can start this viewer program to display it properly.

Similarly, we are experimenting with having Willow launch the Adobe Acrobat viewer to display page-images of the full text of articles in certain databases.

License Server

One complication the UW faced in the implementation of the database driver for our locally mounted BRS databases was the user-account problem. We wanted to support walk-in library users without forcing them to get personal accounts on the machine that runs BRS. Yet BRS requires each search session to run from a different account, so we could not just have a single public account. We implemented a general purpose network license server, and created pools of Willow accounts on the BRS host. When the user selects a BRS database, the BRS driver transparently contacts the license server and requests a BRS account and password. The account and password are then used to log into the host machine. The license server makes sure only one person is using an account at any time. We can also use the license server to make sure only a certain number of users are on a particular database at any time, which some contracts with database vendors require.

It is important to note the distinction between accounts on the database host system, and accounts on the Willow system. As discussed earlier, Willow itself is quite capable of running without accounts for individual users. The problem discussed here is with accounts on the machine where the database is running. Willow can also be configured to allow the user to use her own personal account and password on a database host, but this has become obsolete for us.

For increased security, the passwords are temporary -- they are generated on the fly by the license server, and can only be used for 30 seconds after they are issued. Also, the license server looks at the IP address of all requests, so we can deny non-UW requests for login-tickets to our licensed databases, but still allow a limited number of non-UW users into our Libraries Catalog.

Future Developments

Though Willow is now considered a mature system, there are still many enhancements we would like to add. For example, search history reference and sorting of result sets are not yet supported.

Much of the current development efforts are focused towards greater integration with the World Wide Web. An HTML-based help system is under development, to replace the current home-brew help system. We will also be experimenting with using an HTML display widget for display of retrieved records, and re-doing our multi-media extensions mechanism to be more compatible with that of typical Web browsers. Implementation of the Z39.50 EXPLAIN service is under development, to allow Willow's target databases to be self-configuring. Further down the road, we need to figure out an intuitive and simple interface for allowing the naive user to construct complex boolean searches, i.e. a painless way of specifying ANDs, ORs, and NOTs, or even natural language processing. Parallel searches of multiple databases would also be a nice feature.

Questions and comments about Willow to:
willow@cac.washington.edu