IMPORTANT ANNOUNCEMENT

Development and support of Willow is now discontinued. Willow was removed from production at UW on June 30, 1999.

Using Willow with the Z39.50 Information Retrieval Protocol

Section 0: Introduction
Section 1: What is Z39.50?
Section 2: Z39.50 Services: 2.1 INIT; 2.2 SEARCH; 2.3 SCAN; 2.4 PRESENT
Section 3: Configuring a Z39.50 Database
Section 4: Specifying Z39.50 Attributes
Section 5: MARC Record Formatting
Section 6: Creating a Z39.50 Database

Section 0: Introduction

This document provides an introduction to the Z39.50 protocol, followed by information on how to configure and use Willow's Z39.50 driver. Readers who are already familiar with the protocol may wish to go directly to Configuring a Z39.50 Database. Creating a Z39.50 Database gives some pointers for finding a Willow/Z39.50 compatible server for your own data. For current information on Z39.50 in general, see the Z39.50 resources pages at http://lcweb.loc.gov/z3950/agency and http://ds.internic.net/z3950/z3950.html

Section 1: What is Z39.50?

The Z39.50 Information Retrieval Service and Protocol Standard provides an interoperable way to search databases, retrieve database records, scan term lists, and perform ancillary tasks related to information retrieval (IR). It is a platform and operating system independent client/server protocol that addresses communication between an origin (client) and a target (server). It does not define user-interface behavior.

ANSI Z39.50-1992 was a revision of the (now obsolete) Z39.50-1988 protocol. The 1992 standard defined version 2 of the protocol and was bit-compatible with ISO 10162/10163, Search and Retrieve, SR. Z39.50-1995, the current revision of the standard since 1995, defines protocol versions 2 and 3.

The Library of Congress is the maintenance agency for the standard and is advised by the Z39.50 Implementors Group (ZIG). The ZIG maintains an active listserv (Z3950IW@NERVM.BITNET) and meets 3 or 4 times each year. ZIG membership is open to all interested parties.

The application protocol data units (APDUs) for the 1992 and 1995 revisions of the standard are expressed in Abstract Syntax Notation One (ASN.1) 1990. An html version of the 1995 ASN.1 is available at the Maintainence Agency Home Page.

Section 2: Z39.50 Services

Z39.50 interactions take place within the context of a Z39.50 service. Most implementors accept the INIT, SEARCH, and PRESENT services as forming the "core" of the protocol, and virtually all Z39.50 clients and servers support these. Other services, less widely implemented, are SCAN, SEGMENT, DELETE, ACCESS-CONTROL, RESOURCE-CONTROL, TRIGGER-RESOURCE-CONTROL, RESOURCE-REPORT, SORT, EXTENDED-SERVICES, and CLOSE. Willow currently supports version-2 INIT, SEARCH, PRESENT, SCAN, and RESOURCE-CONTROL.

Section 2.1: INIT

A Z39.50 session begins with the INIT service. The origin sends an InitRequest to the target. The InitRequest specifies which protocol versions it can use, which Z39.50 services it may wish to use during the session, and what size messages and records it can accept. Optionally, the InitRequest may identify the client software and include userid and password information.

The target responds with an InitResponse that confirms which protocol version will be used, which of the proposed Z39.50 services it will support, and whether or not the InitRequest has been accepted. Optionally, it may identify the server software.

Upon receiving an InitResponse, and assuming the InitRequest was accepted, the origin decides whether to continue with the session.

Section 2.2: SEARCH

The SEARCH service allows many different types of queries. The most common is the type-1 RPN (reverse polish notation) query. RPN queries consist of two operands followed by an operator that should be applied to them. The advantage of RPN is that it allows complex queries to be expressed unambiguously. For example, the expression 2 x 3 + 2 would normally be interpreted as "(2 x 3) + 2", but is ambiguous without the parentheses. The same query in RPN would be "2 2 3 x +", and allows only one possible interpretation. Operands may themselves be RPN queries, or they may consist of one or more search terms plus "attributes" that describe the desired characteristics for a term. The type-101 query adds proximity operators to the boolean of type-1 queries, but few systems support type-101 queries at this time.

Attributes are type/value pairs which collectively form an "attribute set". The most commonly implemented attribute set is "BIB-1"; this attribute set is public and registered, which means that implementors cannot arbitrarily change it. The BIB-1 set includes elements for "use" (an access point, roughly analagous to an index), "relation" (equal, greater than, etc.), "position" (first in field, first in subfield, any position), "structure" (word, phrase, etc.), truncation (left, right, etc.), and "completeness" (incomplete subfield, complete subfield, complete field). A list of the BIB-1 attributes and how to specify them in Willow can be found in the Specifying Z39.50 Attributes section below. Other public, registered attribute sets include STAS (scientific and technical) and GILS (government information). Implementors may design private attribute sets; specifications for these must be obtained from the owner.

Even given the relatively well-understood domain of bibliographic data, the semantics of the BIB-1 attribute set are problematic. In the past, targets often redefined queries they were unable to process ("fuzzying" the search). If, for example, an origin submitted a search asking for "Smith, John" as a personal author phrase and the target had no concept of "personal-name-as-author", it might map the query to "personal-name-as-subject-or-author". Almost certainly, this was not what the user expected. Implementors are aware of this problem, and several solutions are being tried. Profiles are being developed for certain classes of data (WAIS and GILS are already defined), BIB-1 semantics are being explicitly defined, and servers are no longer arbitrarily fuzzying searches. Version-3 adds a semanticAction flag that allows the origin to specify how much fuzzying it is willing to accept.

Section 2.2.1: Searching with Willow

Attribute Sets

Willow supports only the BIB-1 attribute set at this time.

Operators

AND, OR, and AND-NOT are supported. If more than one field box contains search terms, the field boxes are connected by AND. Operators may not appear as the first element in any field box. Proximity operators are not yet supported.

The assumed operator between words is usually AND (for the structure attribute WORD_LIST). Some servers will use a default operator other than AND, and ultimately choosing the default is up to the server. Some indexes, like TITLE_WP, are phrases by default (structure attribute PHRASE). The completeness attribute affects whether the phrase has to be the complete subfield, complete field, or incomplete subfield. You can override structure, completeness, and position attributes on a per-field basis in the database configuration file, but your user cannot do so on a per-search basis.

Regardless of your database configuration settings, the target's indexing practices will determine much of your search strategy. For example, OCLC WorldCat indexes subject headings as phrases at the complete subfield level. If you try to search "united states history" as a subject phrase (looking for documents with the subject heading "United States -- History"), you will fail since the indexing is

  history 
  united states

Similarly, "united states history" fails as a phrase search in WorldCat; there is no index for that database that spans subject subfields. Your only alternatives are to search "united states history" as a wordlist (which assumes AND between words), or to connect the subfield phrases with AND:

  united states AND history

Relational Operators

Many numeric fields support relational operators, listed below:

  =   equal to 
  <   less than 
  <=  less than or equal to 
  >   greater than 
  >=  greater than or equal to 
  ~=  not equal to 
  ^=  not equal to

Punctuation

The search term itself must be a simple string without any special characters (like "=()<>", tabs or blanks). If you want to include these characters just put quotes marks around the search term. If you want a quote mark within a term surrounded by quote marks you must double the quote mark within the quotes. For example either query below

   'hogan''s hero''s' 
or 
   "hogan's hero's"

would search for the string "hogan's hero's". A quote mark may be either a single quote (') or a double quote (").

With some targets, punctuation (or the lack of it) is significant. Among the more common practices are:

requiring a comma between the surname and forename in an author search to indicate inverted order ("smith, john")
requiring that a comma not appear between the surname and forename in an author search ("smith john")
assuming the absence of a comma in an author search indicates direct order ("john smith")
requiring double dashes between subheadings in a subject phrase search (united states -- history)
requiring that double dashes not appear between subheadings in a subject phrase search (united states history)

Willow's Z39.50 driver normalizes search arguments by default. See Configuring a Z39.50 Database below for more details.

Truncation

Use '#' or '?' to indicate right truncation. For example, "rat#" will match "rat", "rate", "ratio", etc. If you submit a truncated search term and the server does not support truncation, you should get an error message.

A few servers truncate by default, so a search for "wash" also retrieves "washington". In these cases, you can use '|' to indicate no truncation. For example, "wash|" will retrieve "wash" only, even if the server truncates by default.

You can override the target's default truncation setting in the database configuration file on a per-field basis.

Completeness

Like truncation, the completeness attribute defines how your search terms are mapped at the target. In RLIN's "Avery Index to Architectural Periodicals", an author phrase search for "clark, sally" retrieves documents by "Clark, F. R. S.". RLIN's default behavior is to map the forename in an author search to the initials and to the spelled-out form. To get an exact personal name search, you must set the completeness attribute to CFLD (value = 3). Completeness is a database configuration per-field setting and cannot be changed on a per-search basis. Willow normally accepts the target's default.

Section 2.3: SCAN

The SCAN service provides a way to browse ordered lists of terms. The origin specifies a database, starting term, attributes, step size, number of terms requested, and preferred position in the response. The target should return the terms and/or an error diagnostic. Targets are free to support SCAN for some attributes and not for others, or not to support SCAN at all. SCAN performance can be erratic, especially if the target has to sift through a concatenated term list looking for those terms that match a particular request. Willow's SCAN has been extensively rewritten to comply with Z39.50-1995.

Section 2.4 PRESENT

PRESENT allows the origin to request records that were found by a search. Preferred record syntaxes are part of the INIT negotiation between origin and target. The protocol allows for many different record syntaxes, although only MARC and SUTRS (Simple Unstructured Text Record) are in widespread use today; Willow supports both MARC and SUTRS records. Some developers are beginning to implement GRS-1 (Generic Record Syntax), a syntax that allows encoding of arbitrarily complex records like image formats, SGML data, etc.

MARC is an encoding standard (Z39.2) for bibliographic data that has been in widespread use in libraries for 30 years. Within the library community, a set of content guidelines (USMARC Format for Bibliographic Data) and a set of data format rules (Anglo-American Cataloging Rules) have attached themselves to the MARC communications format specified in Z39.2.

It should be noted that the use of MARC as a record syntax in Z39.50 does not imply the use of USMARC content guidelines. Willow's MARC record formatting is table-driven, which gives a site administrator the flexibility required to design rational displays for MARC records that do not use USMARC content rules.

Section 3: Configuring a Z39.50 Database

This section provides supplemental information to the Database Configuration Files section of Administration for configuring Z39.50 databases. When using the Z39.50 driver, the following differences from that section apply.

HostName

A fully qualified domain name for the Z39.50 server host must be supplied with a "HostName" environment specifier. I.e.

    env_name = HostName 
    env_val = fionavar.mit.edu

An exception is if you use the University of Washington's license manager system (see below), in which case the driver can aquire a login "ticket" from the license manager which will include a Z39.50 server name.

Authenticated Databases

Z39.50 databases which require user-authentication can be handled three different ways. They can be configured using the standard Willow method where the user enters and id and/or password (see the "Manual Login Databases" section of "Customization"), or you can hard-code the authentication information into the db.conf file, so the user does not have to type it.

To use the latter method, use an "auth_id" environment specifier. I.e.

    env_name = auth_id 
    env_val = uname/password

The value is a user name and a password separated by a slash, with no whitespace.

The third option is to use the UW license manager system, which can provide userid and passwords for connections to remote hosts as needed. With the UW license manager, all of the connection information can be maintained centrally on the license manager. When a user starts a database session, the driver contacts the license server and obtains the connection information and password. The license manager can be configured to limit the number of concurrent sessions, as some vendors require.

Z39.50Port

If the Z39.50 service is not on the standard port (210) you can specify it:

    env_name = Z39.50Port
    env_val  = 210

canon_rule

By default, the Z39.50 driver will try to normalize your search arguments before sending them over the wire. The normalization rule does the following:

within double quotes, no filtering.
comma, semicolon, exclamation point, and question mark become whitespace.
single quote is deleted.
double dash becomes a space.
period becomes whitespace except for the following attributes: "LCCN", "CODEN", "LOCAL_NUM", "DEWEY_NUM", "UDC_NUM", "LC_CALL", "NLM_CALL", "LOCAL_CALL", NULL.
parentheses are filtered unless preceded by one of the following operators: "OR", "||", "AND", "&&", "AND-NOT", NULL.
Everything else is untouched.

To turn normalization off on a per-database basis, specify:

    env_name = canon_rule
    env_val  = none

t_field_cmd =

Same as with other databases but additional functionality. In addition to simple index names like "TITLE" you can specify other Z39.50 attributes. See the Specifying Z39.50 Attributes section below.

l_field_cmd = LANGUAGE = ENGLISH

Is also a string that gets sent to the driver. It is a way of abbreviating a search. Just insert the search that would normally get submitted as a string here.

list_name

Specifies a browsable index using the browse translation functionality, which maps the Willow list-browsing protocol to the Z39.50 SCAN protocol. To enable browsing for a particular index, add a corresponding list_name line after a t_field_cmd entry, specifying the Z39.50 attributes desired (see below, Specifying Z39.50 Attributes). For example, to browse the list of full titles:

    t_field_name = Exact Titles 
    t_field_cmd  = TITLE_WP
    list_name    = TITLE_WP

Optionally, you may specify a starting term by appending "@<term>" to the list name. To begin scanning titles at 'a', for example:

    list_name = TITLE_WP@A

The translation may have problems with certain servers which use a collating sequence other than ASCII. The driver can support a pseudo-EBCDIC sequence for a browse list; this feature can be activated by appending :EBCDIC to the index name, e.g.:

    list_name = TITLE:EBCDIC.

ASCII is the default; specifying :ASCII explicitly is also allowed.

preferred_syntax

Specifies the record syntax that you want the server to use. Willow currently supports USMARC, UKMARC, CANMARC, UNIMARC, and SUTR. preferred_syntax is an optional field; if it is not specified, the target may substitute its default record syntax. In the past, when almost all servers supported USMARC and few supported anything else, you almost always got USMARC as a default. Increasingly, targets are defaulting to a syntax other than USMARC -- some use GRS1, which Willow cannot yet understand. Now, it is prudent to specify a preferred_syntax. Be sure to specify the correct "flavor" of MARC (USMARC in the United States, CANMARC in Canada, etc.).

SUTR Syntax

To enable the use of SUTR (Simple Unstructured Text Record) records, include the following in the database entry.

    env_name = preferred syntax 
    env_val  = SUTR

With SUTR records, the fields to display for the summary record are selected using the summary_fields field in a database entry. For example,

    summary_fields = TITLE,"AUTHOR:"

will match any line beginning with TITLE or AUTHOR for summary display.

Each field with an 'E' flag indicates that the summary display should end if this field is matched. If none of these fields are matched, the Summary_Field_Count tag will be used to determine how many fields will be used for the summary. It must be the last tag:

    summary_fields = Land boundaries:E,Coastline:E,Summary_Field_Count:8

summary_fields, title_fields

Willow requires title_fields and summary_fields to exist for each database, but for non-SUTR databases (i.e. MARC databases) their values are ignored by the Z39.50 driver so they can be left blank:

    title_fields = summary_fields =

For MARC databases the contents of the summary and full records are controlled by two files -- bibbrief for summaries and title lists, and bibfull for the full records. They are described more fully below, in MARC Record Formatting.

There is one bibbrief and bibfull file that comes stock with the z3950 driver (located in the z3950.format subdirectory, relative to the Willow*configURL or Willow*databaseConfigDir X Resources). But you can specify a different file to look for with bib_suffix. For example:

    env_name = bib_suffix 
    env_val = mitcat

would cause your database to look for bibbrief.mitcat and bibfull.mitcat before the standard bibbrief and bibfull files. If one or both files are not found, Willow silently defaults. If one or both are found, their contents OVERRIDES the default. So you should clone bibfull and bibbrief, make changes to the clone and install it as your custom file with the custom bib_suffix.

extension_fields

The z3950 driver supports the Willow multi-media extension mechanism. The extension_label and extension_program operate in exactly the same way as other drivers. However, the extension_fields string is VERY DIFFERENT. It contains a mini bibfull file entry. Instead of whitespace, you use a comma. (You also must terminate the string with a comma. Perhaps a subsequent version will contain a smarter parser.) Here is an example of a Z39.50 extension entry:

    extension_fields  = PATH:@,|,|,690,|,~,PAGE:@,|,|,505,|,~, 
    extension_label   = Article View 
    extension_program = launcher

When you press the button labeled "Article View" the 'launcher' program is started and receives a full record with the MARC 690 field prefixed by "PATH: ", and the MARC 505 field prefixed by "PAGE: ". MARC Record Formatting below gives more information about setting up the MARC formatting description. The Multi-Media Extensions section of Administration gives more information on using the extension mechanism.

Section 4: Specifying Z39.50 Attributes

The attribute-spec specifies the search index (also known as the USE Attribute), and optionally can also set Structure, Completeness, Positional, and Truncation attribute parameters.

  <attribute-spec> ::= <use> 
       | <use>'/'<attribute_type>':'<attribute_value>

  <attribute_type> ::=
       <relation>
      | <position>
      | <structure>
      | <truncation>
      | <completeness>

Where <use> is any name that matches the indexes defined for the database and <other_attribute> is any one of the non-use attributes.

It is not necessary to specify non-use attributes unless a particular behavior is desired. Willow associates default values for <relation> and <structure> with each <use> attribute. Targets will generally ignore or supply server defaults for any missing non-use attributes. "Well-behaved" targets believe that you want the behavior you specify, so it is prudent to leave unspecified those attributes that are irrelevant to you.

  Attribute: <use> Integer: 1
  Syntax:    <value>
  Default:   none. <use> attribute is required.
  Example:   t_field_cmd = TITLE

Except for those noted below, <use> attributes ending in "_wp" default to PHRASE structure, and others to WORD_LIST. <use> attributes identified with '*' in the list below default to WORD structure and FIRST_IN_SUBFIELD position.

  <use>                         Integer
  PERS_NAME                        1
  AUTHOR_PN                        1
  PERS_NAME_WP                     1
  PAP                              1
  CORPORATION                      2
  ORGANIZATION                     2
  CAP                              2
  CORPORATION_WP                   2
  CONFERENCE                       3
  CONFERENCE_WP                    3
  TITLE                            4
  TITLE_WP                         4
  TWP                              4
  SERIES                           5
  SERIES_WP                        5
  UNFRM_TITLE                      6
  UNFRM_TITLE_WP                   6
  ISBN                             7
  ISSN                             8
  LCCN                             9
  BNBN                            10   *
  BGFN                            11   *
  LOCAL_NUM                       12   *
  DEWEY_NUM                       13   *
  UDC_NUM                         14   *
  BLISS_NUM                       15   *
  LC_CALL                         16   *
  NLM_CALL                        17   *
  NAL_CALL                        18   *
  MOS_CALL                        19   *
  LOCAL_CALL                      20   *
  SUBJECT                         21
  SUBJECT_WP                      21
  SP                              21
  RAMEAU_SUBJ                     22
  RAMEAU_SUBJ_WP                  22
  BDI_SUBJ                        23
  BDI_SUBJ_WP                     23
  INSPEC_SUBJ                     24
  INSPEC_SUBJ_WP                  24
  MESH_SUBJ                       25
  MESH_SUBJ_WP                    25
  PA_SUBJ                         26
  PA_SUBJ_WP                      26
  LC_SUBJ                         27
  LC_SUBJ_WP                      27
  RVM_SUBJ                        28
  RVM_SUBJ_WP                     28
  LOCAL_SUBJ                      29
  LOCAL_SUBJ_WP                   29
  DATE                            30
  PUBDATE                         31
  ACQ_DATE                        32
  KEY_TITLE                       33
  KEY_TITLE_WP                    33
  COLL_TITLE                      34
  COLL_TITLE_WP                   34
  PLL_TITLE                       35
  PLL_TITLE_WP                    35
  COVER_TITLE                     36
  COVER_TITLE_WP                  36
  TITLEPAGE                       37
  TITLEPAGE_WP                    37
  CAPT_TITLE                      38
  CAPT_TITLE_WP                   38
  RUN_TITLE                       39
  RUN_TITLE_WP                    39
  SPINE_TITLE                     40
  SPINE_TITLE_WP                  40
  OVAR_TITLE                      41
  OVAR_TITLE_WP                   41
  FORMER_TITLE                    42
  FORMER_TITLE_WP                 42
  ABBR_TITLE                      43
  ABBR_TITLE_WP                   43
  EXPN_TITLE                      44
  EXPN_TITLE_WP                   44
  PRECIS_SUBJ                     45
  PRECIS_SUBJ_WP                  45
  RSWK_SUBJ                       46
  RSWK_SUBJ_WP                    46
  SUBDIV_SUBJ                     47
  SUBDIV_SUBJ_WP                  47
  NATBIB_NUM                      48
  NATBIB_NUM_WP                   48
  LGLDEP_NUM                      49
  LGLDEP_NUM_WP                   49
  GOVTPUB_NUM                     50
  GOVTPUB_NUM_WP                  50
  MUSIC_PUB                       51
  MUSIC_PUB_WP                    51
  DB_NUM                          52
  DB_NUM_WP                       52
  LOCAL_ID                        53
  LOCAL_ID_WP                     53
  LANGUAGE                        54
  GEO_CODE                        55
  INST_CODE                       56
  NAME_TITLE                      57
  NAME_TITLE_WP                   57
  GEO_NAME                        58
  GEO_NAME_WP                     58
  PUB_PLACE                       59
  CODEN                           60
  MFORM_GEN                       61
  ABSTRACT                        62
  ABSTRACT_WP                     62
  NOTE                            63
  NOTE                            63
  PUB_TYPE_EXACT                 500
  AUTHOR_TITLE                  1000
  AUTHOR_TITLE_WP               1000
  RECTYPE                       1001
  RECTYPE_WP                    1001
  NAME                          1002
  NAME_WP                       1002
  AUTHOR                        1003
  AUTHOR_WP                     1003
  PERS_AUTHOR                   1004
  PERS_AUTHOR_WP                1004
  CORP_AUTHOR                   1005
  CORP_AUTHOR_WP                1005
  CONF_AUTHOR                   1006
  CONF_AUTHOR_WP                1006
  STANDARD_ID                   1007
  STANDARD_ID_WP                1007
  LC_CHILD_SUBJ                 1008
  LC_CHILD_SUBJ_WP              1008
  PERSNAME_SUBJ                 1009
  PERSNAME_SUBJ_WP              1009
  TEXT_BODY                     1010
  TEXT_BODY_WP                  1010
  DB_ADD_TIME                   1011
  DB_ADD_TIME_WP                1011
  DB_MOD_TIME                   1012
  DB_MOD_TIME_WP                1012
  ATY_FMT_ID                    1013
  CONCEPT_TEXT                  1014
  CONCEPT_TEXT_WP               1014
  CONCEPT_REF                   1015
  COCEPT_REF_WP                 1015
  DEFAULT                       1016
  DEFAULT_WP                    1016
  ANY                           1017
  ANY_WP                        1017
  SERVER_CHOICE                 1017
  SERVER_CHOICE_WP              1017
  PUBLISHER                     1018
  RECORD_SOURCE                 1019
  EDITOR                        1020
  BIB_LEVEL                     1021
  GEO_CLASS                     1022
  INDEXED_BY                    1023
  MAP_SCALE                     1024
  MUSIC_KEY                     1025
  REL_PERIODICAL                1026
  REPORT_NUMBER                 1027
  STOCK_NUMBER                  1028
  THEMATIC_NUMBER               1030
  MATERIAL_TYPE                 1031
  DOC_ID                        1032
  HOST_ITEM                     1033
  CONTENT_TYPE                  1034
  ANYWHERE                      1035
  ATS                           1036
  SICI                          1037


  Attribute: <relation>   Integer: 2
  Syntax:    R:<attribute_value>
  Default:   EQUAL
  Example:   t_field_cmd = DATE/R:GREATER_OR_EQUAL

       <attribute_value>                     Integer
       EQUAL                                    3
       GREATER_OR_EQUAL                         4
       GREATER_THAN                             5
       LESS_THAN                                1
       LESS_THAN_OR_EQUAL                       2
       NOT_EQUAL                                6
       PHONETIC                               100
       RELEVANCE                              102
       STEM                                   101


  Attribute: <position>     Integer: 3
  Syntax:    P:<value>
  Default:   ANY_POSITION_IN_FIELD
  Example:   t_field_cmd = TITLE_WP/P:FFIELD

       <attribute_value>                     Integer
       ANY_POSITION_IN_FIELD                    3
       FFIELD                                   1
       FSUBFIELD                                2
       FIRST_IN_FIELD                           1  (synonym for FFIELD)
       FIRST_IN_SUBFIELD                        2  (synonym for FSUBFIELD)


  Attribute: <structure>    Integer: 4
  Syntax:    S:<value>
  Default:   varies by <use> attribute
             _wp <use> attributes default to PHRASE
             other <use> attributes default to WORD_LIST, 
             except where noted in the <use> section
  Example:   t_field_cmd = TITLE/S:WORD

       <attribute_value>                     Integer
       DATE_NORMALIZED                          5
       DATE_UNNORMALIZED                      100
       DOCUMENT_TEXT                          106
       FREE_FORM_TEXT                         105
       KEY                                      3
       LOCAL_NUMBER                           107
       NAME_NORMALIZED                        101
       NAME_UNNORMALIZED                      102
       PHRASE                                   1
       STRUCTURE                              103
       URX                                    104
       WORD                                     2
       WORD_LIST                                6
       YEAR                                     4
       WORD_ADJ                                 7


  Attribute: <truncation>   Integer: 5
  Syntax:    T:<value>
  Default:   none (server default)
  Example:   t_field_cmd = TITLE_WP/T:RIGHT_TRUNCATION

       <attribute_value>                     Integer
       DO_NOT_TRUNCATE                        100
       GLOB                                   102
       LEFT_AND_RIGHT                           3
       LEFT_TRUNCATION                          2
       PROCESS_#_IN_THE_SEARCH_TERM           101
       REGEXP                                 103
       RIGHT_TRUNCATION                         1


  Attribute: <completeness> Integer: 6
  Syntax:    C:<value>
  Default:   none (server default)
  Example:   t_field_cmd = TITLE_WP/C:CSUBFIELD

       <attribute_value>                     Integer
       CFIELD                                   3
       CSUBFIELD                                2
       COMPLETE_FIELD                           3 (synonym for CFIELD)
       COMPLETE_SUBFIELD                        2 (synonym for CSUBFIELD)
       INCOMPLETE_SUBFIELD                      1

Example 1:

for a subject search that groups subfields as phrases, but will match the subfield phrase if it occurs anywhere in the field.

  t_field_cmd = SUBJECT/S:PHRASE/P:ANY_POSITION_IN_FIELD

Example 2:

If the database had defined a default TITLE index as:

  use      structure   completeness  position 
  TITLE    WORD_LIST   CFIELD        ANY_POSITION_IN_FIELD

The following attribute-spec expressions are equivalent:

  TITLE 
  TITLE/S:WORD_LIST 
  TITLE/S:WORD_LIST/C:CFIELD 
  TITLE/S:WORD_LIST/C:CFIELD/P:ANY_POSITION_IN_FIELD 
  TITLE/C:CFIELD 
  TITLE/C:CFIELD/P:ANY_POSITION_IN_FIELD 
  TITLE/P:ANY_POSITION_IN_FIELD

The reason why so much redundancy is allowed, is to make it easy for a user to specify an attribute spec.

NO whitespace should appear anywhere in an attribute spec. This is because the parser considers the attribute spec to be one token, and cuts it apart at the slashes.

All of these parameterizations are controlled through arrays in src/z3950/api/client/bib1.h and src/z3950/api/client/bib1attr.h. So if you hear of a new kind of attribute, you can add it there.

Section 4: MARC Record Formatting

The contents of the summary and full records are decoded from the MARC format received from the Z39.50 server. How the user sees them is controlled by two files 'bibbrief' for summaries, and 'bibfull' for the full records.

These two files get stored as bibbrief, and bibfull in the db-config/z3950.format directory. As discussed in Configuring a Z39.50 Database it is also possible to specify alternate bibfull and bibbrief files for specific databases.

Conventions: Comment lines are delimited by beginning and ending with a slash. If you want to put a space in your output, denote it with the at-sign -- '@'. The vertical bar '|' denotes "default" and is used to match anything. The tilde '~' means "not" and is used to match all but those named.

The files consist of a set of tab-separated fields.

  short:     The short name string to be displayed.
  ind1:      MARC first indicator. 
  ind2:      MARC second indicator. 
  tag:       MARC tag. 
  subfields: MARC subfields present.

Control is exercised as follows: if the tag, indicators and subfields in the record match the contents of a line, the short name entry will be prepended and the data from the record will be displayed. In bibbrief, the summary field tells how wide to make the field in the title display; since the title display is fixed width, the number of characters specified in the summary column is always reserved for that shortname, even if no data is present.

Bibbrief Format Examples

/short   ind1   ind2   tag   subfields   summary / 
AU:@ |    |     100     |      20

Twenty characters out of the author appear in the title line display. All subfields are displayed.

SU:@ |    |     600     ~w      ~

Subject does not appear in the title line display. All subfields except w are displayed.

Bibfull Format Examples

/short   ind1   ind2   tag   subfields   summary / 
AU:@ |    |     100    ~w       ~

Special Rules

Following are the special rules that are currently available.

R1:

Used to extract holdings data from the MIT catalog. Parameters 0 "R1" R2: Special labels for MeSH subject headings.

Parameters 
0      "R2"

R2 was created to format the MeSH subject headings (as they occur in OCLC FirstSearch MEDLINE) into a more reasonable form. R2 looks for a MARC 650 tag with the second indicator set to '2'; this identifies the heading as MeSH. R2 assumes that the first occurrence of the 650_2 should be labeled as a "MeSH Major Subject", and all of the a subfields will be grouped under that label. The second occurrence of the 650_2 is assumed to hold "MeSH Minor Subject", and each subfield a entry will be grouped under that label, with each subfield starting on a new line. Subsequent 650_2 are not labelled, and will appear on a separate line following the "MeSH Minor Subject" entries.

Example: the following three MARC fields:

650 2 $a Aortic Aneurysm, Abdominal--surgery (*SU)
 $a *Implants, Artificial 
650 2 $a Aorta--pathology (PA)/ultrastructure (UL) $a Dogs
 $a Microscopy, Electron, Scanning $a Stents 
650 2 $a Animal $a Support, Non-U.S. Gov't

would be formatted and shown to the user as:

MESH MAJOR SUBJECT: Aortic Aneurysm, Abdominal--surgery (*SU)
                    *Implants, Artificial 
MESH MINOR SUBJECT: Aorta--pathology (PA)/ultrastructure (UL) 
                    Dogs 
                    Microscopy, Electron, Scanning 
                    Stents 
                    Animal 
                    Support, Non-U.S. Gov't

R3-R7:

These varargs rules use '!' to separate the arguments. Each rule must be terminated with a '!'.

R3:

Substitutes a text string for the label defined in the bib file.

Parameters 
0   "R3" 
1   (decimal) field tag 
2   string for first occurrence of MARC field (if null, 
    use bibfile label) 
3   string for second occurrence of MARC field 
.   .
:   : 
n   string for n-1th occurrence of MARC field

Example:  AUTHOR:@  |  |  R3!700!!OTHER@AUTHORS:@!  |   ~   
  will display
    700 1  |a Mahin, John Lee, |d 1907-
    700 10 |a Arthur, Robert.
    700 10 |a Wayne, John, |d 1907-1979.
    700 10 |a Ross, Katherine.
  as:
        AUTHOR: Mahin, John Lee, 1907-
 OTHER AUTHORS: Arthur, Robert.
                Wayne, John, 1907-1979.
                Ross, Katherine.

R4:

Extracts data from a fixed field.

Parameters
0       "R4"
1       (decimal) field tag
2       (decimal/hex/octal) beginning character offset
3       (decimal/hex/octal) ending character offset
        (leading 0x = hex; leading 0 = octal)
         must be >= beginning offset

Example:   OCLC@NO:@  |  |   R4!001!0!10!   |     ~
  will display "001 ocm20885381" as:
    OCLC NO: ocm20885381

R5:

Substitute data in a field

Parameters
0       "R5"
1       (decimal) field tag
2       data for first occurrence of MARC field
3       data for second occurrence of MARC field
.       .
:       :
n       data for n-1th occurrence of MARC field

Example:   HOLDINGS:@  |   |  R5!948!Check@at@ILL.@!  |  ~
  will display "948 $hLIBRARIES: STATE: LA  LIBRARY: LMJ" as:
    HOLDINGS: Check at ILL.

R6:

Extract and format date based on contents of leader byte 7 and MARC field 008.

Parameters
0       "R6"
1       (decimal) field tag (use 008)

Example:   YEAR:@     |  |  R7!008!   |   ~
  will cause dates to be formatted as:
     /07       008/6-14        display
       s       d19041905       YEAR: 1904-1905
       s       d181u18uu       YEAR: 181u-18uu
       m       i17651770       YEAR: 1765-1770
       m       r19911926       YEAR: 1991, 1926
       m       c19701958       YEAR: 1970, 1958
       m       d198606         YEAR: June, 1986
       m       s1903           YEAR: 1903
       s       c19869999       YEAR: 1986
       m       b               YEAR:

R7:

Appends text string at the end of each subfield in a MARC field.

Parameters
0       "R7"
1       (decimal) field tag
2       string to append to each subfield except the last 
        in the MARC field
3       string to append to the last subfield in the MARC field

Example:  SUBJECT:@ |  |  R7!650!@--@!.!  |  ~
  will cause  
    "650 $aUnited States $aHistory $yCivil War 1861-1865" 
  to be formatted as:
    SUBJECT: United States -- History -- Civil War 1861-1865.

Section 5: Creating a Z39.50 Database

This section gives some pointers for finding a Z39.50 compatible server to put your data into, to allow you to search with Willow. The development of Z39.50 software is progressing quite rapidly, so the information in this section may be out of date by the time you read it.

The best way to start looking for current Z39.50 information is to point your favorite Web browser at the Z39.50 Resources page at URL: http://ds.internic.net/z3950/z3950.html The Library of Congress, in its capacity as the Z39.50 maintenance agency, also maintains a resource page at: http://lcweb.loc.gov/z3950/agency/

The Isite software package from CNIDR provides a text indexer, search engine, and Z39.50 communications stack to access databases. This is a versatile package that offers features such as field-specific searching, field maps, and Boolean operators. See http://vinca.cnidr.org/software/Isite/Isite.html for information on the current version.

The National Library of Canada has made their server code available, but it includes only the protocol engine -- you still have to integrate a search engine. Denmark's Index Data offers a version 3 protocol engine and API called YAZ; again, you must integrate your own search engine. There are also commercial search engines like OCLC's Newton that come with a Z39.50 protocol stack, but they tend to be fairly expensive for light-duty use.

WAIS is an extremely popular database system that speaks Z39.50, however Willow speaks Z39.50 version 2 (1995), with some version 3 enhancements. WAIS-0.2 and WAIS-0.3 speak z39.50 (1988), which is not compatible with version 2.

At the University of Washington, we are using currently using Willow to provide campus-wide access to a growing list of FirstSearch databases at OCLC (http://www.oclc.org/) and to the Zephyr database suite at RLG (http://www.rlg.org/toc.html). We are testing Z39.50 access to MEDLINE from servers at the National Library of Medicine. Most recently, we have begun testing with servers at SilverPlatter Information, who supply several hundred database titles worldwide.

Questions and comments about Willow to:
willow@cac.washington.edu