Chapter 4: Volume Server Architecture

Section 4.1: Introduction

The Volume Server allows administrative tasks and probes to be performed on the set of AFS volumes residing on the machine on which it is running. As described in Chapter 2, a distributed database holding volume location info, the VLDB, is used by client applications to locate these volumes. Volume Server functions are typically invoked either directly from authorized users via the vos utility or by the AFS backup system.
This chapter briefly discusses various aspects of the Volume Server's architecture. First, the high-level on-disk representation of volumes is covered. Then, the transactions used in conjuction with volume operations are examined. Then, the program implementing the Volume Server, volserver, is considered. The nature and format of the log file kept by the Volume Server rounds out the description. As with all AFS servers, the Volume Server uses the Rx remote procedure call package for communication with its clients.

Section 4.2: Disk Representation

For each volume on an AFS partition, there exists a file visible in the unix name space which describes the contents of that volume. By convention, each of these files is named by concatenating a prefix string, "V", the numerical volume ID, and the postfix string ".vol". Thus, file V0536870918.vol describes the volume whose numerical ID is 0536870918. Internally, each per-volume descriptor file has such fields as a version number, the numerical volume ID, and the numerical parent ID (useful for read-only or backup volumes). It also has a list of related inodes, namely files which are not visible from the unix name space (i.e., they do not appear as entries in any unix directory object). The set of important related inodes are:
  • Volume info inode: This field identifies the inode which hosts the on-disk representation of the volume's header. It is very similar to the information pointed to by the volume field of the struct volser trans defined in Section 5.4.1, recording important status information for the volume.
  • Large vnode index inode: This field identifies the inode which holds the list of vnode identifiers for all directory objects residing within the volume. These are "large" since they must also hold the Access Control List (ACL) information for the given AFS directory.
  • Small vnode index inode: This field identifies the inode which holds the list of vnode identifiers for all non-directory objects hosted by the volume.
All of the actual files and directories residing within an AFS volume, as identified by the contents of the large and small vnode index inodes, are also free-floating inodes, not appearing in the conventional unix name space. This is the reason the vendor-supplied fsck program should not be run on partitions containing AFS volumes. Since the inodes making up AFS files and directories, as well as the inodes serving as volume indices for them, are not mapped to any directory, the standard fsck program would throw away all of these "unreferenced" inodes. Thus, a special version of fsck is provided that recognizes partitions containing AFS volumes as well as standard unix partitions.

Section 4.3: Transactions

Each individual volume operation is carried out by the Volume Server as a transaction, but not in the atomic sense of the word. Logically, creating a Volume Server transaction can be equated with performing an "exclusive open" on the given volume before beginning the actual work of the desired volume operation. No other Volume Server (or File Server) operation is allowed on the opened volume until the transaction is terminated. Thus, transactions in the context of the Volume Server serve to provide mutual exclusion without any of the normal atomicity guarantees. Volumes maintain enough internal state to enable recovery from interrupted or failed operations via use of the salvager program. Whenever volume inconsistencies are detected, this salvager program is run, which then attempts to correct the problem.
Volume transactions have timeouts associated with them. This guarantees that the death of the agent performing a given volume operation cannot result in the volume being permanently removed from circulation. There are actually two timeout periods defined for a volume transaction. The first is the warning time, defined to be 5 minutes. If a transaction lasts for more than this time period without making progress, the Volume Server prints a warning message to its log file (see Section 4.5). The second time value associated with a volume transaction is the hard timeout, defined to occur 10 minutes after any progress has been made on the given operation. After this period, the transaction will be unconditionally deleted, and the volume freed for any other operations. Transactions are reference-counted. Progress will be deemed to have occurred for a transaction, and its internal timeclock field will be updated, when:
  • 1 The transaction is first created.
  • 2 A reference is made to the transaction, causing the Volume Server to look it up in its internal tables.
  • 3 The transaction's reference count is decremented.

Section 4.4: The volserver Process

The volserver user-level program is run on every AFS server machine, and implements the Volume Server agent. It is responsible for providing the Volume Server interface as defined by the volint.xg Rxgen file.
The volserver process defines and launches five threads to perform the bulk of its duties. One thread implements a background daemon whose job it is to garbage-collect timed-out transaction structures. The other four threads are RPC interface listeners, primed to accept remote procedure calls and thus perform the defined set of volume operations.
Certain non-standard configuration settings are made for the RPC subsystem by the volserver program. For example, it chooses to extend the length of time that an Rx connection may remain idle from the default 12 seconds to 120 seconds. The reasoning here is that certain volume operations may take longer than 12 seconds of processing time on the server, and thus the default setting for the connection timeout value would incorrectly terminate an RPC when in fact it was proceeding normally and correctly.
The volserver program takes a single, optional command line argument. If a positive integer value is provided on the command line, then it shall be used to set the debugging level within the Volume Server. By default, a value of zero is used, specifying that no special debugging output will be generated and fed to the Volume Server log file described below.

Section 4.5: Log File

The Volume Server keeps a log file, recording the set of events of special interest it has encountered. The file is named VolserLog, and is stored in the /usr/afs/logs directory on the local disk of the server machine on which the Volume Server runs. This is a human-readable file, with every entry time-stamped.
Whenever the volserver program restarts, it renames the current VolserLog file to VolserLog.old, and starts up a fresh log. A properly-authorized individual can easily inspect the log file residing on any given server machine. This is made possible by the BOS Server AFS agent running on the machine, which allows the contents of this file to be fetched and displayed on the caller's machine via the bos getlog command.
An excerpt from a Volume Server log file follows below. The numbers appearing in square brackets at the beginning of each line have been inserted so that we may reference the individual lines of the log excerpt in the following paragraph.
[1] Wed May 8 06:03:00 1991 AttachVolume: Error attaching volume
/vicepd/V1969547815.vol; volume needs salvage 
[2] Wed May 8 06:03:01 1991 Volser: ListVolumes: Could not attach volume
1969547815 
[3] Wed May 8 07:36:13 1991 Volser: Clone: Cloning volume 1969541499 to new
volume 1969541501 
[4] Wed May 8 11:25:05 1991 AttachVolume: Cannot read volume header
/vicepd/V1969547415.vol 
[5] Wed May 8 11:25:06 1991 Volser: CreateVolume: volume 1969547415
(bld.dce.s3.dv.pmax_ul3) created 
Line [1] indicates that the volume whose numerical ID is 1969547815 could not be attached on partition /vicepd. This error is probably the result of an aborted transaction which left the volume in an inconsistent state, or by actual damage to the volume structure or data. In this case, the Volume Server recommends that the salvager program be run on this volume to restore its integrity. Line [2] records the operation which revealed this situation, namely the invocation of an AFSVolListVolumes() RPC.
Line [4] reveals that the volume header file for a specific volume could not be read. Line [5], as with line [2] in the above paragraph, indicates why this is true. Someone had called the AFSVolCreateVolume() interface function, and as a precaution, the Volume Server first checked to see if such a volume was already present by attempting to read its header.
Thus verifying that the volume did not previously exist, the Volume Server allowed the AFSVolCreateVolume() call to continue its processing, creating and initializing the proper volume file, V1969547415.vol, and the associated header and index inodes.