Chapter 2: File Server Architecture

Section 2.1: Overview

The AFS File Server is a user-level process that presides over the raw disk partitions on which it supports one or more volumes. It provides 'half' of the fundamental service of the system, namely exporting and regimenting access to the user data entrusted to it. The Cache Manager provides the other half, acting on behalf of its human users to locate and access the files stored on the file server machines.
This chapter examines the structure of the File Server process. First, the set of AFS agents with which it must interact are discussed. Next, the threading structure of the server is examined. Some details of its handling of the race conditions created by the callback mechanism are then presented. This is followed by a discussion of the read-only volume synchronization mechanism. This functionality is used in each RPC interface call and intended to detect new releases of read-only volumes. File Servers do not generate callbacks for objects residing in read-only volumes, so this synchronization information is used to implement a 'whole-volume' callback. Finally, the fact that the File Server may drop certain information recorded about the Cache Managers with which it has communicated and yet guarantee correctness of operation is explored.

Section 2.2: Interactions

By far the most frequent partner in File Server interactions is the set of Cache Managers actively fetching and storing chunks of data files for which the File Server provides central storage facilities. The File Server also periodically probes the Cache Managers recorded in its tables with which it has recently dealt, determining if they are still active or whether their records might be garbage-collected.
There are two other server entities with which the File Server interacts, namely the Protection Server and the BOS Server. Given a fetch or store request generated by a Cache Manager, the File Server needs to determine if the caller is authorized to perform the given operation. An important step in this process is to determine what is referred to as the caller's Current Protection Subdomain, or CPS. A user's CPS is a list of principals, beginning with the user's internal identifier, followed by the the numerical identifiers for all groups to which the user belongs. Once this CPS information is determined, the File Server scans the ACL controlling access to the file system object in question. If it finds that the ACL contains an entry specifying a principal with the appropriate rights which also appears in the user's CPS, then the operation is cleared. Otherwise, it is rejected and a protection violation is reported to the Cache Manager for ultimate reflection back to the caller.
The BOS Server performs administrative operations on the File Server process. Thus, their interactions are quite one-sided, and always initiated by the BOS Server. The BOS Server does not utilize the File Server's RPC interface, but rather generates unix signals to achieve the desired effect.

Section 2.3: Threading

The File Server is organized as a multi-threaded server. Its threaded behavior within a single unix process is achieved by use of the LWP lightweight process facility, as described in detail in the companion "AFS-3 Programmer's Reference: Specification for the Rx Remote Procedure Call Facility" document. The various threads utilized by the File Server are described below:
  • WorkerLWP: This lightweight process sleeps until a request to execute one of the RPC interface functions arrives. It pulls the relevant information out of the request, including any incoming data delivered as part of the request, and then executes the server stub routine to carry out the operation. The thread finishes its current activation by feeding the return code and any output data back through the RPC channel back to the calling Cache Manager. The File Server initialization sequence specifies that at least three but no more than six of these WorkerLWP threads are to exist at any one time. It is currently not possible to configure the File Server process with a different number of WorkerLWP threads.
  • FiveMinuteCheckLWP: This thread runs every five minutes, performing such housekeeping chores as cleaning up timed-out callbacks, setting disk usage statistics, and executing the special handling required by certain AIX implementations. Generally, this thread performs activities that do not take unbounded time to accomplish and do not block the thread. If reassurance is required, FiveMinuteCheckLWP can also be told to print out a banner message to the machine's console every so often, stating that the File Server process is still running. This is not strictly necessary and an artifact from earlier versions, as the File Server's status is now easily accessible at any time through the BOS Server running on its machine.
  • HostCheckLWP: This thread, also activated every five minutes, performs periodic checking of the status of Cache Managers that have been previously contacted and thus appear in this File Server's internal tables. It generates RXAFSCB Probe() calls from the Cache Manager interface, and may find itself suspended for an arbitrary amount of time when it enounters unreachable Cache Managers.

Section 2.4: Callback Race Conditions

Callbacks serve to implement the efficient AFS cache consistency mechanism, as described in Section 1.1.1. Because of the asynchronous nature of callback generation and the multi-threaded operation and organization of both the File Server and Cache Manager, race conditions can arise in their use. As an example, consider the case of a client machine fetching a chunk of file X. The File Server thread activated to carry out the operation ships the contents of the chunk and the callback information over to the requesting Cache Manager. Before the corresponding Cache Manager thread involved in the exchange can be scheduled, another request arrives at the File Server, this time storing a modified image of the same chunk from file X. Another worker thread comes to life and completes processing of this second request, including execution of an RXAFSCB CallBack() to the Cache Manager who still hasn't picked up on the results of its fetch operation. If the Cache Manager blindly honors the RXAFSCB CallBack() operation first and then proceeds to process the fetch, it will wind up believing it has a callback on X when in reality it is out of sync with the central copy on the File Server. To resolve the above class of callback race condition, the Cache Manager effectively doublechecks the callback information received from File Server calls, making sure they haven't already been nullified by other file system activity.

Section 2.5: Read-Only Volume Synchronization

The File Server issues a callback for each file chunk it delivers from a read-write volume, thus allowing Cache Managers to efficiently synchronize their local caches with the authoritative File Server images. However, no callbacks are issued when data from read-only volumes is delivered to clients. Thus, it is possible for a new snapshot of the read-only volume to be propagated to the set of replication sites without Cache Managers becoming aware of the event and marking the appropriate chunks in their caches as stale. Although the Cache Manager refreshes its volume version information periodically (once an hour), there is still a window where a Cache Manager will fail to notice that it has outdated chunks.
The volume synchronization mechanism was defined to close this window, resulting in what is nearly a 'whole-volume' callback device for read-only volumes. Each File Server RPC interface function handling the transfer of file data is equipped with a parameter (a volSyncP), which carries this volume synchronization information. This parameter is set to a non-zero value by the File Server exclusively when the data being fetched is coming from a read-only volume. Although the struct AFSVolSync defined in Section 5.1.2.2 passed via a volSyncP consists of six longwords, only the first one is set. This leading longword carries the creation date of the read-only volume. The Cache Manager immediately compares the synchronization value stored in its cached volume information against the one just received. If they are identical, then the operation is free to complete, secure in the knowledge that all the information and files held from that volume are still current. A mismatch, though, indicates that every file chunk from this volume is potentially out of date, having come from a previous release of the read-only volume. In this case, the Cache Manager proceeds to mark every chunk from this volume as suspect. The next time the Cache Manager considers accessing any of these chunks, it first checks with the File Server it came from which the chunks were obtained to see if they are up to date.

Section 2.6: Disposal of Cache Manager Records

Every File Server, when first starting up, will, by default, allocate enough space to record 20,000 callback promises (see Section 5.3 for how to override this default). Should the File Server fully populate its callback records, it will not allocate more, allowing its memory image to possibly grow in an unbounded fashion. Rather, the File Server chooses to break callbacks until it acquires a free record. All reachable Cache Managers respond by marking their cache entries appropriately, preserving the consistency guarantee. In fact, a File Server may arbitrarily and unilaterally purge itself of all records associated with a particular Cache Manager. Such actions will reduce its performance (forcing these Cache Managers to revalidate items cached from that File Server) without sacrificing correctness.