[274] in Athena_Backup_System
Re: comments on crippled mode spec (diffs at the end)
daemon@ATHENA.MIT.EDU (Bill Cattey)
Wed Jun 26 00:56:03 1996
Date: Wed, 26 Jun 1996 00:55:54 -0400 (EDT)
From: Bill Cattey <wdc@MIT.EDU>
To: delgado@MIT.EDU
Cc: athena-backup@MIT.EDU
In-Reply-To: <9606211412.AA00955@ops-5.MIT.EDU>
I changed the Crippled mode spec to reflect your comments.
Note: I've change ABS_SLAVEINT to ABS_SLAVESERVICE, although this
terminology still occurs in the abs_master.tex and slave.tex documents.
ABS_SLAVEINT Actions is renamed to ABS_SLAVESERVICE actions.
The abs_master section called ABS_SLAVEINT Interface Definition should
probably change to ABS_SLAVESERVICE Interface Definition
(Or did I misunderstand what the correct and consistent naming should be?)
One comment:
Section: Dump
...
Add after step 4: Query the vldb or file server for the list
of volumes.
I did not incorporate because the steps are operator actions.
The narrative before and after tells what the ABS Crippled Mode
Master does. If those sections are still not explicit enough, I'll change
them. The currently say that the Crippled Mode Master queries
the file server.
Double check and see if I've got the tape validation protocol right this time.
I have added a thread that tells how the Crippled Mode Master does
job abort and job ID handling.
I added a few lines to the previously empty Crippled Mode Dump log
format section, but I really need help here specifying exactly what it
should
look like.
I've done up two diagrams one showing a normal mode Master with Media
Slaves and one showing the Crippled Mode Master with Media Slaves. I
hope these boxes and arrows will put a final end to our confusion. I'll tune
up these diagrams with Diane and integrate them appropriately into the
spec.
-wdc
Here are context diffs of the new crippled.tex for those who would
rather not look at the whole doc again.
*** /tmp/,RCSt1020951 Wed Jun 26 00:54:02 1996
--- crippled.tex Wed Jun 26 00:47:53 1996
***************
*** 1,5 ****
\documentstyle[11pt,fullpage,epsf,titlepage]{article}
! % $Header: /afs/athena.mit.edu/astaff/project/athena-backup/doc/draft/RCS/crippled.tex,v 1.2 96/06/26 00:53:04 wdc Exp Locker: wdc $
\begin{document}
\title{The Athena Backup System Crippled Mode Specification}
\author{Bill Cattey and Jonathon Weiss}
--- 1,5 ----
\documentstyle[11pt,fullpage,epsf,titlepage]{article}
! % $Header: /afs/athena.mit.edu/astaff/project/athena-backup/doc/draft/RCS/cripple.tex,v 1.1 96/06/11 13:27:30 wdc Exp Locker: wdc $
\begin{document}
\title{The Athena Backup System Crippled Mode Specification}
\author{Bill Cattey and Jonathon Weiss}
***************
*** 10,21 ****
\section{Introduction}
The Athena Backup System (ABS) was designed to provide a high
! level of functionality. The Master is a complex component
! capable of controlling many Slaves. The design decision
! was made to layer the Master on a relational database
! to provide a conceptually simple, but powerful and
! extensible back end for present and future functionality.
Deciding to allow so much complexity in the implementation
of critical services like backup and restore involves risk.
--- 10,27 ----
\section{Introduction}
The Athena Backup System (ABS) was designed to provide a high
! level of functionality. The goal is to build
! a new backup system with scalability,
! reliability, flexibility, efficient and easy data retrieval,
! almost completely automated system with configurable error
! routing, and lots of built-in recovery.
+ To meet the goal, a system with multiple media slave servers
+ controlled by a central master server and informed by a
+ relational database was designed. Layering the Master on
+ a relational database provides a conceptually simple, but
+ powerful and extensible back end for present and future
+ functionality.
Deciding to allow so much complexity in the implementation
of critical services like backup and restore involves risk.
***************
*** 32,74 ****
the steps to take to prevent and recover from outages, it
was agreed that additional effort should be taken to ensure
the availability of key ABS services in the event of a
! prolonged database or Master outage. Crippled Mode does this.
Crippled Mode
provides the ability to dump and restore with the same level
of effort that the pre-ABS backup procedures required.
The Crippled Mode system and procedures dovetail into
! ABS: Dump logs made in Crippled Mode are merged
back into the ABS database when it comes back up.
The value-added functionality of keeping dump status and
schedules in a relational database is preserved. Critical
! services stay up in the face of catastrophic failures of
! complex service components.
! In order to perform dumps and restores, even in Crippled Mode,
! four things are needed:
! \begin {itemize}
! \item
! a program to read and write the tapes.
! \item
! a user interface.
! \item
! a way to convert commands entered at the UI into read
! and write operations to be performed on the tape.
! \item
! a way to gather status reports from the program that reads
! and writes tapes and properly act on them -- updating
! status records, or passing messages or requests for operator
! action to the user interface.
! \end {itemize}
!
! In Crippled Mode the standard tape slave will be used to
read and write the tapes. A special Crippled Mode master will
! talk to the tape slaves using the ABS\_SLAVE and ABS\_SLAVEINT
remote procedure call interfaces. The Crippled Mode
! master will contain the user interface.
\section{Media Slave}
--- 38,77 ----
the steps to take to prevent and recover from outages, it
was agreed that additional effort should be taken to ensure
the availability of key ABS services in the event of a
! prolonged database or Master outage. Crippled Mode
! addresses these concerns.
Crippled Mode
provides the ability to dump and restore with the same level
of effort that the pre-ABS backup procedures required.
The Crippled Mode system and procedures dovetail into
! ABS: dump logs made in Crippled Mode are merged
back into the ABS database when it comes back up.
The value-added functionality of keeping dump status and
schedules in a relational database is preserved. Critical
! services remain available in the face of catastrophic
! failures of complex service components.
+ In normal operation of ABS, the relational database
+ keeps track of all volumes,
+ (including their full and incremental dump dates),
+ all tapes, all dump schedules, and allows
+ great flexibility in specifying dumps and restores.
! In crippled mode, the relational database is offline.
! Locating the tape relevant to a restore of a particular
! dump date of a particular volume, or partition is a
! manual process. Scheduling of dumps is radically simplified:
! the operator specifies that the dump of a particular
! volume, or all the volumes in a partition, or all the
! partitions of a file server begins {\em now}.
! In Crippled Mode the standard tape slave is used to
read and write the tapes. A special Crippled Mode master will
! communicate with the tape slaves using the ABS\_SLAVE and
! ABS\_SLAVESERVICE
remote procedure call interfaces. The Crippled Mode
! Master will contain the user interface.
\section{Media Slave}
***************
*** 75,95 ****
The normal media slave will handle all of the I/O to
backup media during operation of the system in Crippled
Mode. Since this is its normal job, no additional
! development will have to be done to this part of the
system. As during normal operations, the media slave
exports the ABS\_SLAVE interface in order to receive
jobs. The media slave will return the results of those
! jobs to the ABS\_SLAVEINT interface exported by the
! Crippled Mode master. If the Crippled Mode master is
! run on a machine other than the one that normally
! runs the master, and hence with a different
! authentication key, it will be necessary to reconfigure
! the media slave allow connections from the new host/key.
\section{User Interface}
The user interface will be integral to the Crippled
! Mode master. When the Crippled Mode master is started,
the operator will be presented with a shell-like
interface.
--- 78,101 ----
The normal media slave will handle all of the I/O to
backup media during operation of the system in Crippled
Mode. Since this is its normal job, no additional
! development will have is needed for this part of the
system. As during normal operations, the media slave
exports the ABS\_SLAVE interface in order to receive
jobs. The media slave will return the results of those
! jobs to the ABS\_SLAVESERVICE interface exported by the
! Crippled Mode Master.
+ All calls made on the ABS\_SLAVE interface use an RPC
+ timeout value of 0. This causes the RPC to queue the
+ request with the Media Slave and return rather than
+ waiting for the call to complete. The ABS\_SLAVESERVICE
+ requests documented here are used to return status
+ information from the Media Slaves.
+
\section{User Interface}
The user interface will be integral to the Crippled
! Mode master. When the Crippled Mode Master is started,
the operator will be presented with a shell-like
interface.
***************
*** 140,146 ****
\item
Determine which machine will run the Crippled Mode Master.
\item
! Reconfigure and/or restart the media slaves as necessary.
(See below.)
\item
Verify that the normal Master is indeed shut down.
--- 146,152 ----
\item
Determine which machine will run the Crippled Mode Master.
\item
! Reconfigure and signal the media slaves as necessary.
(See below.)
\item
Verify that the normal Master is indeed shut down.
***************
*** 148,161 ****
Start the Crippled Mode Master.
\end {enumerate}
! If the Crippled Mode master is
run on a machine other than the one that normally
runs the master, and hence with a different
authentication key, it will be necessary to reconfigure
- and restart
the media slaves allow connections from the new host/key.
\subsection{Performing a Dump}
When this procedure is followed, the Crippled Mode
--- 154,177 ----
Start the Crippled Mode Master.
\end {enumerate}
! If the Crippled Mode Master is
run on a machine other than the one that normally
runs the master, and hence with a different
authentication key, it will be necessary to reconfigure
the media slaves allow connections from the new host/key.
+ The Media Slave will re-read its configuration file
+ on receipt of a SIGHUP signal. Slaves that are already
+ dumping or restoring will switch over to the
+ Crippled Mode Master gracefully without interruption.
+ The Crippled Mode Master will, just as the normal mode
+ master maintain slave address and port information
+ in a file and as returned by
+ {\it abs\_register\_device}. The slaves will send this
+ call to the master when they notice that the Crippled
+ Mode master has come online.
+
\subsection{Performing a Dump}
When this procedure is followed, the Crippled Mode
***************
*** 165,171 ****
the operator for action to mount tapes.
When a tape dump is complete, the new tape has been written
! and the Crippled mode Dump Log contains entries naming
relevant volumes and their dump dates.
In the case of a full server dump, the Crippled Mode Master
--- 181,187 ----
the operator for action to mount tapes.
When a tape dump is complete, the new tape has been written
! and the Crippled Mode Dump Log contains entries naming
relevant volumes and their dump dates.
In the case of a full server dump, the Crippled Mode Master
***************
*** 174,180 ****
In the case of a partition dump, the Crippled Mode Master
is responsible for obtaining a list of all volumes from the
! vile server volume location service.
\begin {enumerate}
\item
--- 190,196 ----
In the case of a partition dump, the Crippled Mode Master
is responsible for obtaining a list of all volumes from the
! file server volume location service.
\begin {enumerate}
\item
***************
*** 183,188 ****
--- 199,209 ----
\item
Issue the server dump, partition dump, or volume dump command.
\item
+ Mount the proper tape. (See below.)
+ \item
+ Respond to the user interface prompt to specify the Media
+ Slave to be used for the tape operation.
+ \item
Respond to the prompts for the server name, and/or partition
name, and/or volume name as necessary.
\item
***************
*** 189,196 ****
In the case of AFS, respond to the prompt and choose either
read/write or clone volume dumps.
\item
- Mount the proper tape. (See below.)
- \item
If there are more dumps to perform, return to step 1 above
otherwise, proceed as with normal ABS operation to wait
for job status.
--- 210,215 ----
***************
*** 206,214 ****
\begin {enumerate}
\item
! The Crippled Mode Master
! will expect a {\it validate\_media\_label} with {\it ABS\_NO\_LABEL}
! from the slave.
\item
The Crippled Mode Master will expect an {\it abs\_gen\_label}
from the slave.
--- 225,235 ----
\begin {enumerate}
\item
! The Crippled Mode Master may receive an
! {\it abs\_validate\_tape} request from the slave.
! In this case the Crippled master should report
! the tape's state as \{it ABS\_UNKNOWN\_MEDIA}. This will
! cause the slave to reject the tape and request another.
\item
The Crippled Mode Master will expect an {\it abs\_gen\_label}
from the slave.
***************
*** 220,226 ****
\end {enumerate}
If this sequence completes successfully, the {\it abs\_backup}
! operation is initiated.
The Crippled Mode Master marks the
tape as {\it RESERVED} when, in the expected course of
the backup operation, the slave
--- 241,248 ----
\end {enumerate}
If this sequence completes successfully, the {\it abs\_backup}
! operation is initiated. The Crippled Mode Master will
! assign the job an ID, to facilitate tracking the job.
The Crippled Mode Master marks the
tape as {\it RESERVED} when, in the expected course of
the backup operation, the slave
***************
*** 227,240 ****
sends {\it validate\_label} to the Crippled Mode
Master.
! When the dump completes, the {\it abs\_job\_status} call from
the slave will cause the Crippled Mode Master to write out
! appropriate dump log entries for every volume successfully dumped.
The tape label will be removed from the Crippled Mode Master's
active file which has the effect of calling the tape {\it VALID}
(i.e. not to be overwritten) for all future operations.
! Later, when the dump log is merged back
into the database, the operator will be expected to tell
what name was written physically on the tape so that the
Master will properly associate the internal and external
--- 249,267 ----
sends {\it validate\_label} to the Crippled Mode
Master.
! Proper Dump Log entries will be created by the Crippled
! Mode Master from
! {\it abs\_dump\_checkpoint} messages received from
! the slave in the course of performing the dump.
!
! When the dump completes, the {\it abs\_job\_done} call from
the slave will cause the Crippled Mode Master to write out
! appropriate Dump Log entries for every volume successfully dumped.
The tape label will be removed from the Crippled Mode Master's
active file which has the effect of calling the tape {\it VALID}
(i.e. not to be overwritten) for all future operations.
! Later, when the Dump Log is merged back
into the database, the operator will be expected to tell
what name was written physically on the tape so that the
Master will properly associate the internal and external
***************
*** 259,264 ****
--- 286,294 ----
\item
Issue the partition restore or volume restore command.
\item
+ Respond to the user interface prompt to specify the Media
+ Slave to be used for the tape operation.
+ \item
Respond to the prompt for the location of the restored
partition or volume.
\item
***************
*** 269,274 ****
--- 299,308 ----
for job status.
\end {enumerate}
+ The Crippled Mode Master will initiate the {it abs_restore}
+ operation and
+ assign the job an ID to facilitate tracking the job.
+
Finding the proper tape for a restore means reverting
to the old process of using grep or some utility to
search an ASCII file kept by the operators. The ABS
***************
*** 283,290 ****
relevant tape for a restore.
{\bf NOTE:} To make searching easier, the Crippled Mode
! Log and the ABS volume report have the same format.
\subsection{Returning from Crippled Mode to Normal Mode}
When this procedure is complete, the Media Slaves are
--- 317,352 ----
relevant tape for a restore.
{\bf NOTE:} To make searching easier, the Crippled Mode
! Dump Log and the ABS volume report have the same format.
+ \subsection{Aborting a Dump or Restore}
+
+ When a Media Slave notices the startup of a Master
+ (either normal or Crippled Mode) it sends information
+ about jobs that were in progress after the Master went
+ down. The Crippled Mode Master uses this information
+ to create a proper Dump Log entry, and records job ID's
+ of jobs that it did not initiate so that the operator
+ can still take control of those jobs and abort them if
+ necessary.
+
+ \begin {enumerate}
+ \item
+ If the operator does not know the ID of the job, issue
+ the list jobs command to find out the ID's of jobs running
+ on the Media Slaves.
+ \item
+ Issue the job abort command specifying the job ID.
+ \end {enumerate}
+
+ The Crippled Mode Master will send the {it abs\_abort\_job}
+ request, and at the next opportunity, the affected
+ Media Slave will abort the job and signal that it has
+ done so.
+
+ Proper Crippled Mode log entries will be output with the
+ job
+
\subsection{Returning from Crippled Mode to Normal Mode}
When this procedure is complete, the Media Slaves are
***************
*** 306,315 ****
the Crippled Mode, reconfigure and restart them to
run under the normal mode Master.
\item
! Locate the Crippled Mode dump logs.
\item
Use the Crippled Mode Merge command to merge the
! Crippled Mode dump logs into the Master Database.
\item
Resume normal ABS operations.
\end {enumerate}
--- 368,377 ----
the Crippled Mode, reconfigure and restart them to
run under the normal mode Master.
\item
! Locate the Crippled Mode Dump Logs.
\item
Use the Crippled Mode Merge command to merge the
! Crippled Mode Dump Logs into the Master Database.
\item
Resume normal ABS operations.
\end {enumerate}
***************
*** 316,330 ****
\section{File Format for ABS Volume Report / Crippled Mode Log}
! \section{ABS\_SLAVEINT Actions}
In the same way that the normal mode ABS Master responds
to Remote Procedure Calls from the Slave, so too the
Crippled Mode Master responds.
! This section is based on the ABS\_SLAVEINT Interface
Definition section from the ABS Master Specification.
These routines are called exclusively by the Media Slave when it has
--- 378,417 ----
\section{File Format for ABS Volume Report / Crippled Mode Log}
+ In order to perform a crippled mode restore, the following
+ information is needed:
! \begin{itemize}
! \item
! The name of the volume
! \item
! The date of the dump
! \item
! The tape the dump is on
! \item
! The increment date (ie the time since when files changed for
! an incremental dump, or 0 if this is a full dump.)
! \item
! The server name that houses the volume.
! \item
! The partition that houses the volume.
! \end{itemize}
+ \section{ABS\_SLAVESERVICE Actions}
+
In the same way that the normal mode ABS Master responds
to Remote Procedure Calls from the Slave, so too the
Crippled Mode Master responds.
! All calls made on the ABS\_SLAVE interface use an RPC
! timeout value of 0. This causes the RPC to queue the
! request with the Media Slave and return rather than
! waiting for the call to complete. The ABS\_SLAVESERVICE
! requests documented here are used to return status
! information from the Media Slaves.
!
! This section is based on the ABS\_SLAVESERVICE Interface
Definition section from the ABS Master Specification.
These routines are called exclusively by the Media Slave when it has
***************
*** 643,649 ****
This is to prevent
tapes with valid data from being inadvertently overwritten.
This call will return the status of the medium from the viewpoint of
! the Crippled Mode master.
A medium is considered useable for write by a slave under the following
circumstances
--- 730,736 ----
This is to prevent
tapes with valid data from being inadvertently overwritten.
This call will return the status of the medium from the viewpoint of
! the Crippled Mode Master.
A medium is considered useable for write by a slave under the following
circumstances
***************
*** 656,662 ****
\end{itemize}
! The Crippled Mode master does not have a database, but it does
maintain a file of the tapes it is writing dumps to. When a Crippled
Mode Dump is initiated, the tape Slave must have a tape that has
NO ABS Format label.
--- 743,749 ----
\end{itemize}
! The Crippled Mode Master does not have a database, but it does
maintain a file of the tapes it is writing dumps to. When a Crippled
Mode Dump is initiated, the tape Slave must have a tape that has
NO ABS Format label.