[270] in Athena_Backup_System
comments on crippled mode spec
daemon@ATHENA.MIT.EDU (delgado@MIT.EDU)
Fri Jun 21 10:12:11 1996
From: delgado@MIT.EDU
To: wdc@MIT.EDU
Cc: athena-backup@MIT.EDU
Date: Fri, 21 Jun 1996 10:12:03 EDT
Section: Intro
>Deciding to allow so much complexity in the implementation
>of critical services like backup and restore involves risk.
I'd like to see this re-worded. This has the interpretation
that the ABS vision was very simplistic and that the team
condoned the addition of complexity to the system. In reality
the ABS vision and requirements have imposed this so called
complexity into the system. (Remember WE wanted scalability,
reliability, flexibility, efficient and easy data retrieval,
almost completely automated system with configurable error
routing, and lots of built-in recovery.)
....>prolonged database or Master outage. Crippled Mode does this.
Change "Crippled Mode does this." to something like
"Crippled Mode was concieved to address these concerns."
....>services stay up in the face of catastrophic failures of
Change "stay up" to "are available"
Subsection: 4 components
I can somewhat understand why we are trying to use generic names
here, but I think this section would be more succinct if
use RPC termonolgy and ABS interface names.
>a way to convert commands entered at the UI into read
>and write operations to be performed on the tape.
I am still confused about what this really is. Is this
supposed to be a method which accepts user input for
backup/restore requests and issues the appropriate ABSSLAVE
interface calls to instruct the slave to execute the request.
>a way to gather status reports from the program that reads
>and writes tapes and properly act on them -- updating
>status records, or passing messages or requests for operator
>action to the user interface.
I would just use the proper RPC terminology to shorten this;
e.g., "a program which exports the ABS_SLAVESERVICE interface
to handle slave service requests".
>In Crippled Mode the standard tape slave will be used to
>read and write the tapes. A special Crippled Mode master will
>talk to the tape slaves using the ABS\_SLAVE and ABS\_SLAVEINT
>remote procedure call interfaces. The Crippled Mode
>master will contain the user interface.
Change "talk to" to "communicates with"
Change ABS_SLAVEINT to ABS_SLAVESERVICE.
Section: Media Slave
Para. 1 - SLAVEINT => SLAVESERVICE
I would add a note that the slave reconfiguration on change of
master can occur without disrupting the normal operation
of the slave process (i.e., you do not have to restart
the slave process.)
Section: Crippled Mode Functions
>The operator is expected to know which tape is relevant and
>to mount it on the appropriate drive when prompted.
>The procedure for determining the proper tape in a dump
>or restore is described below in the Crippled Mode Procedures
>section.
It should also be mentioned that unlike the ABS Master, crippled
mode will not automatically choose a slave. The admin/operator
will be required to specify a slave host and device.
Need to specify how the crippled mode process finds the slave's port.
Will it support the ability to cancel executing jobs?
Section: Entering Crippled Mode
>authentication key, it will be necessary to reconfigure
>and restart
>the media slaves allow connections from the new host/key.
No, we don't have to restart slaves if the master changes.
We change their slave.conf file and send a HUP to them to
get them to re-read their config file. This allows
slaves that are already dumping/restoring to switch over to the
crippled mode master without interruption.
Section: Dump
>In the case of a partition dump, the Crippled Mode Master
>is responsible for obtaining a list of all volumes from the
>vile server volume location service.
Yes it is "vile" but I think it should be "file".
Add after step 4: Query the vldb or file server for the list
of volumes.
Tape validation protocol:
First step: The Crippled master could get a "abs_validate_tape"
request from as slave if a labeled medium is inserted in the
drive. In this case the Crippled master should report
the tape's state as "ABS_UNKNOWN_MEDIA". This will cause the slave
to reject the tape and request another. (Validation simply
returns the tape's state to the slave)
In the case of a blank tape, the slave does not issue an
"abs_validate_tape" request. it will only issue the
"abs_gen_label" request.
"abs_job_status" handling. We shouldn't be logging these
because they are only sent when the slave begins the dump
of a volume. We can display the info in the ui for
tracking purposes. The reason we don't log want to
log these is that it's possible the tape might subsequently
experience media errors or validation failure, etc.
We log the the volume/tape mappings only at job_done time or
at tape checkpoint time. It is only at this time that we are
assued that the tape is good.
Section: ABS_SLAVESERVICE description
All these calls expect a job id. Does the crippled mode assign
job id's, etc. like the Master does. If so, we need to describe
how/what it does. If not, we need to mention that job id's are
meaningless.