[119] in Athena_Backup_System
performance plan draft
daemon@ATHENA.MIT.EDU (Diane Delgado)
Thu Jul 13 12:49:07 1995
To: athena-backup@MIT.EDU
Date: Thu, 13 Jul 1995 12:48:58 EDT
From: Diane Delgado <delgado@MIT.EDU>
Here's a first very rough draft on our performance/configuration plan.
Please think about it and give comments. There are some gaps
where some of you have more expertise to fill in than I do.
There is also a copy in athena-backup/doc/admin/config-guide
-------------------------
Configuration and Performance Measuring Plan
This document outlines the issues and tasks involved in executing
performance measurements on the Master and Slave with the goal of
producing a set of recommneded system configurations.
MASTER
Assumptions:
Assume that the machine is dedicated for the purposes of
running the ABS master. This reduces the number of variables
we need to examine and also reduces the hardware demands on
the system.
Oracle Configuration
All configuration/performance testing will be done using the
same "basic" Oracle configuration which will be required on
the productions system to help increase reliability, recovery,
and availability.
Basic Config
ARCHIVELOG_MODE is enabled. This ensures that redo logs
are archived to some safe, stable storeage; they can then
be used during disaster recovery, along with the periodic
database backup, to restore the database state to the point in
time of the last committed transaction.
Redo Logs are mirrored with 1 copy.
Redo Logs are configured with 2 groups to allow the LGWR process
to immediatly switch to an available group while the other
group is being archived.
Control Files are mirrored with 1 copy.
SGA - should reside in wired-down memory
Basic Config Open Issues:
Archiving - what secondary storage medium are we archiving to?
How do we ensure the secondary storeage does not fill up
before the next dbms backup is initiated.
SGA - Will Solaris let us pin this down in memory so that it is
not swapped.
Minimum Disk Layout - from the required basic config, we need
to generate the basic layout for the logs and data
across different disks using a minimum configuration
which will ensure availability and recoverability.
Config Variations
The following parameters will be adjusted to determine their
optimal values for the anticipated access patterns to the ABS
database:
LOG_BUFFER - specifies the size of the Redo Log Buffer in
bytes, which resides in the SGA.
DB_BLOCK_BUFFERS - The size of the database block buffer cache
in number of buffers. Resides in SGA
DB_BLOCK_SIZE - The size of each db block cache buffer in bytes,
should be a multiple of the OS filesystem buffer block size.
SORT_AREA -
RETAINED_SIZE - minimum size of sort area
SIZE - maximum size of sort area.
Rollback Segment Parameters - We need to play around with
various scenarious to determine how many we should have
and how large each should be.
Unix System Configuration
Our target platform for the Master is Solaris, so all of this section
will be Solaris-specific. The system should have a minimum of
2 disk controllers and 64 meg of memory. We should also have
? number of disks available during this phase and some (how much?)
extra memory so we can assess the impact of varying numbers of disks and
memory on the performance of the system.
Need to identify the system (probably plover?).
In addition to hardware, we need to identify the minimum Athena
environment which needs to run on this system. There are a few
issues here:
AFS Client - It's not necessary for the proper functioning of
the ABS to run the AFS Client cache mgr. In fact, we probably
dont' want to do this since AFS cache mgr will compete for
precious memory resources with Oracle. AFS may also jeopordize
the stability of the host system.
Other AFS daemons - what's necesasry and what's not?
Zephyr - The abs needs to use zephyr to send messages, but
it does not need to receive any. What's the minimum zephyr
configuration.
Mail - The abs needs to be able to send mail; it does not
need to receive mail.
Any others?
Performance Measuring
Much of the following information has been assembled by studying
the various performance tuning guides provided for Oracle on Unix.
Devise Test Suite
We need to generate a batch of operations using the VRT tests
which we believe will reflect the "important" DBMS operations.
This test suite will be run multiple times on various configurations
and performance statistics will be gathered. The statistics
will be examined and we will select an optimal configuration
based on this information.
We also need to develop a "simulated workload" scenario which
should reflect the expected mix of operations on a typical
workday.
Identify System Configurations
Here we identify the system configurations of interest to be
used during the performance measurement.
?? TBD
Tune Application Data Access
Before we execute peformance measurements on the various system
configurations, the queuries in the application must be tuned
so they use resources optimally.
TKPROF/EXPLAIN PLAN - This is an Oracle tool used to gauge the
resources consumed by a query. We turn on this type of tracing
at the session level, run the test suite, and then examine the
results. The Oracle documentation provides sufficient information
to interpret the results and tune the SQL statements.
After the adjustments are made, the Test suite is re-run and
stats are gathered and examined. Further adjustments are
made as necessary.
Identify Statistics
After the application data access has been tuned we are ready
to examine the system configuration.
The following statistics should be gathered during the performance
monitoring for each run of the Test Suite.
SQLDBA Stats - These will be gathered using a simulated
workload rather than the test suite since they
need to be generated using the "expected" mix of operations,
which may not be the same as the selected Test Suite.
MONITOR I/O - this generates stats on buffer cache usage.
We can adjust the size of the Oracle buffer cache if the
stats indicate we are experiencing a low hit ratio.
REDO STATS - this generates stats on how many pending
write requests there are for the redo log buffer.
Ideally this should be 0, and we increase the size of
Redo buffer if we see values larger than 0.
Swapping - monitor swapping activity - swapouts should ideally
be zero.
i/o request queue - examine the average queue length for each
disk and % busy. Average queue requests greater than
2-4 imply a possible bottleneck. Solution may be to
separate some of the data files across multiple disks.
?? others ?? (there are many recommendations in the Oracle
for Unix Tuning Guide but I'm tired of reading
it for now - add these later).
Disaster Recovery -
Need to execute disaster recovery (restore from backup and
archivelogs) to determine the amount of time required
for various DBMS sizes and archive log sizes. We need to
determine how often the backups need to run to ensure that
a complete recovery can be executed within the required 4 hour time.
This will involve peforming a routine backup of the system.
Executing a day's worth of backup operations and terminate
the system ungracefully. We then attempt to restore the
ABS database using the backup copy and the archive logs.
Noting the size of the database and logs and the time required
to perform the restore. This also requires that we identify
the type of medium used for the archive log storeage (e.g.,
some high-speed, high-capacity tape, prefereably).
From this we should generate some guidelines which can be
used by the operations staff as they monitor the growth of
the database so that they can more accurately predict the
recovery time and adjust the dbms backup scheme accordingly.
Both dev and operations should particpate in the execution
of the disaster recovery to help ensure that several people are
skilled at this before the system goes into production.
Recommended Routine Monitoring
As an end result of this exercise, we should also be able to generate
a set of guidelines for routine monitoring of the production system
which should assist operations staff to keep the system running
efficiently. This information will be incorporated as part of
the ABS Administrators Guide.
SLAVES
Identify Configurations:
The ABS team recommends that we investigate the use of some high
performance configurations which will greatly increase the amount of data
which we can backup in a given day. Some of this work involves examining
new high speed, high capacity tape devices and how they perform
on the various unix systems of interest. After we have identified some
promising technologies and configurations, we would ideally like to
run the ABS Slaves on them and produce guidelines for recommended
configurations.
In addition, we will have to include as part of the performance
measuring, recommendations for the hardware configurations which are
currently in use for AFS backups, in the event that we are not able
to purchase more efficient hardware.
The following hardware configurations are of interest:
(Dave and Brian fill this in).
Athena Config
- Identify the minimum athena/software configuration for a Tape Slave.
??
Application Tuning
- Identify the optimal buffersize for each type of tape device to
ensure data writes are streamed efficiently.
?? what else
Identify Tests
The following tests will be run with each configuration:
- raw read/write - do we want to do this?
-backup a full tape's worth of volumes; note the amount of
data backed-up and how long this took.
- attempt to restore volumes which are at various locations
on the tape (including the end of the tape). Note how
long it takes to seek to the appropriate location and
begin the restore.
?? what else
Identify Performance Statisitcs
?? Figure out what we need to look at besides total time to backup
n kbytes of data.