[119] in Athena_Backup_System

home help back first fref pref prev next nref lref last post

performance plan draft

daemon@ATHENA.MIT.EDU (Diane Delgado)
Thu Jul 13 12:49:07 1995

To: athena-backup@MIT.EDU
Date: Thu, 13 Jul 1995 12:48:58 EDT
From: Diane Delgado <delgado@MIT.EDU>


Here's a first very rough draft on our performance/configuration plan.
Please think about it and give comments.  There are some gaps
where some of you have more expertise to fill in than I do.

There is also a copy in athena-backup/doc/admin/config-guide


-------------------------

Configuration and Performance Measuring Plan

This document outlines the issues and tasks involved in executing
performance measurements on the Master and Slave with the goal of
producing a set of recommneded system configurations.



                               MASTER
Assumptions:

    Assume that the machine is dedicated for the purposes of
    running the ABS master.  This reduces the number of variables
    we need to examine and also reduces the hardware demands on
    the system.



Oracle Configuration

    All configuration/performance testing will be done using the
    same "basic" Oracle configuration which will be required on
    the productions system to help increase reliability, recovery,
    and availability.


    Basic Config

        ARCHIVELOG_MODE is enabled.  This ensures that redo logs
        are archived to some safe, stable storeage;  they can then
        be used during disaster recovery, along with the periodic
        database backup, to restore the database state to the point in 
        time of the last committed transaction.

        Redo Logs are mirrored with 1 copy.


        Redo Logs are configured with 2 groups to allow the LGWR process
        to immediatly switch to an available group while the other
        group is being archived.

        Control Files are mirrored with 1 copy.

        SGA - should reside in wired-down memory

    Basic Config Open Issues:

        Archiving - what secondary storage medium are we archiving to?
              How do we ensure the secondary storeage does not fill up
              before the next dbms backup is initiated.

        SGA - Will Solaris let us pin this down in memory so that it is
              not swapped.

        Minimum Disk Layout - from the required basic config, we need
              to generate the basic layout for the logs and data
              across different disks using a minimum configuration
              which will ensure availability and recoverability.


    Config Variations
    
      The following parameters will be adjusted to determine their
    optimal values for the anticipated access patterns to the ABS
    database:

        LOG_BUFFER - specifies the size of the Redo Log Buffer in
           bytes, which resides in the SGA.

       
        DB_BLOCK_BUFFERS - The size of the database block buffer cache
           in number of buffers.  Resides in SGA
        DB_BLOCK_SIZE - The size of each db block cache buffer in bytes,
           should be a multiple of the OS filesystem buffer block size.
           

        SORT_AREA -
              RETAINED_SIZE - minimum size of sort area
              SIZE - maximum size of sort area.

        Rollback Segment Parameters - We need to play around with
              various scenarious to determine how many we should have
              and how large each should be.


Unix System Configuration

   Our target platform for the Master is Solaris, so all of this section
   will be Solaris-specific.  The system should have a minimum of
   2 disk controllers and 64 meg of memory.  We should also have
   ? number of disks available during this phase and some (how much?)
   extra memory so we can assess the impact of varying numbers of disks and 
   memory on the performance of the system.

   Need to identify the system (probably plover?).


   In addition to hardware, we need to identify the minimum Athena
   environment which needs to run on this system.  There are a few
   issues here:

      AFS Client - It's not necessary for the proper functioning of
       the ABS to run the AFS Client cache mgr.  In fact, we probably
       dont' want to do this since AFS cache mgr will compete for
       precious memory resources with Oracle.  AFS may also jeopordize
       the stability of the host system.
 

      Other AFS daemons - what's necesasry and what's not?

      Zephyr - The abs needs to use zephyr to send messages, but
      it does not need to receive any.  What's the minimum zephyr
      configuration.

      Mail - The abs needs to be able to send mail; it does not
      need to receive mail.

      Any others?


Performance Measuring

   Much of the following information has been assembled by studying
   the various performance tuning guides provided for Oracle on Unix.


    Devise Test Suite
      
        We need to generate a batch of operations using the VRT tests
        which we believe will reflect the "important" DBMS operations.
        This test suite will be run multiple times on various configurations
        and performance statistics will be gathered.  The statistics
        will be examined and we will select an optimal configuration
        based on this information.

        We also need to develop a "simulated workload" scenario which
        should reflect the expected mix of operations on a typical
        workday.

    Identify System Configurations
       
        Here we identify the system configurations of interest to be
        used during the performance measurement.

        ?? TBD

    Tune Application Data Access
     
        Before we execute peformance measurements on the various system
        configurations, the queuries in the application must be tuned
        so they use resources optimally.

        TKPROF/EXPLAIN PLAN - This is an Oracle tool used to gauge the
        resources consumed by a query.  We turn on this type of tracing
        at the session level, run the test suite, and then examine the
        results.  The Oracle documentation provides sufficient information
        to interpret the results and tune the SQL statements.

        After the adjustments are made, the Test suite is re-run and
        stats are gathered and examined.  Further adjustments are
        made as necessary.

    Identify Statistics

        After the application data access has been tuned we are ready
        to examine the system configuration.

        The following statistics should be gathered during the performance
        monitoring for each run of the Test Suite.

        SQLDBA Stats - These will be gathered using a simulated
        workload rather than the test suite since they
        need to be generated using the "expected" mix of operations,
        which may not be the same as the selected Test Suite.
              
              MONITOR I/O - this generates stats on buffer cache usage.
              We can adjust the size of the Oracle buffer cache if the
              stats indicate we are experiencing a low hit ratio.

              REDO STATS - this generates stats on how many pending
              write requests there are for the redo log buffer.
              Ideally this should be 0, and we increase the size of
              Redo buffer if we see values larger than 0.


       Swapping - monitor swapping activity - swapouts should ideally
             be zero.

       i/o request queue - examine the average queue length for each
             disk and % busy.  Average queue requests greater than
             2-4 imply a possible bottleneck.  Solution may be to
             separate some of the data files across multiple disks.

       ?? others ??  (there are many recommendations in the Oracle
                      for Unix Tuning Guide but I'm tired of reading
                      it for now - add these later).



    Disaster Recovery -

        Need to execute disaster recovery (restore from backup and
        archivelogs) to determine the amount of time required
        for various DBMS sizes and archive log sizes.  We need to
        determine how often the backups need to run to ensure that
        a complete recovery can be executed within the required 4 hour time.

        This will involve peforming a routine backup of the system.
        Executing a day's worth of backup operations and terminate
        the system ungracefully.  We then attempt to restore the 
        ABS database using the backup copy and the archive logs.
        Noting the size of the database and logs and the time required
        to perform the restore.  This also requires that we identify
        the type of medium used for the archive log storeage (e.g.,
        some high-speed, high-capacity tape, prefereably).  

        From this we should generate some guidelines which can be
        used by the operations staff as they monitor the growth of
        the database so that they can more accurately predict the
        recovery time and adjust the dbms backup scheme accordingly.

        Both dev and operations should particpate in the execution
        of the disaster recovery to help ensure that several people are
        skilled at this before the system goes into production.



Recommended Routine Monitoring

As an end result of this exercise, we should also be able to generate
a set of guidelines for routine monitoring of the production system
which should assist operations staff to keep the system running 
efficiently.  This information will be incorporated as part of
the ABS Administrators Guide.





                     SLAVES


Identify Configurations:

  The ABS team recommends that we investigate the use of some high
performance configurations which will greatly increase the amount of data
which we can backup in a given day.  Some of this work involves examining
new high speed, high capacity tape devices and how they perform
on the various unix systems of interest.  After we have identified some
promising technologies and configurations, we would ideally like to 
run the ABS Slaves on them and produce guidelines for recommended 
configurations.
   
In addition, we will have to include as part of the performance 
measuring, recommendations for the hardware configurations which are
currently in use for AFS backups, in the event that we are not able
to purchase more efficient hardware.


The following hardware configurations are of interest: 

          (Dave and Brian fill this in).



Athena Config

  - Identify the minimum athena/software configuration for a Tape Slave.
     ?? 


Application Tuning

  - Identify the optimal buffersize for each type of tape device to
    ensure data writes are streamed efficiently.

  ?? what else  

Identify Tests

  The following tests will be run with each configuration:

     - raw read/write - do we want to do this?

     -backup a full tape's worth of volumes; note the amount of
      data backed-up and how long this took.

     - attempt to restore volumes which are at various locations
       on the tape (including the end of the tape).   Note how
       long it takes to seek to the appropriate location and
       begin the restore.

    ?? what else

Identify Performance Statisitcs

  ?? Figure out what we need to look at besides total time to backup
     n kbytes of data.


home help back first fref pref prev next nref lref last post