[444] in athena10

home help back first fref pref prev next nref lref last post

Login chroot design for Athena 10

daemon@ATHENA.MIT.EDU (ghudson@MIT.EDU)
Tue Aug 19 13:27:09 2008

Date: Tue, 19 Aug 2008 13:26:03 -0400 (EDT)
From: ghudson@MIT.EDU
Message-Id: <200808191726.m7JHQ3kp013052@outgoing-legacy.mit.edu>
To: athena10@mit.edu

Since Ken Arnold raised the point that login chroots would simplify
the update architecture, I'm going to take some time this week to
explore implementing them.

The basic idea would be a three-tiered logical volume setup:

  * login-master is the long-term, mutable chroot.
  * login-stable is a snapshot of login-master made after an update.
  * login is an ephemeral snapshot of login-stable used for a session.

(Assumption: LVM supports snapshots of snapshots.  If it does not,
then the corner cases get a little uglier.  We can create login from
login-master just before updating login-master, but what if someone
does a fast login and logout before the update completes?)

Some considerations:

  * We need to pick a size for the logical volumes--small enough that
    we don't overfill cluster machine disks and big enough that we
    don't run out of space for software packages.  /tmp and /var/tmp
    will be coming from the host so that's not an issue.  I'm not sure
    if snapshots 

  * There are some concurrency issues to worry about with
    login-stable, as Ken pointed out.

  * The cluster-software package wants to be installed in
    login-master, not on the host.  (So it gets removed from
    debathena-cluster.)

  * login-master probably wants debathena-workstation installed, not
    debathena-cluster.  (In particular, we don't want the login chroot
    to be trying to create its own interior login chroot!)

  * If we use the chroot via schroot in an /etc/X11/Xsession.d script,
    then the failsafe session won't use the chroot unless we go to
    extra effort to do so.  There are some positive and negative
    ramifications of that; it makes it easier to get root on the host,
    and it means cluster-software won't be available in a failsafe
    session.

So the task breakdown is:

  1. Verify that snapshots of snapshots work in LVM.  If they don't,
     back to the drawing board.

  2. Pick a discipline for handling the concurrency issues around
     login-stable.

  3. Implement a package which (a) creates and updates login-master,
     and (b) manages login-stable.

  4. Implement a package taking over the gdm PreSession and
     PostSession scripts.  In PreSession, snapshot login-stable to
     create login and set it up in schroot.  In PostSession, kill all
     user processes and destroy login.  Also add an Xsesssion script
     to schroot

  5. debathena-auto-update still needs to exist to keep the host up to
     date, but can be redesigned to stop interacting with gdm for the
     most part.  (It still needs to reboot the machine after a kernel
     upgrade; I forgot about that requirement in the current
     implementation.)

I'm still a bit nervous that we'll discover software that can't be
made to work in a chroot.  This is definitely forging into less
charted territory, which can mean higher maintenance costs.  But it
does have substantial advantages for cleanup and updates.

home help back first fref pref prev next nref lref last post