[4226] in Release_7.7_team
Lore of how Red Hat Network trades Security against Scalability
daemon@ATHENA.MIT.EDU (William Cattey)
Wed Feb 25 14:00:44 2004
Mime-Version: 1.0 (Apple Message framework v612)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Message-Id: <7C2FA857-67C4-11D8-ADB7-000A9596D0BC@mit.edu>
Content-Transfer-Encoding: 7bit
Cc: Dan Logcher <dlogcher@mit.edu>, Hal Abelson <hal@mit.edu>
From: William Cattey <wdc@MIT.EDU>
Date: Wed, 25 Feb 2004 13:57:40 -0500
To: release-team@mit.edu
In responding to a query from Theresa on related issues, I found myself
documenting a discovery I made of how Red Hat Network trades Security
against Scalability in a way that hurt MIT a little last week, and
would hold the potential to hurt it a LOT if, for example, the Athena
update were replaced with RHN Institute-wide.
I'm sending out this lore to a wider audience in the interests of
helping people unfamiliar with one or the other system to understand
better.
The fundamental tradeoff is that since Red Hat insists that nobody
should be allowed to perform an update through RHN if they lack a
certifying credential, the potential exists for a systemic outage until
every system gets visited to update certificates.
Last week we had an outage that prevented all users of our Red Hat
Proxy into Red Hat Network from performing any updates until two
problems at the Red Hat end were fixed, and then each and every machine
was hand-tooled with new certificates.
Events of the outage:
1. The Red Hat Certifying Authority Certificate expired.
2. Nobody could update starting Saturday Feb 14.
3. It took us 3 days to correct a Red Hat internal blunder that made
their support people not talk to us.
4. It took another two days to re-install the CA (their baroque repair
procedures were an issue.)
5. Hand tooling EVERY system using Red Hat Network was then required
to re-install the Red Hat CA Cert to get updates going again. The
announcement that the service was back up (including customer
instructions on how to do the hand tooling) was sent out on 20
February.
If the Athena Release Team were to go away, and the Athena update were
converted to "just use Red Hat Network" we would be at risk of another
such outage any time there was a problem with either the Red Hat
Certifying Authority Certificate, or with the individual certificate on
the client host.
Furthermore, the Red Hat Proxy server represents a single point of
failure. A client-driven update that pulls data from an enterprise
filesystem with replication is a more reliable infrastructure.
When the time comes to have a technical conversation with Red Hat, I'll
try to point out the value of a scalable, non-secure, pathway through
their service to head off such failures.
-wdc