[146868] in North American Network Operators' Group
Re: OT: Traffic Light Control (was Re: First real-world SCADA attack
daemon@ATHENA.MIT.EDU (Thomas Maufer)
Wed Nov 23 21:43:04 2011
From: Thomas Maufer <tmaufer@gmail.com>
In-Reply-To: <20111124015931.GA24906@panix.com>
Date: Wed, 23 Nov 2011 18:41:58 -0800
To: Brett Frankenberger <rbf+nanog@panix.com>
Cc: NANOG <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
<unlurks>
I have to jump in on this thread. Traffic light controllers are a fun =
category of technical artifacts. The weatherproof boxes that the relays =
used to live in have stayed the same size for decades, but now the =
controllers just take a teeny tiny circuit board rattling around in this =
comparatively huge box. And it's full of software, dontcha know? So why =
not have lots of newfangled features? Curiously, the people who make the =
insides of the box have a WHOLE DIFFERENT way of thinking about "what a =
traffic light controller should do?" - the "insider" people are in the =
21st century, while the "outsider" people are in the early 20th century. =
Lemme splain.
A particular traffic light controller that I tested in 2007 had an FTP =
server inside it. I have no idea why. So I tried fuzzing it. 5 minutes =
into the test, the test aborted because the DuT wouldn't restart =
anymore. Upon investigation, we discovered that a particular FTP =
sequence had triggered a bug that had a rather unfortunate =
(side-)effect: The flash file system of the traffic light controller was =
formatted or erased. As a bonus, the device also had crashed and it was =
awaiting a ZMODEM file download since it didn't have a boot image any =
more. We couldn't test anything else because we didn't have the special =
serial cable to (re-)install the OS. Fail-safe? Not hardly: Not when it =
has no software! It's a lump of highly refined sand, in a plastic case.
There are many lessons here, not least of which is: Ship the device with =
the smallest possible attack surface! Why the heck was FTP enabled? =
Clearly this device had never been subjected to any negative testing. =
And these devices are meant to be networked, so that FTP bug will be =
tickled someday, I just don't know when. Yes, it was reported to the =
vendor, and no, I have no idea if they ever fixed it.
Also, in this thread I have seen several references to "fail-safe" or =
"redundancy" features. In my experience, those are often some of the =
weakest aspects of some systems. In one case, I my testing rendered a =
multi-million-dollar highly redundant VoIP soft switch useless by =
constantly causing the primary to fail - and while the secondary was =
being activated, there was a quiet period of 2-3 seconds during which =
time no calls went through. Shortly after the secondary had become the =
primary, it failed again, continuing the cycle. Literally traffic =
amounting to one packet (about 100 bytes, IIRC) per second of carefully =
crafted SIP INVITES could make this switch completely useless. The bug I =
found involved SIP INVITE messages that could not be filtered=85unless =
you didn't want to accept VoIP phone calls at all, which calls into =
question your purchase of the multi-million-dollar highly redundant soft =
switch. That bug was fixed.
Software is tricky stuff. The number of ways it can fail is practically =
infinite, but there is generally only a small number of ways for it to =
work correctly. Networked software is particularly challenging to write =
because the software engineers don't get to control their inputs. The =
intervening network can (does) fold, spindle, mutilate, truncate, drop, =
reorder or duplicate packets and your code on the receiving end has to =
try to understand what was intended by the sender. Oh, and the sender =
might be following an older version of the standard (if one even exists) =
or simply have included some bugs of their own. Because the coders are =
so focused on making their code do what the MRD/PRD required - on a =
tight schedule! - they have little time to imagine all the possible ways =
their code might fail. Their error-handling routines are simply never =
imaginative enough to handle real-world brokenness. It *is* possible to =
test this stuff, but time pressures in release schedules don't leave a =
lot of breathing room for developers to take on whole new classes of =
tasks that are outside their expertise (security testing). So you end up =
with a traffic light controller that erases its own flash file system =
when it receives a slightly strange but completely legal FTP command, or =
a highly redundant VoIP soft switch that is only good at ping-ponging =
from primary to secondary CPUs. Don't even get me started on problems I =
have found in carrier-class routers.
I don't need to name names: All software has bugs (except possibly the =
code in the main computers on the Space Shuttle). Every engineer I have =
ever known has tried to write their code well, but automated negative =
testing has only recently caught up to where the engineers and QA staff =
can focus on what they do best (write and test code that implements =
features that someone can buy), and let purpose-built tools do the =
negative testing for them, so their error-handling routines can be =
robust, too. Fixing bugs is generally straightforward. Finding them has =
always been the challenge.
~tom
</unlurks>
On 23 Nov 2011, at 17:59 , Brett Frankenberger wrote:
> On Wed, Nov 23, 2011 at 05:45:08PM -0500, Jay Ashworth wrote:
>>=20
>> Yeah. But at least that's stuff you have a hope of managing. =
"Firmware
>> underwent bit rot" is simply not visible -- unless there's, say, =
signature=20
>> tracing through the main controller.
>=20
> I can't speak to traffic light controllers directly, but at least some
> vital logical controllers do check signatures of their firmware and
> programming and will fail into a safe configuration if the
> signatures don't validate.
>=20
> -- Brett
>=20