[122018] in North American Network Operators' Group
Re: Mitigating human error in the SP
daemon@ATHENA.MIT.EDU (David Hiers)
Thu Feb 4 01:09:40 2010
In-Reply-To: <20100203161409.GC3212@kallisti.us>
Date: Wed, 3 Feb 2010 22:08:57 -0800
From: David Hiers <hiersd@gmail.com>
To: Ross Vandegrift <ross@kallisti.us>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
You can completely implement Vijay's most impressive stuff and simply
move the problem to a different level of abstraction.
No matter what you do, it still comes down to some geek banging on
some plastic thingy. I'm as likely to screw up an "Extensible
Entity-Attribute-Relationship" as I am an ACL.
David
On Wed, Feb 3, 2010 at 8:14 AM, Ross Vandegrift <ross@kallisti.us> wrote:
> On Mon, Feb 01, 2010 at 09:46:07PM -0500, Stefan Fouant wrote:
>> Vijay Gill had some real interesting insights into this in a
>> presentation he gave back at NANOG 44:
>>
>> http://www.nanog.org/meetings/nanog44/presentations/Monday/Gill_programa=
tic_N44.pdf
>>
>> His Blog article on "Infrastructure is Software" further expounds
>> upon the benefits of such an approach -
>> http://vijaygill.wordpress.com/2009/07/22/infrastructure-is-software/
>>
>> That stuff is light years ahead of anything anybody is doing today
>> (well, apart from maybe Vijay himself ;) ... but IMO it's where we
>> need to start heading.
>
> Vijay's stuff is fascinating. =A0The vision is great. =A0But in my
> experience, the vendors and implementations basically ruin the dream
> for anyone who doesn't have his pull.
>
> I'm sure my software is nowhere close to being as sophisticated as
> his, but my plans are pretty much in line with his suggestions. =A0Some
> problems I've run into that I don't see any kind of solution for:
>
> 1) Forwarding-impacting bugs: IOS bugs that are triggered by SNMP are
> easily the #1 cause of our accidental service impact. =A0Most seem to be
> race conditions that require real-world config and forwarding load -
> not something a small shop can afford to build a lab to reproduce. =A0If
> we stuck to manual deployment, we might have made a few mistakes but
> would it have been worse? =A0Maybe - but honestly, it could be a wash.
>
> 2) Vendor support is highly suspicious of automation: anytime I open a
> ticket, even unrelated to an automated software process, the first
> thing the vendor support demands is to disable all automation.
> Juniper is by far the best about this, and they *still* don't actually
> believe their own automation tools work. =A0Cisco TAC's answer has
> always been "don't ever use SNMP if it causes crashes!" =A0Procurve
> doesn't even bother to respond to tickets related to automation bugs,
> even if they are remotely triggerable crashes in the default config.
>
> 3) Automation interfaces are largely unsupported: I imagine vendor
> software development having one or two guys that are the masterminds
> for SNMP/NETCONF/whatever - and that's it. =A0When I have a question on
> how to find a particular tool, or find a bug in an automation
> function, I can often go months on a ticket with people that have no
> idea what I'm talking about. =A0What documentation exists is typically
> incomplete or inconsistent across versions and product lines.
>
> 4) Related tools prevent reliable error reporting: as far as I can
> tell, Net-SNMP returns random values if a request fails; if there's a
> pattern, I've failed to discern it. =A0expect is similar. =A0ScreenOS's
> SSH implementation always returns that a file copy failed. =A0Procurve
> only this year implemented ssh key-based auth in combination with
> remote authentication. =A0The best-of-breed seems to be an oft-pathetic
> collection of tools.
>
> 5) Management support: developing automation software is hard - network
> devices aren't nearly as easy to deal with as they should be. =A0When I
> spend weeks developing features that later causes IOS to spontaneously
> reload, people that don't understand the relation to operational
> impact start to advocate dismantling the automation just like the
> vendors above.
>
> I'm sure we'll continue to build automated policy and configuration
> tools. =A0I'm just not convinced it's the panacea that everyone thinks.
> Unless you're one of the biggest, it puts your network at someone
> else's mercy - and that someone else doesn't care about your
> operational expenses.
>
> Ross
>
> --
> Ross Vandegrift
> ross@kallisti.us
>
> "If the fight gets hot, the songs get hotter. =A0If the going gets tough,
> the songs get tougher."
> =A0 =A0 =A0 =A0--Woody Guthrie
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAktpoNEACgkQMlMoONfO+HB6PACeLoFhmwv8K07Zq9tQDZgKcHYq
> 5nEAoMnrd2YLrSzGkA71N8vRgFWG/SL1
> =3DFQbw
> -----END PGP SIGNATURE-----
>
>