[142356] in North American Network Operators' Group
Re: BGP Design question.
daemon@ATHENA.MIT.EDU (Bret Palsson)
Thu Jun 23 02:08:25 2011
From: Bret Palsson <bret@getjive.com>
In-Reply-To: <5.1.0.14.2.20110623085011.036d3c08@efes.iucc.ac.il>
Date: Thu, 23 Jun 2011 00:07:00 -0600
To: Hank Nussbacher <hank@efes.iucc.ac.il>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
That's fine if you are running a website. When it comes to =
telecommunications, a 15 minute outage is pretty huge. Especially with =
certain types of customers: emergency services for example.
-Bret
On Jun 23, 2011, at 12:02 AM, Hank Nussbacher wrote:
> At 20:42 22/06/2011 -0700, Jason Roysdon wrote:
>=20
> Let me be a bit of a heretic here. How often does your router fail? =
Or your firewall? In the 25 years I have gone into customers I have =
found when they did a cross setup as proposed below by Bret and Jason, =
only one person truly knew the complete setup and if something broke =
only he was able to fix it. There is never complete printed =
documentation: routing design, IPs on all interfaces, subnetting =
schematic, etc. And if there was at one point, after 2 years it was =
outdated and never updated and only the *1* guy knew the changes in his =
head.
>=20
> In that kind of situation, when something stopped working they always =
had to call in the "guru" to fix it. On the other hand, a simple design =
of only *one* path (pick either left or right side of each of the ASCII =
arts), made it possible that even junior network engineers as well as =
technicians called in on emergency with 4 hours notice, were able to fix =
the situation much more quickly than the "cross" design. And the MTBF =
on a single path solution, IMHO, is around 3-4 years. And if you need =
redundancy, keep a spare box on a shelf, completely loaded with the =
latest config so that it can be hot-swapped in within 15 minutes of =
failure.
>=20
> This 1-path design is not for everyone. The vendors always recommend =
the "cross" design since they sell 2x the amount of boxes but I have =
found that life works fine with just a 1-path design as well.
>=20
> -Hank
>=20
>=20
>> I second the static routes, specially from a simplicity standpoint. =
Add
>> in a pair of layer two switches to simplify further:
>>=20
>>=20
>> +--------+ +--------+
>> | Peer A | | Peer A | <-Many carriers. Using 1 carrier
>> +---+----+ +----+---+ for this scenario.
>> |eBGP | eBGP
>> | |
>> +---+----+iBGP+----+---+
>> | Router + + Router | <- Routers. Not directly connected
>> +-+------+ +------+-+
>> | |
>> +-+------+ +------+-+
>> |L2Switch|----|L2Switch| <- Layer 2 switches, can be stacked
>> +--------+ +--------+
>> | |
>> +-+------+ +------+-+
>> |Act. FW |----|Pas. FW | <-Firewalls Active/Passive.
>> +--------+ +--------+
>>=20
>> You can lose all of the left leg, or all of the right leg, and still =
be
>> up. If you want to complicate things, you can add crossing links
>> between it all, but again, beyond BGP and VRRP, this is a very simple
>> design you can easily troubleshoot at 3AM. It's also much easier to
>> document the troubleshooting steps (so you can go on vacation and
>> someone else can solve without calling you) and test upgrades.
>>=20
>> You can nearly evenly split the traffic by having a VRRP VIP on each
>> edge router, with the other router backing up the first. The =
firewalls
>> can have two static routes, one to each VIP, and this will roughly
>> load-balance the traffic out on a packet basis. As you peer with the
>> same ISP, this will work just fine. If they have an outage, your =
edge
>> routers will learn, and even if the circuit drops it'll know, and
>> basically the VIP will just redirect traffic to the other router.
>>=20
>> Now all your firewalls have to do is maintain stateful session
>> information, not OSPF.
>>=20
>> If you had two different ISPs (especially if they are not roughly =
evenly
>> connected), then not having intelligence of the BGP paths in your
>> firewalls can cause an extra hop when it hits router with the longer
>> path, which will redirect it to the router with the shorter path.
>>=20
>> Speaking from a Cisco/HSRP point of view, you could be more =
intelligent
>> (re:more complicated, and complication means harder troubleshooting =
and
>> more documentation needed) during problem periods by having the VIP =
move
>> routers automatically based on the WAN link dropping and/or a route
>> beyond it being lost (others can comment to if VRRP supports this).
>> This would save one hop to the "broken" router when the BGP path or =
WAN
>> is down.
>>=20
>> Jason Roysdon
>>=20
>> On 06/22/2011 06:07 PM, Bret Palsson wrote:
>> > On Wed, Jun 22, 2011 at 5:33 PM, PC <paul4004@gmail.com> wrote:
>> >
>> >> Who makes the firewall?
>> >>
>> >>
>> > Juniper SSG. We use NSRP and replicate all the RTOs. We have =
hitless on the
>> > Firewalls, have for years. We're now peering with our own carriers =
vs. using
>> > our datacenter's mix.
>> >
>> > A static route from the junipers to the VIP (VRRP) is probably the =
way to
>> > go. I think.
>> >
>> > To make this work and be "hitless", your firewall vendor must =
support
>> >> stateful replication of routing protocol data (including OSPF). =
For
>> >> example, Cisco didn't support this in their ASA product until =
version 8.4 of
>> >> code.
>> >>
>> >> Otherwise, a failover requires OSPF to re-converge -- and quite =
frankly,
>> >> will likely cause some state of confusion on the upstream OSPF =
peers, loss
>> >> of adjacency, and a loss of routing until this occurs. It's like =
someone
>> >> just swapped a router with the same IP to the upstream device -- =
assuming
>> >> your active/standby vendor's implementation only presents itself =
as one
>> >> device.
>> >>
>> >> However, once this is succesful your current failover topology =
should work
>> >> fine -- even if it takes some time to failover.
>> >>
>> >> In my opinion though, unless the firewall is serving as "transit" =
to
>> >> downstream routers or other layer 3 elements, and you need to run =
OSPF to it
>> >> (And through it) as a result, it's often just easier to static =
default route
>> >> out from the firewall(s) and redistribute a static route on the =
upstream
>> >> routers for the subnets behind the firewalls. It also helps =
ensure
>> >> symmetrical traffic flows, which is important for stateful =
firewalls and can
>> >> become moderatly confusing when your firewalls start having many =
interfaces.
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jun 22, 2011 at 4:27 PM, Bret Palsson <bret@getjive.com> =
wrote:
>> >>
>> >>> Here is my current setup in ASCII art. (Please view in a fixed =
width
>> >>> font.) Below the art I'll write out the setup.
>> >>>
>> >>>
>> >>> +--------+ +--------+
>> >>> | Peer A | | Peer A | <-Many carriers. Using 1 carrier
>> >>> +---+----+ +----+---+ for this scenario.
>> >>> |eBGP | eBGP
>> >>> | |
>> >>> +---+----+iBGP+----+---+
>> >>> | Router +----+ Router | <-Netiron CERs Routers.
>> >>> +-+------+ +------+-+
>> >>> |A `.P A.' |P <-A/P indicates Active/Passive
>> >>> | `. .' | link.
>> >>> | :: |
>> >>> +-+------+' `+------+-+
>> >>> |Act. FW | |Pas. FW | <-Firewalls Active/Passive.
>> >>> +--------+ +--------+
>> >>>
>> >>>
>> >>> To keep this scenario simple, I'm multihoming to one carrier.
>> >>> I have two Netiron CERs. Each have a eBGP connection to the same =
peer.
>> >>> The CERs have an iBGP connection to each other.
>> >>> That works all fine and dandy. Feel free to comment, however if =
you think
>> >>> there is a better way to do this.
>> >>>
>> >>> Here comes the tricky part. I have two firewalls in an =
Active/Passive
>> >>> setup. When one fails the other is configured exactly the same
>> >>> and picks up where the other left off. (Yes, all the sessions =
etc. are
>> >>> actively mirrored between the devices)
>> >>>
>> >>> I am using OSPFv2 between the CERs and the Firewalls. Failover =
works just
>> >>> fine, however when I fail an OSPF link that has the active =
default route,
>> >>> ingress traffic still routes fine and dandy, but egress traffic =
doesn't.
>> >>> Both Netiron's OSPF are setup to advertise they are the default =
route.
>> >>>
>> >>> What I'm wondering is, if OSPF is the right solution for this. =
How do
>> >>> others solve this problem?
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Bret
>> >>>
>> >>>
>> >>> Note: Since lately ipv6 has been a hot topic, I'll state that =
after we get
>> >>> the BGP all figured out and working properly, ipv6 is our next =
project. :)
>> >>>
>> >>>
>> >>>
>> >>
>> >
>=20