[142356] in North American Network Operators' Group


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Re: BGP Design question.

daemon@ATHENA.MIT.EDU (Bret Palsson)
Thu Jun 23 02:08:25 2011

From: Bret Palsson <bret@getjive.com>
In-Reply-To: <5.1.0.14.2.20110623085011.036d3c08@efes.iucc.ac.il>
Date: Thu, 23 Jun 2011 00:07:00 -0600
To: Hank Nussbacher <hank@efes.iucc.ac.il>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

That's fine if you are running a website. When it comes to =
telecommunications, a 15 minute outage is pretty huge. Especially with =
certain types of customers: emergency services for example.

-Bret

On Jun 23, 2011, at 12:02 AM, Hank Nussbacher wrote:

> At 20:42 22/06/2011 -0700, Jason Roysdon wrote:
>=20
> Let me be a bit of a heretic here.  How often does your router fail?  =
Or your firewall?  In the 25 years I have gone into customers I have =
found when they did a cross setup as proposed below by Bret and Jason, =
only one person truly knew the complete setup and if something broke =
only he was able to fix it.  There is never complete printed =
documentation: routing design, IPs on all interfaces, subnetting =
schematic, etc.  And if there was at one point, after 2 years it was =
outdated and never updated and only the *1* guy knew the changes in his =
head.
>=20
> In that kind of situation, when something stopped working they always =
had to call in the "guru" to fix it.  On the other hand, a simple design =
of only *one* path (pick either left or right side of each of the ASCII =
arts), made it possible that even junior network engineers as well as =
technicians called in on emergency with 4 hours notice, were able to fix =
the situation much more quickly than the "cross" design.  And the MTBF =
on a single path solution, IMHO, is around 3-4 years.  And if you need =
redundancy, keep a spare box on a shelf, completely loaded with the =
latest config so that it can be hot-swapped in within 15 minutes of =
failure.
>=20
> This 1-path design is not for everyone.  The vendors always recommend =
the "cross" design since they sell 2x the amount of boxes but I have =
found that life works fine with just a 1-path design as well.
>=20
> -Hank
>=20
>=20
>> I second the static routes, specially from a simplicity standpoint.  =
Add
>> in a pair of layer two switches to simplify further:
>>=20
>>=20
>>     +--------+    +--------+
>>     | Peer A |    | Peer A |  <-Many carriers. Using 1 carrier
>>     +---+----+    +----+---+    for this scenario.
>>         |eBGP          | eBGP
>>         |              |
>>     +---+----+iBGP+----+---+
>>     | Router +    + Router |  <- Routers. Not directly connected
>>     +-+------+    +------+-+
>>       |                  |
>>     +-+------+    +------+-+
>>     |L2Switch|----|L2Switch|  <- Layer 2 switches, can be stacked
>>     +--------+    +--------+
>>       |                  |
>>     +-+------+    +------+-+
>>     |Act. FW |----|Pas. FW |  <-Firewalls Active/Passive.
>>     +--------+    +--------+
>>=20
>> You can lose all of the left leg, or all of the right leg, and still =
be
>> up.  If you want to complicate things, you can add crossing links
>> between it all, but again, beyond BGP and VRRP, this is a very simple
>> design you can easily troubleshoot at 3AM.  It's also much easier to
>> document the troubleshooting steps (so you can go on vacation and
>> someone else can solve without calling you) and test upgrades.
>>=20
>> You can nearly evenly split the traffic by having a VRRP VIP on each
>> edge router, with the other router backing up the first.  The =
firewalls
>> can have two static routes, one to each VIP, and this will roughly
>> load-balance the traffic out on a packet basis.  As you peer with the
>> same ISP, this will work just fine.  If they have an outage, your =
edge
>> routers will learn, and even if the circuit drops it'll know, and
>> basically the VIP will just redirect traffic to the other router.
>>=20
>> Now all your firewalls have to do is maintain stateful session
>> information, not OSPF.
>>=20
>> If you had two different ISPs (especially if they are not roughly =
evenly
>> connected), then not having intelligence of the BGP paths in your
>> firewalls can cause an extra hop when it hits router with the longer
>> path, which will redirect it to the router with the shorter path.
>>=20
>> Speaking from a Cisco/HSRP point of view, you could be more =
intelligent
>> (re:more complicated, and complication means harder troubleshooting =
and
>> more documentation needed) during problem periods by having the VIP =
move
>> routers automatically based on the WAN link dropping and/or a route
>> beyond it being lost (others can comment to if VRRP supports this).
>> This would save one hop to the "broken" router when the BGP path or =
WAN
>> is down.
>>=20
>> Jason Roysdon
>>=20
>> On 06/22/2011 06:07 PM, Bret Palsson wrote:
>> > On Wed, Jun 22, 2011 at 5:33 PM, PC <paul4004@gmail.com> wrote:
>> >
>> >> Who makes the firewall?
>> >>
>> >>
>> > Juniper SSG. We use NSRP and replicate all the RTOs. We have =
hitless on the
>> > Firewalls, have for years. We're now peering with our own carriers =
vs. using
>> > our datacenter's mix.
>> >
>> > A static route from the junipers to the VIP (VRRP) is probably the =
way to
>> > go. I think.
>> >
>> > To make this work and be "hitless", your firewall vendor must =
support
>> >> stateful replication of routing protocol data (including OSPF).  =
For
>> >> example, Cisco didn't support this in their ASA product until =
version 8.4 of
>> >> code.
>> >>
>> >> Otherwise, a failover requires OSPF to re-converge -- and quite =
frankly,
>> >> will likely cause some state of confusion on the upstream OSPF =
peers, loss
>> >> of adjacency, and a loss of routing until this occurs.  It's like =
someone
>> >> just swapped a router with the same IP  to the upstream device -- =
assuming
>> >> your active/standby vendor's implementation only presents itself =
as one
>> >> device.
>> >>
>> >> However, once this is succesful your current failover topology =
should work
>> >> fine -- even if it takes some time to failover.
>> >>
>> >> In my opinion though, unless the firewall is serving as "transit" =
to
>> >> downstream routers or other layer 3 elements, and you need to run =
OSPF to it
>> >> (And through it) as a result, it's often just easier to static =
default route
>> >> out from the firewall(s) and redistribute a static route on the =
upstream
>> >> routers for the subnets behind the firewalls.  It also helps =
ensure
>> >> symmetrical traffic flows, which is important for stateful =
firewalls and can
>> >> become moderatly confusing when your firewalls start having many =
interfaces.
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jun 22, 2011 at 4:27 PM, Bret Palsson <bret@getjive.com> =
wrote:
>> >>
>> >>> Here is my current setup in ASCII art. (Please view in a fixed =
width
>> >>> font.) Below the art I'll write out the setup.
>> >>>
>> >>>
>> >>>     +--------+    +--------+
>> >>>     | Peer A |    | Peer A |  <-Many carriers. Using 1 carrier
>> >>>     +---+----+    +----+---+    for this scenario.
>> >>>         |eBGP          | eBGP
>> >>>         |              |
>> >>>     +---+----+iBGP+----+---+
>> >>>     | Router +----+ Router |  <-Netiron CERs Routers.
>> >>>     +-+------+    +------+-+
>> >>>       |A   `.P    A.'    |P   <-A/P indicates Active/Passive
>> >>>       |      `.  .'      |      link.
>> >>>       |        ::        |
>> >>>     +-+------+'  `+------+-+
>> >>>     |Act. FW |    |Pas. FW |  <-Firewalls Active/Passive.
>> >>>     +--------+    +--------+
>> >>>
>> >>>
>> >>> To keep this scenario simple, I'm multihoming to one carrier.
>> >>> I have two Netiron CERs. Each have a eBGP connection to the same =
peer.
>> >>> The CERs have an iBGP connection to each other.
>> >>> That works all fine and dandy. Feel free to comment, however if =
you think
>> >>> there is a better way to do this.
>> >>>
>> >>> Here comes the tricky part. I have two firewalls in an =
Active/Passive
>> >>> setup. When one fails the other is configured exactly the same
>> >>> and picks up where the other left off. (Yes, all the sessions =
etc. are
>> >>> actively mirrored between the devices)
>> >>>
>> >>> I am using OSPFv2 between the CERs and the Firewalls. Failover =
works just
>> >>> fine, however when I fail an OSPF link that has the active =
default route,
>> >>> ingress traffic still routes fine and dandy, but egress traffic =
doesn't.
>> >>> Both Netiron's OSPF are setup to advertise they are the default =
route.
>> >>>
>> >>> What I'm wondering is, if OSPF is the right solution for this. =
How do
>> >>> others solve this problem?
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Bret
>> >>>
>> >>>
>> >>> Note: Since lately ipv6 has been a hot topic, I'll state that =
after we get
>> >>> the BGP all figured out and working properly, ipv6 is our next =
project. :)
>> >>>
>> >>>
>> >>>
>> >>
>> >
>=20

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[142356] in North American Network Operators' Group

Re: BGP Design question.

daemon@ATHENA.MIT.EDU (Bret Palsson)Thu Jun 23 02:08:25 2011

daemon@ATHENA.MIT.EDU (Bret Palsson)
Thu Jun 23 02:08:25 2011