[88605] in North American Network Operators' Group
NANOG36-NOTES 2006.02.13 talk 7 QoS in MPLS environments
daemon@ATHENA.MIT.EDU (Matthew Petach)
Mon Feb 13 18:19:33 2006
Date: Mon, 13 Feb 2006 15:19:02 -0800
From: Matthew Petach <mpetach@netflight.com>
To: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: owner-nanog@merit.edu
------=_Part_12114_28895363.1139872742631
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Here's my notes from the MPLS QoS tutorial; wish I could have
been in two places at once to catch the ISPSec BOF as well.
I won't be taking notes at Eddie Deens, though, so it'll be up
to Ren's camera to capture the details for those following along
at home. < http://nanog.multiply.com/ >
Matt
2006.02.13
QoS in MPLS networks tutorial notes.
See notes for Agenda, outline, etc. at
http://www.nanog.org/mtg-0602/sathiamurthi.html
Traffic characterizations go beyond simple DiffServ
bit distinctions
Understand traffic types and sources and nature
of traffic before
Latency,
Jitter,
Loss
three traffic parameters to be tracked that influence
choices made when applying QoS
It's all about managing finite resources
rate control, queing, scheduling, etc.
congestion management, admission control
routing control traffic protection
The QoS Triangle (no, not bermuda triangle)
Identify Traffic Type
Determine QoS parameters
Apply QoS settings
2 approaches to QoS
fine-grained approach
or
combination of flows to same traffic type, to same
source. Needs to have same characteristics so you
can consider them as an aggregated flow.
Best Effort is simplest QoS
Integrated services (Hard QoS)
Differentiated Services (soft QoS)
Best Effort is simple, traditional internet
Integrated services model, RFC 1633, guarantees per
flow QoS
strict bandwidth reservations.
RSVP, RFC 2055, PATH/RESV messages
Admission controls
must be configured on every router along path
Works well on small scale. Scaling challenge with large
numbers of flows.
What about aggregating flows into integrated services?
DiffServ arch; RFC 2475
scales well with large flows through aggregation
creates a means for traffic conditioning (TC)
defines per-hop behaviour (PHB)
edge nodes perform TC
keeps core doing forwarding
tough to predict end to end behaviour
esp with multiple domains
how do you handle capacity planning?
Diff services arch slide with pictures of
traffic flow.
TCA prepares core for the traffic flow that
will be coming in; allows core to do per-hops
behaviour at the core.
IETF diffserv model
redefine ToS byte in IP header to differentiated services
code point (DSCP)
uses 6 bits to define behaviour into behaviour aggregates.
Class Selector (CS0 through CS 7)
classifier; selects packets based on headers.
Classification and Marking
flows have 5 parameters; IP src, dest, prececedence,
DSCP bits,
You can handle traffic metering via adjusting the
three flows.
3 parameters used by the token bucket;
committed information rate
conformed and extended burst size
Policing vs shaping.
policing drops excess traffic; it accomodates bursts;
anything beyond that gets dropped; or, can be re-marked.
Shaping smooths traffic but increases latency.
buffers packets.
policing
uses the token bucket scheme
tokens added to the bucket at the committed rate
depth of the bucket determines the burst size
packets arriving when there's enough tokens in the bucket
are conforming
packets arriving when the bucket is out of tokens are
non-conforming; either coloured, dropping, etc.
diagram of token bucket, very nice.
shaping--use the token bucket scheme as well
smooths through buffering
queued packets transmitted as tokens are available.
1 aspect is traffic conditioning at edge
2 aspect is per hop behaviour
PHB relates to resource allocation for a flow
resource allocation is typically bandwidth
queing / scheduling mechanisms
FIFO/WFQ/MWRR(weighted)/MDRR (deficit)
congestion avoidence
RED (random early detection / Weighted random early drop
Queing/scheduling
needs some data mining to decide how to prioritize certain
classes of traffic.
de-queues depends on weights assigned to different flows.
Congestion avoidance technique
when there is congestion what should happen?
tail drop (hit max queue length)
drop selectively but based on IP Prec/DSCP bit
Congestion control for TcP
adaptive
dominant transport protocol
Slide showing problem of congestion; without technique,
have uncontrolled congestion, big performance impact
due to retransmissions.
TCP traffic and congestion
congestion vs slow-start
sender/recieever negotiate on it.
source throttles back traffic.
(control leverages this behaviour)
Global synchroniztion happens when many flows pass through
a congested link; each flow going through starts following
the same backoff and ramp up, leads to sawtooth curves.
RED
a congestion avoidance mechanism
works with TCP
uses packet drop probability and avg queue size
avoids global synchronization of many flows.
minimizes packet delay jitter by managing queue size
RED has minimum and maximum threshold; average queue
size is used to avoid dealing with transient bursts.
WRED combines RED with IP precedence or DSCP to
implement multiple service classes
each service class has its own min and max threshold and
drop rate.
nice slides of lower and higher thresholds for different
traffic types.
When is WRED used? only when TCP is bulk of traffic.
Won't help UDP or other IP
MPLS and QoS, into DiffServ
avoid vendor CLI as much as possible for the talk.
stick with techniques only.
do classification and marking at edge, then do per
hop behaviour on when to queue or drop packets within
the core.
Within the MPLS domain, do you lose all the nice
classification information?
No, you tunnel information from IP DiffServ into MPLS
DiffServ.
MPLS DiffServ
doesn't introduce new QoS architecture
uses diffserv defined for IP QoS (RFC 2745)
MPLS DiffServ is defined in RFC3270
uses MPLS shim header
show slide of diffserv scalability via aggregation
traffic enters at PE router, goes through P core,
comes out PE at other side.
MPLS scalability comes from
aggregation of traffic on the edge
processing of aggregate only in the core
deal with buckets only, thus can scale well.
the PE router has to put 2 labels on; next router
What's unchanged in MPLS diffserv?
traffic conditioning agreements
same classification, marking, shaping, policing still
happen at the edge
buffer management adn packet scheduling mechanisms
used to implement PHB
PHB definitions
EF: low delay/jitter/loss
AF: low loss
BE: no guarantees (best effort)
what's NEW in MPLS diffserv?
Prec/DSCP field not visible to MPLS LSRs
info on diffserv must be made visible to LSR in
MPLS header using EXP field/label
how is DSCP mapped into EXP--some interation between them.
EXP is 3 bits, S is 1 bit.
Typical mapping
Expedidted forwarding: EF DSCP 6 bits to 3 bits of EXP bits.
101000 maps to 101
but then you lose bits of informatin.
IP DSCP 6 bits whle MPLS EXP =3D 3bits (RFC 3270)
if 8 or less PHBs are used, map DSCP to EXP directly,
with E-LSPs with preconfigured mappings
If more than 8 PHBs, needed to be mapped in label
and EXP; L-LSPs are needed
Both E-LSP and L-LSP can use LDP or RSVP for label
distribution.
MPLS: flows associated with FEC mapped to one label
DS: flows associated with class, mappable to EXP
MPLS diffserv tunneling modes
Based on RFC 3270
Modes
uniform
short-pipe
pipe
how do you implement the modes? depends on your
engineering decisions.
uniform mode
assume the entire admin domain of the SP is under single
diffserv domain
then like a requirement to keep colouring info the same
(uniform) when going from IP to IP, to MPLS, back again,
etc.
in both MPLS to MPLS and to IP cases, tehe PHB of the
topmost popped label is copied into the new top label
or the IP DSCP if no label remains.
Short pipe mode
assume an ISP network implmementing a diffserv model
assumes customers implement a different policy.
note that the policy applied outbound on egress
interface is basd on DSCP of the customer, hence the
short-pipe naming.
Pipe-mode
same as short-pipe
however, SP wants to drive the outbound
PHBs of the topmost popped label is copied to the new
top label
classification is based on mpls-exp field (EXP=3D0) of the
topmost received MPLS frame
MPLS TE and DiffServ
is diffserv good enough to determine end to end quality
of service? nope.
what happens if there's no congestion, but a link
fails?
when link fails, and reroute happens across a new
link; the new link gets congested due to combined
traffic.
You may need to engineer your traffic on non-optimal
path to assure enough bandwidth will be ready for it.
So you have BW optimization and congestion management
in parallel
TE + DiffSErve
spread traffic around with more flexibility than IGP
supports.
MPLS labels can be used to engineer explicit paths
tunnels are uni-directional
How does MPLS TE work?
Explicit routing
constraint-based routing
admission control
protection capabilities
RSVP-TE to establish LSPs
ISIS and OSPF extensions to advertise link attributes
Diffserv aware TE
per-class constraint based routing
per class admission control
so best effort can go on one link, while low-latency
can be shifted along a different link.
Link BW distributed in pools of BW constraints (BC)
up to 8 BW pools
different BW pool models
Maximum Allocation Model (MAM)
Maximum Reservable Bandwidth (MRB)
BC0: 20% Best Effort (admission class 1)
BC1: 50% Premium (admission class 2)
BC2: 30% Voice (admission class 3)
Per class traffic engineering concept; all 3 sum to MRB
If for any reason the part of traffic hard reserved isn't
being used, it's wasted; nobody gets to burst into it.
No sharing of unused capacity. But simple, independent.
DS-TE BW Pools--Russian Dolls Model (RDM)
BW pool applies to one or more classes
Global BW pool (BC0) equals MRB
BC0...BCn used for computing unreserved BW for
class n
so BC0: MRB (best effort + premium + voice)
BC1: 50% premium + voice
BC2: 30% Voice
Downside is higher bandwidth class may push out some
lower traffic that was flowing already.
Aggregate TE in diffserv network
DS TE and QoS
Diffserve-TE doesn't preclude the necessity of configuring
PHB QoS in the TE path; DiffServ TE operates in conjunction
with QoS mechanisms.
Traffic engineering is a huge field; so it's hard to cover
in a short period of time.
Summary:
QoS techniques
effective allocation of network resources
IP DiffServ
Service Differentiation
good starting point, bu doesn't scale that well
MPLS and DiffServ
Builds scalable networks for service providers
DiffServ Tunnelling modes
Scalable and flexible QoS options
Supports Draft Tunneling Mode RFC
DiffServ TE
provides strict point-to-point guarantees
pipe models are your choice, how do you want to
architect your network? What are _your_ traffic
needs?
When you need to drop traffic, determine how you'll
drop traffic based on DSCP bits so you can set
watermarks on the traffic; some traffic more
lenient about drops, other traffic not so lenient
about drops.
Question: Fred W. from Bechtel.
With IPv6, there's a 20 byte flow label, rather than
the 8 bit agony of mapping the v4 DSCP bits; does that
give more flexibility, more choices, are there fewer
headaches associated with v6 QoS handling?
Short answer--the presenters aren't as focused on v6
development, so they don't have a concrete answer to
give there, sorry.
That wraps up the presentation/tutorial at 1715 hours
pacific time.
------=_Part_12114_28895363.1139872742631
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
<br>
Here's my notes from the MPLS QoS tutorial; wish I could have<br>
been in two places at once to catch the ISPSec BOF as well.<br>
I won't be taking notes at Eddie Deens, though, so it'll be up<br>
to Ren's camera to capture the details for those following along<br>
at home. < <a href=3D"http://nanog.multiply.com/">http://nanog.mul=
tiply.com/</a> ><br>
<br>
Matt<br>
<br>
<br>
<br>
2006.02.13 <br>
QoS in MPLS networks tutorial notes.<br>
<br>
See notes for Agenda, outline, etc. at<br>
<a href=3D"http://www.nanog.org/mtg-0602/sathiamurthi.html">http://www.nano=
g.org/mtg-0602/sathiamurthi.html</a><br>
<br>
Traffic characterizations go beyond simple DiffServ<br>
bit distinctions <br>
Understand traffic types and sources and nature<br>
of traffic before<br>
<br>
Latency, <br>
Jitter,<br>
Loss<br>
three traffic parameters to be tracked that influence<br>
choices made when applying QoS<br>
<br>
It's all about managing finite resources<br>
rate control, queing, scheduling, etc.<br>
congestion management, admission control<br>
routing control traffic protection<br>
<br>
The QoS Triangle (no, not bermuda triangle)<br>
<br>
Identify Traffic Type<br>
Determine QoS parameters<br>
Apply QoS settings<br>
<br>
2 approaches to QoS<br>
fine-grained approach<br>
or<br>
combination of flows to same traffic type, to same<br>
source. Needs to have same characteristics so you<br>
can consider them as an aggregated flow.<br>
<br>
Best Effort is simplest QoS<br>
Integrated services (Hard QoS)<br>
Differentiated Services (soft QoS)<br>
<br>
Best Effort is simple, traditional internet<br>
<br>
Integrated services model, RFC 1633, guarantees per<br>
flow QoS<br>
strict bandwidth reservations.<br>
RSVP, RFC 2055, PATH/RESV messages<br>
Admission controls<br>
must be configured on every router along path<br>
Works well on small scale. Scaling challenge with large<br>
numbers of flows.<br>
What about aggregating flows into integrated services?<br>
<br>
DiffServ arch; RFC 2475<br>
scales well with large flows through aggregation<br>
creates a means for traffic conditioning (TC)<br>
defines per-hop behaviour (PHB)<br>
edge nodes perform TC<br>
keeps core doing forwarding<br>
tough to predict end to end behaviour<br>
esp with multiple domains<br>
how do you handle capacity planning?<br>
<br>
Diff services arch slide with pictures of<br>
traffic flow.<br>
<br>
TCA prepares core for the traffic flow that<br>
will be coming in; allows core to do per-hops<br>
behaviour at the core.<br>
<br>
IETF diffserv model<br>
redefine ToS byte in IP header to differentiated services<br>
code point (DSCP)<br>
uses 6 bits to define behaviour into behaviour aggregates.<br>
<br>
Class Selector (CS0 through CS 7)<br>
<br>
classifier; selects packets based on headers.<br>
<br>
Classification and Marking<br>
flows have 5 parameters; IP src, dest, prececedence,<br>
DSCP bits,<br>
<br>
You can handle traffic metering via adjusting the<br>
three flows.<br>
<br>
<br>
3 parameters used by the token bucket;<br>
committed information rate<br>
conformed and extended burst size<br>
<br>
Policing vs shaping.<br>
policing drops excess traffic; it accomodates bursts;<br>
anything beyond that gets dropped; or, can be re-marked.<br>
<br>
Shaping smooths traffic but increases latency.<br>
buffers packets.<br>
<br>
policing<br>
uses the token bucket scheme<br>
tokens added to the bucket at the committed rate<br>
depth of the bucket determines the burst size<br>
packets arriving when there's enough tokens in the bucket<br>
are conforming<br>
packets arriving when the bucket is out of tokens are<br>
non-conforming; either coloured, dropping, etc.<br>
<br>
diagram of token bucket, very nice.<br>
<br>
shaping--use the token bucket scheme as well<br>
smooths through buffering<br>
queued packets transmitted as tokens are available.<br>
<br>
1 aspect is traffic conditioning at edge<br>
2 aspect is per hop behaviour<br>
<br>
PHB relates to resource allocation for a flow<br>
resource allocation is typically bandwidth<br>
queing / scheduling mechanisms<br>
FIFO/WFQ/MWRR(weighted)/MDRR (deficit)<br>
congestion avoidence<br>
RED (random early detection / Weighted random early drop<br>
<br>
Queing/scheduling<br>
needs some data mining to decide how to prioritize certain<br>
classes of traffic.<br>
de-queues depends on weights assigned to different flows.<br>
<br>
Congestion avoidance technique<br>
when there is congestion what should happen?<br>
tail drop (hit max queue length)<br>
drop selectively but based on IP Prec/DSCP bit<br>
Congestion control for TcP<br>
adaptive<br>
dominant transport protocol<br>
<br>
Slide showing problem of congestion; without technique,<br>
have uncontrolled congestion, big performance impact<br>
due to retransmissions.<br>
<br>
TCP traffic and congestion<br>
congestion vs slow-start<br>
sender/recieever negotiate on it.<br>
source throttles back traffic.<br>
(control leverages this behaviour)<br>
<br>
Global synchroniztion happens when many flows pass through<br>
a congested link; each flow going through starts following<br>
the same backoff and ramp up, leads to sawtooth curves.<br>
<br>
RED<br>
a congestion avoidance mechanism<br>
works with TCP<br>
uses packet drop probability and avg queue size<br>
avoids global synchronization of many flows.<br>
minimizes packet delay jitter by managing queue size<br>
<br>
RED has minimum and maximum threshold; average queue<br>
size is used to avoid dealing with transient bursts.<br>
WRED combines RED with IP precedence or DSCP to <br>
implement multiple service classes<br>
each service class has its own min and max threshold and<br>
drop rate.<br>
<br>
nice slides of lower and higher thresholds for different<br>
traffic types.<br>
<br>
When is WRED used? only when TCP is bulk of traffic.<br>
Won't help UDP or other IP<br>
<br>
MPLS and QoS, into DiffServ<br>
avoid vendor CLI as much as possible for the talk.<br>
stick with techniques only.<br>
do classification and marking at edge, then do per<br>
hop behaviour on when to queue or drop packets within<br>
the core.<br>
<br>
Within the MPLS domain, do you lose all the nice<br>
classification information?<br>
No, you tunnel information from IP DiffServ into MPLS<br>
DiffServ.<br>
<br>
MPLS DiffServ<br>
doesn't introduce new QoS architecture<br>
uses diffserv defined for IP QoS (RFC 2745)<br>
MPLS DiffServ is defined in RFC3270<br>
uses MPLS shim header<br>
<br>
show slide of diffserv scalability via aggregation<br>
traffic enters at PE router, goes through P core,<br>
comes out PE at other side.<br>
MPLS scalability comes from<br>
aggregation of traffic on the edge<br>
processing of aggregate only in the core<br>
deal with buckets only, thus can scale well.<br>
the PE router has to put 2 labels on; next router<br>
<br>
What's unchanged in MPLS diffserv?<br>
traffic conditioning agreements<br>
same classification, marking, shaping, policing still<br>
happen at the edge<br>
buffer management adn packet scheduling mechanisms<br>
used to implement PHB<br>
<br>
PHB definitions<br>
EF: low delay/jitter/loss<br>
AF: low loss<br>
BE: no guarantees (best effort)<br>
<br>
what's NEW in MPLS diffserv?<br>
Prec/DSCP field not visible to MPLS LSRs<br>
info on diffserv must be made visible to LSR in<br>
MPLS header using EXP field/label<br>
how is DSCP mapped into EXP--some interation between them.<br>
EXP is 3 bits, S is 1 bit.<br>
<br>
Typical mapping<br>
Expedidted forwarding: EF DSCP 6 bits to 3 bits of EXP bits.<br>
101000 maps to 101<br>
but then you lose bits of informatin.<br>
IP DSCP 6 bits whle MPLS EXP =3D 3bits (RFC 3270)<br>
if 8 or less PHBs are used, map DSCP to EXP directly,<br>
with E-LSPs with preconfigured mappings<br>
If more than 8 PHBs, needed to be mapped in label<br>
and EXP; L-LSPs are needed<br>
<br>
Both E-LSP and L-LSP can use LDP or RSVP for label<br>
distribution.<br>
<br>
MPLS: flows associated with FEC mapped to one label<br>
DS: flows associated with class, mappable to EXP <br>
<br>
MPLS diffserv tunneling modes<br>
Based on RFC 3270<br>
Modes<br>
uniform<br>
short-pipe<br>
pipe<br>
<br>
how do you implement the modes? depends on your<br>
engineering decisions.<br>
<br>
uniform mode<br>
assume the entire admin domain of the SP is under single<br>
diffserv domain<br>
then like a requirement to keep colouring info the same<br>
(uniform) when going from IP to IP, to MPLS, back again,<br>
etc.<br>
<br>
in both MPLS to MPLS and to IP cases, tehe PHB of the <br>
topmost popped label is copied into the new top label<br>
or the IP DSCP if no label remains.<br>
<br>
Short pipe mode<br>
assume an ISP network implmementing a diffserv model<br>
assumes customers implement a different policy.<br>
<br>
note that the policy applied outbound on egress<br>
interface is basd on DSCP of the customer, hence the <br>
short-pipe naming.<br>
<br>
Pipe-mode<br>
same as short-pipe<br>
however, SP wants to drive the outbound <br>
PHBs of the topmost popped label is copied to the new<br>
top label<br>
classification is based on mpls-exp field (EXP=3D0) of the<br>
topmost received MPLS frame<br>
<br>
<br>
MPLS TE and DiffServ<br>
is diffserv good enough to determine end to end quality<br>
of service? nope.<br>
what happens if there's no congestion, but a link<br>
fails?<br>
when link fails, and reroute happens across a new<br>
link; the new link gets congested due to combined<br>
traffic. <br>
You may need to engineer your traffic on non-optimal<br>
path to assure enough bandwidth will be ready for it.<br>
<br>
So you have BW optimization and congestion management<br>
in parallel<br>
TE + DiffSErve<br>
spread traffic around with more flexibility than IGP<br>
supports.<br>
<br>
MPLS labels can be used to engineer explicit paths<br>
tunnels are uni-directional<br>
<br>
How does MPLS TE work?<br>
Explicit routing<br>
constraint-based routing<br>
admission control<br>
protection capabilities<br>
RSVP-TE to establish LSPs<br>
ISIS and OSPF extensions to advertise link attributes<br>
<br>
Diffserv aware TE<br>
per-class constraint based routing<br>
per class admission control<br>
so best effort can go on one link, while low-latency<br>
can be shifted along a different link.<br>
<br>
Link BW distributed in pools of BW constraints (BC)<br>
up to 8 BW pools<br>
different BW pool models<br>
<br>
Maximum Allocation Model (MAM)<br>
Maximum Reservable Bandwidth (MRB)<br>
BC0: 20% Best Effort (admission class 1)<br>
BC1: 50% Premium (admission class 2)<br>
BC2: 30% Voice (admission class 3)<br>
<br>
Per class traffic engineering concept; all 3 sum to MRB<br>
If for any reason the part of traffic hard reserved isn't<br>
being used, it's wasted; nobody gets to burst into it.<br>
No sharing of unused capacity. But simple, independent.<br>
<br>
DS-TE BW Pools--Russian Dolls Model (RDM)<br>
BW pool applies to one or more classes<br>
Global BW pool (BC0) equals MRB<br>
BC0...BCn used for computing unreserved BW for<br>
class n<br>
so BC0: MRB (best effort + premium + voice)<br>
BC1: 50% premium + voice<br>
BC2: 30% Voice<br>
<br>
Downside is higher bandwidth class may push out some<br>
lower traffic that was flowing already.<br>
<br>
Aggregate TE in diffserv network<br>
<br>
DS TE and QoS<br>
Diffserve-TE doesn't preclude the necessity of configuring<br>
PHB QoS in the TE path; DiffServ TE operates in conjunction<br>
with QoS mechanisms.<br>
<br>
Traffic engineering is a huge field; so it's hard to cover<br>
in a short period of time.<br>
<br>
Summary:<br>
QoS techniques<br>
effective allocation of network resources<br>
IP DiffServ<br>
Service Differentiation<br>
good starting point, bu doesn't scale that well<br>
MPLS and DiffServ<br>
Builds scalable networks for service providers<br>
DiffServ Tunnelling modes<br>
Scalable and flexible QoS options<br>
Supports Draft Tunneling Mode RFC<br>
DiffServ TE<br>
provides strict point-to-point guarantees<br>
pipe models are your choice, how do you want to<br>
architect your network? What are _your_ traffic<br>
needs?<br>
<br>
When you need to drop traffic, determine how you'll<br>
drop traffic based on DSCP bits so you can set<br>
watermarks on the traffic; some traffic more<br>
lenient about drops, other traffic not so lenient<br>
about drops.<br>
<br>
Question: Fred W. from Bechtel. <br>
With IPv6, there's a 20 byte flow label, rather than<br>
the 8 bit agony of mapping the v4 DSCP bits; does that<br>
give more flexibility, more choices, are there fewer<br>
headaches associated with v6 QoS handling?<br>
Short answer--the presenters aren't as focused on v6<br>
development, so they don't have a concrete answer to<br>
give there, sorry.<br>
<br>
That wraps up the presentation/tutorial at 1715 hours<br>
pacific time.<br>
<br>
<br>
<br>
<br>
------=_Part_12114_28895363.1139872742631--