logo

Recent tweets
@zourzouvillys Sounds a bit grim. We should catch up at some point -- will message you over the weekend :-)
Nice evening out with @gregorymead in Islington - good cocktails, food and company. I had forgotten the last tube horror though.
Kicking back with uni friend. Sometimes it's great not to be too nerdy. This week has mauled me.
 
Cycling Routes
 
 
blog.rob.sh
CV Update
I am currently actively interested in new opportunities due to changing circumstances with my current role. I've therefore uploaded a current curriculum vitae to this site.
Tagged in: Work, Me
I've had a couple of mails relating to this PSN, which again references the research that Andy Davidson, Jonathan Oddy and I did last year. It seems that some of the sources of the initial mailing list posts we made are gone (particularly the merit.edu one that is referenced from both Juniper's site and most other places). For that reason, I've included both the mails that we sent to NANOG/C-NSP/J-NSP last year here.


Date: Fri, 16 Jan 2009 12:57:19 +0000
From: Rob Shakir 
To: cisco-nsp@puck.nether.net, nanog@nanog.org
Subject: BGP Session Teardown due to AS_CONFED_SEQUENCE in AS4_PATH
Message-ID: <20090116125718.GB26415@bronze.eng.gxn.net>


Strict RFC 4893 (4-byte ASN support) BGP4 implementations are vulnerable to a
session reset by distant (not directly connected) ASes. This vulnerability is a
feature of the standard, and unless immediate action is taken an increasingly
significant number of networks will be open to attack. Accidental triggering of
this vulnerability has already been seen in the wild, although the limited
number of RFC 4893 deployments has limited its effect.  

Summary:
It is possible to cause BGP sessions to remotely reset by injecting invalid data
into the AS4_PATH attribute provided to store 4-byte ASN paths. Since AS4_PATH
is an optional transitive attribute, the invalid data will be transited through
many intermediate ASes which will not examine the content. To be vulnerable, an
operator does not have to be actively using 4-byte AS support. This problem was
first reported by Andy Davidson on NANOG in December 2008 [0], furthermore we
have been able to demonstrate that a device running Cisco IOS release
12.0(32)S12 behaves as per this description.

Details:

When a prefix is learnt from a BGP neighbour that does not support 4-byte ASNs,
the AS4_PATH attribute is retained, and appended to UPDATE messages sent to
other neighbours [1, 3]. RFC4893 specifies that AS_CONFED_SEQUENCE and
AS_CONFED_SET are invalid in an AS4_PATH, the intention of which is to ensure
that an AS with a mix of AS4-aware BGP speakers, and AS4-unaware BGP speakers
does not propagate confederation AS paths outside of the confederation [1, 3].
Upon receiving an invalid BGP UPDATE message, a BGP speaker must send a
NOTIFICATION message [2, 6.3], after a NOTIFICATION message, the BGP connection
is closed [2, 4.5].

Analysis of the Reported Path:   

On 10th December 2008, a BGP update was propagated with illegal/invalid
confederation attributes in the AS4_PATH.  When this update was received by AS4
aware BGP speakers, the RFCs described above were interpreted literally and the
session was torn down. Because the illegal attributes were learned on a transit
session, an affected network can have global reachability impaired.

Please note that the analysis of this path describes what we expect to have
happened in this case, it has not been confirmed by any of the ASNs involved.

91.207.218.0/23 
	Path Attributes - Origin: Incomplete 
	Flags: 0x40 (Well-known, Transitive, Complete) 
	Origin: Incomplete (2) 
	AS_PATH: xx xx 35320 23456 (13 bytes) 
	AS4_PATH: (65044 65057) 196629 (7 bytes) 

In this data, the AS_PATH indicates that a prefix is announced by an AS4 speaker
(as indicated by AS23456) and propagated through by AS35320. The AS4_PATH data
shows that the AS4 originator is AS196629, the rest of this path is an
AS_CONFED_SEQUENCE [3, 5]. It would appear that in this case, AS196629 peers
with AS35320, which is AS4-aware on this border. The prefix is then propagated
through AS35320, with the AS4 aware routers appending their ASN to the
AS_CONFED_SEQUENCE. This is in contravention of RFC 4893 [1, 3]. The border
which announces this route to AS35320's upstream does not appear to be
AS4-aware. During normal announcements, the BGP speaker on a border with an
upstream ASN that is not part of the confederation will remove the left-most
AS_CONFED_SETs or AS_CONFED_SEQUENCEs that exist in the AS_PATH [3, 6.1] and
replace them with the confederation identifier. However, due to the fact that
both AS_CONFED_SET and AS_CONFED_SEQUENCE are invalid in an AS4_PATH, then no
such action is taken on the border between an AS4 aware AS, and a non-AS4 aware
AS. In addition, since the AS35320 border is not AS4 aware, then it does not
update the AS4_PATH.

This malformed UPDATE is then sent to AS35320's upstream, if there are no
AS4-aware routers in the path between the AS35320 border, and an AS receiving
this update, the AS4_PATH will not have been analysed. The first AS4-aware
router to receive this update will reset the session towards the neighbour from
whom it receives the update. 

The border which announces this route to AS35320's upstream does not appear to
be AS4-aware; If it were a strict AS4 implementation it would reset the BGP
session due to the malformed AS4_PATH, and a broken implementation that treats
AS4_PATH as an equivalent of the AS_PATH would sanitise the AS4_PATH. This
allows the AS4_PATH containing an AS_CONFED_SET to be passed to neighbouring
networks.

This escape of an AS_CONFED_SET from a network with only partial AS4 support is
exactly the situation that RFC 4893 attempts to avoid by forbidding the presence
of an AS_CONFED_SET in the AS4_PATH. In the ideal world the neighbouring network
receiving an UPDATE containing this obviously malformed AS4_PATH would reset the
session, preventing further propagation and isolating the broken network.

Unfortunately the vast majority of networks do not support AS4 so pass on this
malformed AS4_PATH to their neighbours. The first AS4-aware router to receive
this update will reset the session towards the neighbour from whom it received
the update.

Cisco IOS Behaviour:

In a lab environment, a Cisco 7200 running IOS 12.0(32)S12, which is able to
support 4-byte ASNs, was peered with a Cisco 2811 running 12.4(19). When the BGP
session to the upstream 2811 is established by the 7200, the following log
messages are observed:

*Jan 16 11:29:58.531: %BGP-5-ADJCHANGE: neighbor 193.239.32.2 Up 
*Jan 16 11:30:02.595: %BGP-6-ASPATH: Invalid AS path (65044 65048 65062) 3.21 23456 received from 193.239.32.2: Confederation found in AS4_PATH
*Jan 16 11:30:02.595: %BGP-5-ADJCHANGE: neighbor 193.239.32.2 Down BGP Notification sent
*Jan 16 11:30:02.595: %BGP-3-NOTIFICATION: sent to neighbor 193.239.32.2 3/1 (update malformed) 27
 bytes E0111803 030000FE 140000FE 180000FE 26 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0050 0200 0000
 3540 0101 0240 020C 0205 3D25 2114 89F8 5BA0 5BA0 4003 04C1 EF20 02E0 1118 0303 0000 FE14 0000 
FE18 0000 FE26 0202 0003 0015 0000 5BA0 175B CFDA

The configuration on the 7200 is as follows:

router bgp 65123
no synchronization
bgp log-neighbor-changes
neighbor 193.239.32.2 remote-as 15653
no auto-summary

The BGP session will continue to be reset each time the invalid AS4_PATH is
received.

Possible Impact:

During a BGP conversation, it is expected that a neighbour's UPDATE messages are
sanitised by the immediate neighbour, during a 'normal' BGP conversation, if a
BGP speaker receives an invalid UPDATE, it will teardown the session, and this
invalid UPDATE will not propagate any further. In the case of optional
transitive attributes such as AS4_PATH, this invalid update can be transited
through many ASes, as the content of the invalid attribute in the UPDATE message
is not examined.

In a hypothetical scenario, an AS4 aware service provider (A) has a transit
provider (T) that is not AS4 aware. BGP speaker B, a large distance from A has a
bug affecting their equipment that introduces an AS_CONFED_SET in the AS4_PATH.
Since B's updates are propagated through to A via T, A will tear down the
session to T due to the malformed attribute. This is an out of proportion
reaction as the update may affect only one prefix in a full BGP table. If this
update is also propagated through A's other transit providers A may lose
full-table visibility until one of their transit providers filters the route.
Examining the UPDATE message to establish which route caused session teardown
may be a non-trivial activity.


Conclusion:

Whilst this description may be applied to invalid data in any optional
transitive element, it has a greater impact with AS4_PATH due to the large
number of BGP speakers that currently do not examine any 4-byte ASN data in an
UPDATE. There has been a discussion of this matter on the IETF IDR mailing list
[4], however, due to availability of Cisco IOS containing AS4 support
(12.0(32)S12), and an observation of this problem 'in the wild', we believe that
it is of operational concern to those that are planning on deployment of
AS4-aware platforms [5].

Any input from the operational community relating to this problem is much
appreciated, either publicly, or privately.

Regards,
	Andy Davidson, NetSumo (andy.davidson@netsumo.com),
	Jonathan Oddy, Hostway UK (jonathan.oddy@hostway.co.uk),
	Rob Shakir, GX Networks (rjs@eng.gxn.net)

References:
[0]: Andy Davidson - 91.207.218.0/23 prefix in DFZ - AS3.21 / AS196629 -
    announced with AS_CONFED_SEQUENCE in AS4_PATH - propagated by 35320,
    http://markmail.org/message/3ofvjyggayfxezna
[1]: rfc4893: BGP Support for Four-octet AS Number Space
[2]: rfc4271: A Border Gateway Protocol 4 (BGP-4)
[3]: rfc3054: Autonomous System Confederations for BGP
[4]: Kaliraj Vairavakkalai, Juniper Networks, [Idr] RFC-4893 handling malformed
    AS4_PATH attributes,
    http://www.ietf.org/mail-archive/web/idr/current/msg03368.html
[5]: http://as4.cluepon.net/index.php/Software_Support

Thanks to Will Hargrave (LONAP) for assistance with this document.


Date: Wed, 21 Jan 2009 10:14:24 +0000
From: Rob Shakir 
To: nanog@nanog.org
Subject: Re: BGP Session Teardown due to AS_CONFED_SEQUENCE in AS4_PATH
Message-ID: <20090121101424.GB5577@bronze.eng.gxn.net>
References: <20090116125718.GB26415@bronze.eng.gxn.net>

Hi,

Further to the initial research sent to NANOG, after discussions with a number
of operators, we have compiled some recommendations on the handling of invalid
AS4_PATH attributes. 

Any feedback on these recommendations is appreciated:

As discussed on the IETF IDR list last month, there are concerns relating to the
treatment of AS_CONFED_SET/SEQUENCE in AS4_PATH as described in RFC4893 [0].
Since the last post to that thread the situation has been made more urgent with
the release of Cisco IOS 12.0(32)S12, which responds to malformed AS4_PATH
attributes by sending a NOTIFICATION to the neighbour, and tearing down the BGP
adjacency. This behaviour seems to be required by RFC4721 section 6.3, as there
is no alternative error handling defined in RFC4893. As posted last Friday [1],
and discussed on the IDR list, this strict implementation introduces a new
attack vector by which a BGP session can be torn down due to a an attribute
populated by a distant BGP neighbour. These malformed attributes have already
been seen in the wild as a result of a error in Juniper's implementation of
RFC4893. 

Following discussions with a number of operators, we have attempted to generate
some recommendations relating to the behaviour that would be operationally most
useful when treating the invalid data in the AS4_PATH optional transitive
attribute.

There are two cases to consider when an invalid AS4_PATH is received:
  (1) A path to the prefix is not already known from that neighbour. 
  (2) A path to the prefix has already been learnt from that neighbour; 

In case (1) we recommend that the BGP speaker should discard the UPDATE and log
the fact. The log entry should include the received AS_PATH and
AS4_PATH to aid in debugging.

In case (2) we recommend that the BGP speaker should treat the UPDATE as a
withdrawal of existing path to the prefix. As per case (1) a log entry should be
raised to indicate that this has occurred.

It is quite possible that in both cases this behaviour may result in the BGP
speaker no longer having a valid path to the destination. We foresee that this
lack of a prefix in a BGP speaker's routing table may cause some operational
load initially, however, we feel that this is acceptable, considering the
alternate behaviours.

Should a prefix be injected into the global table with an invalid AS4_PATH, and
should the newly advertised (invalid) path be selected by all upstreams
available to a given ASN then this ASN will lose reachability to the prefix.
Whilst this can be abused we do not see this as more serious than the existing
possibility of malicious injection and blackholing of a prefix by a 3rd party.
As long as the rejection of paths due to invalid AS4_PATHs is clearly reported
to the administrator the source of the problem can be clearly identified. 

We consider that attempting to extract a valid AS4 or AS_PATH from the invalid
UPDATE is a mistake since this allows the propagation of invalid BGP data. In
addition, incorrect implementation of this comparatively complex mechanism by a
vendor may result in loops. By explicitly not installing prefixes with invalid
AS_PATH or AS4_PATH into the routing table, the possibility of loops caused by
these invalid paths is avoided.

The defined behaviour in RFC4893 and RFC4271 has significantly harmful effects
and it seems only by virtue of the fact that the implementations of many vendors
do not strictly comply with the RFCs that this problem has not had the same
impact for every vendor. At the current time, however, one cannot deploy a
4-byte capable Cisco IOS device, or an OpenBGP (current stable release) router
into the global table, without risking teardown of a every session via which a
global table is learnt.

Further discussion of this issue would be much appreciated, as a common and
consistent approach to rectifying the problem will benefit network operators far
more than individual vendor implementing their own solution. Should a consensus
be reached an update to the RFC is required in order to ensure that future
implementations do not exhibit this harmful behaviour.

Kind regards,
       Andy Davidson (NetSumo), andy.davidson@netsumo.com
       Jonathan Oddy (HostWay), jonathan.oddy@hostway.co.uk 
       Rob Shakir (GX Networks), rjs@eng.gxn.net

[0]: http://www.ietf.org/mail-archive/web/idr/current/msg03368.html
[1]: http://www.merit.edu/mail.archives/nanog/msg14345.html

Many thanks to David Freedman (Claranet) for assistance in developing the
recommendations in this document.


In addition to this - it looks like there's some fairly interesting coverage of another Juniper PSN at this blog.
Grupetto Start to the Year
A lovely, albeit slightly chilly, start to 2010 with the Grupetto this morning. About 90km out to Windsor and back to London on one of our normal training routes. Mark also kindly took some photos, a few of which feature me.



Lots more similar rides to come whilst training for Paris-Roubaix and the Tour of Flanders.

Happy New Year!
Tagged in: Cycling
Visualising MPLS-TE Networks

For all network deployments, there is a requirement to present information relating to both topology, and various utilisation statistics to some human operator. In many cases, this process has become so ingrained in network requirements that there are almost ubiquitous solutions to the visualising data - for example, link utilisation is almost always presented via some framework or tool powered by RRDTool. Other tools, such as network "weathermap" diagrams linking this utilisation information into an overview of a network topology are also seen in many NOCs. In most cases, the problem of visualising data relating to a flat MPLS or IP network is solved for most common deployments.

The problem of data presentation is somewhat altered in the case of an MPLS-TE network. Whilst link utilisation graphs, and weathermap diagrams may continue to provide useful information, suddenly the overlaid tunnels within a network are of interest to operators. Whilst a first-line NOC engineer may not be aware of what is indicated by a certain LSR becoming a midpoint for a large number of tunnels within a network, this almost certainly identifies some form of failure, or congestion that should be reacted to. In addition to identifying events such as TE LSP path changes to particular nodes, it is often useful to retain some insight into the constraints within which CSPF is currently selecting paths for LSPs. In order to solve these problems, there's a requirement for an additional set of visualisation tools. In some cases, these tools may be implemented using existing visualisation tools - however, this often results in overly-complex presentations. The intent of this post is to outline some of the challenges, and interesting methods that appear to be available for presenting data relating specifically to MPLS-TE networks - a number of beta-quality solutions in use within AS5413 are presented, however, this post does not intend to define a complete solution.

In order to properly consider the available solutions to any visualisation problem, the data to be presented to an operator needs to be defined. Ideally, the information that is required by an operator is likely to include the following:

  1. The path taken by specific TE LSPs throughout a network -- within an straight MPLS or IP network, it is likely that there is a relatively easy set of nodes through which traffic is forwarded, since within a TE deployment paths are dynamic according to constraints, rather than following the shortest-path, then the presentation of path information to a human is of operational benefit.
  2. The resource consumption on any TE-enabled path -- this requirement is likely to include the presentation of resource utilisation over a specific link by reservations associated with a TE LSP. However, it should also view of the resources consumed by a specific TE LSP, both in terms of reserved bandwidth, and actual traffic forwarded in the interface
  3. Efficiency of path selection across a MPLS-enabled topology -- in many cases, TE is deployed in order to ensure that service level guarantees are upheld. Since typically, these guarantees are made in terms of loss, latency and jitter, the effect of TE path selection on forwarding between specific points in a TE network is of interest to an operator. Where the 'efficiency' of path selection is referred to, this should be interpreted as the proximity of path selection in the network to the optimal path throughout a network (this should be assumed to be the shortest-path as would be selected typically by an IGP).
In order to address requirement 1 above - there are two types of visualisation tool that is of direct interest to an operator - a view based on a specific node, and a whole path view. For a specific LSP, a per-node view showing the egress interface for each LSP is relatively trivial to produce from the output of a command such as show ip rsvp reservation or similar. An example of this form of diagram (as produced by one of the tools built for AS5413 use) is below.

A significant limitation of producing static images such as the example above, is that it is difficult to visualise changes between egress interfaces. The node in the example is a midpoint for a small number of nodes running MPLS TE - during the time at which the example output was gathered, a small number of test PEs had been deployed with automatic bandwidth adjustment on each TE LSP, and hence the number of TE reservations is relatively small. In order to determine egress paths for each LSP on the node, and hence determine the cause of any imbalance across equal-cost paths, a diagram such as this can be useful. However, to properly view the changes in paths over time, a relatively large number of diagrams of this nature need to be examined. It is likely that a better result would be achieved by having some form of dynamically updating graphic representation.

The second type of diagram requires a topological view of the network - it is almost impossible to represent all TE paths for which a specific node is the head-end in a network on a single diagram whilst retaining a clear, easy-to-parse image. Producing per-LSP diagrams is possible, but again produces some difficulty in representing the changes of path for an LSP. Diagrams can quickly become convoluted if all possible paths are included. For example, the diagram below shows all possible NHOP and NNHOP LSRs for a specific node within the AS5413 network topology. Where there is a secondary device via which all nodes can be seen, the topology becomes relatively complex very quickly. The problem of finding a computational method to produce layouts that are clear for human interpretation is relatively complex, with Graphviz being the standard OSS tool utilised for this purpose.

Should there be a requirement to scale a diagram such as the example above to show all possible nodes via which a TE LSP may be routed, and overlay the actual path taken, then a large number of diagrams are likely to be required. In addition, the information presented to an human is unlikely to be easy to parse. Again, a requirement would appear to exist for dynamic means of displaying information, and the ability to adjust an image to show specific nodes or paths of interest would be advantageous.

The second requirement in the list above is somewhat easier to solve, as it mirrors existing visualisations that are utilised for a straight IP/MPLS network. A weathermap-type topology diagram, where each link is marked according to the reservations made on it can easily display information relating to the size of reservations on specific interfaces. Additional information can be shown in such a topology diagram by adjusting the display of specific nodes to reflect the number of LSPs carried. Such a solution can be implemented with relatively minor changes to existing tools such as Network Weathermap. It should be noted that this is likely to require human interaction to develop a set of topologies with a network that are of interest to those using such a weathermap - for instance, certain NOC engineering teams may require specific topologies - or specific subsets of nodes. Displaying every PE within a relatively large network on a single diagram is unlikely to provide useful information to an engineering team.

To monitor SLGs within a network (as per the third requirement above), typically, some form of graphing of an IP SLA probe, or some end-to-end ICMP graph (via a tool such as SmokePing) is used. It is likely that such implementations can continue to be utilised to support SLAs. However, the problem of displaying efficiency of path selection within a network is not as trivial to solve, and indeed is unlikely to ever be a requirement in a network where a packet is routed according to an SPF algorithm. This is not something that has been implemented within AS5413 currently, however, it would appear that some interesting approaches do exist - for example, Karol Kowalik and Martin Collier from Dublin City University present an eye-diagram approach to displaying the efficiency of path selection throughout a TE network. Within their paper, they note that this approach does not scale easily to TE networks above 30 nodes, however, since there appears to be no open reference implementation for this method, it hard to evaluate its performance. This paper appears to be one of the few approaches to showing path efficiency through a TE network that has been discussed.

The post's intention, as discussed, is to present the small number of solutions, and their limitations that have currently been implemented at AS5413 - clearly, these tools are not production quality, or complete at the current time. If there is interest in providing a non-commercial solution (as an alternative to solutions like OPNET, and the Cisco TE tools) that provides information for small to medium SPs, where such an investment does not seem justified, then there may be a possibility to push some collaborative OSS visualisation tools. If you, or your organisation have any such interest, please let me know via the comments.

Tagged in: Code, Tech, MPLS
A quick personal post to break the silence here!

I'm currently very interested in hearing about any UK or EU-based network engineering or architecture opportunities that are out there, especially in SP networks that run MPLS with TE. If anyone has some such opportunity, or knows of something that they think might suit me -- please drop me a mail to rjs@rob.sh for a copy of my CV.

An outline of my CV is available on LinkedIn.

I'm hoping to find some time to put some technical articles together that can be posted here in the near future.
Tagged in: Tech, Work, ISP

rjs@rob.sh sip:rjs@rob.sh
previous posts
contact details
gps logs
 
Fran Buckland [people]
Andy Davidson [people]
rjs ssh key [tech]
rjs pgp key [tech]
Rollapaluza [cycling]
londonfgss [cycling]
CS Grupetto [cycling]
atom [rob.sh]
notebooks [rob.sh]
admin [rob.sh]
rss [rob.sh]
Stolen Bikes [london]
inhabitat [green]
core77 [design]