BGP Error Handling and Enhancements Post IETF-79

              · · ·

With IETF 79 happening last week - I think one of the great things that's coming out of the IDR work leading up to the meeting has been that quite a few drafts have been written around the requirements that exist in BGP for better error handling. I've been vocal about this before, of course, so it's not that surprising that I'm (yet again) banging the drum for this cause, however, we are getting somewhere finally. To that end, I was wanting to air some views on a couple of the drafts that either have benefits to the operational community, or don't quite hit the mark.

The first of these is draft-chen-ebgp-error-handling, this is probably one of the most effective error handling drafts that I've seen in IDR, and starts to put together a framework for a relatively generic means to be able to handle an erroneous packet. In brief, this proposes a means by which the "treat-as-withdraw" mechanism that we discussed as part of the 4-byte ASN issues last year to any UPDATE whereby the NLRI information can be parsed from the packet. What this achieves is that any packet that has an error in an attribute, or is malformed in a way that means it can be parsed to some extent, this routing information will not be propagated past the ingress border of an ASN. This is very useful behaviour, and begins to constrain the invalid routing information, however, it doesn't perhaps go far enough. Both myself, and other representatives of Cable&Wireless have been relatively vocal about stating that we'd want this to be extended to iBGP, rather than constrained only to implementation for eBGP - whilst there may be some argument that this behaviour is initially useful for ensuring that invalid routing information does not propagate between autonomous systems - there continues to be some discussion around the effects of doing this in multi-topology deployments. The major argument against deployment for iBGP is that it may result in an inconsistent RIB across an ASN. I obviously have an ongoing interest in this requirement, so will hopefully be putting together some further work on it over the next few weeks and months.

The other drafts that are of interest are tending to focus on the problems that are likely within implementations whereby 'treat-as-withdraw' is adopted. Primarily, Keyur Patel (Cisco) proposed a mechanism whereby one can determine a start and end of a route refresh. This behaviour is very useful to determine whether a local BGP speaker's RIB is consistent with the RIB of the remote neighbour, hence where prefixes are being treated as withdrawn, it can be determined which prefixes were missing. Since often the generation of erroneous UPDATE packets can be a transient issue, this lets us ensure that during a complete route refresh, there are no prefixes that we could not parse, and hence have some view as to whether we have a complete RIB. In general, this is a means of recovery for those prefixes that are missing from a RIB.

In the discussion of Keyur's draft above, I think one of the key issues that was highlighted, and changed in -01 was around the fact that there is some ambiguity as to what should happen should the RIB continue to be updated in a manner that causes churn - i.e. where is the end of a route refresh actually reached in this case? From this discussion, there were further suggestions that there is a more optimal manner to handle this. For this it is important to remember that when combined with Enke Chen's draft above, a local system can have knowledge of the prefixes that it was unable to receive, since it was able to parse the NLRI, for this reason, it is possible to request specific prefixes be refreshed by the remote system. In order to receive these, a single refresh with specific ORF filters in it would result in these being re-sent by the remote system, with no requirement to consider sending the whole RIB. Where there is local inbound filtering. This is captured in a draft by Jie Dong (Huawei) -- draft-zeng-one-time-prefix-orf. This draft actually gets my support for this and another reason, since it helps solve another issue around maintaining consistent L3VPN routing tables across a set of PEs, without requiring either large amounts of memory consumption by holding the Adj-RIB-In for each peer to be stored in RAM, or a route refresh be sent. This can result in a large number of prefixes being sent from a remote speaker, and where this is an RR, this can actually result in sizeable load on a centralised resource. I've described this in a mail to the IDR mailing list -- however, I'd hope to highlight this in another post that I've got planned for here.

As usual, this post's motivation is more to attempt to highlight the drafts that are available at the moment that I believe solve practical problems that already exist in this protocol - thoughts and comments are appreciated, contact details are available via the links on the right.