| |
|
Belgium
NANOG
Code
ic.ac.uk
netnod
Tech
Apple
Geek
Cycling
Route
RFID
rob.sh
Work
Me
Crime
London
ISP
LINX
Food
londonfgss
Rollapaluza
Photography
IPv6
RIPE
Cisco
MPLS
Code
JunOSe
BGP
SDN
IOS
JunOS
Thoughts
MPLS_TE
Grupetto
IETF
UKNOF
Presentations
UKNOF
UK
|
| |
Since I'm taking the Computational Physics module in my third year at IC, we're using some libraries like GSL to provide "better-than-default" random number generators and such like. It turns out that those of us using a Mac don't get GSL installed by default with Xcode, or under OS X - and unlike Windows and Linux - there's no instructions on the course site. Here's a really quick way to ensure that you can link against it:
Which should let you link against GSL just fine :-)
|
|
|
I finally got myself an iPhone - and am loving it. It's great how I can now sync my calendars, and address book to my phone without having to worry at all about having six clones of each event on my calendar (which of course, makes it rather difficult to tell what I'm actually meant to be doing that day). However, a topic that has come up a couple of times in discussion with a few friends is that of the iPhone SDK. I feel that the big question here is, "Do Apple have enough incentive to make a fully featured iPhone SDK?".
With a friend moving to work at a VoIP start-up that deals with having a client on mobiles that allows SIP calls to be made, the big discussion we have been having is whether the iPhone SDK will allow a VoIP client to be implemented on it. My initial feeling on this is, no - not initially. It's been documented in a number of places that Apple are taking call revenue from the networks on each iPhone that is sold. If this is true, then it would be against Apple's interest to actually allow a functional SIP implementation make it onto non-hacked iPhones - since they're going to lose money if users start making their calls via SIP rather than via the cell networks (this assumes that Apple get revenue based on all calls - not just based on the actual contract worth).
However, I can't say that I'm sure of this - Apple may only be taking their cut from the revenue that is generated from the subscription fee on each iPhone - rather than the additional calls, and in this case, they might feel that allowing users to utilise VoIP when they're in a hotspot area would be another cool feature that the iPhone can offer. They might also feel that a lot of the iPhone users aren't savvy enough to be using SIP very often - I guess something like Skype from hotspots (or iChat for voice, or Google Talk...) might be more popular with the less tech-savvy userbase. It's hard to call really.
Alternatively, Apple might just wait until the networks stop paying them revenue, and roll the SDK with better network support then, increasing sales on a device that they're no longer making such revenue on.
Either way, the iPhone really just slots in where a phone should, it works with wireless and a mobile-data mechanism without having to think about it at all (although I'd like an 802.1X implementation). Calendar data syncs, Mail accounts sync, my Music syncs, my contacts sync - it integrates with my (primarily Apple based) digital life very well.
I'm impressed with my iPhone - roll on the SDK so that I can start making it do even funkier things!
|
|
So, at the moment, I'm writing a presentation about the operation and the security implications of RFID. During the course of the random searches around the internet, I've found that there's a lot of really, really cool work going with respect to RFID. Even more great than the output on the subject is who is studying it. Lots of really cool observations are coming out of the open source friendly community - some of the best presentations on the subject are from presentations at CCC. Along with projects like OpenPCD, this output is pretty cool!
However, that's not really the point of this post. During the course of reading around, I've found that whilst there's a lot of information around - there's also a lot of FUD that surrounds that information. My presentation is trying to give people (with some physics background) a simple idea of what RFID is, and particularly how it works. Given that I've already done a quick summary of how RFID works, I figured I'd blog about it, so that I can add to the mush of material that you just can't reference online.
I'll discuss a high frequency system - since cards such as MIFARE (which e.g. Oyster uses) work at around 13.56MHz. The RFID system consists of two elements - the reader, and the tag. Tags come in a number shapes - active, passive, and semi-passive. Really, it's the passive tags that I'm interested in. The image below shows the anatomy of a (simple) passive tag. It's composed of an antenna - running around the card, an IC, and a substrate that they're both attached to.
The reader consists of a dipole-antenna, a transceiver, and some controlling electronics (this is hugely simplified, check OpenPCD for much more detail). Obviously, the consideration of the conversation between the reader, and the tag is the interesting part.
- The reader emits a signal from a dipole antenna at a fixed radio frequency. The magnetic field of the signal induces a current in the dipole antenna loop of the tag - hence powering it on.
- The tag is now powered up, and a capacitor in the tag is charged, the current is trapped using a diode. The resulting voltage across the capacitor powers up the IC within the tag. In the really simple case that we're considering - we'll assume that this tag just has a unique ID, and isn't doing anything interesting. The IC replays the unique tag ID (as a digital binary signal). The signal is fed into a transistor, causing the antenna to reflect/absorb more of the signal - hence modulating it.
- The modulated signal shows variations in the frequency which are dependent on the way in which the transistor responded to the input from the IC within the tag. This results in a load-modulated signal - the trace below shows the frequency distribution of the reflected signal from the tag.
The RFID tag information is contained completely in the sidebands of the signal, and the figure above shows that compared to the reader-generated centre frequency - these are extremely weak. The 90dB difference accounts for some of the reason why RFID transmission range is so limited.
- The modulated signal is received by the reader, where it is resolved into a tag ID.
This isn't exactly the most exciting part of RFID, but its the basics of how the technology works, in a fairly friendly format I hope, and without a spin that's trying to present any kind of agenda about RFID. There's a lot more interesting things to look at, and blog about on this subject, so I'll probably discuss those at some other time. But for the time being - I'm interested in this, and hope that this helps someone one day - questions/comments/corrections are welcomed to my mail address.
|
|
For anyone interested, the slides for my RFID presentation are here.
|
|
30 23 28-31 * * [ "`date +%m`" != "`date +%m --date=tomorrow`" ] && /Users/rjs/bin/monthEnd.py 2>&1 >/dev/null
Pretty handy for running on the last day of the month - and should work on Linux.
|
|
Since I've got a few moments, and I've decided to actually write down some rants rather than deciding that I can't be bothered to - I'm going to use some space to single the praises of Django.
I've been using Django for a couple of years now - since around the autumn of 2005, and as such, feel that I've got a pretty good grasp of how the framework works. I haven't really hacked around that much with the innards of Django (although I did propose a patch), however, what I really like about this framework isn't particularly the internals, but just the whole philosophy that there seems to be in terms of building a web application.
Let's face it, when you sit down to write a web application, especially as a sysadmin or a network engineer - your primary motivation isn't to write a web application, your motivation is to get some data out in a very easily readable format to some audience. So, you start looking at how your database should be designed (or retrofitting your code around an existing database), and you start writing code for updating one table, or grabbing data from another. Stop. This is where Django comes in. With Django, you just rapidly develop your data into a set of models, which can be related in a number of different ways - almost everything that you want to do (for generic data handling) can then be done straight from a generic Django view - without you having to go around writing a lot of code for really generic things. It really speeds up web application development.
I love the fact that Django has things like overloadable save(), and delete() methods for each model - it sounds fairly trivial, but it's great to be able to have the application create the ports on a switch when it saves it into the database the first time - and remove them when the user removes the switch. I've found myself able to work on really quite complicated model layouts without getting too tied up in it, because it's just a case of breaking it down simply into models.
I've got a couple of Django based projects that I'm maintaining (hint: you're reading one of them). The other is a portal system for Catalyst2. I've been working on this system for almost 2 years now - it started out as a PHP application, and then morphed into Django when I discovered it. As a PHP app it was just unmanageable, because of the amount of code that I was having to write. Django's inbuilt admin application has let me just worry about actually presenting data to users - and keeping the boring stuff (like adding data to simple database tables) out of the way of the application. It's become a pretty complicated application.
The system tracks ISP assets, things like racks, servers, switches, routers, cabling...the list goes on. For servers it can control APC MasterSwitches via SNMP, it integrates into cacti for network graphs, it can access RANCID SVN repositories to obtain device configurations...I built an SNMP poller that collects traffic data, and produces billing data as both HTML and PDF. I've integrated it with our automated DNS and MX secondary system. The list is pretty huge (hence why this project is almost two years old!).
Before it sounds like I'm just trying to give a list of features that I've written into an application - I'll get to the point. Django and python's flexibility means that I've been able to sit down and write features, I haven't had to sit down and write a lot of generic type functions that add entries to a database, allow them to be updated, and then save them, I haven't had to handle how my site is going to be templated - I've written features. This is the massive difference for me using Django.
Sure, I could get this functionality with just about any framework out there - BUT, Django does this really well - there are very, very few things that you come across and say "Oh, I don't like that" that you can't change. I didn't really like the authentication system that it uses by default - no worries, I can just replace it with an LDAP authentication system for our users, and use the Django one to provide application-specific privileges. Django does things easily, in a sane manner, that you can just code for. What's even better, is it fits in with people like me - who just want to get an application out - but also it fits in with those guys who want to produce just the frontend part of the system - the templating language allows really simple creation of complex pages, without a steep learning curve.
I didn't really plan this entry before I started, but I hope I've got across what I actually wanted to say, Django is a really great framework that's very, very flexible. It's also getting better, they're carefully considering what's added to it, and keeping it so that its very database neutral, and can fit into many, many development styles.
I've written >5,000 lines of just python code in my Django applications, and I'm still finding new and cool things that the framework can do - I really do recommend it for web application RAD!
|
|
Imperial College are currently implementing changes so that you need to access either POP3 or IMAP with SSL enabled, I figured since they didn't list Fetchmail in their new site, then I'd post my configuration (.fetchmailrc) here in case anyone else uses it:
poll icex.imperial.ac.uk
proto pop3
user "USERNAME"
password "PASSWORD"
is "LOCALADDRESS" here
ssl
sslfingerprint "7D:E8:74:1F:E8:B1:E6:15:A6:0C:02:2B:BA:89:BE:4D"
Enjoy.
|
|
I was reading an entry posted by Brett Carr on Nominet's techblog today entitled "ipv6 It just works". Unfortunately, for IPv6, and for the sentiment behind this message (IPv6 can be run pretty easily!), in my experience, IPv6 - it doesn't just work!
It's easy to dismiss the previous sentence, given that many networks aren't designed to run IPv6, and there's kit out there that's just not IPv6-capable yet. When building the AS29636 network, we specified that IPv6-capability was one of the things that would be a requirement of the kit that was going into the new network, not just something that we'd like to have. We work to a similar specification at my current employer, - which ensures that we can deploy IPv6 within a pre-agreed timeframe once we have some commercial drive for it (either from customers, or for business continuity reasons). I think that this is the best way for a SP network to be a the moment - there's no revenue in having IPv6 deployed (generally), but there might be lost revenue when a customer comes to your network with IPv6 as a requirement in their RFQ...
Returning to the reason that I started writing this post - the problems for IPv6 deployment don't just come from the fact that your hardware doesn't necessarily support it, and it isn't just that running IPv6 on your kit might have financial implications for the software licensing that you're going to be deploying (the arbitrary Cisco requirement for advipservices for IPv6 is a completely separate post). There are going to be issues where you don't necessarily expect them - which can be hard to debug, where IPv6 "should just work", it doesn't.
Without mentioning any specifics of a case that was brought to my attention in the last couple of weeks - a customer was having problems getting IPv6 traffic flowing across a layer 2 ethernet circuit. The expectation of this circuit that you can put ethernet frames onto it (and it doesn't really matter what the ethertype is, just that they're valid ethernet frames) - and they are going to be punted down the link, to whatever you terminate the L2 circuit on. With IPv4, this not working would be a disastrous failure - the product just wouldn't be working. However, this particular circuit was not passing frames that contained IPv6 packets. As it turns out, the carrier's equipment in the path contained a firmware bug that was causing the frames containing IPv6-packets to be dropped - and hence, no neighbour-discovery, and no traffic flow between the two ends of the circuit.
This is just one isolated case - but the question is, where else in your network do you have a problem like this one? How much kit that may, right now, be considered something that shouldn't be interfering anywhere above L2, is going to exhibit this type of problem? How much load is this going to cause your NOC? How much time liasing with circuit suppliers, and telcos is going to be spent actually deploying IPv6 on your network? I think these questions are starting to form a basis of why SPs should be startng to roll out IPv6 onto your network now. The lack of transition plan from IPv4 to IPv6, and the fact that IPv6 hasn't had widespread deployment testing across many platforms and transmission media mean that deploying IPv6 in a rush across your network isn't necessarily going to be as easy as you've thought.
Whilst I applaud the fact that Nominet are ensuring that they're going to be ready to run the UK ccTLD with IPv6 nameservers, and that their infrastructure is ready - I don't think that IPv6 is going to be quite as easy to deploy as Brett found in his blog post.
|
|
It took me a few hours over the course of this week to build the RIPE whois server for some internal projects -- given that there seems to be a very limited amount of documentation for the build process, and threads on mailing lists, I'm going to post this here. I hope that it gets picked up by Google.
The first problem that is encountered is that the libtool that is included with the whois server does not support 'modern' tags, such as --tag=CC. This looks to be because the included libtool is somewhat dated. This can be easily fixed by using the system libtool:
[rjs@dbhost whoisserver-nightly]$ mv libtool libtool.old
[rjs@dbhost whoisserver-nightly]$ ln -s `which libtool` .
(NB: after doing this, you should specify --no-all --setup-db --setup-config --install --setup-tests, and _NOT_ --configure, otherwise the existing libtool will just replace the symlink that you've created)
The next problem is that there are a large number of definitions of yywrap() that conflict when they are being linked. On examination, these seem to be of the form:
int yywrap(){
return 1;
}
There are definitions in:
- src/modules/rpsl/syntax.c
- src/modules/rpsl/mnt_routes.lex.c
- src/modules/rpsl/mnt_routes6.lex.c
- src/modules/rpsl/mnt_routes_an.lex.c
that conflict with each other. Simply removing the yywrap function that only returns 1 from each of these files resolves this linker issue.
The next problem is that there is a multiple definition of a 'set_dynamic' function -- I believe that this is a function that's used both in the src/modules/pc/pc_commands.c files, and in the MySQL headers.
/usr/lib/mysql/libmysqlclient_r.a(array.o): In function `set_dynamic':
/home/mysqldev/rpm/BUILD/mysql-4.1.22/libmysql_r/array.c:175: multiple definition of `set_dynamic'
/home/rjs/tmp/whoisserver-nightly/src/.././src/librip.a(pc_commands.o):
/home/rjs/tmp/whoisserver-nightly/src/modules/pc/pc_commands.c:509: first defined here
/usr/bin/ld: Warning: size of symbol `set_dynamic' changed from 366 in
/home/rjs/tmp/whoisserver-nightly/src/.././src/librip.a(pc_commands.o) to 203 in
/usr/lib/mysql/libmysqlclient_r.a(array.o)
The first definition (/home/mysqldev/rpm/BUILD/...) is from the RPM package of the shared MySQL libraries. I've tried the compatibility libraries, as well as versions from MySQL 4.1, and MySQL 4.0. The only way I can find to correct this is to change where the include path is from the files that are provided by the MySQL development libraries, to the shared libraries provided by MySQL:
[rjs@dbhost whoisserver-nightly]$ diff Makefile Makefile.orig
214c214
< MYSQL_LIBS = -L/usr/lib/ -lmysqlclient_r -lz -lcrypt -lnsl -lm
---
> MYSQL_LIBS = -L/usr/lib/mysql -lmysqlclient_r -lz -lcrypt -lnsl -lm
[rjs@dbhost whoisserver-nightly]$ diff src/Makefile src/Makefile.orig
378c378
< MYSQL_LIBS = -L/usr/lib/ -lmysqlclient_r -lz -lcrypt -lnsl -lm
---
> MYSQL_LIBS = -L/usr/lib/mysql -lmysqlclient_r -lz -lcrypt -lnsl -lm
This then allows the whoisd to compile.
However, when starting the server a number of segmentation faults are experienced:
[rjs@dbhost bin]$ ./whoisd_start --config=rip.config --crashes=1
Starting whois-server daemon with configuration
/home/rjs/whoistmp//conf/rip.config from /home/rjs/whoistmp//bin
./whoisd_start: line 165: 23118 Segmentation fault (core dumped) $NOHUP_NICENESS $WHOISRIP -p $pid_file -c ${CONFIG} >> $err_log 2>&1
mv: cannot stat `core': No such file or directory
./whoisd_start: line 165: 23145 Segmentation fault (core dumped) $NOHUP_NICENESS $WHOISRIP -p $pid_file -c ${CONFIG} >> $err_log 2>&1
mv: cannot stat `core': No such file or directory
081008 11:24:36 $WHOISD ended
This is because the MySQL connections do not work correctly out of the box -- what you will need to do is to go to src/SQL/ and create the DB manually:
[rjs@dbhost SQL]$ mysql -utest_db -pPASSWORD test_db < create.tables.sql
[rjs@dbhost SQL]$ mysql -utest_db -pPASSWORD test_db < main.index.1
As long as your $PREFIX/conf/sources.config is correct (correct U/P for the object database, not the admin db), and the rip.conf has the admin DB specified correctly -- then the server should then start.
This was originally going to be an e-mail to ripe-dbm, but I seem to have fixed it during the course of writing the mail!
|
|
Last Friday, Andy Davidson, Jonathan Oddy, and I pushed out some research that has some quite worrying repercussions. Whilst I've heard from a lot of people privately about this matter, there's a big flaw here, and as Andy posted on his blog (which is much more informative than mine, I think!), this is a big problem.
The reason, I think, that we're getting limited public discussion of this exploit (I hesitate to call it an exploit, it's a flaw really, because it's actually a result of the RFC that the problem exists), is because the implementations of 4-byte AS support that are out there already are generally not standards compliant. Let's run down the list:
Maybe there are a couple of interesting points here, why are most vendors not actually complying with this RFC, does this mean that they've spotted what Andy, Jonathan and I have reported on, and dropped this requirement? If this is the case, then I wonder, when IETF IDR is so full of people with @juniper.net, and @cisco.com addresses - why did this ever appear in the first place?
This is a serious flaw in the standards, and despite the fact that today, we reported on how the issue has actually come to pass, this is going to remain open, unless we fix the RFC.
The issue here is, with most (if not all) BGP attributes, there's almost an expectation that the immediate neighbour will sanity-check what their peer has sent - if it's one hop away, you can generally interact directly with that neighbour, and work out what the problem is, there's no-one harmed, as just one session is dropped, by two networks sharing some adjacency. A case in point of this, is the problem that we saw with Cisco not obeying the RFC relating to sending UPDATES before KeepAlives in BGP conversations ( CSCsu84268). As far as I saw, this bug only affected directly connected neighbours, and hence there was no major impact. Now, let's consider what happens the case of the AS4_PATH problem we reported. AS4_PATH is optional transitive in BGP, hence, if you hand it to a non-AS4 speaker, the router will just transmit it along to the peers it advertises the route to. This is a reasonably neat solution, one would think, as AS4 information is transmitted, but it doesn't require every router in the path between two AS4 speakers to understand it, yet they can still get the same information as they could if the path was completely made up of AS4 speakers. Furthermore, by appending 23456 to AS_PATH, then even the non-AS4 speakers understand that there was some AS in this path. However, this also means that if I announce, to a non-AS4 speaker, a completely invalid AS4_PATH, they don't know anything about it, or the contents, and hence can't sanity check it. This results in me being able to tunnel my AS4_PATH across the internet.
Great, so now I've described, in some more chatty language, what we wrote on NANOG, and C-NSP. What does this mean to any operator? Well, if I take a prefix, originate it, and then announce it to the internet, then I can get the first AS4 speaker I find to tear down whatever session they learned my prefix on. If I combine this with injecting some ASNs into the path, so that some networks don't accept it (due to loop prevention), then I can probably work out a way to get _my_ copy of the update across to you. In IOS's logs, you can't even tell who originated that prefix, and it doesn't seem to show the whole AS_PATH/AS4_PATH either. Say I send two prefixes that you learn one via one transit provider, and the other via another, I'll disconnect your full table connectivity.
This isn't even a bug, this is a flaw in the standard. I'd really like to get this fixed, and the way to do that is to get a bunch of operator experience/views, and take it to the IETF. So, if this concerns you (or you're going to need to deploy a new point release -- like 12.0(32)S12 is, or maybe need hardware support with 12.2SRE...), please put some pressure on your vendor, or drop me a note at rjs@eng.gxn.net.
|
|
Further to my previous post - I presented this issue at LINX65 - video and slides can be found below.
Video
Fixed Slides - LINX's PowerPoint install seems to have corrupted my slides on the day.
Comments and feedback are most welcome.
|
|
A quick personal post to break the silence here!
I'm currently very interested in hearing about any UK or EU-based network engineering or architecture opportunities that are out there, especially in SP networks that run MPLS with TE. If anyone has some such opportunity, or knows of something that they think might suit me -- please drop me a mail to rjs@rob.sh for a copy of my CV.
An outline of my CV is available on LinkedIn.
I'm hoping to find some time to put some technical articles together that can be posted here in the near future.
|
|
For all network deployments, there is a requirement to present information relating to both topology, and various utilisation statistics to some human operator. In many cases, this process has become so ingrained in network requirements that there are almost ubiquitous solutions to the visualising data - for example, link utilisation is almost always presented via some framework or tool powered by RRDTool. Other tools, such as network "weathermap" diagrams linking this utilisation information into an overview of a network topology are also seen in many NOCs. In most cases, the problem of visualising data relating to a flat MPLS or IP network is solved for most common deployments.
The problem of data presentation is somewhat altered in the case of an MPLS-TE network. Whilst link utilisation graphs, and weathermap diagrams may continue to provide useful information, suddenly the overlaid tunnels within a network are of interest to operators. Whilst a first-line NOC engineer may not be aware of what is indicated by a certain LSR becoming a midpoint for a large number of tunnels within a network, this almost certainly identifies some form of failure, or congestion that should be reacted to. In addition to identifying events such as TE LSP path changes to particular nodes, it is often useful to retain some insight into the constraints within which CSPF is currently selecting paths for LSPs. In order to solve these problems, there's a requirement for an additional set of visualisation tools. In some cases, these tools may be implemented using existing visualisation tools - however, this often results in overly-complex presentations. The intent of this post is to outline some of the challenges, and interesting methods that appear to be available for presenting data relating specifically to MPLS-TE networks - a number of beta-quality solutions in use within AS5413 are presented, however, this post does not intend to define a complete solution.
In order to properly consider the available solutions to any visualisation problem, the data to be presented to an operator needs to be defined. Ideally, the information that is required by an operator is likely to include the following:
-
The path taken by specific TE LSPs throughout a network -- within an straight MPLS or IP network, it is likely that there is a relatively easy set of nodes through which traffic is forwarded, since within a TE deployment paths are dynamic according to constraints, rather than following the shortest-path, then the presentation of path information to a human is of operational benefit.
-
The resource consumption on any TE-enabled path -- this requirement is likely to include the presentation of resource utilisation over a specific link by reservations associated with a TE LSP. However, it should also view of the resources consumed by a specific TE LSP, both in terms of reserved bandwidth, and actual traffic forwarded in the interface
-
Efficiency of path selection across a MPLS-enabled topology -- in many cases, TE is deployed in order to ensure that service level guarantees are upheld. Since typically, these guarantees are made in terms of loss, latency and jitter, the effect of TE path selection on forwarding between specific points in a TE network is of interest to an operator. Where the 'efficiency' of path selection is referred to, this should be interpreted as the proximity of path selection in the network to the optimal path throughout a network (this should be assumed to be the shortest-path as would be selected typically by an IGP).
In order to address requirement 1 above - there are two types of visualisation tool that is of direct interest to an operator - a view based on a specific node, and a whole path view. For a specific LSP, a per-node view showing the egress interface for each LSP is relatively trivial to produce from the output of a command such as show ip rsvp reservation or similar. An example of this form of diagram (as produced by one of the tools built for AS5413 use) is below.
A significant limitation of producing static images such as the example above, is that it is difficult to visualise changes between egress interfaces. The node in the example is a midpoint for a small number of nodes running MPLS TE - during the time at which the example output was gathered, a small number of test PEs had been deployed with automatic bandwidth adjustment on each TE LSP, and hence the number of TE reservations is relatively small. In order to determine egress paths for each LSP on the node, and hence determine the cause of any imbalance across equal-cost paths, a diagram such as this can be useful. However, to properly view the changes in paths over time, a relatively large number of diagrams of this nature need to be examined. It is likely that a better result would be achieved by having some form of dynamically updating graphic representation.
The second type of diagram requires a topological view of the network - it is almost impossible to represent all TE paths for which a specific node is the head-end in a network on a single diagram whilst retaining a clear, easy-to-parse image. Producing per-LSP diagrams is possible, but again produces some difficulty in representing the changes of path for an LSP. Diagrams can quickly become convoluted if all possible paths are included. For example, the diagram below shows all possible NHOP and NNHOP LSRs for a specific node within the AS5413 network topology. Where there is a secondary device via which all nodes can be seen, the topology becomes relatively complex very quickly. The problem of finding a computational method to produce layouts that are clear for human interpretation is relatively complex, with Graphviz being the standard OSS tool utilised for this purpose.
Should there be a requirement to scale a diagram such as the example above to show all possible nodes via which a TE LSP may be routed, and overlay the actual path taken, then a large number of diagrams are likely to be required. In addition, the information presented to an human is unlikely to be easy to parse. Again, a requirement would appear to exist for dynamic means of displaying information, and the ability to adjust an image to show specific nodes or paths of interest would be advantageous.
The second requirement in the list above is somewhat easier to solve, as it mirrors existing visualisations that are utilised for a straight IP/MPLS network. A weathermap-type topology diagram, where each link is marked according to the reservations made on it can easily display information relating to the size of reservations on specific interfaces. Additional information can be shown in such a topology diagram by adjusting the display of specific nodes to reflect the number of LSPs carried. Such a solution can be implemented with relatively minor changes to existing tools such as Network Weathermap. It should be noted that this is likely to require human interaction to develop a set of topologies with a network that are of interest to those using such a weathermap - for instance, certain NOC engineering teams may require specific topologies - or specific subsets of nodes. Displaying every PE within a relatively large network on a single diagram is unlikely to provide useful information to an engineering team.
To monitor SLGs within a network (as per the third requirement above), typically, some form of graphing of an IP SLA probe, or some end-to-end ICMP graph (via a tool such as SmokePing) is used. It is likely that such implementations can continue to be utilised to support SLAs. However, the problem of displaying efficiency of path selection within a network is not as trivial to solve, and indeed is unlikely to ever be a requirement in a network where a packet is routed according to an SPF algorithm. This is not something that has been implemented within AS5413 currently, however, it would appear that some interesting approaches do exist - for example, Karol Kowalik and Martin Collier from Dublin City University present an eye-diagram approach to displaying the efficiency of path selection throughout a TE network. Within their paper, they note that this approach does not scale easily to TE networks above 30 nodes, however, since there appears to be no open reference implementation for this method, it hard to evaluate its performance. This paper appears to be one of the few approaches to showing path efficiency through a TE network that has been discussed.
The post's intention, as discussed, is to present the small number of solutions, and their limitations that have currently been implemented at AS5413 - clearly, these tools are not production quality, or complete at the current time. If there is interest in providing a non-commercial solution (as an alternative to solutions like OPNET, and the Cisco TE tools) that provides information for small to medium SPs, where such an investment does not seem justified, then there may be a possibility to push some collaborative OSS visualisation tools. If you, or your organisation have any such interest, please let me know via the comments.
|
|
I've had a couple of mails relating to this PSN, which again references the research that Andy Davidson, Jonathan Oddy and I did last year. It seems that some of the sources of the initial mailing list posts we made are gone (particularly the merit.edu one that is referenced from both Juniper's site and most other places). For that reason, I've included both the mails that we sent to NANOG/C-NSP/J-NSP last year here.
Date: Fri, 16 Jan 2009 12:57:19 +0000
From: Rob Shakir
To: cisco-nsp@puck.nether.net, nanog@nanog.org
Subject: BGP Session Teardown due to AS_CONFED_SEQUENCE in AS4_PATH
Message-ID: <20090116125718.GB26415@bronze.eng.gxn.net>
Strict RFC 4893 (4-byte ASN support) BGP4 implementations are vulnerable to a
session reset by distant (not directly connected) ASes. This vulnerability is a
feature of the standard, and unless immediate action is taken an increasingly
significant number of networks will be open to attack. Accidental triggering of
this vulnerability has already been seen in the wild, although the limited
number of RFC 4893 deployments has limited its effect.
Summary:
It is possible to cause BGP sessions to remotely reset by injecting invalid data
into the AS4_PATH attribute provided to store 4-byte ASN paths. Since AS4_PATH
is an optional transitive attribute, the invalid data will be transited through
many intermediate ASes which will not examine the content. To be vulnerable, an
operator does not have to be actively using 4-byte AS support. This problem was
first reported by Andy Davidson on NANOG in December 2008 [0], furthermore we
have been able to demonstrate that a device running Cisco IOS release
12.0(32)S12 behaves as per this description.
Details:
When a prefix is learnt from a BGP neighbour that does not support 4-byte ASNs,
the AS4_PATH attribute is retained, and appended to UPDATE messages sent to
other neighbours [1, 3]. RFC4893 specifies that AS_CONFED_SEQUENCE and
AS_CONFED_SET are invalid in an AS4_PATH, the intention of which is to ensure
that an AS with a mix of AS4-aware BGP speakers, and AS4-unaware BGP speakers
does not propagate confederation AS paths outside of the confederation [1, 3].
Upon receiving an invalid BGP UPDATE message, a BGP speaker must send a
NOTIFICATION message [2, 6.3], after a NOTIFICATION message, the BGP connection
is closed [2, 4.5].
Analysis of the Reported Path:
On 10th December 2008, a BGP update was propagated with illegal/invalid
confederation attributes in the AS4_PATH. When this update was received by AS4
aware BGP speakers, the RFCs described above were interpreted literally and the
session was torn down. Because the illegal attributes were learned on a transit
session, an affected network can have global reachability impaired.
Please note that the analysis of this path describes what we expect to have
happened in this case, it has not been confirmed by any of the ASNs involved.
91.207.218.0/23
Path Attributes - Origin: Incomplete
Flags: 0x40 (Well-known, Transitive, Complete)
Origin: Incomplete (2)
AS_PATH: xx xx 35320 23456 (13 bytes)
AS4_PATH: (65044 65057) 196629 (7 bytes)
In this data, the AS_PATH indicates that a prefix is announced by an AS4 speaker
(as indicated by AS23456) and propagated through by AS35320. The AS4_PATH data
shows that the AS4 originator is AS196629, the rest of this path is an
AS_CONFED_SEQUENCE [3, 5]. It would appear that in this case, AS196629 peers
with AS35320, which is AS4-aware on this border. The prefix is then propagated
through AS35320, with the AS4 aware routers appending their ASN to the
AS_CONFED_SEQUENCE. This is in contravention of RFC 4893 [1, 3]. The border
which announces this route to AS35320's upstream does not appear to be
AS4-aware. During normal announcements, the BGP speaker on a border with an
upstream ASN that is not part of the confederation will remove the left-most
AS_CONFED_SETs or AS_CONFED_SEQUENCEs that exist in the AS_PATH [3, 6.1] and
replace them with the confederation identifier. However, due to the fact that
both AS_CONFED_SET and AS_CONFED_SEQUENCE are invalid in an AS4_PATH, then no
such action is taken on the border between an AS4 aware AS, and a non-AS4 aware
AS. In addition, since the AS35320 border is not AS4 aware, then it does not
update the AS4_PATH.
This malformed UPDATE is then sent to AS35320's upstream, if there are no
AS4-aware routers in the path between the AS35320 border, and an AS receiving
this update, the AS4_PATH will not have been analysed. The first AS4-aware
router to receive this update will reset the session towards the neighbour from
whom it receives the update.
The border which announces this route to AS35320's upstream does not appear to
be AS4-aware; If it were a strict AS4 implementation it would reset the BGP
session due to the malformed AS4_PATH, and a broken implementation that treats
AS4_PATH as an equivalent of the AS_PATH would sanitise the AS4_PATH. This
allows the AS4_PATH containing an AS_CONFED_SET to be passed to neighbouring
networks.
This escape of an AS_CONFED_SET from a network with only partial AS4 support is
exactly the situation that RFC 4893 attempts to avoid by forbidding the presence
of an AS_CONFED_SET in the AS4_PATH. In the ideal world the neighbouring network
receiving an UPDATE containing this obviously malformed AS4_PATH would reset the
session, preventing further propagation and isolating the broken network.
Unfortunately the vast majority of networks do not support AS4 so pass on this
malformed AS4_PATH to their neighbours. The first AS4-aware router to receive
this update will reset the session towards the neighbour from whom it received
the update.
Cisco IOS Behaviour:
In a lab environment, a Cisco 7200 running IOS 12.0(32)S12, which is able to
support 4-byte ASNs, was peered with a Cisco 2811 running 12.4(19). When the BGP
session to the upstream 2811 is established by the 7200, the following log
messages are observed:
*Jan 16 11:29:58.531: %BGP-5-ADJCHANGE: neighbor 193.239.32.2 Up
*Jan 16 11:30:02.595: %BGP-6-ASPATH: Invalid AS path (65044 65048 65062) 3.21 23456 received from 193.239.32.2: Confederation found in AS4_PATH
*Jan 16 11:30:02.595: %BGP-5-ADJCHANGE: neighbor 193.239.32.2 Down BGP Notification sent
*Jan 16 11:30:02.595: %BGP-3-NOTIFICATION: sent to neighbor 193.239.32.2 3/1 (update malformed) 27
bytes E0111803 030000FE 140000FE 180000FE 26 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0050 0200 0000
3540 0101 0240 020C 0205 3D25 2114 89F8 5BA0 5BA0 4003 04C1 EF20 02E0 1118 0303 0000 FE14 0000
FE18 0000 FE26 0202 0003 0015 0000 5BA0 175B CFDA
The configuration on the 7200 is as follows:
router bgp 65123
no synchronization
bgp log-neighbor-changes
neighbor 193.239.32.2 remote-as 15653
no auto-summary
The BGP session will continue to be reset each time the invalid AS4_PATH is
received.
Possible Impact:
During a BGP conversation, it is expected that a neighbour's UPDATE messages are
sanitised by the immediate neighbour, during a 'normal' BGP conversation, if a
BGP speaker receives an invalid UPDATE, it will teardown the session, and this
invalid UPDATE will not propagate any further. In the case of optional
transitive attributes such as AS4_PATH, this invalid update can be transited
through many ASes, as the content of the invalid attribute in the UPDATE message
is not examined.
In a hypothetical scenario, an AS4 aware service provider (A) has a transit
provider (T) that is not AS4 aware. BGP speaker B, a large distance from A has a
bug affecting their equipment that introduces an AS_CONFED_SET in the AS4_PATH.
Since B's updates are propagated through to A via T, A will tear down the
session to T due to the malformed attribute. This is an out of proportion
reaction as the update may affect only one prefix in a full BGP table. If this
update is also propagated through A's other transit providers A may lose
full-table visibility until one of their transit providers filters the route.
Examining the UPDATE message to establish which route caused session teardown
may be a non-trivial activity.
Conclusion:
Whilst this description may be applied to invalid data in any optional
transitive element, it has a greater impact with AS4_PATH due to the large
number of BGP speakers that currently do not examine any 4-byte ASN data in an
UPDATE. There has been a discussion of this matter on the IETF IDR mailing list
[4], however, due to availability of Cisco IOS containing AS4 support
(12.0(32)S12), and an observation of this problem 'in the wild', we believe that
it is of operational concern to those that are planning on deployment of
AS4-aware platforms [5].
Any input from the operational community relating to this problem is much
appreciated, either publicly, or privately.
Regards,
Andy Davidson, NetSumo (andy.davidson@netsumo.com),
Jonathan Oddy, Hostway UK (jonathan.oddy@hostway.co.uk),
Rob Shakir, GX Networks (rjs@eng.gxn.net)
References:
[0]: Andy Davidson - 91.207.218.0/23 prefix in DFZ - AS3.21 / AS196629 -
announced with AS_CONFED_SEQUENCE in AS4_PATH - propagated by 35320,
http://markmail.org/message/3ofvjyggayfxezna
[1]: rfc4893: BGP Support for Four-octet AS Number Space
[2]: rfc4271: A Border Gateway Protocol 4 (BGP-4)
[3]: rfc3054: Autonomous System Confederations for BGP
[4]: Kaliraj Vairavakkalai, Juniper Networks, [Idr] RFC-4893 handling malformed
AS4_PATH attributes,
http://www.ietf.org/mail-archive/web/idr/current/msg03368.html
[5]: http://as4.cluepon.net/index.php/Software_Support
Thanks to Will Hargrave (LONAP) for assistance with this document.
Date: Wed, 21 Jan 2009 10:14:24 +0000
From: Rob Shakir
To: nanog@nanog.org
Subject: Re: BGP Session Teardown due to AS_CONFED_SEQUENCE in AS4_PATH
Message-ID: <20090121101424.GB5577@bronze.eng.gxn.net>
References: <20090116125718.GB26415@bronze.eng.gxn.net>
Hi,
Further to the initial research sent to NANOG, after discussions with a number
of operators, we have compiled some recommendations on the handling of invalid
AS4_PATH attributes.
Any feedback on these recommendations is appreciated:
As discussed on the IETF IDR list last month, there are concerns relating to the
treatment of AS_CONFED_SET/SEQUENCE in AS4_PATH as described in RFC4893 [0].
Since the last post to that thread the situation has been made more urgent with
the release of Cisco IOS 12.0(32)S12, which responds to malformed AS4_PATH
attributes by sending a NOTIFICATION to the neighbour, and tearing down the BGP
adjacency. This behaviour seems to be required by RFC4721 section 6.3, as there
is no alternative error handling defined in RFC4893. As posted last Friday [1],
and discussed on the IDR list, this strict implementation introduces a new
attack vector by which a BGP session can be torn down due to a an attribute
populated by a distant BGP neighbour. These malformed attributes have already
been seen in the wild as a result of a error in Juniper's implementation of
RFC4893.
Following discussions with a number of operators, we have attempted to generate
some recommendations relating to the behaviour that would be operationally most
useful when treating the invalid data in the AS4_PATH optional transitive
attribute.
There are two cases to consider when an invalid AS4_PATH is received:
(1) A path to the prefix is not already known from that neighbour.
(2) A path to the prefix has already been learnt from that neighbour;
In case (1) we recommend that the BGP speaker should discard the UPDATE and log
the fact. The log entry should include the received AS_PATH and
AS4_PATH to aid in debugging.
In case (2) we recommend that the BGP speaker should treat the UPDATE as a
withdrawal of existing path to the prefix. As per case (1) a log entry should be
raised to indicate that this has occurred.
It is quite possible that in both cases this behaviour may result in the BGP
speaker no longer having a valid path to the destination. We foresee that this
lack of a prefix in a BGP speaker's routing table may cause some operational
load initially, however, we feel that this is acceptable, considering the
alternate behaviours.
Should a prefix be injected into the global table with an invalid AS4_PATH, and
should the newly advertised (invalid) path be selected by all upstreams
available to a given ASN then this ASN will lose reachability to the prefix.
Whilst this can be abused we do not see this as more serious than the existing
possibility of malicious injection and blackholing of a prefix by a 3rd party.
As long as the rejection of paths due to invalid AS4_PATHs is clearly reported
to the administrator the source of the problem can be clearly identified.
We consider that attempting to extract a valid AS4 or AS_PATH from the invalid
UPDATE is a mistake since this allows the propagation of invalid BGP data. In
addition, incorrect implementation of this comparatively complex mechanism by a
vendor may result in loops. By explicitly not installing prefixes with invalid
AS_PATH or AS4_PATH into the routing table, the possibility of loops caused by
these invalid paths is avoided.
The defined behaviour in RFC4893 and RFC4271 has significantly harmful effects
and it seems only by virtue of the fact that the implementations of many vendors
do not strictly comply with the RFCs that this problem has not had the same
impact for every vendor. At the current time, however, one cannot deploy a
4-byte capable Cisco IOS device, or an OpenBGP (current stable release) router
into the global table, without risking teardown of a every session via which a
global table is learnt.
Further discussion of this issue would be much appreciated, as a common and
consistent approach to rectifying the problem will benefit network operators far
more than individual vendor implementing their own solution. Should a consensus
be reached an update to the RFC is required in order to ensure that future
implementations do not exhibit this harmful behaviour.
Kind regards,
Andy Davidson (NetSumo), andy.davidson@netsumo.com
Jonathan Oddy (HostWay), jonathan.oddy@hostway.co.uk
Rob Shakir (GX Networks), rjs@eng.gxn.net
[0]: http://www.ietf.org/mail-archive/web/idr/current/msg03368.html
[1]: http://www.merit.edu/mail.archives/nanog/msg14345.html
Many thanks to David Freedman (Claranet) for assistance in developing the
recommendations in this document.
In addition to this - it looks like there's some fairly interesting coverage of another Juniper PSN at this blog.
|
|
After a late programme committee request, I presented on "Enhancing BGP" at UKNOF 16. The presentation was intended to be an update on the current drafts in the IDR working group, and give some encouragement to operators to get involved, and contribute.
I'll put the video up when the Tom at PortFast and Brandon of Bogons have done their excellent job on it. For the meantime, the slides are linked below.
There's also a good add-paths presentation that John Scudder and Dave Ward gave at NANOG here
|
|
So, if we take a moment to look at the following responses to questions that the leaders of the three parties involved in the "Digital Debate" on YouTube gave, concerning the Digital Economy Bill:
Sure, we get some tired old rhetoric, as expected. However, the key point here is that both Labour and the Conservatives appear to believe that they've done the right thing with this bill. Combine this with the fact that Gordon Brown mentioned a broadband rollout as part of the few ideas that he could during one of the debates, and I think that it starts to become apparently that politicians do not understand the UK "Internet" industry. Where there can be comments as to who is being "looked after" by each party (Cameron even mentions that the bill is most important for the media producers (or 'rights holders")) - I think the problem here is that politicians in the UK fundamentally do not understand how this industry operates. I think increasingly we are going to see the UK fall behind in terms of what we can roll-out due to impractical over-taxation, and ideas such as those put forward in the DEA.
Looking simply at two issues:
- Fibre Taxation - in the UK, if a business is to light up a fibre pair, as well as any standard taxes (e.g. VAT) that must be paid, then an additional VoA Business Rate is due on these fibres. This can be up to £500/pair/year outside of London, and £600/pair/year in the London metro region [source: Valuation Office Agency].
Let's look at what this does for the telecommunications industry in the UK, especially for small players. Since such a company probably does not have a DWDM system, then the relatively cheap fibre runs are now taxed quite highly, should such a company then want to start increasing their capacity, then the additional costs are inflated due to taxation. Where larger players might be able to split these rates over a large number of DWDM channels (up to 32 or 64) a smaller provider might only have one channel - and hence the cost of infrastructure or customer links for smaller companies is inflated, due to the fact that they cannot justify the CapEx required for such multiplexing systems. Even for larger players, this isn't encouraging large scale fibre build out. If tax is paid per route-KM for every FTT{H,P,C} deployment, then this adds an additional overhead (in avoidable taxation!) to any such roll-out. Hardly an incentive for a commercial entity to begin such a deployment! Alongside the CapEx, OpEx, and business rates you are required to pay - the UK government will tax you just for lighting up the infrastructure they are encouraging you to build! This alone is not helping with any of the three party's plans for any kind of broadband roll-out, especially to rural areas where there is no profit for commercial entities to roll out such technologies.
- Digital Economy Act - Andrew Cormack of JANET (UK) gave an excellent presentation at UKNOF relating to the DEA. There are two key points here:
- The government (and apparently the Tories) believe that this bill being pushed through in "wash-up" was the right thing to do. Contrast this with the fact that they also appear to be stating that the digital economy (and communications that such an economy provides) is key for Britain. I agree it's key, we're a services based economy, and if more services can be provided utilising the Internet, then one of two things will happen. Either the UK will not be equipped to deliver such services globally, and the "Digital Economy" will mean that these can then be out-sourced to other countries - or the UK will be in a position to grow the services that it can deliver, with the considerable skill of the UK workforce, into both global and European markets. Any bill therefore, that affects the manner in which this "Digital Economy" (by which I'm now referring to ISPs and telcos), should therefore, one would have thought, justify reasonable debate by the fully attended (?!) Commons!
- Westminster appears to have no idea as to who they are legislating for. I am not against ensuring that the creative industries are able to protect their rights - however, this needs to be done in a manner that can be policed without damaging another industry. As Andrew said in his presentation the Government is unsure of how many ISPs are in scope - stating it could be 5, 10, 20 or 450. How can the impact of legislation be considered, if the Government cannot identify the scope? In addition, whilst many rights holders, I would imagine, will say "well, there is very little that is being requested of the ISPs here!" - the technical challenges of implementing mechanisms whereby specific IP addresses, and users can be located, within the timeframes that such complaints appear to take to be progressed, should be costed. I believe that most people within the xSP industry are not going to say "We don't care about your rights as a content producer", however, how can the Government expect our industry to pay directly to police this? We don't care that customer X is pulling data A, B and C - really, once it comes down to working in a larger ISP, we care about getting bit X to endpoint Z whilst ensuring any commercial guarantees that we have made for bit X.
Another concern following these points is that it appears that very few of the UK ISP industry are being directly consulted here. Whilst there may be involvement - it's not something that I have seen mention of particularly amongst smaller ISPs in the community. Government should remember that legislation such as this affects all enterprises within this sector, and hence should consider them. The role of incumbents within this country already affects the delivery of many services, we don't need further legislation to push things further into their favour.
The reason I feel the need to mention this, is that it aggravates me whilst seeing responses such as the above. Politicians cannot absolve themselves of blame for such issues being pushed through in what I feel is quite an undemocratic manner. I'm still not sure who I am going to vote for - but as far as I see it, the huge lack of understanding of the industry within which I work will mean that whoever is in power during the next Parliament will likely not be in the right place to make legislation that actually takes into account how this industry works. Because of this, the UK's economy will suffer - which is a great shame.
|
|
Checking out a few videos that people have linked me to recently I thought that this piece was amazing - really great speed. It also looks like the 5D and 7D are really quite awesome at doing 60fps HD! The video presentation over at vimeo is really cool too!
|
|
With IETF 79 happening last week - I think one of the great things that's coming out of the IDR work leading up to the meeting has been that quite a few drafts have been written around the requirements that exist in BGP for better error handling. I've been vocal about this before, of course, so it's not that surprising that I'm (yet again) banging the drum for this cause, however, we are getting somewhere finally. To that end, I was wanting to air some views on a couple of the drafts that either have benefits to the operational community, or don't quite hit the mark.
The first of these is draft-chen-ebgp-error-handling, this is probably one of the most effective error handling drafts that I've seen in IDR, and starts to put together a framework for a relatively generic means to be able to handle an erroneous packet. In brief, this proposes a means by which the "treat-as-withdraw" mechanism that we discussed as part of the 4-byte ASN issues last year to any UPDATE whereby the NLRI information can be parsed from the packet. What this achieves is that any packet that has an error in an attribute, or is malformed in a way that means it can be parsed to some extent, this routing information will not be propagated past the ingress border of an ASN. This is very useful behaviour, and begins to constrain the invalid routing information, however, it doesn't perhaps go far enough. Both myself, and other representatives of Cable&Wireless have been relatively vocal about stating that we'd want this to be extended to iBGP, rather than constrained only to implementation for eBGP - whilst there may be some argument that this behaviour is initially useful for ensuring that invalid routing information does not propagate between autonomous systems - there continues to be some discussion around the effects of doing this in multi-topology deployments. The major argument against deployment for iBGP is that it may result in an inconsistent RIB across an ASN. I obviously have an ongoing interest in this requirement, so will hopefully be putting together some further work on it over the next few weeks and months.
The other drafts that are of interest are tending to focus on the problems that are likely within implementations whereby 'treat-as-withdraw' is adopted. Primarily, Keyur Patel (Cisco) proposed a mechanism whereby one can determine a start and end of a route refresh. This behaviour is very useful to determine whether a local BGP speaker's RIB is consistent with the RIB of the remote neighbour, hence where prefixes are being treated as withdrawn, it can be determined which prefixes were missing. Since often the generation of erroneous UPDATE packets can be a transient issue, this lets us ensure that during a complete route refresh, there are no prefixes that we could not parse, and hence have some view as to whether we have a complete RIB. In general, this is a means of recovery for those prefixes that are missing from a RIB.
In the discussion of Keyur's draft above, I think one of the key issues that was highlighted, and changed in -01 was around the fact that there is some ambiguity as to what should happen should the RIB continue to be updated in a manner that causes churn - i.e. where is the end of a route refresh actually reached in this case? From this discussion, there were further suggestions that there is a more optimal manner to handle this. For this it is important to remember that when combined with Enke Chen's draft above, a local system can have knowledge of the prefixes that it was unable to receive, since it was able to parse the NLRI, for this reason, it is possible to request specific prefixes be refreshed by the remote system. In order to receive these, a single refresh with specific ORF filters in it would result in these being re-sent by the remote system, with no requirement to consider sending the whole RIB. Where there is local inbound filtering. This is captured in a draft by Jie Dong (Huawei) -- draft-zeng-one-time-prefix-orf. This draft actually gets my support for this and another reason, since it helps solve another issue around maintaining consistent L3VPN routing tables across a set of PEs, without requiring either large amounts of memory consumption by holding the Adj-RIB-In for each peer to be stored in RAM, or a route refresh be sent. This can result in a large number of prefixes being sent from a remote speaker, and where this is an RR, this can actually result in sizeable load on a centralised resource. I've described this in a mail to the IDR mailing list -- however, I'd hope to highlight this in another post that I've got planned for here.
As usual, this post's motivation is more to attempt to highlight the drafts that are available at the moment that I believe solve practical problems that already exist in this protocol - thoughts and comments are appreciated, contact details are available via the links on the right.
|
|
I spoke at LINX71 about the testing that we (C&W) have been doing in the lab with 100GigE - we got a pre-production card and hence had a look at the technology for real. Thanks to LINX, the presentation video can be seen by clicking on the image below.
Once again, however, whatever LINX use as a presentation laptop didn't render my slides properly - even though I'd submitted PDF too! Hence the slides can be found on this site.
|
|
As I presented at UKNOF 18, I have now written an Internet-Draft to address the requirements of Network Operators for how BGP should handle errors in UPDATE messages. The draft can be found on the IETF site, and I'm currently seeking opinions as to whether this reflects the an operational consensus! If you're an Operator (DFZ, MSE or otherwise), it would be great to hear from you!
I'll be presenting the draft at NANOG 51 in Miami on Tuesday - if you're there, feel free to ping me!
|
|
The video from the presentation I gave a NANOG, LINX and UKNOF has now been posted. You can find the video at the following URL - NANOG 51: BGP Error Handling or by clicking on the image below. The full slide deck is also on this site - here.
|
|
It's been quite a while since I updated this blog, very lax of me, sorry!
The lack of updates appears more indicative of how busy I appear to have been since presenting the error handling draft work at NANOG (which looks to be the last post!). Since January, I've presented at the IETF in Prague, and then again in Québec City - particularly on a number of aspects of the work that I've been documenting here for some time!
The good news is - we're making some significant progress. Over the last 6 months or so, the work that a number of operators have done, as well as work being focused from particular vendors has been focusing us towards how robust BGP needs to be to meet the operational requirements of the protocol right now. At IETF 80 in Prague, I presented at both the Global Routing Operations WG and Inter-Domain Routing, on the draft that I've described in the presentations linked in previous pages. For those that are interested, the slides for this are linked below.
The response, both at NANOG, and at the IETF meeting to this work has been very positive - I think as I've tried to characterise, there are a lot of operators that understand that this is an issue. Also - and perhaps somewhat surprisingly to me, there are a lot of vendors/implementors of protocols that also agree that this behaviour is very sub-optimal in numerous network deployments. There is significant appetite in the IDR working group to try and solve this issue in a deployable, scaleable manner - which is fantastic. Since BGP is the signalling glue for the Internet, and most modern IP networks, then it's really good that we are able to provide some focus for this issue, which, at the end of the day, will result in a more resilient set of networks.
In addition to such enthusiasm in the IETF IDR working group, GROW accepted the draft that I put together as a working group document - which is great, GROW's charter is almost to provide IDR work items that come from the operations area of the IETF. Pushing these requirements from GROW into IDR, whilst it might sound a bit like just internal workings of the IETF, gives some further credence to the fact that this is required by operators. Given all the discussions that I have had with operators about this issue, and how much of an issue I know this to be, I think that having the IETF process work on this the right way is great. This adoption means that the draft is now called draft-ietf-grow-ops-reqs-for-bgp-error-handling - and is progressing really well - I can't thank a number of people, including Bruno Decraene, Shane Amante, and David Freedman enough for their excellent discussions and suggestions on this subject - IMHO, such inter-operator collaboration is fantastic to see in terms of generally improving the operations, robustness, scalability and management of IP networks in general - and is of huge benefit to both the Internet and general network operations.
But, of course, just a requirements draft is not going to solve the issues that exist in the protocol - however, it does give a framework that gives us something to work around. As such, the point of this post is to point out to any operators that might read here, and not the IETF mailing lists, what actual progress we've been making in the IETF!
- On the issue of preventing all errors having to be responded to with a NOTIFICATION - whilst we don't have a clear draft that says that this will happen with both eBGP and iBGP, there is a clear understanding within the IDR working group that this is the operational demand. The IDR chairs have tasked the WG to produce a single solution 'error handling' draft - this is likely to be heavily based on both the optional transitive error handling draft written by John Scudder (Juniper), and the eBGP errors draft written by Enke Chen and Keyur Patel (Cisco) - this combined error handling document is going to be the cornerstone of the changes that really meet the requirements laid out in the draft I wrote.
- Keyur Patel, Enke Chen and Alton Lo (Cisco) have been doing some fantastic work in terms of looking at the hitless session restart on non-recoverable errors occurring set of requirements outlined in my document - a number of comments from (amongst others) Chris Morrow, prompted some revision of this section of the draft in the -01 version - describing which particular errors are deemed to be non-recoverable. It's safe to say that I've learnt quite a lot about what can go wrong in parsing streams like BGP messages over the last few months - and I've definitely got Alton, Keyur, Jeff Haas, and others to thank on this one. As such, I think the requirements that are now in the draft match up to what the operational requirements are - if you disagree, I'd love to hear from you!
Keyur et al's work has been focused around some discussions that we had in Miami, and then looking at how these ideas would scale (which I know a bunch of us discussed in Prague!) - if you're interested in this, then GR Notification, and Accelerated Convergence for BGP Graceful Restart - both of these essentially meet the requirement to perform some hitless session restart, whilst also looking to make this as scaleable as possible.
- In terms of prefix recovery during an inconsistent RIB state, there are a couple of drafts that are doing this work - but there's still some opportunity for improvement. Deployment issues of ORF are holding up the two that I am co-authoring with Jie Dong (Huawei), and Jakob Heitz (Ericsson) et al - which are described in One-time Extended Community-based ORF and One-time Address-Prefix-based ORF. Alternatives to this exist in how we might implement Route Target Constraint, and also how we might look at being able to deploy other ROUTE REFRESH-based mechanisms. I think, whilst there are some options here, there's still some unanswered questions!
- The final requirement that is outlined in the requirements draft relates to how the BGP protocol can be managed. This has turned out to be one of the most complicated requirements - as I am not certain that there is a direct agreement as to how much should be integrated into the protocol. Whilst Tom Scholl (nLayer) suggested DIAGNOSTIC. As such, DIAGNOSTIC offered much more functionality in terms of a query/answer set of mechanisms, along with some similar functionality in terms of giving means to be able to perform logging. As requested by the IETF IDR chairs at IETF80, Robert Raszuk, David Freedman and I sat down to write a draft combining both ADVISORY and DIAGNOSTIC. The end result (OPERATIONAL), I presented at IETF81 last week - I think we would all be very grateful for any comments (slides linked below)!
Overall, I think we're really making some good progress on this one - I'm hopeful that the requirements draft can be cut and dried, and go to GROW WGLC prior to IETF 82 in Taipei! From then on, I think, as a community, we're really driving some good solutions that, at the end of the day, are going to improve the stability, robustness and operations of the Internet!
As always, comments (via the re-enabled form below) or via e-mail are greatly appreciated - I'm really keen to ensure that we're hitting the right requirements here!
|
|
At one of the Ericsson R&D days, Professor Scott Shenker - who’s an academic at the University of California in Berkeley, presented on a concept that he calls the ‘software defined network’. Now, if you haven’t seen the presentation - it’s definitely worth watching (it's on YouTube, here), and provides quite an engaging look at the problem of network scaling from the perspective of academia, and especially in terms of a comparison to the more rigorous disciplines of computer science, like OS design.
Now, there are some interesting parallels with the ‘software defined network’ concept, and a couple of issues that I’ve been discussing, working on, or just had some interest in previously.
When considering a network whereby we have a decoupled control-plane, there are great parallels to the argument around centralised-management vs. distributed/dynamic management - insofar that the idea that a centralised control-plane has an overview of exactly what the network is doing is considered in other places. I blogged about this issue previously - albeit through the guise of considering how one provides useful operational tools for MPLS-TE networks. The question in this case is whether providing dynamic path placement, controlled by a distributed set of nodes (essentially with each head-end LSR being responsible for its own path placement, within the constraints set by the network) is better than utilising a centralised, off-line computation mechanism whereby path placement is computed for all network elements, and then rolled out to them. There are some distinct advantages to the latter approach in terms of being able to make a more holistic approach to the problem - and considering the interdependencies of LSPs - however, it results in a complex (and therefore often expensive) centralised element of the system. However, in this case, we are not decoupling the network to the extent that the SDN would want to - we are merely computing the ways that traffic should flow, the actual signalling, FIB programming, and protocol configuration is still provided on a per element basis. This therefore leaves us with a set of distributed systems, that have already had a complex additional layer deployed with them to solve the traffic placement problem - surely the worst of both worlds? The question that interests me, regarding the SDN, is whether pushing all path decision functionality to a central network control-plane results in simplification of the elements within the network (and through removing this complexity, adds some further robustness) or whether removing the means by which each node within the network is responsible for its own survivability results in a system whereby we introduce a SPoF, where all elements are affected by erroneous behaviour, rather than a subset.
Another issue that interests me in this area is around scaleability - right now we have a tight coupling between where the control-plane functionality for a particular interconnection (be it a UNI or NNI) is deployed, and where the physical interconnection takes place. Sure, there are some interim layers that might exist (consider, for instance, the extension of a Layer 3 node by a Layer 2 - or even Layer 1 - domain to backhaul and/or aggregate connectivity) - however, essentially, even where we are able to do this, we have a single point of interconnection into our Layer 3 domain (be it IP/MPLS or pure IP). This interconnection point needs to maintain both connectivity back into the network - i.e. how do I get to each other exit point that I need to be aware of - and the functionality required to support the UNI. It therefore becomes a pinch-point quite quickly.
At the recent IETF armd working-group meeting in Québec City, this was something that was spoken about at some length, particularly focused on the scale concerns of network elements for the ‘Cloud’. In this case, let’s dispense with the poorly defined definition of Cloud, and define the problem as the interconnection to increasingly dense sets of hosts, which are requiring increasingly more Layer 4+ service nodes, which are increasingly becoming multi-tenant. The key point of the discussion was that the problem that I described above - the existence of a single point of interconnection which must support any of the FHRP, address resolution, PE-CE routing protocol functionality required for the interconnect, as well as meshing into the SP network topology. Whilst armd perhaps focuses more on the address resolution element of this. I (and a couple of other network operators - watch this space on this one) think that this is rather more generic. So, how does this tie into the SDN concept? The decoupling of the control-plane and the forwarding-plane provides us a new toolset to solve this problem. If we can ‘outsource’ the routing decision functionality from the physical interconnection point to another element (which may or may not be centralised for the entire SP network), as the SDN would want to do, this starts to give us some flexibility to independently scale the control-plane. This starts to get around the problem that we have a very dense interconnection point physically, since we can just stack up control-planes to be able to provide the functionality we require there. The SDN using this as a ‘centralised’ element manager-type solution is also interesting, since this provides some implication that we have some tolerance to latency between the network manager, and the nodes - which means that it may be possible to place the control-plane in a physically disparate location to the forwarding plane - an interesting new concept.
There’s another benefit of such a disconnected control-plane - even if we just consider some smaller concept (that might look a bit more achievable than the entire-network deployment that Prof. Shenker proposes. At the moment, we’re seeing great demands for FTTx and deployment of more intelligent network elements closer and closer to the edge - this is motivating work like Seamless MPLS. However, this means deploying many relatively complex (and therefore expensive!) systems - perhaps something that works where one can offset costs of one element by additional revenue made by another, but where this isn’t possible, the startup costs of such an issue are high. However, if we consider being able to deploy forwarding-only elements that have an API towards a central control-plane - then we can do two things. Firstly, the edge element can be cheaper - it need only perform those functions needed right at the edge - FIB programming, OAM and QoS - without any control or management functionality for these. This is a concept we’re already seeing in the industry, so nothing new I think. However, then if we have N of these forwarding elements, we can look at the idea of combining this with a hypervisor-esque virtualisation - at this point, our CPU resources can be timeshared - giving us statmux for CPU time, as the drive towards virtualisation for hosts has done. An interesting concept for lower initial cost builds where large numbers of elements are needed.
This discussion rambled on a bit for some initial thoughts, but there are definitely some interesting points that Prof. Shenker raises - I need to have a think about the availability question some more - I’d especially like to do this with a view to looking at how centralised management works within MPLS-TE (and probably even more importantly, MPLS-TP) networks. As always, I’m interested to discuss my view of this - clearly this is something being presented out of academia into the standardisation/design/R&D arena, and hence perhaps doesn’t have a clear, public, operational model yet - so it’s interesting to consider how it might apply to ‘real world’ networks!
|
|
|
|
|