Almost two years ago I wrote a post on this site entitled Some Initial Thoughts on the SDN
. Clearly, since then the SDN concept gained some more legs (and entered a new stage of the hype cycle) - so, where are we right now?
Firstly, I think its fair to say that the concept presented by Scott Shenker of having a single centralised computational element controlling COTS OpenFlow-speaking switches has fallen out of favour somewhat (based on the discussions with other network architects, engineers, and implementors that I have had). Somewhat as predicted, there are real challenges with this approach within high-scale, distributed networks:
Survivability - through centralising a network controller, suddenly we introduce a single point at which centralised computation needs to be performed - which implies that the network controller has a real-time view of the network's state and infrastructure, and is able to react to changes to keep all paths working. As any operator of a network has observed, failure modes and communication of failures even within a single node is not necessarily a reliable process - hence, to remove the ability for nodes to act autonomously to calculate paths, and observe path liveliness seems a clear barrier to providing networks with the availability required of modern IP applications (e.g., linear TV and voice).
Scalability - whilst within 'steady state' operation, a centralised controller is very likely to be able to keep up with processing requests for new paths, and programming elements, this is unfortunately the "easy" part of the controller's job. From an operational perspective, there is a requirement to scale the control-plane such that it can deal with the worst case failure within acceptable time bounds. When we consider failure modes that result in large numbers of paths needing to be recomputed and programmed, then the scalability of the centralised model becomes very questionable. Centralising computation in this case negatively impacts scalability and network performance, rather than enhancing it.
One point that has been raised to me when I've expressed these thoughts is that transport networks have tended to use centralised computation for many years. However, this is not directly analogous to the SDN controller concept. Transport networks that rely on centralised computation tend to perform "set and forget" computation where an A and B path are programmed once, and in-band OAM chooses which path is used, should the A path fail, it is not recomputed, hence avoiding the challenge of needing to scale to large numbers of path computations, and resulting in worse survivability than an IP network.
The other fundamental challenge around the controller concept is the fact that networks of any scale are inherently inter-domain -- even the smallest networks I have worked in have utilised different domains to separate operational elements (e.g., confederations), and the medium and large ones have had multiple platforms, as well as legacy platforms that need to interoperate.
However, clearly, these approaches might have applicability where one constrains the scope and scale of the network -- particularly, utilising this concept within closed datacentre environments might have some applicability (especially where global optimisation is desired).
So -- if the centralised control-plane/COTS forwarding-plane looks somewhat shaky as a view of the "SDN", is there any future? My answer, yes, there definitely should be, but perhaps it won't be the revolution that was originally predicted, and in my personal opinion will be centred around two key concepts that we can take from the use cases that are being mooted for "SDN":
Network programmability - one of the frustrations that is being aired through SDN is the way that it is hard to interact with the network in order for it to be more dynamic. Looking at the datacentre use case, how much of this would be a non-issue if the interfaces through which configuration of edge devices were programmed weren't somewhat clunky (CLI-based screen-scraping...) or very non-standard (SNMP MIBs tend to be the least "standard" standards). This is a traditional SP problem too -- what would be called orchestration within the datacentre context is really just provisioning of new services, or sub-elements of services. A movement towards "SDN" concepts giving us better external programmability of the network would be advantageous to network operation, without requiring large amounts of infrastructure to be removed from the network (a business case that never really stacks up). Starting with extending existing services (e.g., provisioning of forwarding paths through technologies like PCE), or adding new ephemeral state to devices (really extending the on-demand provisioning achievable through RADIUS for subscriber management interfaces to be more general, and not just at authentication time) would give these kind of wins, and start to tease out more use cases where better orchestration/more dynamic provisioning of the network enhances service capabilities.
Global optimisation/orchestration - a few years ago (wow, 4 years ago!) I wrote something around Visualising MPLS-TE Networks, reflecting on the means by which TE-LSP placement and management could be achieved through off-line tools. MPLS-TE is one of those cases where it is possible to achieve some level of global optimisation of resource utilisation (such that we consider forwarding paths on a global network view, rather than having each individual network element be greedy when they are selecting paths), and whilst this behaviour is not always of utility, for a subset of services such overall optimisation is of advantage - yet SPs cannot really use this today. My feeling is that, with the work that we're doing on Segment Routing in the IETF, if we can solve one of the key issues with RSVP-TE (the fact that large amounts of mid-point state is not conducive to simple mid-point devices, and causes scaling issues during large network events), then the idea of having global controllers that are able to select more optimal (non-SPT) forwarding paths, or stitch multiple forwarding paths together is something that we can exploit. Again, it seems to me that starting this by exploiting some of the path calculation tools that we've used before (PCE again!) would give us a way that we can derive some of those benefits of having resource-aware path placement, which may be globally computed, where we require it - exploiting a hybrid centralised and distributed control-plane for the network. If we develop this approach, and it is adopted in SP networks, then it seems to me that the next logical step is how we could consider non-forwarding resource utilisation within the network, to provide more globally efficient utilisation of these functions, and reduce overall unit cost.
Both of these concepts really result in more dynamic networks, which consider overall resource utilisation and efficiency to a greater extent. They're not new ideas -- but if SDN means that they are re-examined such that the way that we instantiate them within the network is thought about again, then perhaps it gives us a good way forward to increase the efficiency of networks and hence realise some economic benefits (the primary motivator for technology change). Better still (for operators), this could be achievable through evolving the current infrastructure and not require wholesale changes in infrastructure, and operational capabilities (albeit, the chosen evolution path may open the door to larger changes in subsequent investment cycles).
I'm sure there are some thoughts that I am being overly pragmatic - and possibly even thoughts that I'm giving "SDN" too much credit. What I'd like to see is ways that we can use new technologies in ways that are realisable, and enhance either the quality of services delivered to users of networks, or simplify or reduce the cost of the infrastructure operated by SPs. To get there, we need evolution, rather than revolution - whilst a coup d'état might be exciting at the time, such revolutions are often bloody, and result in a degradation of experience which I don't feel operators can afford, or service consumers will tolerate.
A question that came up at an event I was at yesterday: How will the time between the first (commercial) deployment of a telephony service, and a regulated universal service obligation for telephony compare to that of the time between the first (commercial) Internet services being deployed and a USO for IP connectivity (e.g., Broadband)?
Based on this, is the cycle time of the telephony regulatory bodies, and mechanisms through which changes are implemented within these bodies suitable for Internet services?
Answers on a postcard please.
On Friday, I presented at the Netnod
meeting in Stockholm, Sweden - again about BGP error handling - this time presenting a bit of an update as to why this continues to be a problem for the Internet (and private BGP deployments) - and why this work is still really relevant. In addition, I tried to give an overview of what the solution space looks like. I'm not sure whether there's video, but as usual, the slides are linked below!
As usual, I'm happy to take questions, comments or further queries on this work - please just let me know!
At one of the Ericsson R&D days, Professor Scott Shenker - who’s an academic at the University of California in Berkeley, presented on a concept that he calls the ‘software defined network’. Now, if you haven’t seen the presentation - it’s definitely worth watching (it's on YouTube, here), and provides quite an engaging look at the problem of network scaling from the perspective of academia, and especially in terms of a comparison to the more rigorous disciplines of computer science, like OS design.
Now, there are some interesting parallels with the ‘software defined network’ concept, and a couple of issues that I’ve been discussing, working on, or just had some interest in previously.
When considering a network whereby we have a decoupled control-plane, there are great parallels to the argument around centralised-management vs. distributed/dynamic management - insofar that the idea that a centralised control-plane has an overview of exactly what the network is doing is considered in other places. I blogged about this issue previously - albeit through the guise of considering how one provides useful operational tools for MPLS-TE networks. The question in this case is whether providing dynamic path placement, controlled by a distributed set of nodes (essentially with each head-end LSR being responsible for its own path placement, within the constraints set by the network) is better than utilising a centralised, off-line computation mechanism whereby path placement is computed for all network elements, and then rolled out to them. There are some distinct advantages to the latter approach in terms of being able to make a more holistic approach to the problem - and considering the interdependencies of LSPs - however, it results in a complex (and therefore often expensive) centralised element of the system. However, in this case, we are not decoupling the network to the extent that the SDN would want to - we are merely computing the ways that traffic should flow, the actual signalling, FIB programming, and protocol configuration is still provided on a per element basis. This therefore leaves us with a set of distributed systems, that have already had a complex additional layer deployed with them to solve the traffic placement problem - surely the worst of both worlds? The question that interests me, regarding the SDN, is whether pushing all path decision functionality to a central network control-plane results in simplification of the elements within the network (and through removing this complexity, adds some further robustness) or whether removing the means by which each node within the network is responsible for its own survivability results in a system whereby we introduce a SPoF, where all elements are affected by erroneous behaviour, rather than a subset.
Another issue that interests me in this area is around scaleability - right now we have a tight coupling between where the control-plane functionality for a particular interconnection (be it a UNI or NNI) is deployed, and where the physical interconnection takes place. Sure, there are some interim layers that might exist (consider, for instance, the extension of a Layer 3 node by a Layer 2 - or even Layer 1 - domain to backhaul and/or aggregate connectivity) - however, essentially, even where we are able to do this, we have a single point of interconnection into our Layer 3 domain (be it IP/MPLS or pure IP). This interconnection point needs to maintain both connectivity back into the network - i.e. how do I get to each other exit point that I need to be aware of - and the functionality required to support the UNI. It therefore becomes a pinch-point quite quickly.
At the recent IETF armd working-group meeting in Québec City, this was something that was spoken about at some length, particularly focused on the scale concerns of network elements for the ‘Cloud’. In this case, let’s dispense with the poorly defined definition of Cloud, and define the problem as the interconnection to increasingly dense sets of hosts, which are requiring increasingly more Layer 4+ service nodes, which are increasingly becoming multi-tenant. The key point of the discussion was that the problem that I described above - the existence of a single point of interconnection which must support any of the FHRP, address resolution, PE-CE routing protocol functionality required for the interconnect, as well as meshing into the SP network topology. Whilst armd perhaps focuses more on the address resolution element of this. I (and a couple of other network operators - watch this space on this one) think that this is rather more generic. So, how does this tie into the SDN concept? The decoupling of the control-plane and the forwarding-plane provides us a new toolset to solve this problem. If we can ‘outsource’ the routing decision functionality from the physical interconnection point to another element (which may or may not be centralised for the entire SP network), as the SDN would want to do, this starts to give us some flexibility to independently scale the control-plane. This starts to get around the problem that we have a very dense interconnection point physically, since we can just stack up control-planes to be able to provide the functionality we require there. The SDN using this as a ‘centralised’ element manager-type solution is also interesting, since this provides some implication that we have some tolerance to latency between the network manager, and the nodes - which means that it may be possible to place the control-plane in a physically disparate location to the forwarding plane - an interesting new concept.
There’s another benefit of such a disconnected control-plane - even if we just consider some smaller concept (that might look a bit more achievable than the entire-network deployment that Prof. Shenker proposes. At the moment, we’re seeing great demands for FTTx and deployment of more intelligent network elements closer and closer to the edge - this is motivating work like Seamless MPLS. However, this means deploying many relatively complex (and therefore expensive!) systems - perhaps something that works where one can offset costs of one element by additional revenue made by another, but where this isn’t possible, the startup costs of such an issue are high. However, if we consider being able to deploy forwarding-only elements that have an API towards a central control-plane - then we can do two things. Firstly, the edge element can be cheaper - it need only perform those functions needed right at the edge - FIB programming, OAM and QoS - without any control or management functionality for these. This is a concept we’re already seeing in the industry, so nothing new I think. However, then if we have N of these forwarding elements, we can look at the idea of combining this with a hypervisor-esque virtualisation - at this point, our CPU resources can be timeshared - giving us statmux for CPU time, as the drive towards virtualisation for hosts has done. An interesting concept for lower initial cost builds where large numbers of elements are needed.
This discussion rambled on a bit for some initial thoughts, but there are definitely some interesting points that Prof. Shenker raises - I need to have a think about the availability question some more - I’d especially like to do this with a view to looking at how centralised management works within MPLS-TE (and probably even more importantly, MPLS-TP) networks. As always, I’m interested to discuss my view of this - clearly this is something being presented out of academia into the standardisation/design/R&D arena, and hence perhaps doesn’t have a clear, public, operational model yet - so it’s interesting to consider how it might apply to ‘real world’ networks!
It's been quite a while since I updated this blog, very lax of me, sorry!
The lack of updates appears more indicative of how busy I appear to have been since presenting the error handling draft work at NANOG (which looks to be the last post!). Since January, I've presented at the IETF in Prague, and then again in Québec City - particularly on a number of aspects of the work that I've been documenting here for some time!
The good news is - we're making some significant progress. Over the last 6 months or so, the work that a number of operators have done, as well as work being focused from particular vendors has been focusing us towards how robust BGP needs to be to meet the operational requirements of the protocol right now. At IETF 80 in Prague, I presented at both the Global Routing Operations WG and Inter-Domain Routing, on the draft that I've described in the presentations linked in previous pages. For those that are interested, the slides for this are linked below.
The response, both at NANOG, and at the IETF meeting to this work has been very positive - I think as I've tried to characterise, there are a lot of operators that understand that this is an issue. Also - and perhaps somewhat surprisingly to me, there are a lot of vendors/implementors of protocols that also agree that this behaviour is very sub-optimal in numerous network deployments. There is significant appetite in the IDR working group to try and solve this issue in a deployable, scaleable manner - which is fantastic. Since BGP is the signalling glue for the Internet, and most modern IP networks, then it's really good that we are able to provide some focus for this issue, which, at the end of the day, will result in a more resilient set of networks.
In addition to such enthusiasm in the IETF IDR working group, GROW accepted the draft that I put together as a working group document - which is great, GROW's charter is almost to provide IDR work items that come from the operations area of the IETF. Pushing these requirements from GROW into IDR, whilst it might sound a bit like just internal workings of the IETF, gives some further credence to the fact that this is required by operators. Given all the discussions that I have had with operators about this issue, and how much of an issue I know this to be, I think that having the IETF process work on this the right way is great. This adoption means that the draft is now called draft-ietf-grow-ops-reqs-for-bgp-error-handling - and is progressing really well - I can't thank a number of people, including Bruno Decraene, Shane Amante, and David Freedman enough for their excellent discussions and suggestions on this subject - IMHO, such inter-operator collaboration is fantastic to see in terms of generally improving the operations, robustness, scalability and management of IP networks in general - and is of huge benefit to both the Internet and general network operations.
But, of course, just a requirements draft is not going to solve the issues that exist in the protocol - however, it does give a framework that gives us something to work around. As such, the point of this post is to point out to any operators that might read here, and not the IETF mailing lists, what actual progress we've been making in the IETF!
- On the issue of preventing all errors having to be responded to with a NOTIFICATION - whilst we don't have a clear draft that says that this will happen with both eBGP and iBGP, there is a clear understanding within the IDR working group that this is the operational demand. The IDR chairs have tasked the WG to produce a single solution 'error handling' draft - this is likely to be heavily based on both the optional transitive error handling draft written by John Scudder (Juniper), and the eBGP errors draft written by Enke Chen and Keyur Patel (Cisco) - this combined error handling document is going to be the cornerstone of the changes that really meet the requirements laid out in the draft I wrote.
- Keyur Patel, Enke Chen and Alton Lo (Cisco) have been doing some fantastic work in terms of looking at the hitless session restart on non-recoverable errors occurring set of requirements outlined in my document - a number of comments from (amongst others) Chris Morrow, prompted some revision of this section of the draft in the -01 version - describing which particular errors are deemed to be non-recoverable. It's safe to say that I've learnt quite a lot about what can go wrong in parsing streams like BGP messages over the last few months - and I've definitely got Alton, Keyur, Jeff Haas, and others to thank on this one. As such, I think the requirements that are now in the draft match up to what the operational requirements are - if you disagree, I'd love to hear from you!
Keyur et al's work has been focused around some discussions that we had in Miami, and then looking at how these ideas would scale (which I know a bunch of us discussed in Prague!) - if you're interested in this, then GR Notification, and Accelerated Convergence for BGP Graceful Restart - both of these essentially meet the requirement to perform some hitless session restart, whilst also looking to make this as scaleable as possible.
- In terms of prefix recovery during an inconsistent RIB state, there are a couple of drafts that are doing this work - but there's still some opportunity for improvement. Deployment issues of ORF are holding up the two that I am co-authoring with Jie Dong (Huawei), and Jakob Heitz (Ericsson) et al - which are described in One-time Extended Community-based ORF and One-time Address-Prefix-based ORF. Alternatives to this exist in how we might implement Route Target Constraint, and also how we might look at being able to deploy other ROUTE REFRESH-based mechanisms. I think, whilst there are some options here, there's still some unanswered questions!
- The final requirement that is outlined in the requirements draft relates to how the BGP protocol can be managed. This has turned out to be one of the most complicated requirements - as I am not certain that there is a direct agreement as to how much should be integrated into the protocol. Whilst Tom Scholl (nLayer) suggested DIAGNOSTIC. As such, DIAGNOSTIC offered much more functionality in terms of a query/answer set of mechanisms, along with some similar functionality in terms of giving means to be able to perform logging. As requested by the IETF IDR chairs at IETF80, Robert Raszuk, David Freedman and I sat down to write a draft combining both ADVISORY and DIAGNOSTIC. The end result (OPERATIONAL), I presented at IETF81 last week - I think we would all be very grateful for any comments (slides linked below)!
Overall, I think we're really making some good progress on this one - I'm hopeful that the requirements draft can be cut and dried, and go to GROW WGLC prior to IETF 82 in Taipei! From then on, I think, as a community, we're really driving some good solutions that, at the end of the day, are going to improve the stability, robustness and operations of the Internet!
As always, comments (via the re-enabled form below) or via e-mail are greatly appreciated - I'm really keen to ensure that we're hitting the right requirements here!