| |
|
Belgium
NANOG
Code
ic.ac.uk
netnod
Tech
Apple
Geek
Cycling
Route
RFID
rob.sh
Work
Me
Crime
London
ISP
LINX
Food
londonfgss
Rollapaluza
Photography
IPv6
RIPE
Cisco
MPLS
Code
JunOSe
BGP
SDN
IOS
JunOS
Thoughts
MPLS_TE
Grupetto
IETF
UKNOF
Presentations
UKNOF
UK
|
| |
For all network deployments, there is a requirement to present information relating to both topology, and various utilisation statistics to some human operator. In many cases, this process has become so ingrained in network requirements that there are almost ubiquitous solutions to the visualising data - for example, link utilisation is almost always presented via some framework or tool powered by RRDTool. Other tools, such as network "weathermap" diagrams linking this utilisation information into an overview of a network topology are also seen in many NOCs. In most cases, the problem of visualising data relating to a flat MPLS or IP network is solved for most common deployments.
The problem of data presentation is somewhat altered in the case of an MPLS-TE network. Whilst link utilisation graphs, and weathermap diagrams may continue to provide useful information, suddenly the overlaid tunnels within a network are of interest to operators. Whilst a first-line NOC engineer may not be aware of what is indicated by a certain LSR becoming a midpoint for a large number of tunnels within a network, this almost certainly identifies some form of failure, or congestion that should be reacted to. In addition to identifying events such as TE LSP path changes to particular nodes, it is often useful to retain some insight into the constraints within which CSPF is currently selecting paths for LSPs. In order to solve these problems, there's a requirement for an additional set of visualisation tools. In some cases, these tools may be implemented using existing visualisation tools - however, this often results in overly-complex presentations. The intent of this post is to outline some of the challenges, and interesting methods that appear to be available for presenting data relating specifically to MPLS-TE networks - a number of beta-quality solutions in use within AS5413 are presented, however, this post does not intend to define a complete solution.
In order to properly consider the available solutions to any visualisation problem, the data to be presented to an operator needs to be defined. Ideally, the information that is required by an operator is likely to include the following:
-
The path taken by specific TE LSPs throughout a network -- within an straight MPLS or IP network, it is likely that there is a relatively easy set of nodes through which traffic is forwarded, since within a TE deployment paths are dynamic according to constraints, rather than following the shortest-path, then the presentation of path information to a human is of operational benefit.
-
The resource consumption on any TE-enabled path -- this requirement is likely to include the presentation of resource utilisation over a specific link by reservations associated with a TE LSP. However, it should also view of the resources consumed by a specific TE LSP, both in terms of reserved bandwidth, and actual traffic forwarded in the interface
-
Efficiency of path selection across a MPLS-enabled topology -- in many cases, TE is deployed in order to ensure that service level guarantees are upheld. Since typically, these guarantees are made in terms of loss, latency and jitter, the effect of TE path selection on forwarding between specific points in a TE network is of interest to an operator. Where the 'efficiency' of path selection is referred to, this should be interpreted as the proximity of path selection in the network to the optimal path throughout a network (this should be assumed to be the shortest-path as would be selected typically by an IGP).
In order to address requirement 1 above - there are two types of visualisation tool that is of direct interest to an operator - a view based on a specific node, and a whole path view. For a specific LSP, a per-node view showing the egress interface for each LSP is relatively trivial to produce from the output of a command such as show ip rsvp reservation or similar. An example of this form of diagram (as produced by one of the tools built for AS5413 use) is below.
A significant limitation of producing static images such as the example above, is that it is difficult to visualise changes between egress interfaces. The node in the example is a midpoint for a small number of nodes running MPLS TE - during the time at which the example output was gathered, a small number of test PEs had been deployed with automatic bandwidth adjustment on each TE LSP, and hence the number of TE reservations is relatively small. In order to determine egress paths for each LSP on the node, and hence determine the cause of any imbalance across equal-cost paths, a diagram such as this can be useful. However, to properly view the changes in paths over time, a relatively large number of diagrams of this nature need to be examined. It is likely that a better result would be achieved by having some form of dynamically updating graphic representation.
The second type of diagram requires a topological view of the network - it is almost impossible to represent all TE paths for which a specific node is the head-end in a network on a single diagram whilst retaining a clear, easy-to-parse image. Producing per-LSP diagrams is possible, but again produces some difficulty in representing the changes of path for an LSP. Diagrams can quickly become convoluted if all possible paths are included. For example, the diagram below shows all possible NHOP and NNHOP LSRs for a specific node within the AS5413 network topology. Where there is a secondary device via which all nodes can be seen, the topology becomes relatively complex very quickly. The problem of finding a computational method to produce layouts that are clear for human interpretation is relatively complex, with Graphviz being the standard OSS tool utilised for this purpose.
Should there be a requirement to scale a diagram such as the example above to show all possible nodes via which a TE LSP may be routed, and overlay the actual path taken, then a large number of diagrams are likely to be required. In addition, the information presented to an human is unlikely to be easy to parse. Again, a requirement would appear to exist for dynamic means of displaying information, and the ability to adjust an image to show specific nodes or paths of interest would be advantageous.
The second requirement in the list above is somewhat easier to solve, as it mirrors existing visualisations that are utilised for a straight IP/MPLS network. A weathermap-type topology diagram, where each link is marked according to the reservations made on it can easily display information relating to the size of reservations on specific interfaces. Additional information can be shown in such a topology diagram by adjusting the display of specific nodes to reflect the number of LSPs carried. Such a solution can be implemented with relatively minor changes to existing tools such as Network Weathermap. It should be noted that this is likely to require human interaction to develop a set of topologies with a network that are of interest to those using such a weathermap - for instance, certain NOC engineering teams may require specific topologies - or specific subsets of nodes. Displaying every PE within a relatively large network on a single diagram is unlikely to provide useful information to an engineering team.
To monitor SLGs within a network (as per the third requirement above), typically, some form of graphing of an IP SLA probe, or some end-to-end ICMP graph (via a tool such as SmokePing) is used. It is likely that such implementations can continue to be utilised to support SLAs. However, the problem of displaying efficiency of path selection within a network is not as trivial to solve, and indeed is unlikely to ever be a requirement in a network where a packet is routed according to an SPF algorithm. This is not something that has been implemented within AS5413 currently, however, it would appear that some interesting approaches do exist - for example, Karol Kowalik and Martin Collier from Dublin City University present an eye-diagram approach to displaying the efficiency of path selection throughout a TE network. Within their paper, they note that this approach does not scale easily to TE networks above 30 nodes, however, since there appears to be no open reference implementation for this method, it hard to evaluate its performance. This paper appears to be one of the few approaches to showing path efficiency through a TE network that has been discussed.
The post's intention, as discussed, is to present the small number of solutions, and their limitations that have currently been implemented at AS5413 - clearly, these tools are not production quality, or complete at the current time. If there is interest in providing a non-commercial solution (as an alternative to solutions like OPNET, and the Cisco TE tools) that provides information for small to medium SPs, where such an investment does not seem justified, then there may be a possibility to push some collaborative OSS visualisation tools. If you, or your organisation have any such interest, please let me know via the comments.
|
|
At one of the Ericsson R&D days, Professor Scott Shenker - who’s an academic at the University of California in Berkeley, presented on a concept that he calls the ‘software defined network’. Now, if you haven’t seen the presentation - it’s definitely worth watching (it's on YouTube, here), and provides quite an engaging look at the problem of network scaling from the perspective of academia, and especially in terms of a comparison to the more rigorous disciplines of computer science, like OS design.
Now, there are some interesting parallels with the ‘software defined network’ concept, and a couple of issues that I’ve been discussing, working on, or just had some interest in previously.
When considering a network whereby we have a decoupled control-plane, there are great parallels to the argument around centralised-management vs. distributed/dynamic management - insofar that the idea that a centralised control-plane has an overview of exactly what the network is doing is considered in other places. I blogged about this issue previously - albeit through the guise of considering how one provides useful operational tools for MPLS-TE networks. The question in this case is whether providing dynamic path placement, controlled by a distributed set of nodes (essentially with each head-end LSR being responsible for its own path placement, within the constraints set by the network) is better than utilising a centralised, off-line computation mechanism whereby path placement is computed for all network elements, and then rolled out to them. There are some distinct advantages to the latter approach in terms of being able to make a more holistic approach to the problem - and considering the interdependencies of LSPs - however, it results in a complex (and therefore often expensive) centralised element of the system. However, in this case, we are not decoupling the network to the extent that the SDN would want to - we are merely computing the ways that traffic should flow, the actual signalling, FIB programming, and protocol configuration is still provided on a per element basis. This therefore leaves us with a set of distributed systems, that have already had a complex additional layer deployed with them to solve the traffic placement problem - surely the worst of both worlds? The question that interests me, regarding the SDN, is whether pushing all path decision functionality to a central network control-plane results in simplification of the elements within the network (and through removing this complexity, adds some further robustness) or whether removing the means by which each node within the network is responsible for its own survivability results in a system whereby we introduce a SPoF, where all elements are affected by erroneous behaviour, rather than a subset.
Another issue that interests me in this area is around scaleability - right now we have a tight coupling between where the control-plane functionality for a particular interconnection (be it a UNI or NNI) is deployed, and where the physical interconnection takes place. Sure, there are some interim layers that might exist (consider, for instance, the extension of a Layer 3 node by a Layer 2 - or even Layer 1 - domain to backhaul and/or aggregate connectivity) - however, essentially, even where we are able to do this, we have a single point of interconnection into our Layer 3 domain (be it IP/MPLS or pure IP). This interconnection point needs to maintain both connectivity back into the network - i.e. how do I get to each other exit point that I need to be aware of - and the functionality required to support the UNI. It therefore becomes a pinch-point quite quickly.
At the recent IETF armd working-group meeting in Québec City, this was something that was spoken about at some length, particularly focused on the scale concerns of network elements for the ‘Cloud’. In this case, let’s dispense with the poorly defined definition of Cloud, and define the problem as the interconnection to increasingly dense sets of hosts, which are requiring increasingly more Layer 4+ service nodes, which are increasingly becoming multi-tenant. The key point of the discussion was that the problem that I described above - the existence of a single point of interconnection which must support any of the FHRP, address resolution, PE-CE routing protocol functionality required for the interconnect, as well as meshing into the SP network topology. Whilst armd perhaps focuses more on the address resolution element of this. I (and a couple of other network operators - watch this space on this one) think that this is rather more generic. So, how does this tie into the SDN concept? The decoupling of the control-plane and the forwarding-plane provides us a new toolset to solve this problem. If we can ‘outsource’ the routing decision functionality from the physical interconnection point to another element (which may or may not be centralised for the entire SP network), as the SDN would want to do, this starts to give us some flexibility to independently scale the control-plane. This starts to get around the problem that we have a very dense interconnection point physically, since we can just stack up control-planes to be able to provide the functionality we require there. The SDN using this as a ‘centralised’ element manager-type solution is also interesting, since this provides some implication that we have some tolerance to latency between the network manager, and the nodes - which means that it may be possible to place the control-plane in a physically disparate location to the forwarding plane - an interesting new concept.
There’s another benefit of such a disconnected control-plane - even if we just consider some smaller concept (that might look a bit more achievable than the entire-network deployment that Prof. Shenker proposes. At the moment, we’re seeing great demands for FTTx and deployment of more intelligent network elements closer and closer to the edge - this is motivating work like Seamless MPLS. However, this means deploying many relatively complex (and therefore expensive!) systems - perhaps something that works where one can offset costs of one element by additional revenue made by another, but where this isn’t possible, the startup costs of such an issue are high. However, if we consider being able to deploy forwarding-only elements that have an API towards a central control-plane - then we can do two things. Firstly, the edge element can be cheaper - it need only perform those functions needed right at the edge - FIB programming, OAM and QoS - without any control or management functionality for these. This is a concept we’re already seeing in the industry, so nothing new I think. However, then if we have N of these forwarding elements, we can look at the idea of combining this with a hypervisor-esque virtualisation - at this point, our CPU resources can be timeshared - giving us statmux for CPU time, as the drive towards virtualisation for hosts has done. An interesting concept for lower initial cost builds where large numbers of elements are needed.
This discussion rambled on a bit for some initial thoughts, but there are definitely some interesting points that Prof. Shenker raises - I need to have a think about the availability question some more - I’d especially like to do this with a view to looking at how centralised management works within MPLS-TE (and probably even more importantly, MPLS-TP) networks. As always, I’m interested to discuss my view of this - clearly this is something being presented out of academia into the standardisation/design/R&D arena, and hence perhaps doesn’t have a clear, public, operational model yet - so it’s interesting to consider how it might apply to ‘real world’ networks!
|
|
|
|
|