MBGP Instability

Preliminary Results from summer '98 MBGP stability study

...describe MBGP monitor...

DVMRP->MBGP Redistribution

A major contributor to the instability of the MBGP table is the use of the "redistribute dvmrp" command. Obviously, importing routes from a distance-vector protocol with so much known instability is going to import instability. However, even if the redistribute command is used with a route-map importing only a few (or even no) routes, it can cause instability for all routes in the DVMRP table, and perhaps even all routes in the MBGP table.

When configuring the MBGP monitor as an iMBGP peer, we see traces like the following. (This is an arbitrary route picked as an example; there are thousands of others).
TimeEventRouteAS-Path
16:01:44.538report209.249.0.0/17none
16:01:59.244report209.249.0.0/1710888,6461
16:03:00.371report209.249.0.0/17none
16:03:23.680report209.249.0.0/1710888,6461
16:04:20.464report209.249.0.0/17none
16:04:35.518report209.249.0.0/1710888,6461
16:05:36.452report209.249.0.0/17none
16:05:51.252report209.249.0.0/1710888,6461
16:06:54.125report209.249.0.0/17none
16:07:09.034report209.249.0.0/1710888,6461
16:08:14.550report209.249.0.0/17none
16:08:29.342report209.249.0.0/1710888,6461
16:09:36.567report209.249.0.0/17none
16:09:51.493report209.249.0.0/1710888,6461

Even though the router was configured with "redistribute dvmrp route-map local-routes", where local-routes only allows one route from the DVMRP table, note that this completely unrelated DVMRP route was imported (i.e. the entries with no AS-path) for about 15 seconds out of every minute.

When the monitor is configured as an eMBGP peer, the eBGP timers help lessen the instability - normally, the route doesn't update before the wrongly-imported route gets removed from the table again. However, the monitor still sees multiple updates with no information changing, which can be explained by the route being marked as changed due to the changes that occurred "behind the scenes" before the timer allowed an update.

The wrongly-imported routes do not appear to have any route-map set operations applied to them, so for example they get the default local-pref (100) and weight (32768) and no community. This can cause trouble if you are depending on any sets that occur in the route-map to do proper routing (which is generally the case).

These wrongly-imported routes are sometimes visible in the MBGP table on the router itself; "show ip mbgp | include 0.0.0.0.*32768" will sometimes show routes that you don't mean to be importing. Since the routes only live in the table for a short while, it's of course just chance whether or not you catch the routes.

Ramifications of this problem

This problem basically makes it infeasible to import DVMRP routes into MBGP. This severely affects the transition plan from the DVMRP-based MBone to a partial-DVMRP/partial-MBGP multicast infrastructure, and even affects the long-term expected state where there will still be DVMRP clouds at leaves and those DVMRP routes may want to be imported.

Other related results

It looks like there's some instability that's not due to this bug; in particular lots of routes in AS 24 and 1800 tend to flip back and forth between an AS path of 10888,24 and 10888,1800,24 or 10888,1800 and 10888,24,1800. However, AS 24 is using DVMRP route redistribution, so it's not clear whether or not it is the same problem. It may be that this problem can apply to the entire MBGP table, whether or not the associated routes are also in the DVMRP table at the time.

Sample of related problem:
TimeEventRouteAS-Path
18:56:38.195reported193.10.252.48/3010888,24,1800,2603
18:56:38.196reported193.10.252.148/3010888,24,1800,2603
18:56:38.197reported193.64.247.0/2410888,24,1800,2603
18:56:38.197reported193.66.4.0/2410888,24,1800,2603
18:56:38.198reported193.216.233.0/2410888,24,1800,2603
18:56:46.987withdraw193.10.252.48/30
18:56:46.987withdraw193.10.252.148/30
18:56:46.988withdraw193.64.247.0/24
18:56:46.988withdraw193.66.4.0/24
18:56:46.988withdraw193.216.233.0/24

Picking a single route, we see:
TimeEventRouteAS-PathCommunity
18:55:47.035reported193.10.252.48/3010888,24,1800,260310888:777
18:55:50.368reported193.10.252.48/3010888,1800,2603none
18:56:38.195reported193.10.252.48/3010888,24,1800,260310888:777
18:56:46.987withdraw193.10.252.48/30

Note that it flaps back and forth between 10888,24,1800 and 10888,1800. Note also that it flaps back and forth between having community 10888:777 and not, meaning that whether or not it is being exported to DVMRP is also flapping.


Bill Fenner - <fenner@research.att.com>

This material is based upon work supported by the National Science Foundation under Grant No. 9729498. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF).