On Deriving AS Links from BGP Snapshots

BGP snapshots dating back to November 1997 are available for public download, and are intended for use by the networking research community to analyse the nature of the prefix space advertised using BGP, and the paths through which these advertisements have travelled.

Much work has been done using this data, and links derived from the BGP data are often used in research. There are, however, subtle complexities in the process of deriving a set of links which I’ll discuss here.

BGP Data

First, the format of the BGP data may differ, depending on what sort of routing hardware/software you’re accessing. The two formats I’m familiar with, which are stored by different Route Views routers, are Cisco’s and MRT. The latter is considerably easier to process easily in bulk.

The Cisco output is human-readable, with one line representing a prefix and a path, with variable numbers of characters possible in some fields, and with columns padded by spaces to produce a readable table. Unfortunately, in some revisions of the software it appears possible for columns to run together where the data is too long for its own field, or for one line to be split across two. To overcome this I built a short AWK script, which I’ll link to in future, which parses out all the prefixes and paths in a given BGP snapshot in this format in a, hopefully, easier to understand format.

BGP Data to Unique Paths

From these BGP snapshots, it is easy to derive a set of unique paths. With a large list of prefixes and a path to origin for each, we need only strip off the prefix field and derive a unique set of paths from the remaining fields. Depending on how you’ve handled your BGP data, this could be as easy as:

_cat $bgpdata cut -d “ “ -f 2- scala com.sdstrowes.util.Uniq > $link_

At this point, I leave all “details” in place. By that, I mean Autonomous System (AS) Sets, or 32-bit AS numbers in asdot/asdot+ notation. Each of these adds a little complexity to the mix in the next step.

Deriving Links from Paths

Of the “details” already mentioned, AS Sets are the trickiest. AS Sets are identifiable points in the BGP routing system where multiple ASes have been advertised with an associated aggregate prefix, subsets of which will apply to each of those ASes. Such aggregation is rare within BGP though. A network participating in BGP is likely to want to peer with, or buy connectivity from, multiple networks. Since BGP operates on a ‘longest-prefix then shortest-path’ principle, if one upstream network aggregates the more-specific prefix, then BGP will always choose one of the other paths.

The sequence of steps my scripts follow is as follows:

  1. Strip off the braces surrounding AS Sets, using sed to remove any of these brace characters: “{}()”. In doing so, AS Sets which contain only one AS are reduced to an ordinary AS in an AS path. I interpret AS Sets with one member as possibly a router misconfiguration, or a set still configured where other members have since stopped advertising, or perhaps where those other networks have been filtered elsewhere in the network. I prefer to keep these as they’re likely to indicate a direct customer/provider link between the aggregating AS and the AS in the set.

  2. Identify the remaining AS Sets as those with a comma-separated list of AS numbers. Remove the entities in each path which contain “,”.

  3. Some AS numbers in the remaining data may be in asdot or asdot+ format. I prefer that all my pairs are listed as plain integers (“asplain” format). This is a really easy transformation:

    def deriveASNum(in: String) = { val asN = in.split(‘.’) if (asN.size > 1) asN(0).toInt * 65536 + asN(1).toInt else asN(0).toInt }

This set of simple transformations leaves a “clean” set of paths, with AS Sets containing more than one member having been removed. (The rationale for removing sets with more than one member is that it’s difficult to say anything of value about the links between these members.) With these clean paths, it’s trivial to walk along the paths to match up pairs of AS numbers, before outputting a final list of pairs representing AS-level connectivity.

Additional techniques, such as using an N-day (say, 5) window to “smooth” links or AS numbers accidentally missing at the time of the snapshot, or accumulating links over time, may help increase the accuracy of the set of links observed. The steps above are the steps I take to derive links for one BGP snapshot only.

What I also deliberately don’t attempt to do here is annotate the links with link types (that is, determine which links are provider-customer, which links are peering links, and which links indicate sibling AS numbers). This is a difficult task which has attracted much research. The output of this research has shown varying degrees of accuracy. I’m just not convinced we can achieve total accuracy without more networks opening up on their network policies, and offering high-level information on their connectivity, and offering their (BGP) views of the network.

Footnote

Posted by Stephen Strowes on Monday, May 3rd, 2010. You can follow me on twitter.

Recent Posts

(full archive)

All content, including images, © Stephen D. Strowes, 2000–2016. Hosted by Digital Ocean.