ACM Internet Measurement Conference (IMC) 2010: Day 2

Preamble

The notes that follow are a mixture of what each speaker said, or bullets listed on slides (verbatim, with minor US->UK deltas because I was touch-typing), or thoughts of my own. If you were at IMC and you spot errors (likely), do feel free to point them out, and I’ll fix them.

The amount of text written for each is proportional to certain variables: How tired I was during the talk, how interested I was in the talk, how much I knew about the subject material, and how good the speaker was. Note also that these are pretty raw. Questions are omitted if the question was not clear; many responses from the speaker have been shortened for brevity.

Session 6: Topology

Short paper: Primitives for Active Internet Topology Mapping: Toward High-Frequency Characterization

Speaker: Robert Beverly

“What is the topology of the Internet?”
Poorly understood topology (interface, router, or AS level)
Develop/analyse new primitives for active topology discovery
How similar are traceroutes to the same destinationb BGP prefix?
- Use Levenshtein “edit” distance DP algorithm
- Using {0, 1, 2^32}
60% of traces to destinations in same BGP prefix have ED <= 3
Fewer than 50% of random traces have ED <= 10
Variance due to last-hop AS? For 60% of probes to same prefix, we get no additional information beyond leaf AS
Develop three primitives:
- Subnet cenntric probing
- Vantage point sdpreading
- Interface set cover
Goal: Adapt granularity, discover internal structure
- Leverage BGP as coarse structure
- Follow least-common prefix: iteratively pick destinations with prefix that are maximally distant in subnetting sense)
- Address distance is misleading: eg, 18.foo.bar is nearer to 19.0.0.0 than 18.0.0.0
Vantage point spreading:
- discover AS ingress points and paths to the AS via: muiltiple vantage points.
- Using BGP knowledges, maximise the number of distinct VPs per-prefix.
- This is complementary to SCP
Interface set cover
- As shown in preceding analysis, full traces very inefficient
- Generalises doubletree [drfc05] without parametrisation
Developed primitives for faster, more efficient probing
- SCP, ISC, VPS
- Significant load savings without sacrificing fidelity
Q: You map to a two-dimensional space. Is it more abstract? Multi-dimensional? is 2D enough?
A:

Short paper: Resolving IP Aliases with Prespecified Timestamps

Speaker: Justine Sherry

Routers can have dozens of IP addresses
Multiple measurements of the same router may discover differnt IP addresses
Insight: Request multiple timestamps from the same router
Craft a packet such that its reply will convince us that those addresses are aliases
The timestamping at the router will progress until it runs out of aliases that don’t belong to it; fill a packet with all the IP aliases we suspect, and the router will fill the same timestamp.
“Improbably identical timestamps”
Steps:
- Generate candidate pairs
- Craft packets which contains requests for both addresses of the candidate pair
- Categorise results. Look for: Multiple timestamps in the reply; all timestamps should match
Edge cases:
- Some machines provide unsolicited stmps. Can identiy these and remove them.
- 10% of ping-responsive routers respond with four timestamps, but a further 20% stamp with two.
Only 5% of the aliases were false positives
Q: (How?) Were you able to identify the false positives?
A:
Q: False positive would be that your technique says something is an alias, but it’s not.
A: The TTL value, if not decremented, can skew the results.
Q: Is it time to throw out traceroute and design another mapping tool?
A: We have to work with this rag-bag of tools because that’s what we’ve got!
Q: Do you know why this technique is so complementary? Why do routes respond to this technique but not others?
A: Hunch! One manufacturer seems to respond particularly well to the timestamps.
Q: Which routers vendors or OS versions respond well?
A: No. Thought about it, but we don’t have anything to date.

Long paper: On the Impact of Layer-2 on Node Degree Distribution

Speaker: Pascal Mérindol

ally, iffinder
Goals:
- IP networks models and simulations
- Ground truth input for topology generation
Topology discovery using mrinfo
- Uses IGMP messages: ASK_NEIGHBORS; NEIGHBORS_REPLY
- Output: All multicast interfaces of a given router; All multicast neigbours/links
- mrinfo applied recursively: Probe all neighbours; Daily based
Limitations:
- Multicast scope
- IGMP filtering
- Non compliant routers
Advantages
- Network friendly probing: one probe injected per router
- Aliasing: no need to gather IP interfaces
- Forwarding independent; Backup links visible [IMC2009]
- Layer two visiion: Distinguish the IP logical layer over MAC
The node degree distribution follows a heavy-tailed power law? What is the impact of L2 on scale-free and preferential attachment models?
Using our dataset, we can infer several kinds of L2 device
- Focussing on ethernet broadcast networks, we check our inference validity using three routes
- Symmetry, querier election, and subbnet mask
- Three states: Coherent, incoherent, incomplete
Most of L2 inferred devices seem to reveal ethernet broadcast devices such as switches
Revisiting the node degree distribution
- L2-inferred graph (L2-aware)
- L3 graph (L2-agnostic)
The number of edges/connections in the L3 graph is much larger than in the L2 graph
Between 40% and 75% of routers are connected to an L2 device
Most of the L3 connections rely on L2 ethernet switches, even in the L2 graph, almost half of router-tou-router connections go through an L2 node
Why such a great tail shift? Must look at specific distributions in the L2-aware graph
- r: Router degree distrib
- s: Switch deg distrib
- b: L2 degree distrib of routers
- Blah missed one
A large degree router has more L2 neighbours
Conclusion:
- mrinfo-rec is a useful tool for three reasons
- Describe a connected multicast topology at the router level
- can discover backup links (no forwarding dependence)
- able to natively infer L2 devices
Q: What kind of applications perhaps beyond topology modelling rely on degree distributions, and how does the presence of switches change those models?
A: If you consider routing, then if you look only at IP level, you might believe the topology to be much more resilient than the L2 connectivity allows
Q: In the networks you measured, were you able to get any ground truth data to validate the results?
A: No
Q: All the major equipment vendoes have standard config recommendations for customers, including how devices are interconnected for given configs. You should look see if your results are consistent with these recommendations.
Q: Why is the blue (L3) heavier tail than the red (L2)
A: This is what I’ve tried to figure out. In L3 you receive more logical edges than in practice (L2)

Short paper: Eyeball ASes: From Geography to Connectivity

Speaker: Reza Rejaie (?)

Geographic footprint of an AS affects its connectivity. E.g., AS X peers with Y if Y has cetain on geo coverage, or Y has certain number of overlapping PoP locations/IXPs with X
How can we estimate geo- and PoP-footprint of an AS?
This approach complements traditional BGP and traceroute approaches.
- Relying on geo location of users
- More accurate at the edge of the network, eyeball ASes
Four steps:
- Sampling end users (IP address of internet users)
- Mapping end-users to geo locations
- Group end-users by AS using BGP informatioin
- Estimating AS geo-foorprint
Using Kernel density estimation (KDE) method with guassian kernel function
- Probability density function
- KDE presents a weighted average across close-by peers
- Smooth out the error in IP -geo mapping of individual users
- Offers a more aggrgate than user-level view
Largest contour of the density function represents geo footprint
Focus here on city-level resolution for geo-footprint
- Set kernel bandwidth to radius of a city: 50Km
- City level resolution reveals PoP locations
Using larger kernel bandwith leads to a smaller but more reliable set of PoPs for most ASes
Summary:
Promising in identifying geo and pop-footprint of eyeball ASes
Our case study demonstrates how geo information can be used to examine AS topology
Q: Applying single scalar to this function across the board. Hasve you looked at a hierarchical kernel density function?
A: We are doing this right now! Multi-pass views. No concrete results yet.
Q: Unless you have ground-truth information to really check against, how much better can your technique be than the randomised ip-geolocation data that’s already out there? We looked at this, and basically they randomly assign locations within a country.
A: The initial results suggest that to the extent that our ground truth is complete and accurate, we seem to have something. For certain types of AS, we get a good mapping.

Short paper: Towards an AS-to-Organization Map

Xue Cai

AS 6432, 22577, 36561, 15169, 10493: ALL are google
Point: To understand corporate reahc, you need to understand organisations in the AS map
AS to org relationships are important to understand business disputes in the internet
Aim: Automate methods mapping ASes to orgs using WHOIS data
- Org id
- telephone number
- email domain
Our false positive rate is okay, but false negative rate is pretty bad
Link: isi.edu/ant
Q: Tried to map multiple ASes to the same org. Did you look at multiple companies sharing the same AS?
A: Yeah, this is a problem for our clustering. See a lot of joint ventures. Our methodology just now doesn’t solve this problem.

Session 7: Methodology II

Long paper: Comparing and Improving Current Packet Capturing Solutions based on Commodity Hardware

Lothar Braun

Which operating system to use for a capturing system?
Lots of comparisons of operating systems, results are difficult to compare
Which OS performs better? FreeBSD, Linux, Linux with PF_RING improvement…
Issues on multicore systems
- kernel threads
- application threads
Parallelism vs. memory synchronisation
How to schedule the processes? Optimal scheduling depends on the setup…
Hhigh application load on small packets: On linux, higher locad -> better capturing performance both on PF_PACKET and PF_RING
FreeBSD not adffected (performs always worse than linux with higher load)
[ Call to poll() incurs high overhead, more packets involve fewer calls to poll() ]
Reducing calls to poll():
- Active wait (busywait)
  - 1 core -> Cycles wasted
  - 2 core -> Memory synchronisation
- sleep() instead of poll()
  - Give kernel time to move packets
  - Problem: Hard to determine a proper sleep interval
Proposal: Delayed wake()
- Wake is only performed if N packets arrrived, or
- Timeout T has elapsed
Delayed wake improves the capturing performance on moderate application load
Does not harm schenarios with dominating application load
Capturing with TNAPI: Only available for linux and some drivers
- Spawns a kernel thread for the driver
- Reduces in-lernel memory allocation
TNAPI + PF_RING perform best; No memory allocation within kernel is necessary
Possible to capture a gigabit stream
Results: SOmethings have changed since prior studies: Linux now beter than FreeBSD with higher application load.
Q: Why didn’t you use zero-copy buffer option in FBSD?
A: We used it. We didn’t see any improvement.

Short paper: High Speed Network Traffic Analysis with Commodity Multi-core Systems

Speaker: Francesco Fusco

Cores are increasing, memory bandwidth per core is decreasing
Multi queue NICs: Multiple RX/TX queues
Current packet capture software does not explor the increased parallelism
- Sequential polling from multiple queues
- Cache trashing due to compeititoon on the same socket; IRQ handling: packets are getched from one core and processed by another one
- Suboptimal memory utilisation
  - New packet descriptior (sk_buff) allocated for each inccoming packet
  - Unnecerssary packet topy from the kernel to the user space.
PF_RING + Threaded NAPI (TNAPI): A mult-queue aware packet processing framework
Virtual capture devices are efficient data channels
- No sk_buff allocationj (OS bypass)
- Memory mapping from kernel to user space
- 1:1 mapping between queues and CVDs
- Lock-free: 1 polling thread and 1 capture thread per queue
CPU and IRQ affinity settings
- CPU affinity binds threads to cores: pthread_setaffinity_np offered by libpthread in linux
- IRQ affinity bind IRQ’s to cores
Evaluation involved capture from four 1Gbit NIC. If the traffic is balanced, then we can capture 1Gbps per core.
Conclusions:
- Modern commodity hardeware offers opportunities for capturing from highly loaded multi-gigabit links
- NICs are offering new gfeatures to be exploited
  - Dynamically configurable hardware for packet balancing
  - Hardware packet filtering
  - IEEE 1588 synchronisation and high precision time stamping
Q: You do not discuss the core and the hard disk.Bottleneck is surely IO to disk?
A: Fortunately we don’t have to capture the traffic to disk. But if you want to do stream analysis, you don’t want store everything.
Q: Would you compare your techniques with routebrix and packetshader?
A: They are really interesting papers and exploiting similar things. We don’t have any numbers to compare.

Long paper: Network Tomography on Correlated Links

Speaker: Denisa Ghita

Network tomography infers link characteristics from path measurements: Loss rate, delay, congestion, etc
Current tomographic methods assume link independence
Links can be correlated because they share a link, or because of misconfiguration, or because of MPLS
Can we use network tompgraophy when links are correlated? Yes we can!
How to find the possibly correlated links?
- Links in the same local-area network may be correlated!
- Links in the same administrative domain may be correlated!
Our condition: Each siubset of a correlation must be covered by a different set of paths!
Formally prove under which necessary and suggicient condition tohe probabiliyties that links are faulyy are identifiable
Tomographic algorithm determines accurately the probabilities that links are faulty in a variety of congestion schenarios
Q: Why does the algorithm not go wrong when you assume correlation when none exists?
A: We are being very restrictive. We keep independence between correlation sets.

Short paper: Scamper: a Scalable and Extensible Packet Prober for Active Measurement of the Internet

Speaker: Matthew Luckie

Stand-alone packet prober designed for large-scale active measurement of the internet
Implements
TCP, UDP, ICMP, ipv4, ipv6
traceroute: classic, paris, doubletree, mda, pmtud
alias resolution: mercator, ally, radargun, prefixscan, bump
TCP behaviour inference tool (TBIT): ecn, pmtud, others in dev
ping
sting
Q: What’s your support for timestamping?
A: Supports all timestamp options provided by operating systems

Session 8: Edge Networks

Long paper: Netalyzr: Illuminating The Edge Network

Speaker; Christian Kreibich

“illuminating the edge network”
Debugging tools are for experts, or limited
Goal: A network connectivity test suite
- …that anyone can use
- …that is comprehensive
- …and enables longitudinal study
Netalyzer … like breathalyser. Is our ISP sober enough to drive our traffic?
Front-end at ICSI, back-end at EC2
Results derived from 130,000 sessions over ~1 year
Geek bias in dataset
- 11% of users are comcast customers
- 12% of users emply OpenDNS
- 60% of users browse with firefox
DNS problems: 15% of sessions have an effective DNS MTU of 1472B (Including 11% that explicitely advertise > 1472B)
DNS wildcarding is very common
- 29% of sessions, 22% excluding openDNS and comcast
- 43% of affected sessions see it on non-www names
HTTP behaviour
- 8% of sessions are proxied
- Only 5% of sessions see caching: 42% of proxies do not cache if they could; 35% of caches store strongly uncacheable entities
Anti-virus is noticeable
- 10% of sessions could not download “virus”
- <1% for .mp3, .exe, .torrent
Measuring buffer sizes
- Measure end-to-end latency
- Send 10s flood of 1KB UDP datagrams
- - Measure increase in latency: Infer buffer size
- Example: 128KB/s * 2s additional delay == 256KB buffer size
Overbuffering is rampant
Q: Overbuffering. What can we do about it? Where is it being done?
A: Assumption: close to home, probably gateway device.
Q: How small can the buffer be made?
A: “Outwith our control”. Didn’t really answer the question.
Q: DNS MTU. What is causing the bad behaviour?
A: Can’t positively answer that. Often in the ISP’s resolver.

Short paper: An Experimental Study of Home Gateway Characteristics

Speaker; Seppo Hätönen

[Notes omitted because I’m an author…]
Q: Did you see a positive trend as time/firmware versions increase toward the IETF recommendations?
Q: Was there a particular reason why you looked at TCP flows to a particular result?
Q: What are the implications for application developers/vendors/users?

Long paper: Network Traffic Characteristics of Data Centers in the Wild

Speaker: Theophilus Benson

“A 1-millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm.”
Better understanding -> better techniques
- Beter traffic engineering techniques
- Better QoS techniques
- Better energy saving techniques
Examined 10 data centres: 3 classes: universities (edu)/private enterprise (prv)/clouds (cld)
Collected:
- SNMP
- Packet traces
- Topology (in edu/prv classes)
Data centre traffic is bursty
Packet size distribution is bimodal: 200B / 1400B
Intra-rack versus extra-rack results
- Clouds: Mosyt traffic stays within a rack (75%): colocation of apps and dependent components
- Other DCs: > 50% leave the rack; unoptimised placement
Q: You spoke about models of arrival processes. Did you look at other aspects of the arrival process and the model.
A: No, not yet. This is an initial attempt.
Q: How did you calculate demand on a link?
A: SNMP

Short paper: A First Look at Traffic on Smartphones

Ratul Mahajan

Exponential traffic growth. 10 times faster than fixed-line traffic. But we don’t know much about this traffic.
Talk: Preliminary findings from capturing on device
Browsing, email, media, and maps dominate traffic
Small connection sizes lead to high overhead
Throughput is bottlenecked by path loss and socket buffers at servers
Tuning 3G radio timeout can significantly reduce power use with minimal performance impact
60% of traffic is browsing
10% media
10% mmessaging/email
8.5% maps
Lower-layer protocols have high overhead
RTTs are high
Loss rates are high
Throughput is low
- Packet loss is the culprit here. Reduce loss, get higher throughput.
- Possible also that servers (buffers) not tuned for high RTTs in this environment
Q: Surprised sender window has such an impact.
A:
Q: Would people use things differently if performance were not limited?
A: No (?)
Q: What about wifi?
A: Some people use wifi all the time. Some use 3G all the time.

Session 9: Wireless and Mobility

Short paper: Listen to Me If You Can: Tracking User Experience of Mobile Network on Social Media

Speaker: ??

Understanding customer perceived performance is inherently challenging
Proposal: Use online social media as another channel to track customer feedback on service performance
Analyse tweets ralted to a large cellular service provider
Compare tweets with customer care trouble tickets: Types and timeliness of complaints
Extract type and location information from tweet, tweet meta data, and user profile
Different types of performance issues are reported via tweets and via customer care calls
Customer tends to report performance issues in tweets faster
Twitter has a weak diurnal pattern
Performance issues reported in tweets are complementary to those reported in customer care calls.
Q: What about false positives in the twitter data? What about people who are just moaning?
A: Very conservative with the data
Q: Have you considered using this to detect coverage holes in networks?
A: Sure, you could do that.

Short paper: The Effect of Packet Loss on Redundancy Elimination in Cellular Wireless Networks

Speaker: Katherine Guo

Packet data is redundant
Redundancy elimination (pointers into replicated regions in packets)
- Sender A must retain a packet cache; required to spot redundant regions
- Receiver B must retain a packet cache for the opposite reason
20 - 50% bandiwdth reduction on ISP access links
This scheme amplifies the effect of loss
Effect exacerbated in high-loss environments
Install RE in PDN gateway; without loss, RE can save between 8 - 675% bandwith (75th percentile) (??)
Tiny loss rates (??)
20% bandwidth saving with 5% loss
only 3% savings with 10% loss
Informed marking
RE is ineffective over cellular networks
Informed marking restores bandwidth savings
[[HMm. How about two channels: One reliable, one unreliable; all packets which have non-redundant information from the persepctive of the source must follow the reliable path. ??]]
Q: TCP or just arbitrary packets?
A: In our traces, we’re roughly 70% TCP
Q: If it’s TCP, the ACKs tell you if the data got there
Q: What is the average loss rate across the trace
A: The number is from a 2007 trace. In later traces, we observe lower, but not significantly lower.

Short paper: Performance Comparison of 3G and Metro-Scale WiFi for Vehicular Network Access

Speaker: Pralhad Deshpande

In depth evaluation of vehicular wifi access using a metro-scale wifi network
head-to-head comparison of 3G and wifi exposing […]
40% of the time, wifi has no throughput on the long-drive test
45% of the time wifi is better
55% of the time 3g if better
Wifi throughput is highly dependent on speed
3G much less so
wifi median throughput can be up to 4x 3G median throughput, but this is drive specific
tri-modal nature of wifi throughputs
when available, wifi out-performs 3G

Session 10: Classification

Short paper: An Empirical Study of Orphan DNS Servers in the Internet

Speaker: Amogh Dhamdhere (non-author)

An orphan DNS server is a DSN server that exists even though the domain it is contained in does not exist.
orphan.com
DNS entries added to .com TLD
orphan.com NS ns1.orphan.com
ns1.orphan.com A 1.2.3.4
i orphan.com is deted, what happens?
its NS record is deleted
the A record is often retained, and is accessable from .com
Phishing and fast flux domains exploit orphans
Detecting orphans Method 1:
- Look at all A records
- Find NS records corresponding to all A records
- If NS does not exist, DNS server in A record is an orphan
- NS records pointing to orphans contain orphan users. E.g., foo.com NS ns1.orphan.com : foo.com is an orphan user
This method finds all orphans and their users in our zone files
Method 2:
- bar.com NS ns1.bar.us
- (??)
Some of these may be orphans.
Method 3:
- Look at feeds of malicious websites for two weeks
- Feeds contain melicious host names
- We perform NS lookups on their hostnames
Average of 15,962 orphans each day
Median lifetime of orphan: 8-9 days
Most orphans are not used as DNS servers!
1.1% of domains using orphans are on phishing lists
1.3% on malware lists
orphan host names themselves are rarely blacklisted
Some orphans do host scam sites and send spam
many malicious domains user them as DNS servers
Overwhelmingly not used for malicious purposes; removing them may to more harm than good

Short paper: FlowRoute: Inferring Forwarding Table Updates Using Passive Flow-level Measurements

Speaker: Amogh Dhamdhere

Routing protocol performance during routing events can affect end-to-end performance
Transient loops and packet losses may occur during routing reconvergence
Flowroute: A data-plane monitoring tool to work in conjunction with control plane monitors
Infers forwarding able updates using flow-level measurements

Short paper: Digging into HTTPS: Flow-Based Classification of Webmail Traffic

Speaker: Dominik Schatzmann

Classify HTTPS webmail traffic relying on flow-level traffic
Distance toward closest SMTP, IMAP, or POP service
Same subnet? Webmail > 90%; Non mail ~50%
How long are services used?
- Shorter than 25s? Webmail ~= 20%; non mail ~= 90%
Reading and answering emails takes some time
Access patterns? webmail == periodic; non-mail == random
Timing fingerprint based oin coarse-grained flow data
Result: Precision 79.2%
Q: How short does the timer need to be to see this periodicity?
A: Using arrival time of new flows, and the rate of flow is not important. Using a timer of 1s.

Footnote

Posted by Stephen Strowes on Sunday, November 7th, 2010. You can follow me on twitter.

Stephen D. Strowes

ACM Internet Measurement Conference (IMC) 2010: Day 2

Footnote

Recent Posts