IMC 2014 Notes
These are my scribbled notes from IMC 2014. They’re very incomplete and probably inaccurate at points, but they’re what I caught from papers interesting to me when I was in the conference hall. I don’t note down questions that don’t contribute much, but I try and note questions/answers that contribute beyond what was already covered in the talk. If I’ve misrepresented your work, please get in touch and I’ll fix things up!
Session 1: Interdomain Routing and Traffic
Inter-Domain Traffic Estimation for the Outsider
- aim to shift focus from connectivity to traffic
- traffic is all that matters for network engineering, anomaly detection, economics
- but few available traffic datasets exist
- analogy: popularity of paths from multiple connectivity measurements implies traffic volume
- urban planning: some streets are more central than others; predict path traffic based on road structure
- large traceroute datasets -> AS level connectivity -> apply structural analysis
- ground-truth checks against real traffic: one global tier-1, and one large IXP
- “new and adapted metrics from Space Syntax”
- ranking of AS links by traffic
Q&A:
- Q: traceroutes are collected at different dates/times; how does that affect traffic estimation? A: effectively used a two-month sample, albeit two years apart for comparison
- comment: you could go back in time and correlate traffic dynamics with particular events
Challenges in Inferring Internet Interdomain Congestion
- explores the challenges in developing a system to characterise the extent of interdomain congestion
- prompted by public noise around peering disputes
- method: TSLP, Time Sequence Latency Probes; build a time series of latency probes
- want to avoid incorrectly inferring that a link is congested or uncongested given current interest
- happen to have a good view of level3-Dallas indicating congestion on AT&T and verizon
- challenge: AQM and WFQ
- challenge: inferring interdomain links
- challenge: asymmetric reverse paths; record-route options generally not supported
- congestion trends indicate cogent and level3 congestion through 2013–March 2014, after which congestion dropped to zero
Q&A:
- Assertion: 64MB queue necessary to satisfy the ~50ms inflation on a 10Gbit link (rough calc: correct)
Inferring Complex AS Relationships
- Looking at more complex AS relationships than the standard simple model of p2c, p2p, s2s
- new types:
- partial transit
- hybrid (dual transit/peering)
- both can be inferred with a high level of confidence
- partial transit is p2c with restricted scope, implying hierarchy of providers
- hybrid implies ASes that establish different relationship types at different points of presense
- requires inference of prefix export policies to evaluate based on how providers propagate prefixes
- limitations: topology incompleteness (we can only model what we see); city-level geoloc (hybrid links within a city region may be hidden); difficult to neatly categorise more complex relationships
- model indicates 3.3% of links inferred to partial transit, and 1.2% inferred to hybrid relationships
- some data on the size of customer cones/traffic levels for hybrids
- hybrid relationships can be unintentional
Q&A:
- European ASes are over-represented in the results, partially explained by the different ecosystem
Peering at Peerings: On the Role of IXP Route Servers
- questions: what are IXP route servers, how do they work, what peering opportunities do they offer
- more peering leads to greater benefit for each member, but peerings require effort, coordination, push low-end routers
- IXPs offer route servers as a solution; ISP establishes a single session with the route server, making peering easy
- route server filters prefix import (avoid hijacking) and filters on export by each peer
- route server prefix distribution is bimodal: lots of prefixes advertised to all members using the RS, and lots that are advertised to very few
- in terms of traffic and coverage, the data indicates the majority of ASes use the RS but still have bilateral peering agreements with other networks to exchange data
- using the RS can mask who is at fault when something fails (the RS or the other peer)
- thus, bi-lateral peering is used for traffic-intensive peering arrangements
Q&A:
- Q: are route servers built to be highly available? do peers have fallbacks? A: the IXPs they spoke to have 100% uptime, so … we don’t know.
- Q: What do RSes mean for net neutrality? A: the RS will expose peering policies the peers apply to the RS, but not if they have bilateral agreement; bilateral agreements more likely if you want to violate net neutrality
- Q: are IXPs in other regions trying to catch up with europe?
Session 2: Understanding (Mobile) Broadband Networks
Measuring the Reliability of Mobile Broadband Networks
- measure the experienced reliability on network, data, and application layers; think about the user
- how do you define reliable?
- ability to register on the network and establish a session, and how long it is available
- data: ability to send/recv packets
- useful connection; example, can use voip
- performance: throughput/goodput, to a reasonable bitrate
- reliability through multihoming
- operators often share radio access networks, permitting greater visibility into where failures happen
- interesting: 25% of connections are down for more than 10 minutes per day; RAN (radio access network) is dominant factor in downtime; high correlation between downtime and SnR
- “higher than expected” loss rates; loss runs indicate most (~60%) runs are 1 packet in length. Spike around 5/6 packets, where session is erroneously marked as “idle” despite sending packets
- downloads fail most often because of inability to establish a TCP connection
- multihoming can give 99.999% availability
Q&A
- are your results affected by the volume of traffic you’re sending? A: we’re aware operators may apply different policies for high volume users, but we maintained a low bitrate. That’s why we were demoted to “idle”, because we weren’t transferring a high enough bitrate
- have you looked at latency? A: we’re looking at it, as ongoing work
Behind the Curtain - Cellular DNS and Content Replica Selection
- DNS for CDN selection is usually the same as static networks, but
- client IPs are dynamically assigned, have no geographic anchor, and unstable anycast routing
- cellular networks have fewer egress points than traditional ISPs, though increasing in tandem with the deployment of 4G and its low latencies
- measured 6 cell networks (4 in US, 2 in South Korea); app: namehelp mobile; 350 devices, 280K experiments; samples every 30 minutes for five months
- assertion: cellular DNS is a poor location signal
- cellular DNS is highly dynamic, leading to CDNs returning different sets of replicas on a regular basis
- anycast routed public DNS resolvers also suffer from unstable mappings
Q&A
- when did you collect the data? A: march 2014 through october 2014
- does edns(0) modify the result? A: probably not given how dynamic client IPs are, but this is ongoing work
When the Internet Sleeps: Correlating Diurnal Networks With External Factors
- we know traffic is diurnal (seen locally everywhere)
- what about ipv4 address usage, and can we see the global view?
- direct observation: count active addresses over time; find diurnal patterns; draw correlations on location, link type
- why study this?
- sleep reflects policy
- sleep correlates with things such as GDP
- sleep affects outage detection; must not confuse “sleep” with “down”
- … how big is the internet?
- contributions: new methods for analysis, and the application of those methods
- correlating diurnal with many factors: ANOVA (analysis of variance)
- factors: GDP (strong correlation), electricity consumption (weak correlation), number of internet users per host, time of first block allocation, mean age of allocation (weak correlation; stricter policies on newer allocations may be enforcing recycling)
- link type: inferred from DNS (unexpected correlation: seemingly DSL lines correlate with diurnal patterns)
Q&A
- this work hasn’t been compared to the questionable 2012 internet census
- why is the US so stable? Perhaps DSL policies, folks don’t care so much about energy consumption, etc
Need, Want, Can Afford - Broadband Markets and the Behavior of Users
- goal: explire impact of capacity, price, cost of upgrading, and connection quality on broadband user’s behaviour
- challenges: requires a large dataset across a range of broadband markets, and requires scale to isolate confounding factors
- dataset includes aqualab’s Dasu (worldwide), and FCC/SamKnows (US), covering 53,000 users in 160 countries
- monthly cost translated into $ using purchasing power parity (PPP)
Q&A
- did you look at usage-based or variable pricing? A: we weren’t focussing on this, but it’d be an interesting direction
Session 4: Mobile Systems and Networks
WiFi, LTE, or Both? Measuring Multi-homed Wireless Internet Performance
- IP ID monoticity; windows and iOS have distinct patterns
- TCP Timestamp Option: huh, windows phone has TCP timestamp option disabled by default
- Clock frequency stability
Session 5: Theory Underpinnings
Node Failure Localization via Network Tomography
- under what conditions can this work uniquely localise failed nodes?
- how many failed nodes can be uniquely localised?
Efficient Large Flow Detection over Arbitrary Windows: An Algorithm Exact Outside An Ambiguity Region
- large flow detection: flows that consume more than some threshold; e.g., dos attacks
- “arbitrary window model” checks “every possible time window in the past”; general solution, impossible for large flows to evade
Crossroads: A Practical Data Sketching Solution for Mining Intersection of Streams
- identify significant performance anomaly events in real-time in a large cell network
- this can be viewed as a conventional association-rule mining problem, iff it was possible to record
OFSS: Skampling for the Flow Size Distribution
- sampling & sketching
- consider flow size distribution
- state of the art of netflow flow sampling is the great destroyer of the flow size distribution
- simple sketch onto a counter array
- flow sampling requires a flow table, impacting performance; sketching is very fast, but may have collisions
Session 6: Shedding Light on the Web
**Dissecting Web Latency in Ghana **
- the web in developing countries is slow
- connection speeds are increasing as average page sizes are increasing; server locations, routing configuration, and submarine cable layouts do not help
- in the example presented, DNS resolution in 2012 was a large contributor, but this has halved by 2014; connect() time has doubled between 2012 and 2014
- DNS lookup is dominant factor; 15-40% contribution to average page load time
- Redirects account for 80% of websites; 20-25% contribution
- TLS/SSL has an increasingly larger impact; 8-15% of requests required TLS/SSL
- require better caching schemes and/or new CDN architectures and/or redesigning web pages for better caching
Q&A
- how generalisable is this? A: this probably carries for other countries
Session 7: Internet Censorship
A Look at the Consequences of Internet Censorship Through an ISP Lens
- require data snapshots before and after censorship events
- examines consequences of internet censorship in the context of a medium-sized ISP in pakistan
- data between October 11 and August 13
- nov’11; thousands of porn domains blocked; sep’12: youtube blocked
- entire analysis based on Bro protocol logs
- traces split into SOHO and residential
- network dumps captured in ISP’s core network
- example: consistent no response to domains on DNS implies censorship
- observation: no shift to public DNS resolvers for residential users
- observation: noticeable shift to public DNS resolvers for SOHO users
- collateral damage: after the youtube block, google docs traffic also dropped noticeably
Q&A
- they did not have the ability to associate traffic to particular users
Censorship in the Wild: Analyzing Internet Filtering in Syria
- measuring censorship usually entails probing (generate requests, see what gets blocked)
- inherently limited by scale of measurement possible
- 600GB of logs from 7 blue coat SG-9000 proxies leaked from syria in summer 2011 by telecomix
- data has flow-level identifiers (with source IP removed or hashed), plus HTTP details, plus results of filtering decision on device
- broadly: 93.2% of requests allowed; 6.3% denied (5.3% network error, and 1% “policy denied” (7M) or “policy redirect” (2K)); 0.5% proxied, response is cached someplace
- observation: false positives on keyword filtering: “proxy”, for example, is a common word
- observation: metacafe, skype, wikimedia, *.il, amazon.com, for example, blocked
- observation: social media: facebook.com often allowed, but not always; particular pages that may be politically sensitive on facebook are blocked
- observation: entire subnets filtered, representing Israel, Kuwait, Russia, etc
- anti-censorship tech: tor was not filtered during study (it is now); google cache was still being used to access censored content
- ethical considerations: this is sensitive data; encrypted at rest; aggregated stats only; IRB approval
Q&A
- tor traffic is identified as traffic traveling to known-public tor entry relays
Capturing Ghosts: Predicting the Used IPv4 Space by Inferring Unobserved Addresses
- how much space is actively used?
- data collection -> capture-recapture -> population estimates
- collects IPv4 addresses from multiple (9?) different data sources
- regions that will run out first are LACNIC and APNIC, then AfriNIC, then RIPE, then ARIN
- estimates 1.2G IPv4 addresses used (45% of publicly routed space)
- 6.2M /24 subnets used (60% publicly routed space)
- Significant unused space (especially legacy)
Session 9: Illuminating Malicious Behavior
Handcrafted Fraud and Extortion: Manual Account Hijacking in the Wild
- 20% of folks in the US believe their online accounts have been broken into
- Google’s hijack taxonomy: targeted (today’s focus) / manual (low volume, manual work) / automated (high volume, not much damage) hijacking
- focus: credential theft, account exploitation, and remission
- manual hijackers mainly use phishing to steal credentials
- phishing page efficiency: average success rate, 13.78%
- victims are lured to phishing pages via email; 99% of the http requests to phishing pages have no refer
- 20% of decoy accounts accessed in less than 30 minutes; 50% within 7 hours
- number of accounts attempted per IP is really, low, and really stable
Q&A
- IPs come from tor, public proxies, VPNs; all the places you might expect
Session 12: SSL and Heartbleed
The Matter of Heartbleed
- experiment did not exploit the vulnerability
- 45% of all sites support HTTPS; 60% of those support the heartbeat extension
- this doesn’t mean all those 60% were vulnerable, but estimate that 24-55% likely were
- 11% of HTTP hsots on IPv4 supported heartbeat, and 6% of those hosts were vulnerable
- attack scene: no evidence of attack prior to disclosure; first scan traffic 22 hours after disclosure from university of latvia; observed 6000 probe attempts from 692 hosts
- only saw 11 hosts that hit all measurement points, therefore few hosts doing full internet scans
- two weeks after disclosure, 600,000 hosts remained vulnerable
- only 10.1% of sites who were vulnerable replaced their certs
- 14$ of those who replaced their certs re-used their old private key
- 4% revoked their vulnerable certs
Forced Perspectives: Evaluating an SSL Trust Enhancement at Scale
- many SSL trust enhancements have been proposed to fix the CA trust model (DANE, Google’s cert transparency, network probes: convergence, perspectives)
- how do we evaluate performance of these trust alternatives when they have few users?
- performed a university-scale case study of convergence, with workloads synthesised from anonymised university-wide traces
- results on convergence notary performance; generated a workload by mapping one SSL handshake to one call to Convergence
- 0.06% increase in traffic relative to SSL; low-cost
- one server supports entire university’s traffic
- client overhead is minimal (~250ms)