A slightly technical blog by Stephen D. Strowes.
Random header image... Refresh for more!

ACM Internet Measurement Conference (IMC) 2010: Day 3

Preamble

The notes that follow are a mixture of what each speaker said, or bullets listed on slides (verbatim, with minor US->UK deltas because I was touch-typing), or thoughts of my own. If you were at IMC and you spot errors (likely), do feel free to point them out, and I’ll fix them.

The amount of text written for each is proportional to certain variables: How tired I was during the talk, how interested I was in the talk, how much I knew about the subject material, and how good the speaker was. Note also that these are pretty raw. Questions are omitted if the question was not clear; many responses from the speaker have been shortened for brevity.

Session 11: Methodology III

Long paper: Revisiting the Case for a Minimalist Approach for Network Flow Monitoring

Speaker: Vyas Sekar

  • Decouple collection and computation
    • Generic collection primitive
    • Generic data structure
  • Primitive 2: Sample and hold; If flow is already logged, update
  • Sample packet with prob p
  • If new flow -> create counter
  • Use cSamp [NSDI'08] to configure flow sampling capabilities
  • Hash-based coordination for non-overlapping sets of flows
  • Network-wide optimisation -> operators golas e.g., per-path guarantee
  • Putting the pieces together: “minimalist” approach
  • Flow sampling + sample & hold
  • Relative accuracy difference = accuracy (minimalist) – accuracy (app specific) / accuracy (app specific)
  • Hardware reqs are similar:
    • both need per-packet array/key-value updates
    • More than packet sampling, but with router capabilities
  • Memory for counteres
    • botleneck is SRAM (flow headers can be offloaded to DRAM)
    • We conservatively assume 4x more per-counter cost
  • Processing costs
    • Online cost lower for minimalist (don’t need per-app-instance)
    • OFfice cost is higher for minimalist (but can be reduced)
  • Reporting bandwidth: Higher for minimalist, but < 1% of network capacity
  • Q: Costs
  • A: Hardware costs. Full sampling is of similar complexity to current packet sampling. Another issues is SRAM size.

Long paper: Exact Temporal Characterization of 10 Gbps Optical Wide-Area Network

Speaker: Daniel A. Freedman

  • Understand novel behaviour of high-performance lightly loaded WAN links
  • Apprciate distortive impact of endpoint network adapters
  • Present instrumentation (named bifocals)
  • Network adapter distorts signatures/timestamps. Instead, let’s tap optical fibre directly.
  • Record analogue waveform of entire physical layer symbolstream in real time; extract discrete ethernet frames in later off-line processing
  • Packet chains are exponentially rare
  • What causes WAN burstiness?
    • Still an open question
    • But… Relatively insenstive to background traffic. Not due to a single router
  • Instrumentation: Precise and reproducable timing measurements, accessible and transparent
  • This is the first “ground truth”measurement of packet timings of network strreams in flight
  • Comparaitive refernce for less accurate tools (determine when/where applicable)
  • Q: If we had optical switches, would this behaviour still be there?
  • A: That’s two questions. Is it the conversion? I think no. (“ODE”?). Is it the design of the switches? I guess so. Doesn’t know enough about optical switches though. If the switches mimic the same architecture, then probably they’ll behave the same.
  • Q: You get a number of time peaks at different points. Did you find any time peaks that would correspnd to timings within the router device? Known clock frequencies, etc? Could start talking about where in the router these artifacts originate?
  • A: Haven’t seen precisely what you’re asking about.

Long paper: TraceNET: An Internet Topology Data Collector

Speaker: Kamil Sarac

  • tracenet is an extension to traceroute in internet topology collection studies
  • At each hop, traceroute to nodes in subnet hosting that hop’s ip addr
  • Network exploration
    • Trace collection
    • subnet exploration: start at a /30 subnet S which inclueds the IP address IP_v of the pivot interface v; Check if /30 mate of IP_v is likely to be in S; Grow S to a /X subnet (X = 29, 28…) which includes the IP address of v as long as IP addresses in subnet S satisfy our hueristics
  • Hueristics: 9 heuristics to improive accuracy of our inferences
  • Does IP address i belong to assumed subnet S? Yes with high probability, if it satisfies hueristics.
  • Probing overhead: Exact number of probes during a tracenet session depends on length of trace, configuration of the explored subnet, confiruation of the fringe subnets
  • Path fluctuations
  • LImitations: Depends on heuristics based on best practices in layer 3 network configuration underestimation due to subnets with sparsely utilised IP addresses requires router responsiveness
  • Q: How sure are you that the 9 heuristics cover all the cases?
  • A: We did our best to think of all cases! Blah blah blah.
  • Q: Did you get multiple responses from broadcast addresses?
  • A: We did not do it.

Session 12: Online Social Networks

Long paper: Understanding Latent Interactions in Online Social Networks

Speaker: Jing Jiang

  • Crawling process complicated by two things:
    • crawlers must revisit usrs often to build complete log; new visitors cause older visitors to fall off the list.
    • crawlers cannot visit each user to frequently; renren imposes multiple rate limits
  • Visible interactions show much reciprocity

Short paper: Measuring the Mixing Time of Social Graphs

Speaker: Abedelaziz Mohaisen

  • The Sybil Attack, John Douceur
  • Social graphs are not fast mixing
  • Some theoretical arguments in sybil defences are inaccurate
  • the applicability of social network-based sybil defenes is infeasible for some social graphs
  • there is a negative correlation between trust (hypothesized) and the underlying algorithmic property in these graphs—i.e., the mixing time. (Thanks to Aziz for correcting my hasty notes!)

Long paper: Estimating and Sampling Graphs with Multidimensional Random Walks

Speaker: Bruno Ribeiro

  • Measurement distortions

Short paper: The Impact of YouTube Recommendation System on Video Views

Speaker: Lixin Gao

Session 13: Path Properties and Dynamics

Long paper: Selecting Representative IP Addresses for Internet Topology Studies

Speaker: Xun Fan

  • Topology study needs targets. Where do you dervice your destination addresses for your traceroutes?
  • census (ping sweeps) tell us who responds. Use census to predict best representatives.
  • contribs:
    • new automated hitlist generation method, updated everyh 2-3 months
    • evaluation of the hitlist
    • learn about the internet from the hitlist
  • Completeness: Select one representative for all allocated /24′s Responsiveness: representative should be most likely to respond. Stability: representatives don’t change arbitrarily. Important for longitudinal study.
  • Representative predicition from census data: score IP addresses based on appearances in census data. Choose highest scoring rep per /24.
  • Dataset available: http://www.isi.edu/ant/.
  • Q: Your aim is not to map the core. But have you considered in fact doing this?
  • A: ???
  • Q: What is the best coverage both in terms of ASes and prefixes that you can get? How much can I probe?
  • A: 1/12th of the allocated address space

Short paper: Representative Speed Tests for Free: Estimating Achievable Download Speed from Passive Measurements

Speaker: Jeffrey Pang

  • How fast is the network? (From the perspective of the customers)
  • Benefits of passive measurement: No additional load on network,
  • millions of vantage points, captures actual “user experience”
  • Challenge: estimating network speed from flow records in the wild.
  • Solution: Only look at flows from non-rate-liimted applications and content-providers.
  • Goal: Detect rate-limited flows. Detecting if individual flows are rate-limited is not possible since we only have record flow records, not full packet streams.
  • Goal: Detect rate-limited flows-types
  • Contribution: A method to generate a throughput index. A set of creiteria that selects on y passive measurements that accuraely reflect achievable network speed
  • Q: Why didn’t you consider normal TCP behaviour?
  • A: Because at the flow level we do not have that information, do not know RTTs, etc.

Long paper: YouTube Traffic Dynamics and Its Interplay with a Tier-1 ISP: An ISP Perspective

Speaker: Vijay Kumar Adhikari

  • Q: If you are doing localisation, you would see large sites near to customers?
  • A: Some faff on cost of building, economics, hand-waving. ??

Short paper: On the Characteristics and Reasons of Long-lived Internet Flows

Speaker: Lin Quan

  • Long flows -> high probability these are computer-to-computer traffic
  • Contrib: Multi-time-scale analysis
    • New mechanism that csales to weeks of traffic
    • efficient processing and queries
  • show existence of long-lived flows
  • Find their characteristics and causes
  • Causes:
    • port based analysis
  • http://www.isi.edu/ant/
  • Q: What’s the smallest timescale you can say anything about the bustiness?
  • A: 10 mins
  • Q: Okay, 10 minutes is not enough to say anything about the burstiness
  • Q: How about using 2-tuple rather than 5-tuple; src-ip:dst-ip?
  • A: Um.

Session 14: Anomaly and Event Detection

Long paper: BasisDetect : A Model-based Network Event Detection Framework

Speaker: Paul Barford

  • Anomoly detection: DOS attacks, outages, flash crowds, etc
  • Goal: Fast and accurate identification
  • Challenges: Data selection, filter selection,, perspective, thresholds, etc
  • Little actual deployment and use
  • Can we exploit structure in both normal and anomalous traffic to identify anomalies?
  • Basic detect methodology. Three components:
    • anomoly dictionary construction
    • anomoly decomposition of link data
    • network-wide data fusion
  • In terms of single-link real world dataset: over 48% frewer false alarms
  • in terms of network-wide synthetic dataset: between 54-90% fewer false alarms
  • Q: If I understand, you’re looking for anomolies in the future that look like the old anomolies. Relies on signatures, won’t be able to see something you’ve never seen before?
  • A: Yes. False negatives are way better than false positives. We’re focussing on the latter, not enough on the former.
  • Q: Did you focus at all on false negatives?
  • A: A little.
  • Q: What about building your dictionary on normal traffic, and assuming anything else is anomolous?
  • A: ??

Short paper: Temporally Oblivious Anomaly Detection on Large Networks Using Functional Peers

Speaker: Kevin M. Carter

  • Assume baseline behaviour, aim of anomaly detection is to flag the unexpected
  • An anomaly is not necessarily malicious
  • Given a large network, utilise functional peers to identify anomalous hosts. Baseline over space rather than time.
  • Contribs:
    • remove temporality entirely from detection
    • scales to very-large networks, don’t have to maintain state for all hosts
    • anomalies can be automatically categorised into descriptive sets. Actionable intelligence.
  • This is NOT an IDS.
  • No questions fielded!

Long paper: What Happened in my Network? Mining Network Events from Router Syslogs

Speaker: Dan Pei

  • Contrib: SyslogDigest: proacitvely mine network events from router systlogs
  • take low-level minimally sturcture syslog messages -> prioitised high-level network events.
  • Grouping relevant messages together
    • Temporal relationship
    • spatial relationship
    • associations of message types
  • Presentation:
    • ranking
    • naming
  • Message template learning: Templates trte the signatures with the combination of frequent words
  • Decompose messages into works separated by whitespace
  • Construct a tree structure to express the template hierarchy based on the input messages
  • [...]
  • Rule-based grouping
    • “happening together”. Temporaly close within a sliding window
    • “together frequently”. Association rule mining.
  • Offers 2-3 orders of magnitude reduction from raw messages to events