ACM Internet Measurement Conference (IMC) 2010: Day 1
The notes that follow are a mixture of what each speaker said, or bullets listed on slides (verbatim, with minor US->UK deltas because I was touch-typing), or thoughts of my own. If you were at IMC and you spot errors (likely), do feel free to point them out, and I’ll fix them.
The amount of text written for each is proportional to certain variables: How tired I was during the talk, how interested I was in the talk, how much I knew about the subject material, and how good the speaker was. Note also that these are pretty raw. Questions are omitted if the question was not clear; many responses from the speaker have been shortened for brevity.
Conference Website: http://conferences.sigcomm.org/imc/2010/
Location: BMW Edge Ampitheatre, Melbourne, Australia
Dates: 1 – 3 November, 2010
- This is the 10th IMC.
- This IMC has the largest attendance of any IMCs “in an overseas venue”
- 211 submissions (largest ever, 15% increase from last year)
- 110 full (previous, 115)
- 101 short (previous, 68)
- accepted: 24 long, 23 short
- short:long ratio seen in submissions preserved to selection, not by design
- 11 papers were accepted before the tpc met
- Best paper award went to: “network traffic characteristics of data centres in the wild”
Session 1: Services
Long paper: CloudCmp: Comparing Public Cloud Providers
Speaker: Ang Li
- cloudcmp to mean “Cloud compare”
- Framework to compare public cloud providers
- Systematic comparator of cloud providers
- Expecting massive growth
- Many players brings problem of choice
- Choosing the best cloud is hard.
- Requirements for comparison:
- relevant to application performance
- comprehensive along multiple dimensions (e.g., locations, times of day)
- fair (independent from underlying differences between cloud providers)
- Covers four common providers: Amazon, rackspace cloud, windows azure, google app engine
- Focus on nine end-to-end metrics
- each is simple
- each has predictive value
- abstracts away implementation detail, can show variance
- Snapshots from March to September
- Results are inherently dynamic
- Providers are anonymised :-(
- Four common services:
- Compute cluster (elastic: virtual instances)
- Intra-cloud network
- Wide-area network
- Instance performance, java benchmarks
- Cost effectiveness, monetary cost per benchmark
- Scaling performance, scaling latency (time taken to allocate new instance)
- Good performance results may be affected by how heavily loaded the cloud was by other work
- Larger instances not cost effective if code cannot utilise those cores (self evident
- Scaling latency: Linux instances win, < 100seconds.
- Storage comparison:
- Covers blob, table, and queue storage services
- Compare read and write, and query on tables
- Metrics: latency, cost, time to consistency
- Wide area:
- Network latency to closest data centre
- Used 260 planetlab vantage points
- Q: How do you collect the data? How is your comparison fair across all four platforms?
- A: Have a bunch of public tools and custom tools to get read data. Fairness, deliberately choose metrics independent from the providers. Metric is not dependent on platform.
- Q: Follow up: How many samples? Different times?
- A: Tried many different samples, up to 100, at different times without much visible varation.
- Q: 10 minutes scaling time is acceptable? Flash crowds?
- A: Good compared to, say, setting up a new physical node. Obviously better launch times would be preferable.
- Q: Cost effectiveness: Are those results for single-threaded or multi-threaded benchmarks?
- A: We didn’t show multi-threaded benchmarks, but our results show they are still not competitive. But we think they are still not cost effective. They are costed by core count. They are also bounded by memory and I/O.
Short paper: Comparing DNS resolvers in the wild
Speaker: Bernard Ager
Long paper: Improving content delivery using provider-aided distance information
Speaker: Ingmar Poese
Session 2: Security
Long paper: Detecting and characterizing social spam campaigns
Speaker: Hongyu Gao
- Large scale experiment to confirm and quantify spam campaigns on online social networks
- Uncover the characteristics of the campaign:
- They mainly use compromised accounts
- They Mostly conduct phishing attacks
- Model each wall post as a (description, URL) pair.
- Build post-simlarity graph. Edges connect wall posts that are “similar” in their model.
- This reduces the problem of identifying potential campaigns to identifying connected subgraphs and therefore clusters.
Diurnal patterns: malicious posts follow a different pattern to “benign” posts, but the authors plot by percentages. The benign posts are at their most active around 3am, and late into the active portion of the diurnal cycle around 9pm.
- Q: Resiliency of technique: How long after campaign starts do you do your analysis?
- A: This is all offline. Done afterward.
- Q: Hashing dependencies. They can quite easily construct things to generate different hashes.
- Q: NLP along with URL. Have you tried using the URL alone?
- A: We haven’t. Don’t know answer.
- Q: bit.ly logs who clicks. Did you log this?
- A: We didn’t, but it’d be interesting. Don’t have enough data.
- Q: People delete spam posts. Does this affect the result?
- A: Yes, users might delete these posts. Our measurement sets a lower bound for the spam posts. There could be many more. This is what is still there.
Long paper: Detecting Algorithmically generated malicious domain names
Speaker: Sandeep Yadav
Long paper: Internet Background Radiation Revisited
Speaker: Eric Wustrow
Session 3: Economics
Short paper: On Economic Heavy Hitters: Shapley Value Analysis of the 95th-Percentile Pricing
Speaker: Rade Stanojevic
Short paper: Challenges in Measuring Online Advertising Systems
Speaker: Saikat Guha
Session 4: Methodology I
Long paper: Measurement of Loss Pairs in Network Paths
Speaker: Edmond Chan
Short paper: Measuring Path MTU Discovery Behaviour
Speaker: Matthew Luckie
- Common perception that PMTUD is unreliable.
- Similar measurement technique (TBIT)
- Take home points:
- Systems that advertise an MSS of 1380 (10.8% of population) fail at PMTUD dispropritionately (27.1%)
Long paper: Demystifying Service Discovery: Implementing an Internet-Wide Scanner
Speaker: Derek Leonard
- Techniqnues for quickly discovering available services in the Internet benefit multiple areas
- Help characterise internet growth
- Distance esimation
- Understanding how worms create massive botnets
- Discovering and patching security flaws.
- The paper chronicles the development of IRLScanner.
- Maximise politeness at remove networks
- Allow scaninng in minutes or hours
|Definitions: Assume M local machines. In some set F there are n =
- Service discovery: Requests from local hosts are sent to targets in F, which are marked as alive if they respond.
- Formalise politeness: Formal analysis of service discovery algorithms has not previously been attempted.
- \Permutation goal: Spread probes to a subnet evenly throughout F
- Define globally IP wide (GIW) to be a permutation that is IP-wide at all subnets.
|All networks are probed at constant rate
- Internet-wide service discovery are sparse in the literature
- Time and resources seem to be constraints
- Overwhelming number of complaints thwarts researchers (bad publicity, legal threats
- Each target address is classified into one of four categories:
- open set (SYN_ACK)
- Closed set (RST)
- Dead (don’t respond at all)
- OS Fingerprinting: Use distinguishing characteristics of network trasffic toi infer interesting information
- Operating sytem is an important metric.
- Estimate the global impact of known vulnerabilities
- This has not been attempted internet-wide in the literature.
- Fingerprinted 39.6M servers.
- General purpose hosts dominate the set (82%): Windows: 50%; Linux: 40%
- Removed any network whose administrator complained.
- Blocking too many would render th e measurements useless.
- 0.23% of the routable space blocked
Session 5: Wireless
**Long paper: **Measurement and Analysis of Real-world 802.11 Mesh Networks
Speaker: Katrina LaCurts
Long paper: Characterizing Radio Resource Allocation for 3G Networks
Speaker: Alexandre Gerber
- Focus on UMTS
- Limited radio resources in cell networks need to be efficiently managed.
- Allocation of resources triggered by user data transmission activity.
- Release of resources controlled by inactivity timers. Timeout value, called “tail time”.
- State promotions have promotion delay
- State demotions incur tail times (waste radio resources & energy)
- State occupation time and tail times
- half of time in DCH, half of time in FASH (near 100% of data transferred in DCH)
- Spend 7% of time being promoted.
- Promotion overhead == promotion time / total session duration
- What-if analysis for inactivity timers
- Streaming traffic: YouTube video streaming
- Under-utilisation of bandwidth leads to long DCH session, leads to poor battery use.
- Use fast dormancy to eliminate the tail on each chunk: Handset explicitely asks to be put into idle mode, saving battery.
- Conclusion: Most radio resource and energy is consumed when not actually transmitting data. The RRC state machines trade-off is hard to balance, as timers are globally and statically set. Hard to adapt to the diversity of traffic patterns.
- Two approaches to address the problem:
- Apps alter traffic paterns based on the state machien behaviour
- Apps cooperate with network in allocating radio resource.
- Q: Have you considered tuning values by users, rather than applications? i.e., tune by user behaviour?
- A: The fast dormancy stuff is good for the application to use, can predict based on past usage
- Q: Don’t think you can do this by application: multi-tasking. It’s the aggregate traffic that determines what you need to do.
- A: Agree.
- Q: Also, transmitting IP traffic during a voice-call can be cost-free. So, opportunistic or delay-tolerant applications can be used here.
Long paper: On the Feasibility of Effective Opportunistic Spectrum Access
Speaker: Vinod Kone
- Spectrum scarcity is a big problem! Reasons:
- static spectrum allocation
- Most of spectrum is licensed.
- But 95% of the spectrum is idle? McHenry’s NSF report from ‘05.
- Opportunistic Spectrum Access (OSA). Key idea:
- Primary users (PU) - licensed users (e.g., cellular, TV)
- Secondary users (SU) - accesses spectrum when PU doesn’t
- Challenges: Unpredictable PU behaviour; Obeying PU diruption threshold
- Questions: How much of the available spectrum is accessible? Can OSA support existing applications?
- Results: Spectrum availability != accessibility; Accessible spectrum is very low
- OSA cannot support existing applications as is
- Frequent interruptions and high delay
- Spectrum traces from multiple locations (4 countries)
- Wide frequency covered (20MHz to 6GHz) for 1-2 weeks
- 15 popular service bands (TV, cellular, etc)
- Available is occupied less than 5% of the time
- Busy is occupied more than 95% of the time
- Partially available if occupancy == [5%,95%]
- Extraction rate: %availeble spectrum accessible by SU:
- No knowlege: 10%
- Statistical knowledge: Max of 35%
- i.e., availability != accessibility
- Low spectrum extraction == frequent interruptions.
- Frequency bundling: Combine multiple unreliable channels into one reliable channel.
- Key challenges: How to bundle the channels? How to access a bundle?
- How are channels correlated? There is a high percentage of channels that show low correlation. This means we can actually bundle the channels randomly.
- Can we minimise the blocking time by bundling? Yes.
- Significant partially available spectrum: ~26%
- Availability != accessibility.
- Frequent interruptions and high blocking times. OSA cannot support existing applications as-is.
- Q: How scalable is your bundling scheme?
- A: Since it is random, it is very scalable.
- Q: What if everyone uses the bundling scheme?
- A: ??
// End day 1.