Passive and Active Measurement Conference (PAM) 2013, day 1

Preamble

The notes that follow are a mixture of what each speaker said, or bullets listed on slides, or thoughts of my own. If you were at PAM and you spot errors (likely), feel free to point them out, and I’ll fix them. Not all papers are covered, but a good bunch of them are.

Conference Website: http://pam2013.comp.polyu.edu.hk/

Dates: 18 – 19 March, 2013. 74 papers were submited, 24 of those were accepted.

Session 1

Measurement Artifacts in NetFlow Data

Speaker: Rick Hofstede

Netflow is widely deployed. How widespread are artifacts in Netflow data? Analysis depends on the quality of the input data.
Case study: Cisco Catalyst 6500 (SUP720-3B); the authors discuss five artifacts (non comprehensive)
- Imprecise flow record expiration (active timeout, passive timeout, tcp termination (FIN/RST))
- TCP flows without flag information (fast-path hardware forwarding (most common) doesn’t export TCP flags)
- Invalid byte counters (counting of padding bytes in ethernet frames)
- Non-TCP flow records with TCP ACK set
- Gaps
Are the identified artifacts also present in flow data from other exporters?
Can the artifacts be identified in flow data without having access to exporter statistics?
Experimental Setup: three router vendors, including a range of Cisco Catalysts.
Artifact: imprecise flow record expiration.
- Active timeout: generate flows that should be long enough to generate an active flow timeout.
- Most Cisco routers don’t expire flows according to Cisco documentation; the Juniper router does not stabilise (confirming prior studies); FlowMon works very well.
- Idle timeout: send two packets separated by d-seconds to probe the idle timeout.
- Cisco routers behave very well according to spec; Juniper and FlowMon behave badly
Artifact: TCP flows without flag information
- 99.6% of TCP flow records from exporter 1, 2 and 4 contain no TCP flags.
Artifact: Invalid byte counts
- Cisco routers export flow records with invalid byte counts
Artifact: non-TCP flows records with TCP ACK set
- 1% of flow records are non-TCP with TCP ACK flag set
- Actually a bad-hack implemented by Cisco
Artifact: gaps
- Increase in traffic volume causes flow learn failures

Towards Fast and Efficient IP-level Network Topology Capture

Speaker: Thomas Bourgeau

Large scale topology capture take on the order of days and take too long to capture network dynamics.
Standard approach is to use full traceroutes with no effort to limit probing redundancy.
Related: doubletree (Donnet et al): find all source/destination paths, and perform partial traceroutes
Related: set cover (Shavitt et al): obtain entire graph, perform fewer full traceroutes
This approach: NTC (Network Topology Capture)
- Obtain the entire graph (like set cover), then perform partial traceroutes (like doubletree)
Reduce the probing load by selecting a partial traceroutes that observe the same link.
When topological change is detected the measurement agent tracks possible dynamism events along the path until b IPs are discovered in the backward and foward direction.
Experimental setup:
- Paris traceroute over planetlab testbed (230 agents, 800 destinations, every 1 hour across two months)
- Roughly 2% vertex dynamics; 20% edge dynamics
Results:
- Save at most 92% of measurement probes
- Coverage: high vertex coverage in the range 98-99%; high edge coverage 82-95%.
- Dynamics captured: capture at most 75% of dynamism events with 80% load reduction

Detecting Third-party Addresses in Traceroute Traces with IP Timestamp Option

Speaker: Pietro Marchetta

Motivation: IP level topology Internet topology is essential for emulation, simulation, management, resource allocation, etc; BGP dervived AS level topologies are incomplete, traceroute is inaccurate.
Third party addresses: an address which does not belong to any interface on the actual IP path toward the destination. (Origin: ICMP response source address is set to the IP address for the interface on the router on which the router chooses to emit its response; no guarantee of being the interface the original traffic arrived on.)
Problem: Addresses may cause the inference of false AS-level links.
Question: is an IP address discovered by traceroute a third-party address or is it part of the actual traversed path?
Technique: send an ICMP echo request to Y with timestamps requested from YYYY; the response is considered classifiable only when it provides at least 1 timestamp but less than 4 timestamps in the ping reply. If classifiable, then target destination D with a UDP packet with timestamps requested from YYYY. If the response to the UDP packet contains at least one timestamp for Y, then Y is considered as on-path; otherwise, it is a third-party address.
Hop classifiability: 51% of IPs are considered classifiable; 47.6% are non classifiable.
Most classifiable hops appear in several paths from multiple vantage points toward multiple destinations. Paper considers one source with many destinations; many sources with one destination; many sources with many destinations.
AS loops: third party addresses appear to be the cause of 37% of AS loops.

FlowSense: Monitoring Network Utilization with Zero Measurement Cost

Speaker: Curtis Yu

SDNs allow centralised policy and reactive control of network. Reroute around congested links. Need to know when links are congested.
Active measurement: for example, injection of SNMP probes
Passive measurements: expensive instrumentation and infrastructure setup
SDN measurements, switch polling; additional control traffic
Flowsense: leverage existing control traffic to measure network. No additional traffic, network informs systems of changes. As accurate as switch polling.
Openflow messages have utilisation information: PacketIn on first packet in a flow; FlowRemoved conains duration of entry in flow table, and the amount of traffic matched. Can infer utilisation contributed by flow on link.
Post-hoc link utilisation. Log incoming utilisations from FlowRemoved notifications, and update checkpoints created at previous FlowRemoved timestamps.
In the median case, total utilisation is known after around 100 seconds.
However, their data indicates that 90% of the total utilisation can be reported after 10 seconds for 70% of the checkpoints.

Session 2

How to Reduce Smartphone Traffic Volume by 30%?

Speaker: Subhabrata Sen

What is the effectiveness/feasibility of redundancy elimination techniques for smartphone data traffic?
Study off-the-shelf RE techniques: their effectiveness if individually applied, when jointly applied, and their computational overhead.
Techniques:
- Caching (http): 17% reduction in traffic volume if caching is fully utilised.
- Delta encoding (http)
- File compression (http)
- Packet stream compression (application agnostic)
Effectiveness metric: compression ratio (CR) = traffic volume AFTER applying RE / traffic volume BEFORE applying RE
Result: file compression results do not matter much
Result: delta encoding is slightly better than caching; caching handles zero delta, so delta encoding brings limited additional benefits
On under-utilisation of compression: many http requests do not contain Accept-Encoding; some servers do not compress even when Accept-Encoding has been sent by a client.
Result: additional reduction in smartphone traffic by more than 30% with reasonable smartphone utilisation.
Under utilisation of compression is a key culprit; gzip compression brings good traffic reduction with lowest overhead. Decompression is fast on the phone except for 7-zip, reasonably slow for bzip2, fast for other algorithms. Packet stream compression (MODP) is very useful.

Modeling Cellular User Mobility Using a Leap Graph

Speaker: Seunjoon Lee

Short-term user mobility prediction allows mobile network providers to optimise resources (handover; pre-fetching).
Existing approaches include GPS, wifi; issues exist with coverage, privacy, energy-consumption. Need an extra layer of mapping to get base-station level data.
Challenges in handover detection: the active set of base stations is not unique for a single location, and a handset can exist in any combination of tens of sectors in densely covered regions. Not all handovers are due to mobility: load balancing, radio signal fluctuation.
Significant noise == incorrect mobility prediction. How to extract actual user mobility?
Mobility prediction using “leap graph”; adjacent base stations are potentially non-mobility induced; focus on non-adjacent base stations, “leap edges”.
Determining leap treaces: identify overlapping sectors (via knowledge of configuration, empirical data from training period); create leap traces.
Mobility prediction on leap graphs:
- higher prediction accuracy with higher-order markov models
- benefit from destination information is marginal
The data analysis is not able to identify genuine mobility in handovers between adjacent sectors

Keynote: Endace and DAG Technology, 1995 – 2013

Ian Graham, endace

Session 3

Estimating TCP Latency Approximately with Passive Measurements

Speaker: Sriharsha Gangam

Passive measurements in the middle of the network. Decompose path latency of TCP flows.
Existing methods are accurate but expensive. SEQ/ACK matching.
ALE: Approximate Latency Estimator. Goal: configurable tradeoff between accuracy and overhead.
Sliding window of buckets (time intervals); each bucket is a counting bloom filter (CBF)
Controlling error with ALE parameters: increase W: higher coverage; decrease w: higher accuracy.
Process large and small latency flows simultaneously; absolute error is proportional to the latency.
ALE-E: ALE-Exponential: variable buckets of width w; larger, older buckets shift slower.
Error sources: bloom filters are probabilistic structures, with false positives and negatives. Artifacts from TCP: retransmits and out-of-sequence packets; ACK numbers not on SEQ boundaries (cumulative ACKs)
Evaluation: backbone link traces from CAIDA; ground truth/baseline comparison by emulating TCP state machine. Compare latencies measured by ALE and tcptrace. Compare overhead (memory, computation) introduced by ALE and tcptrace.
Memory overhead is interesting: tcptrace can take up to 468MB CSS, ALE-U(96) consumes 9.8MB regardless of sampling rate.

Effect of Competing TCP Traffic on Interactive Real-Time Communication

Speaker: Ilpo Jarvinen

How well does VoIP work in the presence of competing TCP traffic? Especially interesting is web traffic, with transient and parallel TCP connections.
Tested a variety of workloads: CBR-16kbps isolated; audio + bulk transfer; audio + web workload. Testing against a real HSPA network and a fixed server; multiple test iterations with wireless issues causing duplicates, reordering, consecutive losses, and long delay spikes.
Results for the isolated audio with no other traffic are good; audio + bulk transfer with deep buffering causes delay increase, and interactivity is destroyed (delays of over a second); audio with one or two http flows are acceptable with the initial window set to 3 are okay, but there is more delay inherent with higher initial congestion windows or more flows.
Jitter filter: jitter filter “drops” late arriving audio packet, mimics time-bound playback of media. Not lost physically, only delayed too much to be useful.
Loss period level: loss period level is based on loss periods (rfc 3357) the codec encounters due to consecutive packets being “dropped”.
IP packet delay variation confirms that worst-case delay spikes occur during initial window.
Larger initial windows (of up to 10) is much worse for the competing media flow.

Performance Implications of Unilateral Enabling of IPv6

Speaker: Michael Rabinovich

Question: what are the implications of unilateral IPv6 deployment.
Plausible scenario: parallel v6 and v4 attempts, described in rfc 6555.
Plausible scenario: sequential v6 then v4 attempts; inherent delay penalty.
Macro-behaviour the result of complex interactions: browser, OS, DNS resolvers
Experimental setup describes a sequence of DNS interactions, custom URLs to associate a DNS query with the resulting http request, and non-existent v6 addresses, to match the locations of DNS resolvers for a client, and measure the time to v4 failover. Ran a 28 day measurement.
Conclusion: no evidence of performance penalty for unilateral ipv6 enabling
Small increase in failure rate (from 0.0038% to 0.0064%)
Study limitation: one-second time measurement granularity.

Footnote

Posted by Stephen Strowes on Monday, March 18th, 2013. You can follow me on twitter.

Stephen D. Strowes

Passive and Active Measurement Conference (PAM) 2013, day 1

Footnote

Recent Posts