Passive and Active Measurement Conference (PAM) 2013, day 2
Preamble
The notes that follow are a mixture of what each speaker said, or bullets listed on slides, or thoughts of my own. If you were at PAM and you spot errors (likely), feel free to point them out, and I’ll fix them. Not all papers are covered, but a good bunch of them are.
Conference Website: http://pam2013.comp.polyu.edu.hk/
Dates: 18 – 19 March, 2013
Session 4
Measuring Occurrence of DNSSEC Validation
Speaker: Matthaus Wander
- Paper presents a methodology to measure the occurrence of client-side DNSSEC validation and an analysis of the measurement in practice.
- Results: 4.6M DNS/HTTP requests, grouped by ID into 562 Bernouli trials; delta time between requests less than 30s.
- In some trials, DNSKEY query was missing, so the DNS signature cannot be validated
- Cleansing: filter positive result when DNSKEY is missing (0.14%); filter duplicate results per IP address within 12 hours (49.5%); filter ID hash collisions (different client IP addresses with the same ID; less than 0.01%)
- Interesting: DNSSEC per country includes Sweden at 57.6% validation; Czech Republic at 30.7%; US at 13.1%. Most countries have a validation ratio of less than 5%.
On the state of ECN and TCP Options on the Internet
Speaker: Mirja Kuehlewind
- Presenting results of ECN and TCP option deployment via active probing of webservers; includes deployment of ipv6, and passive measurement of ECN deployment.
- Deployment problems for ECN: end systems and routers need to support it; middleboxes clear ECN marks or drop packets
- Active probing of the top-100,000 web servers. Send TCP SYN to target server for ECN negotiation, and SACK option, Timestamp option (TSOPT), window scaling option (WSOPT); capture the resulting SYN/ACK; send one data segment with CE codepoint set, capture the ACK, send FIN to close the connection.
- ECN deployment around 29.5%; much more prevalent on Linux servers than Windows servers.
- 90.9% of hosts are ECN capable; 8.2% replied with an ACK without ECN feedback; 0.9% sent no ACK at all.
- Middleboxes significantly affect ECN usability.
- 2.3% of servers support IPv6 as of August 2012. ECN support (47.5%) and TCP options (90%) higher than in IPv4, but no increase observed over time.
- Passive measurement of ECN adoption: flow data from four of the six border routers on SWITCH; check the ECN field in the IP header of continued flows
- Increase from 0.02% ECN capable sources in 2008 to 0.18% in 2012.
- Hosts and devices supporting ECN are seeing increased deployment but ECN is mostly not used.
- Burst loss study: analysis of loss patterns in typical internet usage scenarios in the absence of ECN. Active measurement and offline evaluation based on estimation of TCP retransmissions. Metric: burst loss == number of losses within one RTT after the first loss.
- Results: Regular burst loss pattern of FTP download and youtube traffic (large burst losses); irregular but few: small burst losses when web browsing.
Measuring Query Latency of Top Level DNS Servers
Speaker: Jinjin Liang
- DNS root zone replication: 13 roots, uneven QoS, 319 anycast instances deployed worldwide
- Method: the King technique: requires a controllable domain; trick a resolver to visit a fake nameserver.
- Latencies measured in South America and Africa are more highly variable, generally 3-6 times worse than Europe or North America
IPv6 Alias Resolution via Induced Fragmentation
Speaker: Matthew Luckie
- Problem: what is the topology of the ipv6 internet? This paper tackles the initial work on the “alias resolution” problem for IPv6 to infer router-level topologies. Given two IPv6 addresses, determine whether they belong to the same router.
- Various IPv4 approaches do not work: address selection in ICMP responses is different, there are no record routers, or IP timestamps, etc.
- IP_ID approach involves probing two addresses (A then B then A) to obtain a sequence of IP_ID values, potentially suggesting a shared counter and therefore aliasing.
- All prior work relies on source routing (rfc 5095 deprecates the source routing functionality required). Also, O(N^2) comparisons required.
- Paper describes the Too-Big Trick (TBT): induce a remote router to originate fragmented packets.
- Probe 49,000 interfaces; 23,892 distinct v6 interfaces from CDN traceroutes, 25,174 distinct v6 interfaces from CAIDA traceroutes; interfaces in 2,617 ASes.
- Around 77% of interfaces respond to ping; around 30% do not send fragments after a packet-too-big notification; around 1% of interfaces become completely unresponsive.
- Around 70% return fragment identifiers after TBT. Of those, 60-70% return sequential IDs. (Unfortunately not the same as IPv4 IDs, so we can’t measure overlap of v4 and v6 addresses to the same set of routers). Remaining 30% use random IDs (confirmed as Juniper).
- Caveats: there is no velocity to IPv6 ID counters (unlike v4). Many routers will have low fragment IDs.
- This paper is intended to demonstrate technique and feasibility; the algorithm is inefficient (O(N^2))
Session 5
Measuring Home Networks with HomeNet Profiler
Speaker: Lucas DiCioccio
- How many and what type of devices connect to home networks? What is the wifi quality?
- NATs and firewalls prevent remote probing from outside; require collaboration from users. Recruiting is a hurdle: privacy, commitment, incentives. Solutions must be portable and require little commitment.
- Users can run one-shot homenet measurements, which provide a report to the user as an incentive. Performs: host scan; service scan (zeroconf, UPnP); wifi environment; performance (embeds netalyzer); user surveys. The one-shot measurements are evaluated against periodic measurements in six homes in France, which run the wifi scan every 10 seconds, and the device scan every 10 minutes.
- Around 3,700 homenet profiler reports, across 46 countries, 210 ASes. Once cleaned, 2,400 distinct homes, 1,600 of those in France.
- Device scans use UDP discard (port 9) to populate the ARP cache and collect IP addresses and corresponding MAC addresses.
- Regarding one-shot device scans: In 90% of the cases you will only see 50% of the devices. Require at least two days of scans to discover all (?) devices (how frequent are those scans?)
- Homes can have up to 20 devices, but HNP detects at most 4 devices 75% of the time.
- Low RSSI values will lead to under estimation of the number of wifi networks nearby; wifi networks with RSSI larger than -76 will already be discovered on a one-shot wifi scan.
Trying Broadband Characterization at Home
Speaker: Mario A. Sanchez
- Home networks are increasingly complex. Presents challenges in network usability, resource management, and complicates broadband characterisation.
- UPnP adoption is increasing. A UPnP-enabled gateway can be used to infer cross-traffic.
- Measurements: Dasu, a platform for broadband measurement; passive measurements from netstat, cumulative bytes transferred on the access-link (UPnP); active upload/download throughput measurements, and discovered devices via UPnP.
- Note: devices which don’t respond to UPnP “don’t exist”, but their network usage can be inferred. Also, the same device can announce several UPnP services.
- For 16% of home networks, there are 3 or more UPnP devices in the home; for 65% of home networks there are 1 or more UPnP devices.
- Interesting: the number of devices in the network is high, but only a few regularly connect to the internet.
Session 6
Searching for Spam: Detecting Fraudulent Accounts via Web Search
Speaker: Marcel Flores
- Twitter spam: forced brevity, easily obscured content, non-symmetric social links.
- Existing methods are good at detecting spam, but they still let the first volley through.
- Key observation: users often use many interlinking sites (blogs, forums, etc), but spam accounts are often throwaways.
- Using existing search engines, can search and learn about a new user handle even prior to it sending any spam. Remove dupes from search results, white-list a set of known-helpful sites. If there are results left, declare the account legit.
- Filtered @mentions from the collected dataset to leave around 110,000 messages, each for a unique account. Run algorithm, check again in two weeks to determine whether the account has been suspended. 21.25% of observed accounts were suspended.
- On further analysis, achieves a true positive rate of 74.23%, a false positive rate of 10.67%. Taking into account manual inspection of false positives according to the search results, the true positive rate may be as high as around 79.2%, and the false positive rate as low as 4.5%.
Characterization of Blacklists and Tainted Network Traffic
Speaker: Jing Zhang
- Network reputation blacklists widely adopted in DNS, mail servers, anti-virus applications, etc
- What are the properties of blacklists (dynamism, consistency, overlap between different lists)
- Impact of reputation? What will happen is we apply filtering policies?
- Data collection: Reputation blacklists collected from publishers daily. Three broad classes of malicious network activity (spam, phishing/malware, active attacks)
- Data collection: Merit Networks, 118.4TB of traffic with 5.7 billion flows and 175 million packets, represented by netflow.
- Question: How stable are blacklists? Answer: very, but size of lists varies wildly between publishers.
- Question: How persistent are the blacklisted IPs? Some are updated aggressively with 500% turnover, others are actually pretty constant.
- Question: What is the distribution of malicious IPs over registries? Answer: APNIC and RIPE have more IPs that are involved in SPAM and active attacks; ARIN and RIPE are the most common regions for phishing and malware.
- Question: How many IPs in each blacklist are overlapped with others? Answer: the overlap within the same class of blacklists was significantly larger than across different classes.
- Question: what fraction of traffic carries a negative reputation? Answer: 40% of flows, or 17% of the bytes are tainted.
- Question: whether a list or a class of lists have the greatest impact on the traffic? Answer: Variance among the tained traffic volumes ranging from more than 10GB per hour to tens of MB per hour. Each IP in phishing/malware and active attack lists contributed two orders of magnitude more tainted traffic.
- Question: what fraction of global blacklists are touched by local traffic? Answer: only a small fraction. Difference between global view and local view.
- Question: Are any IPs responsible for a disproportionately large fraction of tainted traffic? Answer: Top 50 IPs responsible for around 40% of tainted traffic.
- Question: how are these heavy hitters distributed across blacklists? Answer: Top 50 IPs contributed more than half of the tainted traffic for each blacklist. Contribution is even higher in phishing/malware lists. 60 CDN servers and 51 hosting company IPs.
Characterizing Large-scale Routing Anomalies: A Case Study of the China Telecom Incident
Speaker: Rahul Hiran
- 8th April 2010 China Telecom hijack
- Characterise the incident using only public data (routeviews, iplane)
- China telecom announced 50,000 prefixes that it did not own
- Country-based analysis: Was any particular country targeted? Does the distribution of announced prefixes indicate a trend? Seems not. Distribution of hijacked prefixes does not deviate from global distribution.
- Subprefix analysis: 21% of prefixes were longer than the existing prefixes, but 95% of these belonged to China Telecom
- How did interception occur? A neighbour routes to china telecom for hijacked traffic but another neighbour does not, permitting traffic to pass through china telecom en-route to its destination.
End of conference