Status: In progress.
Abstract: OpenFlow assumes a logically centralized controller, which ideally can be physically distributed. However, current deployments rely on a single controller which has major drawbacks including lack of scalability. In this project, we present HyperFlow, a distributed event-based control plane for OpenFlow. HyperFlow is logically centralized but physically distributed: it provides scalability while keeping the benefits of network control centralization. By passively synchronizing network-wide views of OpenFlow controllers, HyperFlow localizes decision making to individual controllers, thus minimizing the control plane response time to data plane requests. HyperFlow is resilient to network partitioning and component failures. It also enables interconnecting independently managed OpenFlow networks, an essential feature missing in current OpenFlow deployments. We have implemented HyperFlow as an application for NOX. Our implementation requires minimal changes to NOX, and allows reuse of existing NOX applications with minor modifications.

We also have a number of projects which aim at improving the overall performance of network using features provided by SDNs. OpenTCP is a framework we have developed to improve performance of TCP flows by providing hints through an Oracle (which is a controller application with access to network-wide statistics). Currently, we are in the process of deploying OpenTCP in a 4000 node datacenter network to evaluate its performance. OpenBoot is a simple controller replication system that provides high levels of reliability and allows updating and patching controller applications without any interruption in network performance. OpenTM is another project we have completed that shows how we can take advantage of features available in OpenFlow for estimating traffic matrices.

Status: In progress.
Abstract: Among different classification techniques, Deep Packet Inspection (DPI) methods are the most accurate. DPI methods have two drawbacks however: 1) they are not efficient as they rely on complex regular expressions as signatures, and 2) they require manual intervention for signature generation, partly due to the complexity of signature constructs.

In this project, we present CUTE, an automatic traffic classification scheme using a set of weighted terms to tackle both of these problems. CUTE relies on matching term sets which is considerably faster than regular expression matchings. Moreover, using term sets increases the hit rate as it can be used to classify partial or slightly modified flows and makes it possible to classify traffic only using the first few bytes of the flows. Replacing regular expressions with a set of terms might introduce errors in classification. However, our theoretical analysis shows that under conditions which hold in real protocols, by using appropriate weights for extracted terms, we can find a reasonably small bound for the error probability. We have evaluated CUTE using two large sets of packets traces collected from two large ISPs within two continents. Our experimental evaluation shows that using the weight function used in CUTE leads in very high accuracy and recall rates, even when only the first 50 bytes of each flow are used. Finally, CUTE is capable of labeing 60% of incomplete flows which other DPI techniques consider unknown.
Status: In progress.
Abstract: Spam over Internet Telephony (SPIT) is a new form of spam delivered using the phone network. With the low cost of Internet telephony, SPIT has become an attractive alternative for spammers to carry out unsolicited marketing and phishing. SPIT is more intrusive than email spam as it demands immediate recipient attention.

In this project, we study individual and network-wide characteristics of communications in a phone network with the objective of identifying "SPITters". We collect and analyze the data from one of the largest phone providers in North America. First, we consider phone communication as an indicator of social ties and propose a new technique, Loose Tie Detection (LTD) to identify outliers based on social ties. Second, we introduce Enhanced Progressive Multi Grey-Leveling (EPMG), which identifies outliers based on call density and reciprocity (i.e., how often a call in one direction can be associated with a call in the opposite direction). Finally, we propose SymRank, an adaptation of the PageRank algorithm that computes the reputation of subscribers based on both incoming and outgoing calls.

We evaluate the three techniques using a dataset collected from one of the largest North American phone companies, and find that they compute an overlapping set of outliers. Our experiments reveal that LTD and SymRank -- although seemingly independent approaches -- closely match with regard to outliers, thus showing that our techniques are effective in identifying SPITters. Based on this observation, we believe LTD can be used as a simple yet effective technique for identifying suspicious nodes by phone providers.
Status: Completed.
Abstract: Sharing personal content online is surprisingly hard despite the recent emergence of a huge number of content sharing systems and sites. These systems suffer from several drawbacks: they each have a different way of providing access control which cannot be used in other systems; moving to a new system is a lengthy process and requires registration and invitation of all the friends to the new system; and the rules for access control are complicated and become more so as our networks of online friends grow.

We present the design of an access control scheme that makes sharing personal content easy. We have deployed this scheme on BitTorrent and Flickr, two popular systems for sharing content online. Our scheme is based on a simple insight -- we must decouple content delivery and sharing from managing social networking information. Our scheme lets people manage their social networks themselves (e.g., through their personal address books) while letting Web sites and Internet systems deliver content. For this, we introduce two new concepts -- social attestations and social access control lists (ACLs). At a high-level, a social attestation is a small piece of meta-data issued by one person to another encapsulating a social relationship. The recipient can use this attestation to prove the social relationship to any online site or to any other user. People exchange these attestations once while reusing them to gain access to their friends' personal content scattered across different sites and systems.

More information: Please visit the Lockr web page.
Status: Completed.
Abstract: Internet routers require buffers to hold packets during times of congestion. The buffers need to be fast, and so ideally they should be small enough to use fast memory technologies such as SRAM or all-optical buffering. Unfortunately, a widely used rule-of-thumb says we need a bandwidth-delay product of buffering at each router so as not to lose link utilization. This can be prohibitively large. In a recent paper, Appenzeller et al. challenged this rule-of-thumb and showed that for a backbone network, the buffer size can be divided by sqrt(N) without sacrificing throughput, where N is the number of flows sharing the bottleneck. In this work, we explore how buffers in the backbone can be significantly reduced even more, to as little as a few dozen packets, if we are willing to sacrifice a small amount of link capacity. We argue that if the TCP sources are not overly bursty, then fewer than twenty packet buffers are sufficient for high throughput. Specifically, we argue that O(log W) buffers are sufficient, where W is the window size of each flow. We support our claim with analysis and a variety of simulations. The change we need to make to TCP is minimal---each sender just needs to pace packet injections from its window. Moreover, there is some evidence that such small buffers are sufficient even if we don't modify the TCP sources, so long as the access network is much slower than the backbone, which is true today and likely to remain true in the future. We conclude that buffers can be made small enough for all-optical routers with small integrated optical buffers.


More information: Please visit the buffer sizing web page.
Status: Completed. Abstract: Routing in mobile ad-hoc networks is hard because the topology can change very rapidly. By the time new paths are discovered, the network can change again -- and in extreme cases, packets circulate endlessly and the system is unstable. Most attempts to solve this problem have required that the topology changes slowly. We propose a routing algorithm called Volcano Routing Scheme (VRS) which will route packets successfully, even if the topology changes very rapidly. VRS doesn't need to discover routes, or exchange routing information; it simply balances the load locally between adjacent pairs of nodes. We show that under some loose conditions on network topology changes, VRS keeps the system stable. Simulations also suggest that VRS is stable for various models of mobility, different communication patterns, and different amounts of flow in the network. Interestingly, we prove that when the network topology is static, packets follow the shortest path. Another approach to alliviate the impact of topology changes in communication networks is multi-path routing. Multi-path routing has been studied thoroughly in the context of wired networks. It has been shown that using multiple paths to route messages between any source-destination pair of nodes (instead of using a single path) balances the load more evenly throughout the network. The common belief is that the same is true for ad hoc networks, i.e., multi-path routing balances the load significantly better than single-path routing. We show that this is not necessarily the case. We introduce a new model for evaluating the load balance under multi-path routing, when the paths chosen are the first K shortest paths (for a pre-specified K). Using this model, we show that unless we use a very large number of paths (which is very costly and therefore infeasible) the load distribution is almost the same as single shortest path routing. This is in contrary to the previous existing results which assume that multi-path routing distributes the load uniformly.
Status: Completed.
Abstract: Input Queued (IQ) switches have been well studied in the past two decades by researchers. The main problem concerning IQ switches is scheduling the switching fabric in order to transfer packets from input ports to output ports. Scheduling is relatively easier when all packets are of the same size. However, in practice, packets are of variable length. In the current implementation of switches, variable length packets are segmented into fixed length packets---also knowns as cells---for the purpose of scheduling. However, such cell-based switching comes with some significant disadvantages: (a) loss of bandwidth due to the existence of incomplete cells; and (b) additional overhead of segmentation of packets and re-assembly of cells. This is a strong motivation to study packet-based scheduling, i.e., scheduling the transfer of packets without segmenting them. The problem of packet scheduling was first considered by Marsan et al. They showed that under any admissible Bernoulli IID (independent and identically distributed) arrival traffic, a simple modification of the Maximum Weight Matching (MWM) algorithm achieves 100% throughput. In this work, we first show that no work-conserving (i.e., maximal) packet-based algorithm is stable for arbitrary admissible arrival processes. Thus, the results of Marsan et al. are strongly dependent on the arrival distribution. Next, we propose a new class of waiting algorithms. We show that the waiting-MWM algorithm is stable for any admissible traffic using the fluid limit technique. The algorithms presented in this work are distribution independent or universal. The algorithms and proof methods may be useful in the context of other scheduling problems.
Status: Completed.
Abstract: Routing messages between pairs of nodes is one of the most fundamental tasks in any distributed computing system. An Interval Routing Scheme (IRS) is a well-known, space-efficient routing strategy for routing messages in a network. In this scheme, each node of the network is assigned an integer label and each link at each node is labeled with an interval. The interval assigned to a link l at a node v indicates the set of destination addresses of the messages which should be forwarded through l at v. When studying interval routing schemes, there are two main problems to be considered: a) Which classes of networks do support a specific routing scheme? b) Assuming that a given network supports IRS, how good are the paths traversed by messages? The first problem is known as the characterization problem and has been studied for several types of IRS. In this work, we study the characterization problem for various schemes in which the labels assigned to the vertices are d-ary integer tuples (d-dimensional IRS) and the label assigned to each link of the network is a list of d 1-dimensional intervals. This is known as Multi-dimensional IRS (MIRS) and is an extension of the the original IRS. We completely characterize the class of network which support MIRS for linear (which has no cyclic intervals) and strict (which has no intervals assigned to a link at a node v containing the label of v) MIRS. In real networks usually the costs of links may vary over time (dynamic cost links). We also give a complete characterization for the class of networks which support a certain type of MIRS which routes all messages on shortest paths in a network with dynamic cost links. The main criterion used to measure the quality of routing (the second problem) is the length of routing paths. In this work we also investigate this problem for MIRS and prove two lower bounds on the length of the longest routing path. These are the only known general results for MIRS. Finally, we study the relationship between various types of MIRS and the problem of drawing a hyper-graph. Using some of our results we prove a tight bound on the number of dimensions of the space needed to draw a hyper-graph.