Saturday, September 6, 2008

Analysis of the Increase and Decrease Algoorithms for Congestion Avoidance in Computer Networks

An abstract setting where there are multiple nodes competing for a single resource is considered. The total resource requirement is the sum of each nodes' requirement. The goal is to control the amount of requirements of each node by providing a binary feedback indicating the load of the resource so that the resource is utilized efficiently (meaning the system operates near its "knee"), fairly (meaning the resource capacity is equally divided among nodes), and without having to provide the details to the nodes beyond the binary feedback.

The first control method proposed is a linear control where the state equation is linear. The state equation used in each time slice is determined by the feedback signal. When an increase is requested, by using an additive control the efficiency can be improved while the fairness stays the same. When a decrease is requested, by using a multiplicative control, the efficiency and fairness can be improved. It is ideal to simultaneously minimize the time to convergence and overshoot but the reality is that these do not move in the same direction with respect to the control levers. This is a result somewhat expected since taking big steps ensure a rapid approach to the goal but results in a large scale of oscillation. The paper finishes with a non-linear control alternative and states some practical issues.

This paper restricted its attention to a simple confined setting which helps to convey the important ideas. The overall approach was very reasonable and well explained. I did notice something that was wrong: p.12 the time to reach the goal. The top part should be when 0 < b, b \neq 1 and the bottom part b=1, not b=0. I am curious about whether such mechanism is currently implemented in today's Internet. Since the nodes in this papers would represent the routers, are routers able to adjust the rate at which packets are sent out? A more realistic model would involve resources forming a network structure. This would have the added complexity of having the resource capacity itself being controlled according to the feedback provided by the resource it is using.

Wednesday, September 3, 2008

On Inferring Autonomous System Relationships in the Internet

This paper proposes heuristic algorithms that infer the AS relationships by constructing an annotated AS graph using BGP routing tables. It starts with a review of BGP using formal notation and a description of what happens when a route is imported and exported between ASes.

An AS graph can be annotated by using directed arcs to denote provider-customer relationships and different types undirected arcs to denote sibling and peer relationships respectively. Based on this annotated graph and observation of the AS_path attirbute, routes are categorized into customer, provider, and peer routes. It then assumes an export rule that indicates what types of routes are to be advertised to what types of neighboring ASes (Selective Export Rule). A reslut of this rule is that AS relationships can be determined: peers do not provide transit for each other, providers provide transit for customers, customers do not provide transit for providers, and siblings provide transit for each other. A path is defined to be valley-free iff a provider-customer edge can be followed by only provider-customer or sibling edges, and a peer edge can be followed by only provider-customer or sibling edges. Assuming the selective export rule, it is shown that BGP routing tables are valley-free. Another important assumption for the heuristic is that the top provider is the AS having the highest edge degree in an AS_path.

Using the top provider as a reference point, the direction of transit is determined within an AS_path and consequently, the links can be labeled sibling, provider-to-customer, or customer-to-provider. Taking into account the possibility that a router can be misconfigured resulting in a corruption of the structure which the labeling decision depends on, a refined algorithm detecting an asymetry of the number of transit is presented. Finally, peer relations are detected using the reasoning that peering can only happen with the neighbor of the top provider and it is likely to have a degree close to the top provider.

The format of this paper using formal notation helped me understand the ideas more effectively with little ambiguity compared to the tutorial. I am curious of what the real world consequences of every entity knowing the full AS relationships will be. And what value this information holds. Also, what attributes of a route entry can be manipulated and what cannot? Especially if the AS_path attribute can be changed freely and redistributed to other ASes, it will have huge consequences. Perhaps only prepending is allowed?

Interdomain Routing Tutorial

This paper discusses BGP, the mechanism used for making interdomain routing decisions. The basic assumption is that the network is interconnected by many autonomous systems (ASes) where their operation is driven by financial interests. The relation between two ASes can be a provider-customer relation where the provider is capable of providing global connectivity to its customers and charge a transit fee for that. Or the relation can be a peering type where two ASes form an “alliance” to avoid paying transit to their respective ISPs for traffic between them. Between ASes, the agreement to forward packets to a destination prefix is done through route announcements. BGP is the protocol used to exchange this routing information between border gateways. When announcing routes to different ASes, however, they are filtered so that it contributes to meeting their financial objective.

The design goals of BGP are to be scalable, be able to enforce various routing policies, and to promote cooperation under competitive circumstances. IGPs were not suitable for the job because it doesn't scale well and its inability to implement policies. The basic structure of a BGP announcement is an IP prefix followed by several attributes that can be used to implement routing policies. One important attribute is LOCAL PREF where an AS uses it to prefer routes announced by a customer to those of peers to those of providers. Of course this is because it is the profitable thing to do. Since generally, there are several BGP routers within an AS connected to several different ASes, there is a need to exchange routing information between the internal BGP routers so that they have a unified view of the outside. This is done by iBGP while the between-AS BGP is called eBGP. After comparing the attributes of several route announcements with the same prefix, the BGP router selects the next hop, writes it to its forwarding table and disseminates this to the other routers.

This was a good tutorial that provided background on the multi-tiered internetwork and used it to motivate the why BGP has particular features or attributes. It also explained well what goes on in a BGP session. On figure 4-7 with the route reflectors, I think the route reflectors should still maintain fully-meshed iBGP sessions with each other rather than a tree-like structure. Because if a route update comes from a client of R1, according to the update rule, R3 and its clients will not get the updated route. Also, the tutorial implies there can be BGP routers that don't have an eBGP session but only iBGP sessions. The only reason for having these kinds of routers should be that these have the potential to have an eBGP session with another but the inter-border link is temporary down.

BGP seems suitable for enforcing policies but since it has a rather simple route selection rule, it neglects to some extent the quality of service provided to its customers. This may be an area where there can be improvement in future research.

Monday, September 1, 2008

The Design Philosophy of the DARPA Internet Protocols

This paper provides the reasons behind why the elements of the early Internet were designed a particular way. The author made an approach by first presenting the high level goals that were required of the network and then rationalized the many decisions that were made to implement those goals.

The fundamental goal of the network was to interconnect existing networks by sharing network resources. Some of the choices made in this regard was to not design from scratch a unified multi-media system and to use packet switching. Since the DARPA network was intended primarily for military use, the second level goals were prioritized based on wartime needs where survivability being the top priority, and accountability of resources the lowest. But as priorities shift, some of the decisions that were made back then may eventually have a painful consequence in the future.

The only failure that was allowed to be perceived above the transport layer was a total partition so the state information must be stored somewhere in the network to recover from non-fatal failure. The choice was made to store this information with the host so that the state information is lost only when the associated host is lost. This type of “fate sharing” seems very reasonable but one instance I can think of where replication of state information would be better is when there exists some other host in the network that can resume the interrupted job.

The use of datagrams as the building block allowed the flexibility of the network to support a variety of services with different requirements.

One thing that was very confusing about the TCP part was about the EOL flag. Since the “buffer is returned to the user either when it is full, or an EOL is received”, I don't see why there will ever be two buffers that are non-empty. If the current buffer fills up before the out of order packet arrives, it will already have been emptied and there will never be a need to use the second buffer. Maybe I'm not clear of what the buffering strategy is or it's because I'm not familiar with the details of TCP.

Overall, I feel this paper was very effective on explaining why components were designed as they are and also provided an opportunity to re-evaluate the other options that were not chosen at that time from the perspective of modern day requirements.

End to End Arguments in System Design

The main difficulty of deciding what functions should be implemented closer to the application and what at a lower level seems to originate from the problem of deciding between a leaner system at the expense of burdening the application developer and a function-rich system that may not be worth the while.

The end-to-end argument is a design principle that argues that the implementation of specific functions should be pushed towards the higher-level layers of the system where it is ultimately needed. The rationale is that the application is what is ultimately responsible for ensuring the correctness of a certain task execution and unless the function is implemented at the lower-level perfectly, which is not very likely, the application can never be free of this burden and will eventually have to deal with this possibility.

But it is necessary to implement at least some functionality at the lower-level so that the application can properly complete a task with an expectable amount of delay or error frequency. It is important to decide where to draw the line because of this trade off between performance enhancement and effort put into low-level implementation. But it was interesting to see that implementing something at the lower level doesn't necessarily mean that it will be beneficial to the performance, as services not needing that specific functionality will still have to pay for it in a way.

Generally agreeing to the end-to-end argument, one question I want to ask is if there is a function in question, why not just implement it in the next layer? If it is possible to have multiple layers of abstraction that are not strictly hierarchical, we can avoid problems like some application being subject to some unwanted feature implemented in the lower level by directly interacting with the lower level layer or a different sibling layer. So another option in my opinion is to consider increasing the granularity of the abstraction. From this perspective, I think the problem of deciding where to implement a function is not much different from the problem of deciding what kind of layers are we going to have.