2.2 Yoid Tree and Mesh Topologies

[Next] [Up] [Previous] [Contents]
Next: 2.3 Content Protocols Up: 2 Yoid Architecture Previous: 2.1 Major Components

2.2 Yoid Tree and Mesh Topologies

Yoid generates two topologies per group--a shared tree (or just tree) topology, and a mesh topology. Both figure large in a group's content distribution. The tree is the primary means of distributing content due to its relative efficiency.

But the tree is fragile--a single crashed or disconnected (or just plain overloaded) member can partition the tree, at least until the tree reconfigures itself. The mesh is not fragile. Its connectivity is rich enough that the simultaneous loss of a number of members will not partition it with high probability. The mesh serves a number of purposes, including where appropriate the distribution of content, all of which are needed to compensate for the fragility of the tree. These purposes include:

Discovery of tree partitions.
Distribution of content, for instance when the tree is partitioned or when very robust distribution is required.
Detection of member unreachability.
Notification of member unreachability.
Verification of content reception (content checksum or signature).

The tree topology is not a subset of the mesh topology. The two topologies are created for and optimized for different purposes, and so their creation is largely independent. The tree is optimized for efficiency while the mesh is optimized for robustness.

The yoid stack of protocols includes a protocol for end-to-end reliable delivery (YDP, the Yoid Distribution Protocol, described later). The reliable component of YDP contains a sequence number space for identifying anything transmitted by any member. This is used to prevent looping when broadcasting over the mesh. Its use when multicasting over the tree is optional, but is used whenever content reliability is required. When used in tree multicast, it can also detect and prevent loops in the tree.

Yoid Tree Topology

As stated above, yoid uses a shared-tree for distribution. A transmitting member may be at any place in the tree (a leaf, the root, or some other branching point). A member can receive a content frame from any tree neighbor, and will forward the frame to all other tree neighbors. We use the word frame here instead of packet because members receive, replicate, and forward content in units of application frames, each of which may consist of multiple IP packets. In what follows, we use the word packet only when referring to IP packets, and we use the word frame when referring to the content units that the application and yoid control protocols deal in.

Figure 1 shows a yoid tree and gives much of the terminology associated with it. Each box in the tree represents a member (the rendezvous is not shown). The solid arrows represent the ``links'' of the tree. The directionality implied by the arrows does not refer to the flow of frames, which can be in either direction. (Note that it is possible to constrain a tree so that only a given member or members can transmit. This information would be conveyed by the rendezvous at member join time, and enforced by all members.) Rather, they refer to the relationship between neighbor members in the tree.

[IMAGE ]
Figure 1: Yoid Tree and Terminology

A member may receive and transmit frames either via unicast IP or scoped IP multicast. In the current architecture, IP multicast is tightly scoped, typically by a hop count of 1 (i.e., to a single physical media), or in some cases a hop count of 2. More relaxed scoping using administrative scoping should be possible but hasn't been well thought out.

The relationship between two neighbor members over unicast IP is that of parent/child. Members attempt to find tree neighbors that are as close to them as possible (where closeness is determined by the latency between two members). Where multicast IP is used, a set of members are grouped as a cluster. One member of the cluster is elected the head, and is responsible for establishing a (unicast IP) parent neighbor, thus bridging the cluster with the rest of the tree. The other cluster members are called feet, and transmit and receive to/from the tree via the head. Cluster members are by definition close to each other, since they share a locally scoped IP multicast. The solid line arrows point from child to parent, or from foot to head.

Each tree must have a single root, which by definition is a member with no parent or head. (There may transitionally be zero or more than one root, but in steady-state there is exactly one.) All other members have exactly one parent or head.

Each member, at a given time, is a transit member or a leaf member, depending on whether it has a multiple neighbors or a single neighbor respectively. A member may intentionally limit itself to being a leaf only. While not related to the tree topology per se, we mention here that a member may be an endhost or a yoid proxy server (or just server, where the context is clear). The distinction is primarily that an endhost contains the application that is using the tree, where a server doesn't. Typically, a server would be a box that has been installed in the network infrastructure for the explicit purpose of acting as a transit member in a tree.

Figure 1 shows a sparse topology (low fan-out). This is intentional. While the tree management algorithm nominally allows any fan-out, a small fan-out appears ideal for a tree where all members are endhosts. A small fan-out maximizes the bandwidth available at each transmitting member, minimizes the amount of work each member must do, and minimizes forwarding delay at each member. A small fan-out also minimizes the amount of short-term topology change that must occur when a member quits the tree.

While testing and experimentation is needed to find optimal fan-outs for various traffic profiles, I tend to think in terms of a fan-out of two for endhost-based trees. For a cluster head member, the cluster itself counts as one ``neighbor'' for the purpose of determining fan-out.

For trees where the transit members are mainly servers, a larger fan-out may make more sense, especially where for policy or other reasons it is desirable to limit endhosts to being leaf members. In this case, the fan-out may be large simply because the number of servers per endhost is small.

Yoid Mesh Topology

Each member maintains a small number of neighbors solely for the purpose of insuring that there is a non-partitioned topology over which frames can be broadcast. These neighbors are called mesh neighbors. For the purpose of a robust broadcast to all members, both mesh and tree neighbors are used. In spite of this, mesh topology refers only to the topology consisting of mesh neighbors.

(The term multicast refers to delivery over the tree (with occasional transmission between non-tree neighbors). The term broadcast refers to delivery over the mesh-and-tree. Note that these terms are subtly different from their IP counterparts. With IP, multicast refers to transmission to a specific group of hosts (those that have joined the multicast group), whereas broadcast refers to transmission to all hosts. With yoid, on the other hand, the group of receivers for both multicast and broadcast are the same.)

To insure a non-partitioned mesh topology, each member M establishes a small number of other members--three or four--as mesh neighbors. These members are randomly selected, with the exceptions that they 1) must not include members that are tree neighbors, and 2) must not include members that have already established a mesh link to member M. The reason for this latter restriction is to prevent trivial cliques, where three or four members all use each other as mesh neighbors, thus partitioning themselves from the rest of the mesh topology. It is this restriction that makes the mesh topology robust.

Assuming that each member establishes the same number of mesh links $N$ , each member will have on average $2N$ mesh links, $N$ established from the member to other members, and $N$ established from other members to them. Of course the actual number may be greater or less than $2N$ since mesh neighbors are randomly chosen. The number may be less than $N$ is the group is small.

Efficient random selection is achieved through a frame delivery mode called ``mesh anycast'', whereby a discovery message takes a random walk along the mesh, randomly stopping at some member (discussed later). In particular, no attempt is made to find mesh neighbors that are nearby (latency-wise).

Other Kinds of Member Relationships

In addition to tree and mesh neighbors, members maintain communications with still other members for various purposes. For instance, members search out and find a few other members that they can adopt as parents on a moments notice should their current parent quit the tree or become unreachable. They maintain connections with these members, and stay up-to-date as to the members' appropriateness as parents.

The rendezvous needs to maintain knowledge of at least a smallish number of members (ten or so), and so a member my find itself in occasional communications with a rendezvous.

[Next] [Up] [Previous] [Contents]
Next: 2.3 Content Protocols Up: 2 Yoid Architecture Previous: 2.1 Major Components

Paul Francis
Fri Oct 1 11:06:22 JST 1999