[Next] [Up] [Previous] [Contents]
Next: 2.4 Yoid Tree Management Up: 2 Yoid Architecture Previous: 2.2 Yoid Tree and

2.3 Content Protocols

 

The stack of yoid protocols for transmitting content is shown in Figure 2. Yoid protocols run directly over UPD or TCP.

  [IMAGE ]
Figure 2: Yoid Content Protocol Stack

Yoid Identification Protocol (YIDP)

Starting from the bottom, the yoid protocol immediately encapsulated by UDP or TCP is YIDP, the Yoid Identification Protocol. The YIDP header is attached to every packet (as opposed to every application frame as with some other yoid protocols). Its purpose, essentially, is to make up for the identification capabilities long lost from IP and TCP/UDP. Host identification is primarily based on domain name, and not at all on IP addresses or TCP/UDP port numbers.

YIDP has specific mechanisms to deal with firewalls, NAT boxes, and dialup hosts (by which I mean hosts with dynamically assigned addresses and often no domain name). All this is necessary in yoid because, unlike today's situation where most hosts (i.e. clients) typically only initiate communications, yoid requires that all hosts be able to both initiate communications with others and allow others to initiate communications with them. This includes hosts that may be behind firewalls and NAT boxes, that have dynamically assigned addresses, and that may not have a domain name.

YIDP identifies

  1. the group the packet is related to,
  2. the sending member, and
  3. in the case of unicast IP, the receiving member, and in the case of clusters, the receiving cluster.
The group is identified by the Group ID, already given above.

Members are identified using only the member domain name and the member YIDP port. The YIDP port number is unique among all instances of members using the same member domain name in a given group. The member domain name must in turn be unique among all members in a given group. Note in particular that we do not use anything at or below UDP/TCP for member identification.

The member name is syntactically a domain name. If a member has a globally unique domain name, then it can be used as the member name. If not, one will be assigned to it, per group, by the rendezvous. The domain name does not have to be a ``working'' domain name, in that it does not have to be discoverable by DNS. Indeed, a member with a working domain name may wish not to use it for privacy reasons (though if it has a global and permanent IP address, then much of the privacy benefit is lost).

The YIDP port is analogous to the port numbers of TCP and UDP. It specifies the path of protocols from YIDP up to and including the application for a given member.

Because the various identifiers used by YIDP are large, they are all compressed into a single 64-bit tag. In the case of unicast IP, the tags are different in either direction, with each member assigning the tag value that its neighbor must use to send packets to it. The tag values are conveyed at the time the two neighbors establish a connection using a simple negotiation protocol.

In the case of clusters (multicast IP), the same tag is used by all cluster members, and is assigned by the cluster head. It is conveyed by the cluster head at the time it becomes the cluster head. (When a new member becomes the cluster head, it normally adopts the same tag value.)

YIDP has several features that make it compatible with NAT/PAT boxes and firewalls. First, YIDP host identification is not messed up by NAT/PAT boxes because identification is independent of IP addresses or UDP/TCP port numbers, any of which may be modified by a NAT/PAT box.

Second, because YIDP headers contain full names, a YIDP-smart NAT box or firewall can handle connections initiated from either direction, not just from behind the firewall. This is possible because the yoid header on packets arriving from outside the firewall contain the name of the target host behind the firewall. The firewall can do a DNS lookup and determine the appropriate IP address to transmit the packet to the target host.

The Yoid Transport Layer: yTCP, yRTP, yMTCP, etc.

Next above YIDP are a number of protocols used between two neighbors or within a cluster to do such things as flow control, congestion control, sequencing, lost packet recovery (retransmissions) and the like. I envision four protocols at the yoid transport layer. The equivalents of TCP and RTP (yTCP and yRTP), and multicast equivalents of TCP and RTP (yMTCP and yMRTP, to give them a name). Note that the multicast equivalents can be much simpler than the reliable multicast protocols being developed for the wider internet today, because of the strict scoping.

It is valid to run yoid over TCP rather than UDP, in which case there would not need to be any protocol running at the yoid transport layer. It is very convenient, however, to be able to run all of yoid over a single UDP port, primarily for the purpose of dealing with firewalls and other boxes that must filter on port number. For this reason, the preferred stack is yTCP (or other yoid transport) over YIDP over UDP.

Yoid Distribution Protocol (YDP)

Next in line is the Yoid Distribution Protocol (YDP). This protocol handles everything needed to move an application frame end to end over the tree-mesh with near-zero probability of loss. Reflecting the broad functionality of yoid, YDP is itself rich in functionality. It handles framing, determines the type of forwarding (multicast, broadcast, etc.), has a (hop by hop) pushback flow control mechanism, and has a hop count. YDP can also identify the final destination(s) of the frame and the original source of a frame. In essence, YDP is to a yoid topology what IP is to a router/host topology.

In addition to all this, YDP has a sequence number space that can sequence and uniquely identify every frame originated by any source. This sequence number is used not only to insure end-to-end reliability and ordering of frame delivery, but also to prevent looping of frames and duplicate delivery of frames to members acting as transits. This allows frames to be transmitted over the tree, the mesh, or a combination of both (for instance, in response to a temporary partition of the tree).

YDP Source Identification

Source identification in YDP is optional, but typically used (I mean here the true originating source, not the previous hop sender identified by YIDP.) The source is identified by its domain name and YIDP port number. The source identification is carried in an option to the YDP header. The fixed part of the header carries a 16-bit source identification tag. When a member transmits a source identification option, it also transmits a tag to associate with that option. When its neighbor acknowledges receipt of the option, subsequent frames from that source carry only the tag.

YDP Forwarding Modes

YDP has five forwarding modes. They are: multicast (over the tree, primarily), broadcast (over the mesh), two types of anycast (over the tree and over the mesh), and unicast. The two types of anycast are different from IP's anycast. Unlike anycast IP, which targets a specific group of hosts (those with the anycast address), and typically selects the nearest, YDP anycast simply causes a frame to (randomly) walk the tree or mesh until it (randomly) decides to stop (or the hop count expires). Anycast is used by the Yoid Tree Management Protocol (YTMP) for discoverying arbitrary members in the topology, allowing members to build up a knowledge base about other members.

Unicast allows a frame to be routed over the tree to a specified target member. At first glance this might seem silly, since any member could just as easily transmit the frame directly to the target member using IP. The purpose of unicast, however, is for the case where many members are sending frames to a single member. Doing it over the tree has two advantages. First, the target member does not need to maintain a separate connection to every sending member. Second, the pushback flow control over the tree prevents the target member from being congested (should every member, for instance, decide to transmit at once).

For unicast to work, routing information must be installed in the members. Any member can cause routing information pointing to itself to be installed by multicasting a frame with itself identified as the source, and with a flag indicating that the reverse path back to the source should be remembered. When the topology changes, each member updates its new neighbor as to which ``sources'' are behind it. The intent here is that routing information is maintained for only a few members, or else there would be considerable traffic exchange at each topology change.

YDP Pushback Flow Control

Every member, acting as a transit, has a certain amount of buffer space for a given tree for storing frames in transit. This amount can be described in terms of space (e.g. 500 kbytes), or time (e.g. one minute), or a combination (e.g. 500 kbytes or one minute, whichever is greater). The amount, however, is the same for all members in the tree, and is one of the parameters given a member by the rendezvous when it joins.

If the buffer fills, the member must either drop frames for outgoing neighbors or pushback incoming neighbors. To determine which, each tree also has a minimum transmission rate associated with it. When the buffer fills, incoming neighbors are pushed back to this rate. If the buffer remains full (because an outgoing neighbor is even slower than this minimum rate), then frames for that neighbor are dropped.

The current pushback mechanism is a single bit transmitted to the neighbor being pushed back. When a member receives this bit set, it stops transmitting to its neighbor. When the bit is not set, it transmits. When frames are being received from several neighbors, the neighbor(s) chosen for pushback are those with the highest per source transmission. In other words, if a member is receiving from two neighbors at roughly the same rate, but one neighbor is transmitting frames from fewer sources, that neighbor will be pushed back first.

Note that this is by no means the only pushback mechanism or policy. For instance, the pushback could specify a transmission rate rather than simple on/off. Or, the pushback could specify certain sources (those sending the most). The best pushback mechanism is an issue for further study.

YDP Sequence Numbering

As mentioned above, every YDP header contains a sequence number. This number sequences the bytes transmitted by any source. Right now I'm assuming the sequence number is 64 bits in length. The sequence number is initialized by each source at a default low value (1024 in the current implementation), and continues until it reaches the max all-ones. (In the astonishingly unlikely event that this max is reached, the member must quit the group and rejoin with a different YIDP port number.)

The YDP sequence number is of course used to insure end to end reliability (in-order delivery of all frames). But it does more than that. It is used to prevent looping during broadcast. It is used to prevent frame loss during topology changes. It can even be used to help determine which members become neighbors in the tree.

More generally though, it allows any multicast frame to be transmitted to a non-tree neighbor without concern for looping. This means that non-tree neighbors can be used to augment multicast delivery when some tree neighbor is temporarily unable to forward multicast content. It allows yoid to respond very quickly to short-lived lapses in the tree without having to reconfigure the tree.

Exactly how this gets done depends on the nature of the content--its volume, how much gets buffered by each member, how much delay can be tolerated, and any constraints on the order in which the content can be received.

Without getting into details, I envision some kind of sequence number negotiation protocol between both tree and mesh neighbors whereby neighbors (or potential neighbors) tell each other what they have and what they would like to receive from the neighbor. (I don't get into details here in part because the full protocol hasn't been designed yet.)

As an example, consider a file distribution application whereby the entire file is buffered by each member, and there are no delay constraints to speak of (that is, while the file should be received as quickly as is reasonably possible, nothing breaks if it takes a few minutes more or less). What's more, the entire file must be received by each member, but the order in which the bytes are received is not important. In other words, the file is not used by an application until it is received in its entirety.

Here, when two members are considering becoming neighbors on the tree, they would exchange information on what sequence numbers each has already received. They would also determine which of the two was closer in the tree to the source of the file. Based on this, they would determine if they should become neighbors, and if so, which should send to the other and from what point in the file (see Section 2.5).

As another example, consider a voice conferencing application. Here, content is real-time so there is virtually zero buffering. A lost packet here or there may be tolerable, but multi-second loss of transmission (for instance, in order to detect that a neighbor is down and then respond through reconfiguration of the tree) is not tolerable.

By way of explanation, assume that member X at a given moment is transmitting content from source S to member Y, and that Y in turn is transmitting to member Z (see Figure 3). Assume that member Y becomes temporarily overloaded and stops receiving and transmitting. Z may not be able to detect this quickly, but X can because it receives no acknowledgement in the expected time (accepting here that this could be simply because of an unusually long network delay). X can broadcast a message indicating that Y is unreachable. This will quickly reach Z, which can in turn request from some non-tree neighbor that content for source S be transmitted to it.

  [IMAGE ]
Figure 3: Fast Recovery From Overloaded Node

Because this could happen to several members in different parts of the tree at the same time, the sequence numbers are needed to quell duplicates. If Z again hears from Y, it would tell the non-tree neighbor to stop transmission. If it does not hear from Y after a brief timeout, it could find a new tree neighbor (ideally the non-tree neighbor it is already receiving from, if appropriate).

There are other variations on this theme, but the basic idea is that members can use the sequence numbers and various negotiation or notification protocols to manage transmission over the tree and allow transmission over non-tree links. The best set of mechanisms will probably depend on the needs of the application and will require considerable work and experimentation to tease out.


[Next] [Up] [Previous] [Contents]
Next: 2.4 Yoid Tree Management Up: 2 Yoid Architecture Previous: 2.2 Yoid Tree and

Paul Francis
Fri Oct 1 11:06:22 JST 1999