IMRG IPMP Review Team Report ---------------------------- September 2004 The Internet Measurement Research Group (within the IRTF) convened a small team to review the materials related to the IP Measurement Protocol (IPMP). The members of the group (listed at the end of this report) discussed IPMP and several larger issues. In particular, the team reviewed the following two Internet-Drafts: draft-mcgregor-ipmp-03.txt draft-bennett-ippm-ipmp-01.txt The goal of this effort was to chart a strawman course for moving forward with some sort of measurement protocol (if possible). Note: This message represents the group's consensus. However, that does not mean that each member of the team agrees with each point in this note. The group reached rough agreement, not unaminity. The following are the high-order bits from the discussion. The fundamental challenge that measurement protocols attempt to address is to provide a means to measure the network characteristics researchers and operators want to understand in a way that provides fine grained information about the network in a lightweight fashion. To this end, we would suggest that IPMP wants to develop tools that are: - implementable in reasonable timeframes on existing equipment, which means that they should not depend on ASIC development or new equipment purchase - deployable; ISPs would ideally want them, and at minimum not turn them off - useful to the ISPs in terms of their business rules and the questions they ask about their own networks If the procedures or protocols are useful to the ISPs, one can expect that they will be willing to collect the data, and may under some appropriate rules also allow researchers to collect data or share collected data with researchers. In the above context, the team found the motivation for IPMP given in both documents to be lacking --- to the point where the team did not feel the current proposals are viable. Several related/supporting points were discussed: * From the perspective of a vendor developing equipment and protocols or an ISP deploying them, the IPMP proposals on the table do not look viable. The fundamental goal of IPMP is to display the structure of a network and many of its fine-scale characteristics. This is information that a service provider does not share with anyone else except - maybe - under NDA. Given that the protocols to obtain the information are fairly complex and involve a fair level of memory writes, the vendor will do this if and only if its ISP customers ask for it, and they are not asking for this. * Making a better ping or traceroute is, on the one hand, too narrow and mechanistic a focus and yet also too focused on what researchers might find compelling rather than what operators would. * A tool to reverse engineer a network isn't needed by the ISPs. They already know the structure of their own networks. That said, the team **strongly** believes that there is much room for improvement in the state of network troubleshooting and debugging. In particular: * Some service providers are asking for a solution to a problem that may yield data that researchers may find valuable. Within its own network, a service provider is generally interested in locating the links that introduce variability into their network. It may view them as under-provisioned for offered load, as inappropriately routed, or whatever, but they are in fact interested in locating links that require upgrading in some form. * Some service providers are asking (in TIA and related fora) how they can deploy SLAs that cross ISP boundaries. These may be among ISPs that form business coalitions, such as Teleglobe has tried to set up with its transit network customers, or among regional networks such as US RBOCs that view transitive SLAs as a rational approach. The watchword in such consortia is "trust but verify"; it is in their interest to have a procedure or protocol that will allow them to isolate issues that may prevent them from meeting SLA guarantees in something resembling real time. Since those SLAs are one-way, this means accurate one-way delay and jitter measurements host to host, POP to POP, or CPE to CPE. In addition, in looking at the protocols themselves, we found ourselves wondering how much could be learned by clever inference from fairly simple data collection and black box measurement, as opposed to explicit reporting of values. As another example, we note that the intention of such procedures as CalTech's FAST and MIT's XCP protocols is to detect and measure variable delays in the network and cause traffic to be sent in such a way as to maximize throughput while minimizing such delays. This fundamental question is a direct corollary to that raised in http://www.nwfusion.com/research/2002/1216isptestside1.html, and that raised in the context of transitive Tier 2 network SLAs. These would like to be able to identify the existence of an SLA failure or other disturbance in the Force on a route, report its magnitude, and isolate the disturbing device. To that end, we wonder can be done with the numbers measured by Dina Katabi's XCP protocol. Finally, the team wondered if a protocol that carries less global information but more precision would be more deployable. For example if the stamps just consisted of an opaque ID, TTL and simple 32 bit counter running on "the most stable local frequency source", then the ISP (w/ the engineering documentation for their own gear) can use database techniques to compute everything carried by the current protocol. The stamps are simple enough where we can, with a straight face, ask for them in multiple places within one box: input and output framers, bus DMA engines, etc. We can envision that this would be an extremely valuable tool for an ISP to understand (and diagnose) certain QoS properties of their own network. Note that, globally parsable metadata in the stamps probably has negative value to most ISPs because it reduces an ISP's ability to keep it's assets private. The barrier to deployment in not so much the cost of the implementation, but the indirect cost of the leaking proprietary topology information. At the same time, external researchers could use inference techniques to get some of the same information, including most dynamic properties such as queue depths etc. The external users get much less topology information, unless they make an explicit arrangement with the ISP to get the annotations associated with the opaque IDs. In summary, the team came to two points of consensus: 1) that the protocol is inadequately motivated by the proposals, even though ISPs would like to be able to measure their and their neighbors' networks; 2) that the protocol's complexity and intrusiveness are inadequately justified with respect to other, potentially more lightweight approaches that may be easier to deploy. The main point is that to get a protocol deployed, ISPs need to ask for it loudly enough and router vendors need to be able to implement it easily enough, and neither is argued by these proposals. Review team members: Guy Almes (Internet2), Fred Baker (Cisco), Paul Barford (UWisc), Chistophe Diot (Intel Research), Ralph Droms (Cisco), Larry Dunn (Cisco), Matt Mathis (PSC), David Moore (CAIDA), Jennifer Rexford (AT&T Research), Neil Spring (Univ. of Washington) Scribe / team shepherd: Mark Allman (ICIR)