NAME

ipsumdump - produce ASCII summary of network traffic or tcpdump(1) file


SYNOPSIS

ipsumdump [-r | -i | ...] [--src, --dst, --sport, --dport, ...] [other options] [files or interfaces]


DESCRIPTION

The ipsumdump program reads IP packets from one or more data sources, then summarizes those packets into a line-based ASCII file. The resulting summary dump is easy to process with text-based tools. (But see the --binary option, which generates a smaller binary file.)

Here are a couple lines of ipsumdump output, from 'ipsumdump -sd /home/kohler/largedump.gz':

  !IPSummaryDump 1.2
  !creator "ipsumdump -sd /home/kohler/largedump.gz"
  !host max.lcdf.org
  !runtime 1000943858.353723 (Wed Sep 19 16:57:38 2001)
  !data ip_src ip_dst
  64.55.139.202 209.247.204.242
  18.26.4.9 64.55.139.202

The '-sd' option, which is equivalent to '--src --dst', tells ipsumdump to log source and destination IP addresses. '/home/kohler/largedump.gz' is a compressed tcpdump(1) file. Each data line represents a packet; a space separates the two addresses. The '!data' comment describes the contents of each data line.


OPTIONS

Data Sources

Data source options tell ipsumdump what kind of data source to use: tcpdump(1) raw-packet files (--tcpdump), live network interfaces (--interface), NetFlow summary files (--netflow-summary), ipsumdump output files (--ipsumdump), or files containing tcpdump's textual output (--tcpdump-text).

Non-option arguments specify the files, or interfaces, to read. For example, 'ipsumdump -r eth0 eth1' will read two tcpdump(1) files, named ``eth0'' and ``eth1''; 'ipsumdump -i eth0 eth1' will read from two live network interfaces, ``eth0'' and ``eth1''.

Options that read files -- --tcpdump, --netflow-summary, --ipsumdump, and --tcpdump-text -- read from the standard input when you supply a single dash - as a filename, or when you give no filenames at all.

--tcpdump, -r
Read from one or more files produced by tcpdump(1)'s -w option (also known as ``pcap files''). Stop when all the files are exhausted. This is the default. Files (except for standard input) may be compressed by gzip(1) or bzip2(1); ipsumdump will uncompress them on the fly.

--interface, -i
Read from live network interfaces. When run this way, ipsumdump will continue until interrupted with SIGINT or SIGHUP. When stopped, ipsumdump appends a comment to its output file, indicating how many packets were dropped by the kernel before output.

--netflow-summary
Read from one or more NetFlow summary files. These are line-oriented ASCII files; blank lines, and lines starting with '!' or '#', are ignored. Other lines should contain 15 or more fields separated by vertical bars '|'. Ipsumdump pays attention to some of these fields:
  Field  Meaning                       Example
  -----  ----------------------------  ----------
  0      Source IP address             192.4.1.32
  1      Destination IP address        18.26.4.44
  5      Packet count in flow          5
  6      Byte count in flow            10932
  7      Flow timestamp (UNIX-style)   998006995
  8      Flow end timestamp            998006999
  9      Source port                   3917
  10     Destination port              80
  12     TCP flags (OR of all pkts)    18
  13     IP protocol                   6
  14     IP TOS bits                   0

--ipsumdump
Read from one or more ipsumdump files. Any packet characteristics not specified by the input files are set to 0.

--format=format
Read from one or more ipsumdump files, using the specified default format. The format should be a space-separated list of content types; see ToIPSummaryDump(n) for a list.

--tcpdump-text
Read from one or more files containing tcpdump(1) textual output. It's much better to use the binary files produced by 'tcpdump -w', but if someone threw those file away and all you have is the ASCII output, you can still make do. Only works with tcpdump versions 3.7 and earlier.

--nlanr
Read from one or more NLANR-formatted trace files (fr, fr+, or tsh format). See http://pma.nlanr.net/Traces/.

--dag
Read from one or more DAG-formatted trace files. Use this option for new-style ERF dumps (which contain encapsulation type information), or for old-style dumps with ATM encapsulation. See http://dag.cs.waikato.ac.nz/.

--dag-ppp
Read from one or more DAG-formatted trace files with old-style PPP encapsulation. See http://dag.cs.waikato.ac.nz/.

Dump Contents

These options determine what actually goes in the dump; supply an arbitrary number of them. In the output, fields are separated by spaces. If you say '-sd', for example -- or the equivalent '--src --dst' -- the dump's data lines will contain an IP source address, a space, and an IP destination address:

    192.168.1.101 18.26.4.44

If you supply no dump content options, ipsumdump will not create a summary dump. This may be useful if you're only interested in creating a tcpdump file with --write-tcpdump.

--timestamp, -t
Include packet timestamp in the dump. Example: 1000212480.005813. For NetFlow summary input, the packet timestamp equals the flow-end timestamp.

--first-timestamp, -T
Include flow-begin timestamp in the dump. Example: 1000212479.001937. This is meaningful only for packet sources that include flow-begin timestamps, such as NetFlow summaries.

--src, -s
Include IP source address in the dump. Example: 192.168.1.101.

--dst, -d
Include IP destination address in the dump. Example: 18.26.4.44.

--sport, -S
Include TCP or UDP source port in the dump. Example: 8928. For non-TCP or UDP packets, and for fragments after the first, this field is a single dash -.

--dport, -D
Include TCP or UDP destination port in the dump. Example: 80.

--length, -l
Include packet length in the dump, not including any link-level headers. Example: 72.

--protocol, -p
Include IP protocol in the dump. Can be T for TCP, U for UDP, I for ICMP, or a number for some other protocol.

--id
Include IP ID field in the dump. Example: 19371.

--fragment, -g
Include IP fragment test in the dump. The field value is F for first fragments, f for second and subsequent fragments, and . (a single period) for nonfragments.

--fragment-offset, --fragoff, -G
Include IP fragment offset in the dump. The field value is the fragment offset in bytes, possibly followed by a + suffix, indicating the MF (more fragments) flag. Examples: 0+ (fragment offset 0, more fragments forthcoming), 552 (fragment offset 552, this is the last fragment).

--tcp-seq, -Q
Include TCP sequence number in the dump. Example: 4009339012. For non-TCP packets, and for fragments after the first, this field is a single dash -.

--tcp-ack, -K
Include TCP acknowledgement number in the dump. Example: 4009339012. For non-TCP packets, and for fragments after the first, this field is a single dash -.

--tcp-flags, -F
Include TCP flags byte in the dump. Each flag is represented by an uppercase letter. Example: PA (PSH and ACK are on, everything else is off). If no flags are on, prints . (a single period). For non-TCP packets, and for fragments after the first, this field is a single dash -.

Flag characters are F for FIN, S for SYN, R for RST, P for PSH, A for ACK, U for URG, E for ECE (flag bit 6), C for CWR (flag bit 7), and N for Nonce Sum (flag bit 8).

--tcp-opt, -O
Include TCP options in the dump. Single TCP option fields have the following representations:
    EOL, NOP        No representation
    MSS             'mss1400'
    Window scale    'wscale10'
    SACK permitted  'sackok'
    SACK            'sack95-98'; each SACK block
                    is listed separately
    Timestamp       'ts669063908:38382731'
    Other options   '98' (option 98, no data),
                    '99=0:5:10' (option with data, data
                    octets separated by colons)

Multiple options are separated by commas. Any invalid option causes the entire field to be replaced by a single question mark '?'. A period '.' is used for packets with no options (except possibly EOL and NOP).

--tcp-sack
Include SACK-related TCP options in the dump, using the format given under --tcp-opt, above.

--payload-length, -L
Include length of packet payload in the dump. This is the length of the TCP or UDP payload, for TCP or UDP packets, or the length of the IP payload, for other IP packets. Example: 1000.

--payload
Include the actual packet payload in the dump. This is the TCP or UDP payload, for TCP or UDP packets, or the IP payload, for other IP packets. Output as a double-quoted C string; non-ASCII characters, and double-quotes and backslashes, appear as C backslash escapes. Example: ",25\r\n\000".

--capture-length
Include the length of captured IP data in the dump. This can be less than the full IP length (see --length), since many packet capture programs will store only part of each packet's data.

--packet-count, -c
Include packet count in the dump. Some kinds of logs -- such as NetFlow summary logs -- record information about flows, not packets. A flow represents multiple packets; the packet count says exactly how many. Example: 1. See also --multipacket, below.

Other Options

--output=file, -o file
Write the summary dump to file instead of to the standard output.

--write-tcpdump=file, -w file
Write processed packets to a tcpdump(1) file -- or to the standard output, if file is a single dash - -- in addition to the usual summary output.

--binary, -b
Write the summary dump in binary format. See below for more information.

--filter=filter, -f filter
Only include packets and flows matching a tcpdump(1) filter. For example, 'ipsumdump -f ``tcp && src net 18/8''' will summarize data only for TCP packets from net 18. (The syntax for filter is currently a subset of tcpdump's syntax.)

--bad-packets
Print lines like '!bad IP header length 4' for packets with no IP headers, bad IP headers, or bad TCP/UDP headers. (A bad header has an incorrect length or unexpected version, or is spread across multiple fragments.) The '!bad' line will immediately precede the normal output line. Whether or not --bad-packets is true, a dash '-' is printed for any piece of information that came from a bad header, or that came from a portion of the header that was not captured.

--anonymize, -A
Anonymize IP addresses in the output. The anonymization preserves prefix and class. This means, first, that two anonymized addresses will share the same prefix when their non-anonymized counterparts share the same prefix; and second, that anonymized addresses will be in the same class (A, B, C, or D) as their non-anonymized counterparts. The anonymization algorithm comes from tcpdpriv(1); it works like 'tcpdpriv -A50 -C4'.

If --anonymize and --write-tcpdump are both on, the tcpdump output file will have anonymized IP addresses. However, the file will contain actual packet data, unlike tcpdpriv output.

--no-promiscuous
Do not place interfaces into promiscuous mode. Promiscuous mode is the default.

--sample=p
Sample packets with probability p. That is, p is the chance that a packet will cause output to be generated. The actual probability may differ from the specified probability, due to fixed point arithmetic; check the output for a '!sampling_prob' comment to see the real probability. Strictly speaking, this option samples records, not packets; so for NetFlow summaries without --multipacket, it will sample flows.

--multipacket
Supply this option if you are reading NetFlow or IP summaries -- files where each record might represent multiple packets -- and you would like the output summary to have one line per packet, instead of the default one line per record. See also --packet-count, above.

--collate
Sort output packets by increasing timestamp. Use this option when reading from multiple tcpdump(1) files to ensure that the output has sorted timestamps. Combine --collate with --write-tcpdump to collate overlapping tcpdump(1) files into a single, sorted tcpdump(1) file.

--interval=time
Process packets for time, an interval length in seconds (or give a suffix like '2m' or '1hr'). For --interface, ipsumdump will quit after it has run for time. For other options, ipsumdump will quit before writing a packet whose timestamp is more than time seconds later than the timestamp on the first packet it sees.

--limit-packets=count
Output at most count packets, then quit.

--map-address=addrs
addrs is a space- or comma-separated list of IP addresses and/or prefixes. When the summary dump completes, ipsumdump will write those addresses to the standard error, paired with their anonymized counterparts.

--record-counts=time
Useful when reading from interfaces. This option causes ipsumdump to write a comment recording the cumulative number of packets output, and the number of packets dropped by the kernel before ipsumdump could process them, every time seconds. (Or you can say, for example, '2m' for 2 minutes.) A sample comment:
  !counts out 0 kdrop 0

This says that ipsumdump has output 0 records, and the kernel reported 0 packet drops since ipsumdump began.

--random-seed=seed
Set the random seed deterministically to seed, an unsigned integer. By default, the random seed is initialized to a random value using /dev/random, if it exists, combined with other data. The random seed indirectly determines which packets are sampled, and the values of anonymized IP addresses.

--no-mmap
Do not use memory mapping when reading files. This may prevent crashes if you feed ipsumdump a corrupted file. See BUGS, below.

--quiet, -q
Do not print a progress bar to standard error. This is the default when ipsumdump isn't running interactively.

--config
Do not produce a summary. Instead, write the Click configuration that ipsumdump would run to the standard output.

--verbose, -V
Produce more verbose error messages.

--help, -h
Print a help message to the standard output, then exit.

--version, -v
Print version number and license information to the standard output, then exit.


SIGNALS

When killed with SIGTERM or SIGINT, ipsumdump will exit cleanly by flushing its buffers. If you want it to flush its buffers without exiting, kill it with SIGHUP.


EXAMPLES

The '-tsSdDp' option set covers the most commonly useful information about each packet: timestamp, source address, source port, destination address, destination port, and protocol. Invoking 'ipsumdump -i eth1 -tsSdDp' might produce output like this:

  !IPSummaryDump 1.2
  !creator "ipsumdump -i eth1 -tsSdDp"
  !host max.lcdf.org
  !runtime 1000967293.569808 (Wed Sep 19 23:28:13 2001)
  !data timestamp ip_src sport ip_dst dport ip_proto
  1000967303.641581 64.71.165.130 80 192.168.1.101 4450 T
  1000967303.670506 64.71.165.130 80 192.168.1.101 4450 T
  1000967303.882621 18.26.4.44 - 192.168.1.101 - I
  1000967304.253874 64.71.165.130 80 192.168.1.101 4442 T
  1000967304.390016 192.150.187.11 53 192.168.1.101 1299 U
  1000967304.425992 207.171.182.16 80 192.168.1.101 4451 T

Here is the same data, anonymized with -A:

  !IPSummaryDump 1.2
  !creator "ipsumdump --ipsumdump -A -tsSdDp"
  !host max.lcdf.org
  !runtime 1000968019.67508 (Wed Sep 19 23:40:19 2001)
  !data timestamp ip_src sport ip_dst dport ip_proto
  1000967303.641581 29.50.142.215 80 204.196.101.50 4450 T
  1000967303.670506 29.50.142.215 80 204.196.101.50 4450 T
  1000967303.882621 89.142.236.79 - 204.196.101.50 - I
  1000967304.253874 29.50.142.215 80 204.196.101.50 4442 T
  1000967304.390016 204.224.59.219 53 204.196.101.50 1299 U
  1000967304.425992 192.230.64.231 80 204.196.101.50 4451 T


BINARY FORMAT

Binary ipsumdump files begin with several ASCII lines, just like regular ipsumdump files. The line '!binary' indicates that the rest of the file, starting immediately after the newline, consists of binary records. Each record looks like this:

   +---------------+------------...
   |X|record length|    data
   +---------------+------------...
    <---4 bytes--->

The initial word of data contains the record length in bytes. (All numbers in the file are stored in network byte order.) The record length includes the initial word itself, so the minimum valid record length is 4. The high-order bit 'X' is the metadata indicator. It is zero for regular packets and one for metadata lines.

Regular packet records have binary fields stored in the order indicated by the '!data' line, as follows:

   Field Name     Length  Description
   timestamp         8    timestamp sec, usec
   first_timestamp   8    timestamp sec, usec
   ip_src            4    source IP address
   ip_dst            4    destination IP address
   sport             2    source port
   dport             2    destination port
   ip_len            4    IP length field
   ip_proto          1    IP protocol
   ip_id             2    IP ID
   ip_frag           1    fragment descriptor
                          ('F', 'f', or '.')
   ip_fragoff        2    IP fragment offset field
   tcp_seq           4    TCP seqnece number
   tcp_ack           4    TCP ack number
   tcp_flags         1    TCP flags
   tcp_opt           ?    TCP options
   tcp_sack          ?    TCP SACK options
   payload_len       4    payload length
   count             4    packet count

Each field is Length bytes long. Variable-length fields have Length '?' in the table; in a packet record, these fields consist of a single length byte, followed by that many bytes of data.

The data stored in a metadata record is just an ASCII string, ending with newline, same as in a regular ASCII IPSummaryDump file. '!bad' records, for example, are stored this way.


CLICK

The ipsumdump program uses the Click modular router, an extensible system for processing packets. Click routers consist of C++ components called elements. While some elements run only in a Linux kernel, most can run either in the kernel or in user space, and there are user-level elements for reading packets from libpcap or from tcpdump files.

Ipsumdump creates and runs a simple user-level Click configuration. However, you don't need to install Click to run ipsumdump; the libclick directory contains all the relevant parts of Click, bundled into a library.

If you're curious, try running 'ipsumdump --config' with some other options to see the Click configuration ipsumdump would run.

This is, I think, a pleasant way to write a packet processor!


COMPATIBILITY

Version 1.0 of the IPSummaryDump ASCII file format expressed 'ip_fragoff' fields in units of 8 bytes. In version 1.1 and later, these fields are expressed in bytes.

Version 1.1 used W for CWR in tcp_flags fields. Early releases in Version 1.0 versions printed a number between 0 and 255 for tcp_flags, or used X and Y for ECE and CWR. Version 1.2 and later uses C for CWR.

The names of '!data' fields were formerly printed in quotes, and could contain spaces, like the following:

  !data 'timestamp' 'ip src' 'sport' 'ip dst' 'dport' 'ip proto'

ipsumdump still understands files with the old format.


BUGS

Ipsumdump can use the mmap(2) system call to access files, which often has better performance. Unfortunately, if ipsumdump memory-maps a corrupt file, it may crash with a segmentation violation.


SEE ALSO

tcpdump(1), tcpdpriv(1), click(1)

See http://www.pdos.lcs.mit.edu/click/ for more on Click.


AUTHOR

Eddie Kohler <kohler@icir.org>, based on the Click modular router.

Extensive feedback and suggestions from Vern Paxson <vern@icir.org>. Anonymization algorithm from tcpdpriv(1) by Greg Minshall.