ipsumdump - produce ASCII summary of network traffic or tcpdump(1)
file
ipsumdump [-r | -i | ...] [--src, --dst, --sport, --dport, ...] [other options] [files or interfaces]
The ipsumdump program reads IP packets from one or more data sources, then summarizes those packets into a line-based ASCII file. The resulting summary dump is easy to process with text-based tools. (But see the --binary option, which generates a smaller binary file.)
Here are a couple lines of ipsumdump output, from 'ipsumdump -sd /home/kohler/largedump.gz':
!IPSummaryDump 1.2 !creator "ipsumdump -sd /home/kohler/largedump.gz" !host max.lcdf.org !runtime 1000943858.353723 (Wed Sep 19 16:57:38 2001) !data ip_src ip_dst 64.55.139.202 209.247.204.242 18.26.4.9 64.55.139.202
The '-sd' option, which is equivalent to '--src --dst', tells
ipsumdump to log source and destination IP addresses.
'/home/kohler/largedump.gz' is a compressed tcpdump(1)
file. Each data
line represents a packet; a space separates the two addresses. The
'!data
' comment describes the contents of each data line.
Data source options tell ipsumdump what kind of data source to use: tcpdump(1) raw-packet files (--tcpdump), live network interfaces (--interface), NetFlow summary files (--netflow-summary), ipsumdump output files (--ipsumdump), or files containing tcpdump's textual output (--tcpdump-text).
Non-option arguments specify the files, or interfaces, to read. For
example, 'ipsumdump -r eth0 eth1' will read two tcpdump(1)
files, named
``eth0'' and ``eth1''; 'ipsumdump -i eth0 eth1' will read from two live
network interfaces, ``eth0'' and ``eth1''.
Options that read files -- --tcpdump, --netflow-summary,
--ipsumdump, and --tcpdump-text -- read from the standard input when
you supply a single dash -
as a filename, or when you give no filenames
at all.
gzip(1)
or
bzip2(1); ipsumdump will uncompress them on the fly.
Field Meaning Example ----- ---------------------------- ---------- 0 Source IP address 192.4.1.32 1 Destination IP address 18.26.4.44 5 Packet count in flow 5 6 Byte count in flow 10932 7 Flow timestamp (UNIX-style) 998006995 8 Flow end timestamp 998006999 9 Source port 3917 10 Destination port 80 12 TCP flags (OR of all pkts) 18 13 IP protocol 6 14 IP TOS bits 0
These options determine what actually goes in the dump; supply an arbitrary number of them. In the output, fields are separated by spaces. If you say '-sd', for example -- or the equivalent '--src --dst' -- the dump's data lines will contain an IP source address, a space, and an IP destination address:
192.168.1.101 18.26.4.44
If you supply no dump content options, ipsumdump will not create a summary dump. This may be useful if you're only interested in creating a tcpdump file with --write-tcpdump.
1000212480.005813
. For
NetFlow summary input, the packet timestamp equals the flow-end timestamp.
1000212479.001937
.
This is meaningful only for packet sources that include flow-begin
timestamps, such as NetFlow summaries.
192.168.1.101
.
18.26.4.44
.
8928
. For non-TCP
or UDP packets, and for fragments after the first, this field is a single
dash -
.
80
.
72
.
T
for TCP, U
for UDP,
I
for ICMP, or a number for some other protocol.
19371
.
F
for first
fragments, f
for second and subsequent fragments, and .
(a single
period) for nonfragments.
+
suffix, indicating the MF
(more fragments) flag. Examples: 0+
(fragment offset 0, more fragments
forthcoming), 552
(fragment offset 552, this is the last fragment).
4009339012
. For
non-TCP packets, and for fragments after the first, this field is a single
dash -
.
4009339012
. For
non-TCP packets, and for fragments after the first, this field is a single
dash -
.
PA
(PSH and ACK are on, everything else is
off). If no flags are on, prints .
(a single period). For non-TCP
packets, and for fragments after the first, this field is a single dash
-
.
Flag characters are F
for FIN, S
for SYN, R
for RST, P
for PSH,
A
for ACK, U
for URG, E
for ECE (flag bit 6), C
for CWR (flag
bit 7), and N
for Nonce Sum (flag bit 8).
EOL, NOP No representation MSS 'mss1400' Window scale 'wscale10' SACK permitted 'sackok' SACK 'sack95-98'; each SACK block is listed separately Timestamp 'ts669063908:38382731' Other options '98' (option 98, no data), '99=0:5:10' (option with data, data octets separated by colons)
Multiple options are separated by commas. Any invalid option causes the entire
field to be replaced by a single question mark '?
'. A period '.
' is used
for packets with no options (except possibly EOL and NOP).
--tcp-opt
, above.
1000
.
",25\r\n\000"
.
1
. See also --multipacket, below.
tcpdump(1)
file -- or to the standard
output, if file is a single dash -
-- in addition to the usual
summary output.
tcpdump(1)
filter. For example,
'ipsumdump -f ``tcp && src net 18/8''' will summarize data only for TCP
packets from net 18. (The syntax for filter is currently a subset of
tcpdump's syntax.)
!bad IP header length 4
' for packets with no IP
headers, bad IP headers, or bad TCP/UDP headers. (A bad header has an
incorrect length or unexpected version, or is spread across multiple
fragments.) The '!bad
' line will immediately precede the normal output
line. Whether or not --bad-packets is true, a dash '-
' is printed
for any piece of information that came from a bad header, or that came from
a portion of the header that was not captured.
If --anonymize and --write-tcpdump are both on, the tcpdump output file will have anonymized IP addresses. However, the file will contain actual packet data, unlike tcpdpriv output.
!sampling_prob
' comment to see the real probability.
Strictly speaking, this option samples records, not packets; so for NetFlow
summaries without --multipacket, it will sample flows.
tcpdump(1)
files to ensure that the output has sorted
timestamps. Combine --collate with --write-tcpdump to collate
overlapping tcpdump(1)
files into a single, sorted tcpdump(1)
file.
2m
' or '1hr
'). For --interface, ipsumdump will
quit after it has run for time. For other options, ipsumdump will
quit before writing a packet whose timestamp is more than time seconds
later than the timestamp on the first packet it sees.
!counts out 0 kdrop 0
This says that ipsumdump has output 0 records, and the kernel reported 0 packet drops since ipsumdump began.
When killed with SIGTERM or SIGINT, ipsumdump will exit cleanly by flushing its buffers. If you want it to flush its buffers without exiting, kill it with SIGHUP.
The '-tsSdDp' option set covers the most commonly useful information about each packet: timestamp, source address, source port, destination address, destination port, and protocol. Invoking 'ipsumdump -i eth1 -tsSdDp' might produce output like this:
!IPSummaryDump 1.2 !creator "ipsumdump -i eth1 -tsSdDp" !host max.lcdf.org !runtime 1000967293.569808 (Wed Sep 19 23:28:13 2001) !data timestamp ip_src sport ip_dst dport ip_proto 1000967303.641581 64.71.165.130 80 192.168.1.101 4450 T 1000967303.670506 64.71.165.130 80 192.168.1.101 4450 T 1000967303.882621 18.26.4.44 - 192.168.1.101 - I 1000967304.253874 64.71.165.130 80 192.168.1.101 4442 T 1000967304.390016 192.150.187.11 53 192.168.1.101 1299 U 1000967304.425992 207.171.182.16 80 192.168.1.101 4451 T
Here is the same data, anonymized with -A:
!IPSummaryDump 1.2 !creator "ipsumdump --ipsumdump -A -tsSdDp" !host max.lcdf.org !runtime 1000968019.67508 (Wed Sep 19 23:40:19 2001) !data timestamp ip_src sport ip_dst dport ip_proto 1000967303.641581 29.50.142.215 80 204.196.101.50 4450 T 1000967303.670506 29.50.142.215 80 204.196.101.50 4450 T 1000967303.882621 89.142.236.79 - 204.196.101.50 - I 1000967304.253874 29.50.142.215 80 204.196.101.50 4442 T 1000967304.390016 204.224.59.219 53 204.196.101.50 1299 U 1000967304.425992 192.230.64.231 80 204.196.101.50 4451 T
Binary ipsumdump files begin with several ASCII lines, just like regular
ipsumdump files. The line '!binary
' indicates that the rest of the file,
starting immediately after the newline, consists of binary records. Each
record looks like this:
+---------------+------------... |X|record length| data +---------------+------------... <---4 bytes--->
The initial word of data contains the record length in bytes. (All numbers
in the file are stored in network byte order.) The record length includes
the initial word itself, so the minimum valid record length is 4. The
high-order bit 'X
' is the metadata indicator. It is zero for regular
packets and one for metadata lines.
Regular packet records have binary fields stored in the order indicated by
the '!data
' line, as follows:
Field Name Length Description timestamp 8 timestamp sec, usec first_timestamp 8 timestamp sec, usec ip_src 4 source IP address ip_dst 4 destination IP address sport 2 source port dport 2 destination port ip_len 4 IP length field ip_proto 1 IP protocol ip_id 2 IP ID ip_frag 1 fragment descriptor ('F', 'f', or '.') ip_fragoff 2 IP fragment offset field tcp_seq 4 TCP seqnece number tcp_ack 4 TCP ack number tcp_flags 1 TCP flags tcp_opt ? TCP options tcp_sack ? TCP SACK options payload_len 4 payload length count 4 packet count
Each field is Length bytes long. Variable-length fields have Length '?
' in
the table; in a packet record, these fields consist of a single length byte,
followed by that many bytes of data.
The data stored in a metadata record is just an ASCII string, ending with
newline, same as in a regular ASCII IPSummaryDump file. '!bad
' records, for
example, are stored this way.
The ipsumdump program uses the Click modular router, an extensible system for processing packets. Click routers consist of C++ components called elements. While some elements run only in a Linux kernel, most can run either in the kernel or in user space, and there are user-level elements for reading packets from libpcap or from tcpdump files.
Ipsumdump creates and runs a simple user-level Click configuration. However, you don't need to install Click to run ipsumdump; the libclick directory contains all the relevant parts of Click, bundled into a library.
If you're curious, try running 'ipsumdump --config' with some other options to see the Click configuration ipsumdump would run.
This is, I think, a pleasant way to write a packet processor!
Version 1.0 of the IPSummaryDump ASCII file format expressed 'ip_fragoff' fields in units of 8 bytes. In version 1.1 and later, these fields are expressed in bytes.
Version 1.1 used W
for CWR in tcp_flags fields. Early releases in
Version 1.0 versions printed a number between 0 and 255 for tcp_flags, or
used X
and Y
for ECE and CWR. Version 1.2 and later uses C
for
CWR.
The names of '!data
' fields were formerly printed in quotes, and could
contain spaces, like the following:
!data 'timestamp' 'ip src' 'sport' 'ip dst' 'dport' 'ip proto'
ipsumdump still understands files with the old format.
Ipsumdump can use the mmap(2)
system call to access files, which often
has better performance. Unfortunately, if ipsumdump memory-maps a
corrupt file, it may crash with a segmentation violation.
tcpdump(1), tcpdpriv(1), click(1)
See http://www.pdos.lcs.mit.edu/click/ for more on Click.
Eddie Kohler <kohler@icir.org>, based on the Click modular router.
Extensive feedback and suggestions from Vern Paxson <vern@icir.org>.
Anonymization algorithm from tcpdpriv(1)
by Greg Minshall.