`libpcapnav` Manual
Prev		Next

Introduction

Table of Contents
What is libpcapnav?
How does it work?

Welcome! You're looking at the manual for libpcapnav. Thanks for reading this.

What is `libpcapnav`?

libpcapnav is a libpcap wrapper library that allows navigation to arbitrary locations in a tcpdump trace file between reads. The API is intentionally much like that of the pcap library. You can navigate in trace files both in time and space: you can jump to a packet which is at appr. 2/3 of the trace, or you can jump as closely as possible to a packet with a given timestamp, and then read packets from there. In addition, the API provides convenience functions for manipulating timeval structures.

Like libpcap, this library handles things through an opaque handle struct. For trace file navigation and reading packets, this handle is enough. If you need to apply BPF filters or write packets to disk, you can access the familiar pcap handle that is used internally.

How does it work?

At the core of libpcapnav is the ability to resynchronize to the sequence of packets contained in a tcpdump trace file at arbitrary location of the file position indicator. The algorithm is based on Vern Paxson's method from the the tcpslice tool, that basically works as follows: the point near which the file position indicator is to be synchronized with the packet sequence is undershot a little bit, as it is much easier to scan forwards to the desired location, once the packet sequence has been detected. The file is scanned from that initial offset in single-byte steps, at each step assuming a libpcap packet header is present and sanity-checking the values read. Several checks analyze this potential header for sane timestamps, capture lengths etc. If the header appears valid, the next packet header is examined in a similar function, based upon the offset that the checked header provides. If a sequence of three packets seems valid, the algorithm considers the file position pointer to be synchronized with the packet flow and scans as closely as possible to the desired location. If the synchronization point is supposed to be a packet with a given timestamp, some interpolation is done and the process repeated, until the packet closest to the desired timestamp has been found.

libpcapnav's algorithm contains a few modifications that are explained in gory detail in the Netdude Freenix paper, and briefly listed here:

libpcapnav doesn't use Vern's state-machine approach to determine definitive header matches. I've done a lot of my testing with a trace that was captured while NFS-copying another trace file, thus containing lots of "bogus" headers to make things fun, and I've seen a number of problems in this case. This data causes a number of nasty problems, such as large snaplens in the captured data (where a single packet may contain many smaller packets) or payload packets that have a caplen that causes the next packet to be read directly from the next valid header. Much of this should be handled through invalid timestamps, but this is not 100% reliable.
To rectify this, pcapnav uses a different approach: once a header is found that does not instantly appear to be invalid, the chain of packets that it starts is followed, up to a maximum number of packets or until we're out of buffer space.
For this, buffers already containing data loaded from disk are used as much as possible, but when this buffer doesn't suffice, more data is loaded from disk. The hope is that most attempts will point to invalid headers anyway so that this additional load never happens unless we have good reason to believe we've actually found a good header. The difference between PCAPNAV_PERHAPS and PCAPNAV_DEFINITELY (explained in detail later in this document) is then based on the length of the chain found.
While checking headers, the best valid header (ie the one with the longest chain) is remembered, as well as the offset in the trace that'll be the successor of this packet, so that it isn't confused with a "new" good header.
The fun part without doubt are header clashes. A clash in this new system occurs when two headers have the same, maximum, chain length and the same level of reliability of the chain lengths (eg, the chain search could have been stopped because we were out of buffer space or because we have hit the limit of packets we check — the latter is considered more reliable).
If we hit a clash, we simply forget the old best match and keep looking after the clash packet. If we cannot find any better headers afterwards, we return a clash, otherwise the best match found afterwards.
I've seen traces with rather strange final packet headers, containing invalid caplen/len field values and packet data. To make sure we don't miss the last few correct packet headers, I've added some padding space and thus start looking for the last packet in the trace a bit earlier in the file. As the last-packet timestamp and offset is buffered in the pcapnav_t handle anyway, this performance hit is probably negligible.
To find the last packet in a trace, we now go back a lot more from the end of a trace, then find a packet more reliably by using the chain approach described above, and then use pcap to iterate to the last valid packet. Slower, but safer.
A buffer abstraction was introduced to help reduce the number of local variables and parameters to functions. See pcapnav_buf.h.
The original tcpslice version used the PACKET_HDR_LEN macro, yielding the size of a struct pcap_pkthdr, even when the trace file at hand actually uses the extended, larger patched headers.

Introduction

What is libpcapnav?

How does it work?

What is `libpcapnav`?