1 Introduction This document describes the traces recorded during the research activities described in the paper "Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems", which has been published at the ACM Internet Measurement Conference 2004 [1]. The traces have been recorded by a modified version of the Mutella implementation [2] of the Gnutella protocol [3] in the period 03/17/2004 to 04/23/2004. Events recorded in the traces include connection establishment and shut-down as well as all types of Gnutella messages (ping, pong, query, query hit) received at the measurement peer. To get a more complete view of the files shared by other peers, the measurement peer repeats every query it receives from remote peers and records the corresponding query hit messages. For more details on the traces see [1]. 2 Privacy Protection To protect the privacy of Gnutella users, all IP addresses that appear in the traces are mapped to random IP addresses. The mapping procedure described below was designed to preserve a maximum of information, e.g., on geographical locations and network structure. Some private or reserved IP addresses are not mapped. These addresses are summarized below, too. 2.1 Mapping Methodology In order to preserve information about network structure, address mapping is done only on the network id part of the IPv4 address. For example, for an IP address located in a class A network, the first 8 bit of the IP address (the network id) is replaced by a random network ID of a class A network (i.e., a random number from [0,127[), while the remainder of the address (Bit 9-32) is preserved. The mapping is unique and consistent, that is, all original addresses locate in the same network are mapped to the same unique network in the trace files. We make sure that no IP address is mapped to one of the addresses described in the next section. Note that due to the mapping procedure it is not possible to determine the geographical location of a peer in the trace files. To preserve this information, a mapping between the random IP addresses and the country code of the geographical location (as determined by [4]) is provided in the file country.csv.gz that comes with the trace files. The file has the format IP; COUNTRY_CODE. If COUNTRY_CODE is empty, it was not possible to determine the geographical location of the IP address using [4]. 2.2 Unmapped IP Addresses The measurement peer was running with IP address 129.217.16.87, which is unmapped in the trace files. Further addresses that have not been mapped are 0.0.0.0/8, 10.0.0.0/8, 14.0.0.0/8, 39.0.0.0/8, 127.0.0.0/8, 128.0.0.0/16, 169.254.0.0/16, 172.16.0.0/12, 191.255.0.0/16, 192.0.0.0/24, 192.0.2.0/24, 192.88.99.0/24, 192.168.0.0/16, 198.18.0.0/15, and 223.255.255.0/24. 3 Trace Format 3.1 Basic Trace Format A line in the trace file has the following format: TIMESTAMP EVENT_TYPE EVENT_DESCR ... TIMESTAMP is the POSIX compliant timestamp (i.e., the number of seconds elapsed since 1970-01-01 00:00:00 UTC) of the event. EVENT_TYPE is on of the following: CONNECT: A peer has successfully connected to the measurement peer. CLOSE: A connection to a peer has been closed. PING: A ping message has been received. PONG: A pong message has been received. QUERY: A query message has been received. Note: for convenience, the traces also contain QUERY entries for query messages generated by the measurement peer. HIT: A query hit message has been received. EVENT_DESCR provides a more detailed description of the event. See next section for details. 3.2 Event Descriptors The following descriptors are used: BYTES: The number of bytes stored by a peer. CLIENT: The software version of a client as specified during the handshake procedure. CON: The IP address of the (directly connected) peer from which a message was received. EXTENTION:The extention fields found in a message. FILES: The number of files shared by a peer. GUID: The global unique identifier of a message. HOPS: The number of overlay hops a message has traveled. IP: The IP address of the peer that generated a message. Since this address is not known for ping and query messages, the IP: descriptor is always equal to the CON: descriptor for PING and QUERY events. MODE: The mode of a peer as specified during the handshake procedure. Known modes are N (normal peer), U (ultra peer), and L (leaf node). NAME: The name of a file reported in a response message. PAYLOAD: The payload size of a message as specified in the message header. PORT: The TCP-port for connecting or downloading files. RESULTS: The number of results found in a response message. SHA1: The SHA1 hash sum of a file reported in a response message. SIZE: The size of a file reported in a response message. SPEED: The connection speed of a peer. STRING: The search string contained in a query message. TTL: The maximum number of overlay hops a message is allowed to travel. UP: The POSIX time stamp of the time a peer was started as specified during the handshake procedure. References [1] A. Klemm, C. Lindemann, M. Vernon, and O. Waldhorst, Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems, Proc. ACM Internet Measurement Conference (IMC 2004), Taormina, Italy, 55-67, 2004. [2] Muttella, http://mutella.sourceforge.net/. [3] Gnutella Protocol Development, http://rfc-gnutella.sourceforge.net/. [4] MaxMind GeoLite Country, rel. 08/2006, http://www.maxmind.com/app/geoip_country.