Staying Anonymous on a strange WiFi network can be harder than it first appears. Of course, you have taken the usual precautions such as spoofing your MAC address with a tool like macchanger. However did you ever consider what your DHCP client may be leaking on the Application Layer? This post is about two ways that the dhcpcd client will betray your anonymity to the users of a Local Area Network if left in its default configuration.

Thread Model

When I take my laptop to McDonalds and use their free WiFi and then walk down the street to Costa (that’s our European Starbucks), then I don’t want those two network operators to uniquely identify my laptop in both their network logs and compare notes. Additionally if I return to the same network 1 month later I do not want the network operator to be capable of associating my activity last month with my activity this month. These two concerns are important to different people for different reasons and even if they are not concerns for you as a reader I hope they at least put this post into context.

DHCP Background

In this post we will be analysing DHCP messages from packet captures so it is important to have a basic overview of how DHCP works. DHCP is specified by RFC 2131 and uses a client/server architecture. On a wireless network, there is usually a single DHCP server which typically runs on the same hardware as the WiFi Access Point. The DHCP client (dhcpcd or dhclient) runs on the machine who wants to connect to the network.

DHCP is built on top of the Bootstrap Protocol (BOOTP) and uses UDP as its transport protocol. DHCP messages from a client to a server are sent to the ‘DHCP server’ port (67), and DHCP messages from a server to a client are sent to the ‘DHCP client’ port (68).

To start the DHCP process, the client will broadcast a DHCPDISCOVER message to UDP port 67. The DHCP server will respond with a DHCPOFFER message on UDP port 68, which will include an IP address. The client will respond with a DHCPREQUEST, using the IP address from the previous DHCPOFFER message and finally the server sends back a DHCPACK.

aa

To save time on future connections to the same network, the DHCP client will skip straight to the DHCPREQUEST step, asking for the same IP address it had last time.

Kernel Version Leak

First off let’s use a network packet capture tool to observe dhcpcd in action. If you use dhcpcd as your regular DHCP client then running it manually with the -k switch will relinquish the current IP address and stop the dhcpcd daemon. This will cause dhcpcd the next time it starts up to go through the whole DHCPDISCOVER –> DHCPOFFER –> DHCPREQUEST –> DHCPACK process from the beginning and let us capture a full DHCP session.

† dhcpcd -k
† tcpdump -i any -XX -s0 -n src portrange 67-68 and udp | tee dhcpcd.capture
† dhcpcd

Where:

  • -i any to capture on all network interfaces
  • -XX to show captured packet contents and Ethernet header in both hex and ASCII
  • -s0 to capture whole messages and not truncate after a certain size
  • -n to not resolve IP addresses into hostnames
  • src portrange 67-68 and udp to only capture traffic on UDP ports 67 and 68

This is what the DHCPDISCOVER message looks like:

12:58:58.946920 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:05:66:7a:79:4e, length 353
  0x0000:  0004 0001 0006 001e ef0b 36cc 0000 0800  ..........6.....
  0x0010:  4500 017d 931d 0000 4011 e653 0000 0000  E..}....@..S....
  0x0020:  ffff ffff 0044 0043 0169 c5f5 0101 0600  .....D.C.i......
  0x0030:  59cf 1ac2 0000 0000 0000 0000 0000 0000  Y...............
  0x0040:  0000 0000 0000 0000 001e ef0b 36cc 0000  ............6...
  0x0050:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x0070:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x0080:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x0090:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x00a0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x00b0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x00c0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x00d0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x00e0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x00f0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x0100:  0000 0000 0000 0000 0000 0000 0000 0000  ................
  0x0110:  0000 0000 0000 0000 6382 5363 3501 013d  ........c.Sc5..=
  0x0120:  13ff ef0b 36cc 0001 0001 1ff2 7cee 0005  ....6.......|...
  0x0130:  667a 794e 5000 7401 0139 0205 c03c 3964  ..6.P.t..9...<9d
  0x0140:  6863 7063 642d 362e 3131 2e33 3a4c 696e  hcpcd-6.11.3:Lin
  0x0150:  7578 2d34 2e34 2e38 2d68 6172 6465 6e65  ux-4.4.8-hardene
  0x0160:  642d 7231 3a78 3836 5f36 343a 4765 6e75  d-r1:x86_64:Genu
  0x0170:  696e 6549 6e74 656c 9101 0137 0f01 7921  ineIntel...7..y!
  0x0180:  0306 0c0f 1a1c 2a33 363a 3b77 ff         ......*36:;w.

You will probably notice an interesting string near the bottom of the message: dhcpcd-6.11.3:Linux-4.4.8-hardened-r1:x86_64:GenuineIntel. That is a lot of unnecessary and very specific information about our system that dhcpcd is sending without anyone telling it to. Even without the version numbers, there cannot be more than a handful of Hardened Gentoo users in any given geographic region. A network operator who looks through their logs will be able to single out all traffic belonging to a dhcpcd user across multiple browsing dates with reasonable certainty based on this alone.

This feature is called Vendor class identifier and is described in RFC 2132:

9.13. Vendor class identifier

   This option is used by DHCP clients to optionally identify the vendor
   type and configuration of a DHCP client.  The information is a string
   of n octets, interpreted by servers.  Vendors may choose to define
   specific vendor class identifiers to convey particular configuration
   or other identification information about a client.  For example, the
   identifier may encode the client's hardware configuration.

The dhcpcd man page advertises a -i switch to change the information sent in this field:

 -i, --vendorclassid vendorclassid
         Override the DHCPv4 vendorclassid field sent.  The default
         is dhcpcd-<version>:<os>:<machine>:<platform>.  For example
               dhcpcd-5.5.6:NetBSD-6.99.5:i386:i386
         If not set then none is sent.  Some badly configured DHCP
         servers reject unknown vendorclassids.  To work around it,
         try and impersonate Windows by using the MSFT vendorclassid.

Although it is tempting to impersonate a more commonly used DHCP client, (for example Windows using -i "MSFT 5.0"), be aware of other differences between the clients. For example Windows might send different Option fields to dhcpcd by default, or send the same ones in a different order. Things like that may still be used to uniquely identify a machine in a log file or packet capture. Additionally, differences in the Windows and Linux TCP/IP stack can be used by an active adversary to fingerprint a device link if one’s threat model extends that far.

Real MAC Address Leak

A more concerning feature of dhcpcd is that the first time it runs (ever), it will cache the MAC address of the wireless card and use it in all future connections. Lets use Wireshark to verify this:

† tshark -i wlp1s0 -V -j udp -f "portrange 67-68"

Where:

  • -i wlp1s0 is the network interface to capture on
  • -V to print a view of the packet details
  • -j udp to only capture UDP packets
  • -f "portrange 67-68" to only capture packets on ports 67 and 68

This is what the DHCPDISCOVER message looks like:

Frame 1: 395 bytes on wire (3160 bits), 395 bytes captured (3160 bits) on interface 0 Interface id: 0 (wlp1s0)
(...)
User Datagram Protocol, Src Port: 68, Dst Port: 67
    Source Port: 68
    Destination Port: 67
    Length: 361
(...)
Bootstrap Protocol (Discover)
    Message type: Boot Request (1)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
(...)
    Client MAC address: TalariNe_47:22:83 (44:d6:3d:47:22:83)
(...)
Option: (61) Client identifier
    Length: 19
    IAID: 3d472283
    DUID Type: link-layer address plus time (1)
    Hardware type: Ethernet (1)
    Time: 535985390
    Link layer address: 00:05:66:7a:79:4e
(...)

Notice how the Client MAC address field and Link layer address are completely different. Let’s try to disconnect from the network, run macchanger, reconnect and rerun dhcpcd; will it send the same Link layer address?

† ifconfig wlp1s0
wlp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 4e:19:93:0a:9b:38  txqueuelen 1000  (Ethernet)
(...)
† tshark -i wlp1s0 -V -j udp -f "portrange 67-68" > capture &
[1] 2272
Capturing on 'wlp1s0'
† dhcpcd -k
† dhcpcd
(...)
wlp1s0: leased 192.168.0.7 for 86400 seconds
wlp1s0: adding default route via 192.168.0.1
forked to background, child pid 2302
† kill 2272
[1]+  Done                    tshark -i wlp1s0 -V -j udp -f "portrange 67-68" > capture
† grep 'Link layer address' capture
        Link layer address: 4e:19:93:0a:9b:38
† killall wpa_supplicant && ifconfig wlp1s0 down
† macchanger -r wlp1s0
Current MAC:   4e:19:93:0a:9b:38 (unknown)
New MAC:       72:02:ed:12:7e:53 (unknown)
† wpa_supplicant -i wlp1s0 -c /etc/wpa_supplicant/wpa_supplicant.conf &> /dev/null
† tshark -i wlp1s0 -V -j udp -f "portrange 67-68"  > capture2 &
[1] 3044
Capturing on 'wlp1s0'
† dhcpcd
(...)
wlp1s0: leased 192.168.0.9 for 86400 seconds
wlp1s0: adding default route via 192.168.0.1
forked to background, child pid 3088
† kill 3044
[1]+  Done                    tshark -i wlp1s0 -V -j udp -f "portrange 67-68" > capture2
† grep 'Link layer address' capture2
        Link layer address: 4e:19:93:0a:9b:38

Yes, dhcpcd keeps sending the original MAC address even though we have used macchanger. This breaks both scenarios in our threat model. Someone with access to logs of network A and network B can trivially track our activity on both networks with a high level of certainty. Additionally, repeated visits to the same network can all by linked together in the same way. This is despite connecting to each network, on each occasion, with a different MAC address.

What is happening is that the first time it runs, dhdcpd will cache the MAC address in a file called /etc/dhcpcd.duid and resend it on every DHCP request, on every network one tries to connect to. Using a Live System will help, but only as long as the /etc/dhcpcd.duid file does not initially exist. If you roll your own live system image you may find yourself with exactly the same problem if the image is generated from a device which has used dhcpcd at any time in the past.

Funnily enough it is difficult to track down any mention of this DUID thing in the RFCs. It appears to have been introduced in RFC 3310 and is specifically for IPv6:

Each DHCP client and server has a DUID.  DHCP servers use DUIDs to
identify clients for the selection of configuration parameters and in
the association of IAs with clients.  DHCP clients use DUIDs to
identify a server in messages where a server needs to be identified.
See sections 22.2 and 22.3 for the representation of a DUID in a DHCP
message.

(...)

The DUID is carried in an option because it may be variable length
and because it is not required in all DHCP messages.  The DUID is
designed to be unique across all DHCP clients and servers, and stable
for any specific client or server - that is, the DUID used by a
client or server SHOULD NOT change over time if at all possible; for
example, a device's DUID should not change as a result of a change in
the device's network hardware.

So tracking a client between sessions is literally the point of this feature. It appears to be only relevant for IPv6 but dhcpcd will send it even in IPv4 only mode (the -4 switch). Additionally, the DHCP servers I am testing against do not seem to mention this DUID in their responses and may not even be using it.

Deleting /etc/dhcpcd.duid will force dhcpcd to regenerate it with the current MAC address.

But if either of these issues bother you I would recommend using a different DHCP client instead of relying on workarounds.