Discussion:
[lwip-users] broken DHCP/ARP interaction
Radouch, Zdenek
2014-05-13 19:19:50 UTC
Permalink
I am having an LwIP problem where the local ARP erroneously answers its own who-has request
and by doing that prevents DHCP from working correctly. I would appreciate if anyone
familiar with the ARP code could comment on how this could possibly happen (and of
course help me fix it :)).

The setup is an embedded WiFi device running FreeRTOS with LwIP, attempting to obtain
an IP address from the wireless network it just joined. I have a wireshark log I could provide
(it's big and hard to read) capturing the following:


1. The device associates with the network and requests IP address from the DHCP server

2. The DHCP server gives IP address to the device

3. The device runs an ARP IP check on the address

4. The device erroneously answers its own ARP request

5. The device consequently (ARP answered) rejects the IP address offered by the DHCP server

Obviously, the #4 is wrong, the device should have never answered its own ARP since the IP address
in question is still being validated (by that very ARP transaction).
I am not familiar with the code base so I have no idea how this could happen (appears obviously wrong).
I turned on some debugging in the stack and what I see simply confirms what I had decoded with wireshark.
This is easily reproducible (as in it happens every single time I run it) so I could instrument more of the stack,
if that would help with debugging.

Here is the log:

Aspen firmware: May 13 2014 13:58:05
[2.806917] : etharp_send_ip: sending packet 0x20004b34
[2.811492] : etharp_send_ip: sending packet 0x20004b34
[2.905013] : etharp_send_ip: sending packet 0x20004b34
[3.109013] : etharp_send_ip: sending packet 0x20004b34
[appln] listen: OK
[wlcm] got wifi message: 8 0 0x00000000
[wlcm] got event: scan result
[wlcm] Found better AP iRobot-Guest on channel 11
[wlcm] starting association to "default"
[wlcm] got wifi message: 10 0 0x00000000
[wlcm] got event: association result: success
[wlcm] got wifi message: 12 0 0x00000000
[wlcm] got event: authentication result: success
[net] configuring interface mlan (with DHCP client)
[4.824439] : dhcp_start(netif=0x20012024) ml1
[4.828588] : dhcp_start(): starting new DHCP client
[4.836495] : dhcp_start(): starting DHCP configuration
[4.841152] : dhcp_discover()
[4.843650] : transaction id xid(5542A27)
[4.847158] : dhcp_discover: making request
[4.850809] : dhcp_discover: realloc()ing
[4.854298] : dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, DHCP_SERVER_PORT)
[4.861214] : etharp_send_ip: sending packet 0x20004b7c
[4.865908] : dhcp_discover: deleting()ing
[4.869444] : dhcp_discover: SELECTING
[4.872686] : dhcp_discover(): set request timeout 2000 msecs
[4.877847] : ethernet_input: dest:ff:ff:ff:ff:ff:ff, src:44:ad:d9:02:a8:cd, type:800
[4.885014] : dhcp_recv(pbuf = 0x2000f7f8) from DHCP server 172.16.16.7 port 67
[4.891654] : pbuf->len = 310
[4.894150] : pbuf->tot_len = 310
[4.896998] : searching DHCP_OPTION_MESSAGE_TYPE
[4.901056] : DHCP_OFFER received in DHCP_SELECTING state
[4.905881] : dhcp_handle_offer(netif=0x20012024) ml1
[4.910375] : dhcp_handle_offer(): server 0x0B02460A
[4.914785] : dhcp_handle_offer(): offer for 0x641110AC
[4.919443] : dhcp_select(netif=0x20012024) ml1
[4.923440] : transaction id xid(169C39E2)
[4.927046] : etharp_send_ip: sending packet 0x20004b7c
[4.931723] : dhcp_select: REQUESTING
[4.934837] : dhcp_select(): set request timeout 2000 msecs
[4.939831] : ethernet_input: dest:ff:ff:ff:ff:ff:ff, src:44:ad:d9:02:a8:cd, type:800
[4.946993] : dhcp_recv(pbuf = 0x2000f1bc) from DHCP server 172.16.16.7 port 67
[4.953638] : pbuf->len = 310
[4.956135] : pbuf->tot_len = 310
[4.958980] : searching DHCP_OPTION_MESSAGE_TYPE
[4.963038] : DHCP_ACK received
[4.965703] : dhcp_check(netif=0x20012024) ml
[4.969530] : dhcp_check(): netif IP: 0x00000000
[4.973606] : etharp_find_entry: found empty entry 0
[4.978017] : etharp_find_entry: selecting empty entry 0
[4.982759] : etharp_request: sending ARP request.
[4.987007] : etharp_raw: sending raw ARP packet.
[4.991203] : dhcp_check(): set request timeout 500 msecs
[4.995991] : ethernet_input: dest:6c:ad:f8:e5:27:fe, src:6c:ad:f8:e5:27:fe, type:806
[5.003142] : etharp_update_arp_entry: 172.16.17.100 - 6C:AD:F8:E5:27:FE
[5.009217] : etharp_find_entry: found matching entry 0
[5.013879] : etharp_update_arp_entry: updating stable entry 0
[5.019116] : etharp_arp_input: incoming ARP reply
[5.023360] : dhcp_arp_reply()
[5.025938] : dhcp_arp_reply(): CHECKING, arp reply for 0x641110AC
[5.031513] : dhcp_arp_reply(): arp reply matched with offered address, declining
[5.038336] : dhcp_decline()
[5.040751] : transaction id xid(2EAB4956)
[5.044358] : etharp_send_ip: sending packet 0x20004b7c
[5.049038] : dhcp_decline: BACKING OFF
[5.052316] : dhcp_decline(): set request timeout 10000 msecs


Thanks for any pointers or help.
-Z
Zach Smith
2014-05-14 16:17:29 UTC
Permalink
I also have had a problem similar to this. I have lwip running in a wifi device and I was having problems because my router was echoing packets back to me. I can't remember if this interfered with ARP working correctly or if it was causing some other problem. But what I did is put a function in the low level ethernet_input() that filters out any Ethernet packets that have a src MAC address equal to my MAC address. I would also be curious to hear if other folks have had to do this?

I also had a problem similar to this with IPv6 not working when I was plugged into an old hub rather than a switch. IPv6 auto-configuration and duplicate address detection wasn't working right because I was getting my own transmitted packets echoed back to me. There is a thread about this on the lwip mailing list archives:
http://lists.nongnu.org/archive/html/lwip-users/2012-12/msg00020.html

It seems like you could be experiencing something similar maybe.

From: lwip-users-bounces+zsmith=***@nongnu.org [mailto:lwip-users-bounces+zsmith=***@nongnu.org] On Behalf Of Radouch, Zdenek
Sent: Tuesday, May 13, 2014 1:20 PM
To: lwip-***@nongnu.org
Subject: [lwip-users] broken DHCP/ARP interaction

I am having an LwIP problem where the local ARP erroneously answers its own who-has request
and by doing that prevents DHCP from working correctly. I would appreciate if anyone
familiar with the ARP code could comment on how this could possibly happen (and of
course help me fix it :)).

The setup is an embedded WiFi device running FreeRTOS with LwIP, attempting to obtain
an IP address from the wireless network it just joined. I have a wireshark log I could provide
(it's big and hard to read) capturing the following:


1. The device associates with the network and requests IP address from the DHCP server

2. The DHCP server gives IP address to the device

3. The device runs an ARP IP check on the address

4. The device erroneously answers its own ARP request

5. The device consequently (ARP answered) rejects the IP address offered by the DHCP server

Obviously, the #4 is wrong, the device should have never answered its own ARP since the IP address
in question is still being validated (by that very ARP transaction).
I am not familiar with the code base so I have no idea how this could happen (appears obviously wrong).
I turned on some debugging in the stack and what I see simply confirms what I had decoded with wireshark.
This is easily reproducible (as in it happens every single time I run it) so I could instrument more of the stack,
if that would help with debugging.

Here is the log:

Aspen firmware: May 13 2014 13:58:05
[2.806917] : etharp_send_ip: sending packet 0x20004b34
[2.811492] : etharp_send_ip: sending packet 0x20004b34
[2.905013] : etharp_send_ip: sending packet 0x20004b34
[3.109013] : etharp_send_ip: sending packet 0x20004b34
[appln] listen: OK
[wlcm] got wifi message: 8 0 0x00000000
[wlcm] got event: scan result
[wlcm] Found better AP iRobot-Guest on channel 11
[wlcm] starting association to "default"
[wlcm] got wifi message: 10 0 0x00000000
[wlcm] got event: association result: success
[wlcm] got wifi message: 12 0 0x00000000
[wlcm] got event: authentication result: success
[net] configuring interface mlan (with DHCP client)
[4.824439] : dhcp_start(netif=0x20012024) ml1
[4.828588] : dhcp_start(): starting new DHCP client
[4.836495] : dhcp_start(): starting DHCP configuration
[4.841152] : dhcp_discover()
[4.843650] : transaction id xid(5542A27)
[4.847158] : dhcp_discover: making request
[4.850809] : dhcp_discover: realloc()ing
[4.854298] : dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, DHCP_SERVER_PORT)
[4.861214] : etharp_send_ip: sending packet 0x20004b7c
[4.865908] : dhcp_discover: deleting()ing
[4.869444] : dhcp_discover: SELECTING
[4.872686] : dhcp_discover(): set request timeout 2000 msecs
[4.877847] : ethernet_input: dest:ff:ff:ff:ff:ff:ff, src:44:ad:d9:02:a8:cd, type:800
[4.885014] : dhcp_recv(pbuf = 0x2000f7f8) from DHCP server 172.16.16.7 port 67
[4.891654] : pbuf->len = 310
[4.894150] : pbuf->tot_len = 310
[4.896998] : searching DHCP_OPTION_MESSAGE_TYPE
[4.901056] : DHCP_OFFER received in DHCP_SELECTING state
[4.905881] : dhcp_handle_offer(netif=0x20012024) ml1
[4.910375] : dhcp_handle_offer(): server 0x0B02460A
[4.914785] : dhcp_handle_offer(): offer for 0x641110AC
[4.919443] : dhcp_select(netif=0x20012024) ml1
[4.923440] : transaction id xid(169C39E2)
[4.927046] : etharp_send_ip: sending packet 0x20004b7c
[4.931723] : dhcp_select: REQUESTING
[4.934837] : dhcp_select(): set request timeout 2000 msecs
[4.939831] : ethernet_input: dest:ff:ff:ff:ff:ff:ff, src:44:ad:d9:02:a8:cd, type:800
[4.946993] : dhcp_recv(pbuf = 0x2000f1bc) from DHCP server 172.16.16.7 port 67
[4.953638] : pbuf->len = 310
[4.956135] : pbuf->tot_len = 310
[4.958980] : searching DHCP_OPTION_MESSAGE_TYPE
[4.963038] : DHCP_ACK received
[4.965703] : dhcp_check(netif=0x20012024) ml
[4.969530] : dhcp_check(): netif IP: 0x00000000
[4.973606] : etharp_find_entry: found empty entry 0
[4.978017] : etharp_find_entry: selecting empty entry 0
[4.982759] : etharp_request: sending ARP request.
[4.987007] : etharp_raw: sending raw ARP packet.
[4.991203] : dhcp_check(): set request timeout 500 msecs
[4.995991] : ethernet_input: dest:6c:ad:f8:e5:27:fe, src:6c:ad:f8:e5:27:fe, type:806
[5.003142] : etharp_update_arp_entry: 172.16.17.100 - 6C:AD:F8:E5:27:FE
[5.009217] : etharp_find_entry: found matching entry 0
[5.013879] : etharp_update_arp_entry: updating stable entry 0
[5.019116] : etharp_arp_input: incoming ARP reply
[5.023360] : dhcp_arp_reply()
[5.025938] : dhcp_arp_reply(): CHECKING, arp reply for 0x641110AC
[5.031513] : dhcp_arp_reply(): arp reply matched with offered address, declining
[5.038336] : dhcp_decline()
[5.040751] : transaction id xid(2EAB4956)
[5.044358] : etharp_send_ip: sending packet 0x20004b7c
[5.049038] : dhcp_decline: BACKING OFF
[5.052316] : dhcp_decline(): set request timeout 10000 msecs


Thanks for any pointers or help.
-Z
________________________________
Confidentiality Notice: This e-mail may contain confidential and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive from the recipient), please contact the sender by reply e-mail and delete all copies of the message.
Radouch, Zdenek
2014-05-14 18:08:21 UTC
Permalink
Thanks for the pointer; I'll check it out.

However, in this case the ARP does something that is outright wrong -- it answers an ARP query
for an IP address that has not been assigned yet -- the DHCP client is still trying to
ensure that it is OK to use the address (hence the ARP query). So the behavior is completely broken
with respect to temporal dependencies.

-Z


________________________________________
From: lwip-users-bounces+zradouch=***@nongnu.org [lwip-users-bounces+zradouch=***@nongnu.org] on behalf of Zach Smith [***@campbellsci.com]
Sent: Wednesday, May 14, 2014 12:17 PM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] broken DHCP/ARP interaction

I also have had a problem similar to this. I have lwip running in a wifi device and I was having problems because my router was echoing packets back to me. I can’t remember if this interfered with ARP working correctly or if it was causing some other problem. But what I did is put a function in the low level ethernet_input() that filters out any Ethernet packets that have a src MAC address equal to my MAC address. I would also be curious to hear if other folks have had to do this?

I also had a problem similar to this with IPv6 not working when I was plugged into an old hub rather than a switch. IPv6 auto-configuration and duplicate address detection wasn’t working right because I was getting my own transmitted packets echoed back to me. There is a thread about this on the lwip mailing list archives:
http://lists.nongnu.org/archive/html/lwip-users/2012-12/msg00020.html

It seems like you could be experiencing something similar maybe.

From: lwip-users-bounces+zsmith=***@nongnu.org [mailto:lwip-users-bounces+zsmith=***@nongnu.org] On Behalf Of Radouch, Zdenek
Sent: Tuesday, May 13, 2014 1:20 PM
To: lwip-***@nongnu.org
Subject: [lwip-users] broken DHCP/ARP interaction

I am having an LwIP problem where the local ARP erroneously answers its own who-has request
and by doing that prevents DHCP from working correctly. I would appreciate if anyone
familiar with the ARP code could comment on how this could possibly happen (and of
course help me fix it ☺).

The setup is an embedded WiFi device running FreeRTOS with LwIP, attempting to obtain
an IP address from the wireless network it just joined. I have a wireshark log I could provide
(it’s big and hard to read) capturing the following:


1. The device associates with the network and requests IP address from the DHCP server

2. The DHCP server gives IP address to the device

3. The device runs an ARP IP check on the address

4. The device erroneously answers its own ARP request

5. The device consequently (ARP answered) rejects the IP address offered by the DHCP server

Obviously, the #4 is wrong, the device should have never answered its own ARP since the IP address
in question is still being validated (by that very ARP transaction).
I am not familiar with the code base so I have no idea how this could happen (appears obviously wrong).
I turned on some debugging in the stack and what I see simply confirms what I had decoded with wireshark.
This is easily reproducible (as in it happens every single time I run it) so I could instrument more of the stack,
if that would help with debugging.

Here is the log:

Aspen firmware: May 13 2014 13:58:05
[2.806917] : etharp_send_ip: sending packet 0x20004b34
[2.811492] : etharp_send_ip: sending packet 0x20004b34
[2.905013] : etharp_send_ip: sending packet 0x20004b34
[3.109013] : etharp_send_ip: sending packet 0x20004b34
[appln] listen: OK
[wlcm] got wifi message: 8 0 0x00000000
[wlcm] got event: scan result
[wlcm] Found better AP iRobot-Guest on channel 11
[wlcm] starting association to "default"
[wlcm] got wifi message: 10 0 0x00000000
[wlcm] got event: association result: success
[wlcm] got wifi message: 12 0 0x00000000
[wlcm] got event: authentication result: success
[net] configuring interface mlan (with DHCP client)
[4.824439] : dhcp_start(netif=0x20012024) ml1
[4.828588] : dhcp_start(): starting new DHCP client
[4.836495] : dhcp_start(): starting DHCP configuration
[4.841152] : dhcp_discover()
[4.843650] : transaction id xid(5542A27)
[4.847158] : dhcp_discover: making request
[4.850809] : dhcp_discover: realloc()ing
[4.854298] : dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, DHCP_SERVER_PORT)
[4.861214] : etharp_send_ip: sending packet 0x20004b7c
[4.865908] : dhcp_discover: deleting()ing
[4.869444] : dhcp_discover: SELECTING
[4.872686] : dhcp_discover(): set request timeout 2000 msecs
[4.877847] : ethernet_input: dest:ff:ff:ff:ff:ff:ff, src:44:ad:d9:02:a8:cd, type:800
[4.885014] : dhcp_recv(pbuf = 0x2000f7f8) from DHCP server 172.16.16.7 port 67
[4.891654] : pbuf->len = 310
[4.894150] : pbuf->tot_len = 310
[4.896998] : searching DHCP_OPTION_MESSAGE_TYPE
[4.901056] : DHCP_OFFER received in DHCP_SELECTING state
[4.905881] : dhcp_handle_offer(netif=0x20012024) ml1
[4.910375] : dhcp_handle_offer(): server 0x0B02460A
[4.914785] : dhcp_handle_offer(): offer for 0x641110AC
[4.919443] : dhcp_select(netif=0x20012024) ml1
[4.923440] : transaction id xid(169C39E2)
[4.927046] : etharp_send_ip: sending packet 0x20004b7c
[4.931723] : dhcp_select: REQUESTING
[4.934837] : dhcp_select(): set request timeout 2000 msecs
[4.939831] : ethernet_input: dest:ff:ff:ff:ff:ff:ff, src:44:ad:d9:02:a8:cd, type:800
[4.946993] : dhcp_recv(pbuf = 0x2000f1bc) from DHCP server 172.16.16.7 port 67
[4.953638] : pbuf->len = 310
[4.956135] : pbuf->tot_len = 310
[4.958980] : searching DHCP_OPTION_MESSAGE_TYPE
[4.963038] : DHCP_ACK received
[4.965703] : dhcp_check(netif=0x20012024) ml
[4.969530] : dhcp_check(): netif IP: 0x00000000
[4.973606] : etharp_find_entry: found empty entry 0
[4.978017] : etharp_find_entry: selecting empty entry 0
[4.982759] : etharp_request: sending ARP request.
[4.987007] : etharp_raw: sending raw ARP packet.
[4.991203] : dhcp_check(): set request timeout 500 msecs
[4.995991] : ethernet_input: dest:6c:ad:f8:e5:27:fe, src:6c:ad:f8:e5:27:fe, type:806
[5.003142] : etharp_update_arp_entry: 172.16.17.100 - 6C:AD:F8:E5:27:FE
[5.009217] : etharp_find_entry: found matching entry 0
[5.013879] : etharp_update_arp_entry: updating stable entry 0
[5.019116] : etharp_arp_input: incoming ARP reply
[5.023360] : dhcp_arp_reply()
[5.025938] : dhcp_arp_reply(): CHECKING, arp reply for 0x641110AC
[5.031513] : dhcp_arp_reply(): arp reply matched with offered address, declining
[5.038336] : dhcp_decline()
[5.040751] : transaction id xid(2EAB4956)
[5.044358] : etharp_send_ip: sending packet 0x20004b7c
[5.049038] : dhcp_decline: BACKING OFF
[5.052316] : dhcp_decline(): set request timeout 10000 msecs


Thanks for any pointers or help.
-Z
________________________________
Confidentiality Notice: This e-mail may contain confidential and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive from the recipient), please contact the sender by reply e-mail and delete all copies of the message
Simon Goldschmidt
2014-05-15 06:03:56 UTC
Permalink
"Radouch, Zdenek" wrote:
> However, in this case the ARP does something that is outright wrong -- it answers an ARP query
> for an IP address that has not been assigned yet -- the DHCP client is still trying to
> ensure that it is OK to use the address (hence the ARP query). So the behavior is completely broken
> with respect to temporal dependencies.

I can't reproduce this, but I do think it would be a bug. However, Sergio is right in that you should take cautions to not receive (loop-back) self-sent packets. While it might not break things like you see (it does not, for me), it can lead to other anomalties like receiving your own broadcast on broadcast sockets or seeing a reply to a Gratuitizs ARP you send (so IPv4 AutoIP and IPv6 address selection won't work).

Simon
Radouch, Zdenek
2014-05-30 22:14:42 UTC
Permalink
At long last, I understand the problem. The lesson I learned is that you can't trust the packet ether addresses -- they were being faked by the Cisco hardware (sending a packet with someone else's source address). The problem itself is a case of a poorly designed ARP proxy/optimization within the Cisco enterprise AP software.

The problematic sequence is as follows:
[here the AP/DHCP is a Cisco enterprise AP network, the device is a WiFi module running lwIP]

0. A device associates with a WiFi network (SSID)
1. The device requests IP address from a DHCP server
2. The DHCP server sends an IP address to the device
3. The device attempts to validate the IP address by sending out "ARP who has" for the IP address offered by the DHCP server.
This is the standard gratuitous ARP request to ensure that there is no device with the same IP address; no reply is expected.
4. The Cisco AP to which the WiFi device is connected sends (wrongly) an ARP response (for the offered IP address) to the WiFi device!
Note that at this point in time, the IP address has not yet been assigned to any interface, so it is an error for the AP to proxy ARP. Furthermore, it is wrong to send a proxy reply to the very device on behalf of which the proxy is sent.
5. The device sees an [unexpected] ARP reply (with what should be its IP address), and consequently rejects the IP address.
Note that the lwIP DHCP code does not notice that the ARP response contains its own MAC address in all fields; it simply acts on it.

It is not clear to me exactly how the Cisco AP ARP proxy module learns about the IP address, but it is clear that it uses a wrong algorithm. As is, the Cisco algorithm produces an ARP [proxy] reply before and without knowing that the address has actually been assigned. That is, it is not caching responses, it is synthesizing responses before any real device did (or could) send them. Instead, it should simply cache ARP responses; this would have guaranteed that at least on the assignment the proxy replies were valid.

Regarding the lwIP code, it could be more defensive. The "bad" ARP packet that causes the DHCP rejection has all four MAC addresses (ether src, ether dest, sender hardware, target hardware) the same, equal to the MAC address of the device itself.
I would think such packet should be discarded.

To conclude, no, the DHCP/ARP lwIP code is not broken per se, but it will not work when deployed against the Cisco Aironet Enterprise AP solution.

-Z



> -----Original Message-----
> From: lwip-users-bounces+zradouch=***@nongnu.org [mailto:lwip-
> users-bounces+zradouch=***@nongnu.org] On Behalf Of Simon
> Goldschmidt
> Sent: Thursday, May 15, 2014 2:04 AM
> To: Mailing list for lwIP users
> Subject: Re: [lwip-users] broken DHCP/ARP interaction
>
> "Radouch, Zdenek" wrote:
> > However, in this case the ARP does something that is outright wrong -
> -
> > it answers an ARP query for an IP address that has not been assigned
> > yet -- the DHCP client is still trying to ensure that it is OK to use
> > the address (hence the ARP query). So the behavior is completely
> broken with respect to temporal dependencies.
>
> I can't reproduce this, but I do think it would be a bug. However,
> Sergio is right in that you should take cautions to not receive (loop-
> back) self-sent packets. While it might not break things like you see
> (it does not, for me), it can lead to other anomalties like receiving
> your own broadcast on broadcast sockets or seeing a reply to a
> Gratuitizs ARP you send (so IPv4 AutoIP and IPv6 address selection
> won't work).
>
> Simon
>
> _______________________________________________
> lwip-users mailing list
> lwip-***@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/lwip-users
Sergio R. Caprile
2014-05-14 18:10:36 UTC
Permalink
Hi,
Quick&dirty workaround: disable ARP checking by defining
DHCP_DOES_ARP_CHECK to 0

Analysis:
I'm no expert in DHCP nor ARP, but I don't see anything similar to what
you are experiencing.
I've setup three scenarios:
1- different IP (static) prior to DHCP
2- same IP (static) prior to DHCP
3- 0.0.0.0
In scenarios 1 and 3, everything works "as expected"
In scenario 2, I see a "gratuitous ARP", but the address is accepted
anyway. Here is my capture:

No. Time Source Destination
Protocol Length Info
1 0.000000000 192.168.1.42 255.255.255.255
DHCP 350 DHCP Discover - Transaction ID 0xabcd0001

2 0.001705000 192.168.1.1 192.168.1.42
DHCP 342 DHCP Offer - Transaction ID 0xabcd0001

3 0.002564000 192.168.1.42 255.255.255.255
DHCP 350 DHCP Request - Transaction ID 0xabcd0002

4 0.040420000 192.168.1.1 192.168.1.42
DHCP 342 DHCP ACK - Transaction ID 0xabcd0002

5 0.040841000 3com_03:04:05 Broadcast
ARP 60 Gratuitous ARP for 192.168.1.42 (Request)

Frame 5: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on
interface 0
Ethernet II, Src: 3com_03:04:05 (00:01:02:03:04:05), Dst: Broadcast
(ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request/gratuitous ARP)
Hardware type: Ethernet (1)
Protocol type: IP (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
[Is gratuitous: True]
Sender MAC address: 3com_03:04:05 (00:01:02:03:04:05)
Sender IP address: 192.168.1.42 (192.168.1.42)
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 192.168.1.42 (192.168.1.42)

6 0.499760000 3com_03:04:05 Broadcast
ARP 60 Gratuitous ARP for 192.168.1.42 (Request)


I suggest you check your options and step the code to see where this
rejection takes place

--
Radouch, Zdenek
2014-05-14 20:27:42 UTC
Permalink
Well, as far as seeing something else, I, too, can see something else when I configure my network
differently with respect to where the DHCP server is, and I, too, can see things working.
But in my current configuration (which happens to be our corporate network with hundreds of different
machines connecting correctly all the time) the lwIP stack does not work.
A workaround that ignores the failure, or one that avoids a failure by not checking is not exactly a solution
(I know, you said dirty), so I need to find and fix the problem -- we can't afford deploying an embedded
product with something so blatantly broken.

The sequence I see looks like this:

"No.","Time","Source","Destination","Protocol","Length","Info"
"390","0.814720","0.0.0.0","255.255.255.255","DHCP","395","DHCP Discover - Transaction ID 0x5542a27"
"409","0.854480","172.16.16.7","255.255.255.255","DHCP","392","DHCP Offer - Transaction ID 0x5542a27"
"424","0.896977","0.0.0.0","255.255.255.255","DHCP","395","DHCP Request - Transaction ID 0x169c39e2"
"433","0.907728","172.16.16.7","255.255.255.255","DHCP","392","DHCP ACK - Transaction ID 0x169c39e2"
"443","0.930603","6c:ad:f8:e5:27:fe","Broadcast","ARP","87","Who has 172.16.20.182? Tell 0.0.0.0"
"447","0.932726","6c:ad:f8:e5:27:fe","6c:ad:f8:e5:27:fe","ARP","102","172.16.20.182 is at 6c:ad:f8:e5:27:fe"
"459","0.956479","0.0.0.0","255.255.255.255","DHCP","395","DHCP Decline - Transaction ID 0x2eab4956"

All is well until packet #447 -- the address has not been validated yet, the DHCP machine is still in
the DHCP_CHECKING state, so there should not be an ARP entry for the still invalid IP address (172.16.20.182).
I have stepped through enough code to understand the rejection: it happens in dhcp.c(960)
in dhcp_arp_reply(). Unfortunately I don't understand the ARP code, and I don't understand how it
interacts with the DHCP client, hence my looking for someone who knows the lwIP ARP design.

That is, I am looking for someone who could tell me what the lwIP ARP is expected to do in this exact
situation (DHCP ack while IP=0) and how exactly is the new IP address handed from the DHCP client
to the rest of the stack (including the ARP module).

-Z

________________________________________
From: lwip-users-bounces+zradouch=***@nongnu.org [lwip-users-bounces+zradouch=***@nongnu.org] on behalf of Sergio R. Caprile [***@gmail.com]
Sent: Wednesday, May 14, 2014 2:10 PM
To: lwip-***@nongnu.org
Subject: Re: [lwip-users] broken DHCP/ARP interaction

Hi,
Quick&dirty workaround: disable ARP checking by defining
DHCP_DOES_ARP_CHECK to 0

Analysis:
I'm no expert in DHCP nor ARP, but I don't see anything similar to what
you are experiencing.
I've setup three scenarios:
1- different IP (static) prior to DHCP
2- same IP (static) prior to DHCP
3- 0.0.0.0
In scenarios 1 and 3, everything works "as expected"
In scenario 2, I see a "gratuitous ARP", but the address is accepted
anyway. Here is my capture:

No. Time Source Destination
Protocol Length Info
1 0.000000000 192.168.1.42 255.255.255.255
DHCP 350 DHCP Discover - Transaction ID 0xabcd0001

2 0.001705000 192.168.1.1 192.168.1.42
DHCP 342 DHCP Offer - Transaction ID 0xabcd0001

3 0.002564000 192.168.1.42 255.255.255.255
DHCP 350 DHCP Request - Transaction ID 0xabcd0002

4 0.040420000 192.168.1.1 192.168.1.42
DHCP 342 DHCP ACK - Transaction ID 0xabcd0002

5 0.040841000 3com_03:04:05 Broadcast
ARP 60 Gratuitous ARP for 192.168.1.42 (Request)

Frame 5: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on
interface 0
Ethernet II, Src: 3com_03:04:05 (00:01:02:03:04:05), Dst: Broadcast
(ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request/gratuitous ARP)
Hardware type: Ethernet (1)
Protocol type: IP (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
[Is gratuitous: True]
Sender MAC address: 3com_03:04:05 (00:01:02:03:04:05)
Sender IP address: 192.168.1.42 (192.168.1.42)
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 192.168.1.42 (192.168.1.42)

6 0.499760000 3com_03:04:05 Broadcast
ARP 60 Gratuitous ARP for 192.168.1.42 (Request)


I suggest you check your options and step the code to see where this
rejection takes place

--
Sergio R. Caprile
2014-05-14 21:36:03 UTC
Permalink
I suggested the Q&D workaround as a way to check if that was really the
problem.
I guess your problem is that you are seeing your own messages, something
that may not be the case in standard Ethernets (you mentioned Wi-Fi,
didn't you?)
I suggest to follow the code execution and help the developers, I'm just
trying to help so I won't have these problems myself in the future, and
I'm willing to peek in the sources to give my opinion or propose a
patch. Unfortunately I can't duplicate your scenario so I can't dive deeper.
As far as I can follow the analysis, you are right, and the DHCP code
seems to rush to assign an IP when it should wait after the ARP
validation. Try to find that with a breakpoint and tell us what you see.
Try also following the maling list thread and keep us posted, someone
else might have submitted a patch or a (more elegant) workaround

--
Loading...