Discussion:
[lwip-users] recv ok.. send is super slow
Rishi Khan
2008-05-10 19:10:08 UTC
Permalink
I am trying to test the performance of the lwip stack between two
linux machines. If the server is lwIP, the performance is on par with
the linux IP stack. If the client is lwIP, it is super slow. It seems
that lwIP only sends data after a long timeout (it does not seem tied
to tcp_slow/fast_tmr). Also, in the linux/linux case, the CPU
utilization goes way up, but in the lwip-to-linux, the CPU says low.
Is there any easy way to change this behavior? Attached is a simple
example code.
Kieran Mansley
2008-05-12 13:24:08 UTC
Permalink
Post by Rishi Khan
I am trying to test the performance of the lwip stack between two
linux machines. If the server is lwIP, the performance is on par with
the linux IP stack. If the client is lwIP, it is super slow. It seems
that lwIP only sends data after a long timeout (it does not seem tied
to tcp_slow/fast_tmr). Also, in the linux/linux case, the CPU
utilization goes way up, but in the lwip-to-linux, the CPU says low.
Is there any easy way to change this behavior? Attached is a simple
example code.
A short packet capture (e.g. using wireshark) in the slow case would be
helpful.

Kieran
Rishi Khan
2008-05-13 13:07:29 UTC
Permalink
I have actually fixed the send problem and now the recv is the slower
part. To fix this, I changed the following options:
// this is to remove memory constraints for now
#define MEM_LIBC_MALLOC 1
// ran out a few times, so upped
#define MEMP_NUM_TCPIP_MSG_INPKT 16*10
// same here
#define PBUF_POOL_SIZE 100* 10
//why is this orginally 128? seems inefficient
#define PBUF_POOL_BUFSIZE 1500
// why was this 1024?
#define TCP_MSS 1500
//this is what KILLED the send -- probably could be just 65535
#define TCP_SND_BUF 2048*60
//this is just to adjust for the bigger TCP_SND_BUF
#define MEMP_NUM_TCP_SEG 2*TCP_SND_QUEUELEN
// increase recv window
#define TCP_WND 8096*8

The original send problem turned out that it would send one packet,
wait for the ack, then send the next. This was because the
TCP_SND_BUF was only 2048 in the unixsim contrib folder, which I used
as a basis.

My test program linux-to-linux gets 117.8MB/s (940Gbps). It gets
113.2MB/s with TCP window scaling off, but this is off topic.
Linux_send-lwip_recv gets 923.3MB/s (738Gbps). Linux_send-lwip_recv
gets 92.3MB/s (738Gbps). lwip_both gets 92.3MB/s (738Gbps).

Linux is at 192.168.0.1. lwIP is 192.168.0.9. (when lwip is the
sender and receiver, lwip is at 192.168.0.2.

It seems that on the recv side, LWIP sends multiple requests for TCP
window_size updates. The window size drops significantly and
fluctuates between 30K and 60K. Also, there seems to be a lot of PSH
flags when lwip sends. The packets are truncated, but the data
payload is just '0x7a7a7a7 ....'

Finally, somehow linux can send packets bigger than the 1500 MTU. How
is this possible? Is it fragmenting the packets? The IP flags say
'don't fragment'.

Attached are the wireshark files of the first 500 packets of each
method. The "recv_window_update_500" file is the lwip_recv packet dump
filtered for tcp window updates only (first 500 times). lwip_both is
when both client and server are lwip.

Any ideas?
Kieran Mansley
2008-05-13 13:41:25 UTC
Permalink
Post by Rishi Khan
The original send problem turned out that it would send one packet,
wait for the ack, then send the next. This was because the
TCP_SND_BUF was only 2048 in the unixsim contrib folder, which I used
as a basis.
The values used in the examples are often very conservative defaults,
and not tuned for performance. lwIP is in general targeted at low
memory systems rather than high performance ones, which is why the
defaults lean in that direction. Some time ago someone posted a list of
sensible values to use for performance. These may have made it onto the
lwIP wiki, but I'm not sure. Searching the mailing list archives might
help if not.
Post by Rishi Khan
My test program linux-to-linux gets 117.8MB/s (940Gbps). It gets
113.2MB/s with TCP window scaling off, but this is off topic.
Linux_send-lwip_recv gets 923.3MB/s (738Gbps). Linux_send-lwip_recv
gets 92.3MB/s (738Gbps). lwip_both gets 92.3MB/s (738Gbps).
I think you've confused Gbps and Mbps, and got a decimal place or two in
the wrong place in those figures, but I think I can make sense of them.
It looks like you're getting pretty good performance from lwIP. I
wouldn't expect it to beat the native Linux stack.
Post by Rishi Khan
It seems that on the recv side, LWIP sends multiple requests for TCP
window_size updates.
I think we have an outstanding bug about window updates. Might be worth
checking to see if that matches your observed behaviour.
Post by Rishi Khan
The window size drops significantly and
fluctuates between 30K and 60K. Also, there seems to be a lot of PSH
flags when lwip sends. The packets are truncated, but the data
payload is just '0x7a7a7a7 ....'
I'll try and find some time to take a look at your packet captures.
Post by Rishi Khan
Finally, somehow linux can send packets bigger than the 1500 MTU. How
is this possible? Is it fragmenting the packets? The IP flags say
'don't fragment'.
Most likely Linux is doing segmentation offload to your NIC. i.e.
passing a larger than MTU socket buffer to the NIC, which then fragments
it into MTU sized chunks. This is a pretty standard feature in modern
NICs to reduce the overheads on the network stack.

Thanks

Kieran
Rishi Khan
2008-05-13 14:24:01 UTC
Permalink
Post by Kieran Mansley
Post by Rishi Khan
The original send problem turned out that it would send one packet,
wait for the ack, then send the next. This was because the
TCP_SND_BUF was only 2048 in the unixsim contrib folder, which I used
as a basis.
The values used in the examples are often very conservative defaults,
and not tuned for performance. lwIP is in general targeted at low
memory systems rather than high performance ones, which is why the
defaults lean in that direction. Some time ago someone posted a list of
sensible values to use for performance. These may have made it onto the
lwIP wiki, but I'm not sure. Searching the mailing list archives might
help if not.
I'll check. Thanks.
Post by Kieran Mansley
Post by Rishi Khan
My test program linux-to-linux gets 117.8MB/s (940Gbps). It gets
113.2MB/s with TCP window scaling off, but this is off topic.
Linux_send-lwip_recv gets 923.3MB/s (738Gbps). Linux_send-lwip_recv
gets 92.3MB/s (738Gbps). lwip_both gets 92.3MB/s (738Gbps).
I think you've confused Gbps and Mbps, and got a decimal place or two in
the wrong place in those figures, but I think I can make sense of them.
It looks like you're getting pretty good performance from lwIP. I
wouldn't expect it to beat the native Linux stack.
Sorry if there is any confusion here. For clarification, I mean MB is
megabytes or 1 million bytes. Gb is Gigabits or 1 billion bits or 125MB.
Post by Kieran Mansley
Post by Rishi Khan
It seems that on the recv side, LWIP sends multiple requests for TCP
window_size updates.
I think we have an outstanding bug about window updates. Might be worth
checking to see if that matches your observed behaviour.
Post by Rishi Khan
The window size drops significantly and
fluctuates between 30K and 60K. Also, there seems to be a lot of PSH
flags when lwip sends. The packets are truncated, but the data
payload is just '0x7a7a7a7 ....'
I'll try and find some time to take a look at your packet captures.
Post by Rishi Khan
Finally, somehow linux can send packets bigger than the 1500 MTU. How
is this possible? Is it fragmenting the packets? The IP flags say
'don't fragment'.
Most likely Linux is doing segmentation offload to your NIC. i.e.
passing a larger than MTU socket buffer to the NIC, which then
fragments
it into MTU sized chunks. This is a pretty standard feature in modern
NICs to reduce the overheads on the network stack.
I guess I could do the same?

Thanks.

Rishi
Kieran Mansley
2008-05-13 14:49:04 UTC
Permalink
Post by Rishi Khan
Post by Kieran Mansley
Most likely Linux is doing segmentation offload to your NIC. i.e.
passing a larger than MTU socket buffer to the NIC, which then fragments
it into MTU sized chunks. This is a pretty standard feature in modern
NICs to reduce the overheads on the network stack.
I guess I could do the same?
If you're writing a driver to work with the Linux TCP/IP stack, yes. If
you're writing a driver to work with lwIP, not without some changes.
lwIP doesn't currently support segmentation offload.

Kieran

Loading...