Cross-Platform Testing of SO_LINGER

Posted by Nybek on Mar 05, 2015

Introduction

In this post we look at the effects of setting SO_LINGER on various different platforms. We assume you know how to set SO_LINGER and, in general, what it's supposed to do. The authoritative text on the subject is UNIX Network Programming, Vol 1 (3rd edition), and you will find a couple of extracts from it here. All the tests detailed here were carried out on blocking sockets. If you are interested in test results on non-blocking sockets, please see our follow up post.

As a quick summary of our findings, the test results suggest the following advice if you want to write portable code using SO_LINGER:

  • To abort a connection, set SO_LINGER to {on, 0} on a Connected Socket, not on a Listening Socket. If you're developing for Windows, don't call shutdown() before close() as this causes Windows (both native and under Cygwin) to completely ignore the SO_LINGER setting.

  • Don't set SO_LINGER to {on, N} where N > 0.

A brief note on terminology before we proceed. Throughout the text we use the term Listening Socket to refer to the socket (file descriptor) that is returned by the call to socket(). This is in contrast to the term Connected Socket, used to refer to the socket (file descriptor) returned from accept().

Table of Contents

Overview of Tested Platforms

The tests were carried out on 11 different platforms, but since operating systems from the same families produced the same results, we have grouped these results into just 6 groups. These are shown in the left-hand column of the following table:

Test Platforms
OS Family OS Output of 'uname -a' (or similar)
BSD FreeBSD FreeBSD freebsd 10.1-RELEASE-p5 FreeBSD 10.1-RELEASE-p5 #0: Tue Jan 27 08:55:07 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
NetBSD NetBSD netbsd.nybek.com 6.1.5 NetBSD 6.1.5 (GENERIC) amd64
OpenBSD OpenBSD openbsd.nybek.com 5.5 GENERIC#271 amd64
Darwin OS X Yosemite Darwin darwin.nybek.com 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec 22 23:10:38 PST 2014; root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64
Illumos (OpenSolaris) OmniOS SunOS omnios 5.11 omnios-10b9c79 i86pc i386 i86pc
OpenIndiana SunOS openindiana 5.11 oi_151a8 i86pc i386 i86pc Solaris
Linux Centos 7 Linux centos7 3.10.0-123.13.2.el7.x86_64 #1 SMP Thu Dec 18 14:09:13 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Debian Wheezy Linux wheezy 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u1 x86_64 GNU/Linux
Debian Jessie Linux jessie 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64 GNU/Linux
Windows (Cygwin) Windows 7 (Cygwin) CYGWIN_NT-6.1 win7 1.7.33-2(0.280/5/3) 2014-11-13 15:47 x86_64 Cygwin
Windows (Native) Windows 7 Windows 7 (64-bit) service pack 1. (Kernel: 6.1.7601.22923)

The text is written as though the sockets API on all the platforms is the same. The API on Windows, however, is slightly different. So, whenever we refer to close() or errno in the text, we are also speaking about the Windows equivalents: closesocket() and WSAGetLastError().

Results for SO_LINGER set to {on, 0}

According to UNIX Network Programming (page 202), we can expect the following:

If l_onoff is nonzero and l_linger is zero, TCP aborts the connection when it is closed (pp. 1019-1020 of TCPv2). That is, TCP discards any data still remaining in the socket send buffer and sends an RST to the peer, not the normal four-packet connection termination sequence...

Good news: aside from a couple of exceptions, this setting works as expected on all the platforms we tested. The exceptions are:

  1. On the BSDs and Darwin, setting SO_LINGER on a Listening Socket to {on, 0} does not have the desired effect. Rather than aborting the connection immediately, the call to close() blocks until either: (i) the kernel manages to send the whole payload followed by a FIN (and presumably until it actually receives an ACK for the FIN); or, (ii) 120 seconds elapses. After the 120 second interval, close() will return without aborting the connection, errno will be set to EWOULDBLOCK (except on Darwin), and the kernel will continue to manage the socket as though SO_LINGER was never set.

  2. On Windows, both native and under Cygwin, a call to shutdown() renders the SO_LINGER setting ineffective. So, if we set SO_LINGER to {on, 0} on a Windows socket, then call shutdown(), and then call close() before we have read an EOF off the socket, the connection will not be aborted. The effect will be as though SO_LINGER was never set.

The following table summarises the results:

Effect of SO_LINGER set to {on, 0}
Abort on close() Abort on close() after shutdown()
Listening Socket Connected Socket Listening Socket Connected Socket
BSD No Yes No Yes
Darwin No Yes No Yes
Illumos Yes Yes Yes Yes
Linux Yes Yes Yes Yes
Windows (Cygwin) Yes Yes No No
Windows (Native) Yes Yes No No

Results for SO_LINGER set to {on, 20}

This is how the setting is described in UNIX Network Programming (page 203):

If l_onoff is nonzero and l_linger is nonzero, then the kernel will linger when the socket is closed (p.472 of TCPv2). That is, if there is any data still remaining in the socket send buffer, the process is put to sleep until either: (i) all the data is sent and acknowledged by the peer TCP, or (ii) the linger time expires. If the socket has been set to nonblocking (Chapter 16), it will not wait for the close to complete, even if the linger time is nonzero. When using this feature of the SO_LINGER option, it is important for the application to check the return value from close, because if the linger time expires before the remaining data is sent and acknowledged, close returns EWOULDBLOCK and any remaining data in the send buffer is discarded.

Certainly not the clearest description in the book. Nevertheless, since we are ignoring nonblocking sockets here, we can distil this passage down to the following three behaviours:

  • Block on close: The call to close() blocks until all the data is sent and ACK'd, or until the linger time expires.
  • Abort on linger timeout: "if the linger time expires before the remaining data is sent and acknowledged... any remaining data in the send buffer is discarded". We also assume that an RST is then sent to the peer, though the above passage does not actually mention this.
  • Return EWOULDBLOCK on linger timeout: "if the linger time expires before the remaining data is sent and acknowledged, close returns EWOULDBLOCK".

So how did our test platforms measure up?

Effect of SO_LINGER set to {on, 20}
Block on close() Time spent blocking (secs) Abort on linger timeout Return EWOULDBLOCK on linger timeout
BSD Yes 20 No Yes
Darwin Yes 0.2 No No
Illumos Yes 20 No No
Linux Yes 20 No No
Windows (Cygwin) Yes * Until FIN is ACK'd No No
Windows (Native) Yes * 20 Yes No

* On Windows, if close() is preceded by a call to shutdown(), the answer here becomes "No".

Two points are worth noting about the results:

  1. Calling shutdown() before close() renders the SO_LINGER setting ineffective on Windows (native and under Cygwin). This is the same problem as we saw above:

  2. Unlike when linger timeout was set to zero (above), the BSDs and Darwin did not show any difference in behaviour between setting SO_LINGER on a Listening Socket and on a Connected Socket.

Results by Platform

BSD

Setting SO_LINGER to {on, 0} on a Connected Socket will have the expected behaviour: it will cause the connection to be aborted when close() is called. See tests #04 and #22.

Setting SO_LINGER to {on, 0} on a Listening Socket has a very different effect; namely, it causes close() to block until either (i) the full payload + FIN is sent (and presumably until it's acknowledged), or (ii) 120 seconds elapses. After the 120 second timeout, the connection will still not abort. Rather, close() will return with -1, errno will be set to EWOULDBLOCK and the kernel will continue to manage the socket as though SO_LINGER was never set. See tests #03, #20, #33 and #34.

When SO_LINGER is set to {on, 20}, close() will block until either (i) the full payload in the socket send buffer is sent to the peer and acknowledged, or (ii) 20 seconds elapses. After the 20 second interval, close() will return with -1, errno will be set to EWOULDBLOCK and the kernel will continue to manage the socket in the background as though SO_LINGER was never set. With this setting, there is no difference when you set it on a Connected Socket or a Listening Socket. See tests #06, #09, #26 and #30.

You may notice from the test result captures that FreeBSD appears to act differently from NetBSD and OpenBSD in many of the tests. This has nothing to do with SO_LINGER. The tests where FreeBSD's results are different are when the client in the tests was sitting idle for 5 minutes or more, not reading from the connection. When FreeBSD does not see any traffic on the connection for exactly 60 seconds, it aborts it. This usually happens about 150 seconds after the connection is established, sometimes longer, occasionally a lot longer. You can see this behaviour in most of the tests that have the label idle-client in their names. See tests #01, #05, #08, etc.

From the tcpdump traces of test #21, you can see that NetBSD and OpenBSD send an RST to the peer after the connection has been cleanly closed. At a guess, I would say that the RST is sent when close() is called, but by then the connection is already terminated due to a shutdown() call and a FIN from the peer. This unusual behaviour is also witnessed with Darwin and Illumos. See test #21.

Darwin

As you will see, the implementation of SO_LINGER on Darwin is quite an unusual case. Its behaviour is the same as the BSDs when SO_LINGER is set to {on, 0} except that it doesn't set EWOULDBLOCK when they do. Darwin's behaviour when SO_LINGER is set to {on, 20} is wholly original.

Setting SO_LINGER to {on, 0} on a Connected Socket will cause the connection to be aborted when close() is called. See tests #04 and #22.

Setting SO_LINGER to {on, 0} on a Listening Socket causes close() to block until either (i) the full payload + FIN is sent (and presumably until it's acknowledged), or (ii) 120 seconds elapses. After the 120 second timeout, the connection will still not abort. Rather, close() will return with 0 and the kernel will continue to manage the socket as though SO_LINGER was never set. See tests #03, #20, #33 and #34.

Setting SO_LINGER to {on, 20} on either a Listening Socket or a Connected Socket will cause close() to block for 0.2 seconds before returning with 0. The kernel then continues to manages the socket as though SO_LINGER was never set. See tests #06, #09, #26 and #30.

Darwin exhibits the same behaviour as NetBSD and OpenBSD with respect to test #21. Please read the end of the BSD section for an understanding of what this means.

Illumos (OpenSolaris)

Aside from a couple of minor quirks, SO_LINGER on Illumos has the same behaviour as the Linux implementation.

Setting SO_LINGER to {on, 0} causes the connection to be aborted when close() is called. See tests #03, #04, #20, #22, #33 and #34.

When SO_LINGER is set to {on, 20}, a call to close() will block until either (i) the full payload in the socket send buffer is sent to the peer and acknowledged, or (ii) 20 seconds elapses. After the 20 second interval, close() will return with 0 and the kernel will continue to manage the socket in the background as though SO_LINGER was never set. See tests #06, #09, #26 and #30.

The first difference between Illumos and the other platforms is that it won't allow you to set SO_LINGER on a Connected Socket if you have already called shutdown() on the socket. Any attempt to do so is met with an EINVAL. See tests #31 and #32.

Another quirk of the Illumos implementation occurs when the socket SO_LINGER is set to {on, 0} and shutdown() is called. We call shutdown() and after a time the peer reads an EOF off the socket and sends its own FIN. The unusual behaviour is that after ACKing the peers FIN the Illumos kernel sends an RST. You can see this behaviour in the tcpdump traces of tests #19 and #21. The traces for test #21 show a similar behaviour for NetBSD, OpenBSD and Darwin.

When monitoring connections on an Illumos-based OS, you should be aware that the kernel has a different view of TCP states than what you will find on the other platforms we've tested here. If you call close() on a connection and the FIN gets stuck in the socket send buffer because the peer is slow or idle, Illumos will consider the TCP connection to be in an ESTABLISHED state. All the other platforms will consider the connection to be in a FIN_WAIT1 state, even though the FIN has not actually been put on the wire yet. It's something you might want be aware of if you are monitoring connections using netstat or the like.

Linux

When you set SO_LINGER to {on, 0} on Linux, the connection is aborted when close() is called. See tests #03, #04, #20, #22, #33 and #34.

When SO_LINGER is set to {on, 20}, a call to close() will block until either (i) the full payload in the socket send buffer is sent to the peer and acknowledged, or (ii) 20 seconds elapses. After the 20 second interval, close() will return with 0 and the kernel will continue to manage the socket in the background as though SO_LINGER was never set. See tests #06, #09, #26 and #30.

On Linux there is no difference between setting SO_LINGER on the Listening Socket and the Connected Socket. Also, calls to shutdown() before close() don't alter the SO_LINGER behaviour.

Windows (Cygwin)

The SO_LINGER implementation on Cygwin is quite unusual. It follows the worst aspects of the native Windows implementation: ignoring SO_LINGER settings when shutdown() is called before close(). And it follows own path when SO_LINGER is set to {on, N} when N > 0 (so long as shutdown() is not called before close()).

When SO_LINGER is set to {on, 0} the connection is aborted when close() is called. See tests #03, #04 and #33.

When SO_LINGER is set to {on, 0} and shutdown() is called before close(), close() returns immediately with 0 and the kernel continues to manage the socket as though SO_LINGER was never set. See tests #20, #22 and #34.

When SO_LINGER is set to {on, 20}, a call to close() will block until the full payload in the socket send buffer is sent to the peer and acknowledged. In other words, the 20 second timeout is completely ignored. When close() does eventually return, it returns with 0. See tests #06 and #09.

When SO_LINGER is set to {on, 20} and shutdown() is called before close(), the call to close() will return immediately with 0 and the kernel will continue to manage the socket as though SO_LINGER was never set. See tests #26 and #30.

Windows (Native)

At first glance the native Windows implementation of SO_LINGER seems to be close to perfect. Unlike all the other platforms, it does actually abort the TCP connection after the timeout when SO_LINGER is set to {on, N} when N > 0. From reading UNIX Network Programming, this is the behaviour we would expect of all platforms, but only Windows actually does it. Unfortunately, the Windows implementation also has a major flaw: any SO_LINGER setting is completely ignored if shutdown() is called before closesocket().

When SO_LINGER is set to {on, 0} the connection is aborted when closesocket() is called. See tests #03, #04 and #33.

When SO_LINGER is set to {on, 0} and shutdown() is called before closesocket(), closesocket() returns immediately with 0 and the kernel continues to manage the socket as though SO_LINGER was never set. See tests #20, #22 and #34.

When SO_LINGER is set to {on, 20}, a call to closesocket() will block until either (i) the full payload in the socket send buffer is sent to the peer and acknowledged, or (ii) 20 seconds elapses. After the 20 second interval, closesocket() will return with 0 and the connection will be aborted. See tests #06 and #09.

When SO_LINGER is set to {on, 20} and shutdown() is called before closesocket(), the call to closesocket() will return immediately with 0 and the kernel will continue to manage the socket as though SO_LINGER was never set. See tests #26 and #30.

Download Code and Result Data

The code for the command line tools used in the tests is available here: linger-tools.tar.gz

An updated version of this code used is available on GitHub: linger-tools

The test result captures, including tcpdump traces, are available here: linger-test-results.tar.gz

Unfortunately the server-side output of the first 11 tests on the Windows native platform is missing from the download data. The buffer on the Windows command prompt was too small and only the output from the last 23 tests got captured. The results from the tests were manually recorded as they happened, so the included __results.txt files are accurate. At some point we might repeat these tests for completeness, but this is not currently a priority. The client-side output and the tcpdump traces tell the bulk of the story anyway.

About the Tests

In each test we use two command-line tools: linger-client and linger-server (win-linger-server for native Windows).

The linger-client command is simple. It sets the socket receive buffer (SO_RCVBUF) to 8K to control the TCP window and then it reads from the stream according to its command line options, of which there is only one of significance: -i.

$ linger-client -h
usage: linger-client [-i] hostname
    -h      Print usage and exit.
    -i      Interactive. Require user confirmation before each
            read of the stream.
                      
It has two modes of operation: automatic and interactive. In automatic mode, we start linger-client without any options. This causes the client to read 512 bytes from the stream once every 4 seconds until the 20K payload is fully read (about 160 seconds). Automatic mode is used in every test with slow-client in its name.

Interactive mode (-i) is used in all tests with idle-client or fast-client in its name. To simulate an idle client, we start linger-client in interactive mode (-i), wait for 5 or 30 minutes and then press ENTER until the full stream has been read. To simulate a fast client, one which completes its task before any timeouts expire, we start linger-client in interactive mode (-i) and then, without waiting, press ENTER until the full stream has been read.

The linger-server is also simple, in principle. It accepts a connection from the client, writes a 20K payload to the 50K socket send buffer and then attempts to terminate the connection in accordance with the command line options it was give. The linger-server command has a number of options:

$ linger-server -h 
Usage: linger-server [-s lsock|csock|csock_late] [-t linger_secs] [-w] [-S] [-T eof_wait_secs]
     -h             Print usage and exit.
     -s sock_type   The SO_LINGER option is applied to sock_type:
                        lsock - The listening socket
                        csock - The connected socket
                        csock_late - The connected socket. This requires -S to
                                be effective as it is applied after shutdown().
     -t secs        The SO_LINGER timeout in seconds. This only
                    has an effect when -s is set.
     -w             Wait for user confirmation before exiting.
     -S             Use shutdown() and wait for EOF.
     -T secs        Timeout waiting for EOF after shutdown().
                    Must be > 0.
                      
Combinations of these options will allow us to simulate various scenarios. The name of the test will indicate what specific options were used. For example, test
04__slow-client__linger-csock-0
tells us that the client was called as:
$ linger-client
                      
and the server was called as:
$ linger-server -s csock -t 0
                      
So, this tests what happens when SO_LINGER is set to {on, 0} on a Connected Socket.

As a second example, test:

34__idle-client__linger-lsock-0__shutdown-10
tells us that the client was called as:
$ client -i
                      
and the server was called as:
$ linger-server -s lsock -t 0 -S -T 10
                      
This tests what happens while a client sits idle for 5 minutes while the server sets SO_LINGER to {on, 0} (on the Listening Socket), calls shutdown(), waits for 10 seconds for an EOF and finally calls close() when no EOF arrives.

If you are unsure of what command line options were used in any particular test, just look in the client-output or server-output directory of the test result data for that test; the commands themselves were captured along with their output.

References

Stevens, W.R., Fenner, B., and Rudoff, A.M. 2004. UNIX Network Programming, Volume 1 (3rd edition): The Sockets Networking API . Addison-Wesley, Boston, Massachusetts.