Sunday, April 5, 2009

Will a slow client bog down a single-threaded server?

Single-threaded code is attractive; it is shorter and much less complex. If a single-threaded FastCGI daemon is really fast, what happens if there is a really slow connection; for example, a web browser on a slow modem? Do other clients have to wait? I expect the answer is no, but wanted to test to be sure.

Background

Recently, while debugging an issue with a multi-threaded daemon and nanosleep(2), I was surprised to find that the single-threaded version of the daemon ran four times as fast (in wall clock time) as the multi-threaded version. (The system and user times were the same.) I was surprised because there was very little locking--only the call to FCGX_Accept was synchronized and the protected section of code was really really fast--no file reads, no database calls--just memory access.

Initial Benchmark

  1. Created a FastCGI daemon that responds with a big web page (94kB).
  2. Ran http_load with ten clients in parallel, making requests as fast as they could for 60 seconds.
This gave me an initial benchmark:
  • 15,473 total fetches
  • 258 fetches/second
  • 2.5 x 10^7 bytes/second
  • 0.23 mean msecs/connect
  • 37.3 mean msecs/first response
  • 15,473 HTTP 200 responses

Benchmark With Slow Client

The second test ran the same http_load call, but this time after starting a second http_load call in another console that throttled the client to receive approximately 1,000 bytes/second. (This is much slower than a modem, which can generally at least to 33,000 bytes/second.) Note that the http_load call to do this is not documented in the manual page; it is:
$ http_load -Throttle 10000 -parallel 5 -seconds 60 urls
Results:
  • 13,757 fetches
  • 229 fetches/second
  • 2.2 x 10^7 bytes/second
  • 0.23 mean msecs/connect
  • 41.8 mean msecs/first-response
  • 13,757 HTTP 200 responses
Here's the result from the throttled client:
  • 1 fetch
  • 0.01 fetches/second (ran for 90 seconds)
  • 1,079 bytes/second
  • 0.61 mean msecs/connect
  • 4.2 mean msecs/first-response
  • 1 HTTP 200 response
It was throttled so much that it could only complete one request during the entire 90 seconds.

Conclusion

After writing this up, I think the test is in fact worthless! Here's why--watching netstat, I can see the Recv-Q of the socket is filled when running the throttled http_load. I think this means the data has been transmitted over the network (well, between sockets--the test was run on a single machine), and it is queued somewhere in the operating system. This is not an accurate test of a slow network. Drat, back to the drawing board.

Addendum

After thinking about this a bit, I took a different approach:
  • for throttled client, create a modified http_get utility that uses a socket with a minimal receive buffer (512bytes) and sleep one second between each read call
  • run full-speed http_load at same time.
This seems like a decent simulation of a (really!) slow client connection. Watching netstat shows one socket with tens of thousands of bytes in its send queue (the web server) and another with 192 bytes in its receive queue (the throttled client). And the full-speed http_load ran just fine, processing 13,000 fetches in 60 seconds and transferring 2 x 10^7 bytes/second. So as long as the socket send buffer size on the server is big enough to hold the entire response, for sure there is no problem with using a single-threaded server, even for pathologically slow client connections.

Addendum 2

In fact, I suspect with Lighttpd, even if the SO_SNDBUF is not large enough to hold the entire response, it will still fly.

Lightty uses the select loop and multiple send sockets; if a send socket can't hold the entire response, and is backed up in sending the data it does have to the client socket, I would expect the select loop will simply continue to serve the other sockets that are ready to send more data and which are connected to faster clients.

No comments:

Post a Comment