BackgroundRecently, while debugging an issue with a multi-threaded daemon and nanosleep(2), I was surprised to find that the single-threaded version of the daemon ran four times as fast (in wall clock time) as the multi-threaded version. (The system and user times were the same.) I was surprised because there was very little locking--only the call to FCGX_Accept was synchronized and the protected section of code was really really fast--no file reads, no database calls--just memory access.
- Created a FastCGI daemon that responds with a big web page (94kB).
- Ran http_load with ten clients in parallel, making requests as fast as they could for 60 seconds.
- 15,473 total fetches
- 258 fetches/second
- 2.5 x 10^7 bytes/second
- 0.23 mean msecs/connect
- 37.3 mean msecs/first response
- 15,473 HTTP 200 responses
Benchmark With Slow ClientThe second test ran the same http_load call, but this time after starting a second http_load call in another console that throttled the client to receive approximately 1,000 bytes/second. (This is much slower than a modem, which can generally at least to 33,000 bytes/second.) Note that the http_load call to do this is not documented in the manual page; it is:
$ http_load -Throttle 10000 -parallel 5 -seconds 60 urlsResults:
- 13,757 fetches
- 229 fetches/second
- 2.2 x 10^7 bytes/second
- 0.23 mean msecs/connect
- 41.8 mean msecs/first-response
- 13,757 HTTP 200 responses
- 1 fetch
- 0.01 fetches/second (ran for 90 seconds)
- 1,079 bytes/second
- 0.61 mean msecs/connect
- 4.2 mean msecs/first-response
- 1 HTTP 200 response
ConclusionAfter writing this up, I think the test is in fact worthless! Here's why--watching netstat, I can see the Recv-Q of the socket is filled when running the throttled http_load. I think this means the data has been transmitted over the network (well, between sockets--the test was run on a single machine), and it is queued somewhere in the operating system. This is not an accurate test of a slow network. Drat, back to the drawing board.
AddendumAfter thinking about this a bit, I took a different approach:
- for throttled client, create a modified http_get utility that uses a socket with a minimal receive buffer (512bytes) and sleep one second between each read call
- run full-speed http_load at same time.
In fact, I suspect with Lighttpd, even if the SO_SNDBUF is not large enough to hold the entire response, it will still fly.
Lightty uses the select loop and multiple send sockets; if a send socket can't hold the entire response, and is backed up in sending the data it does have to the client socket, I would expect the select loop will simply continue to serve the other sockets that are ready to send more data and which are connected to faster clients.