Golang: Handling tens of thousands of simultaneous HTTP requests even on Raspberry Pi
Hi! Today we will have fun with concurrent HTTP requests and how to handle a really big amounts of them with minimal RAM consumption. To make this process crazy — I will choose RaspberryPi 4B as the server and Macbook Air as the client.
Some theory. HTTP works over TCP, so we have a range of TCP ports: 0 to 65535. It means that we can open up to ~65k simultaneous TCP connections between two machines. Let`s introduce the goal for today: send 60k requests simultaneously (i mean that we will have active 60k HTTP sessions at the same time), do some work, and respond to the client.
Attempt 1: Writing the server and client
Let’s start from something simple. Let the server receive a request, simulate the waiting from the response from DB (time.Sleep(time.Second * 120)) and respond with “Hello” string. First of all — we are interested in stats about RAM usage and current HTTP sessions count. I will use beautiful uilive lib to update the terminal with fresh data.
The simple client looks like:
I disable the KeepAlive to guarantee that TCP connections will not stuck or be reused.
Also, I introduced the cooldown for every 5k requests to be sure that the process will be stable.
Let’s run the server part on RaspberryPi and the client part on MacbookAir. After 1015 received requests server said about exceded limits on file descriptors:
Linux has two limits: hard and soft. I will override all of them by adding the next lines to /etc/security/limits.conf:
* soft nofile 1000000
* hard nofile 1000000
Attempt 2: After rebooting the RaspberryPi — start the server part again and try to run the client.
After 10233 requests we are facing the same limits on the client side:
So, I will push that limits, but on my laptop, I prefer to make it temporarily (up to the next reboot):
sudo sysctl -w kern.maxfiles=1000000
sudo sysctl -w kern.maxfilesperproc=1000000
Additionally, it is required to execute
ulimit -S -n 1000000
For the current terminal session, where the client is going to be run
Attempt 3: restarting the client part
Oops… We have just one more problem after 16377th request:
Some more theory, why it is happening: when you are connecting to some TCP port (:8080 in our case) — the client opens the local port too, but just for the current TCP session. The range of ports is the same: 0 to 65535. In OS you have the settings to define the starting port from this range. Let’s see what settings we have in MacOS:
sysctl -a | grep portrange
As you can see — we have the actual 49152–65535 range for opened TCP ports. As the result, we have just 16383 ports to be opened. Let’s push that limit. I am going to set the lowest port to 1024. I will do that temporarily (up to the next reboot):
sudo sysctl -w net.inet.ip.portrange.first=1024
sudo sysctl -w net.inet.ip.portrange.hifirst=1024
Attention! Be careful with setting the starting port. Remember, that ports 0–1024 are reserved for the known application. Basically — 49152 is optimal value.
Finally, I got the result!
Amazing! Clients got their “hello” string. The server side required just 1230MiB of RAM for that.
Trying to optimize RAM usage
Let’s try to replace the basic http package with fasthttp. This library was designed for very high loads. The most advantage is minimized usage of allocs by reusing existing resources from request to request. If you are planning to use it — you need to read about the main differences on GitHub. So, this is a modified version of server part:
After running the client with the new server part getting just fantastic results!
RaspberryPi just handled the 60K request with a consumption <800MiB of RAM to hold them!
Adding the load
60K concurrent requests are cool, but in the real world we need to do something more than just return the string. Let’s add some dummy load. For example, our getHashOfRandomString() will take sha3 hash 10K times in the loop and return the base64 of the result.
Where the problem can be?
If we have 60K opened TCP sessions — our computer needs to keep these sessions. It requires some free CPU time. For this study I decided to create just one worker, that will read the channel, perform the getHashOfRandomString() and send the result to another channel (different for each HTTP session).
Let’s run and see, how active connections increase fast, and after 60K active connections — smoothly decrease, generating the responses.
So, we just created an artificial bottleneck to have the situation, where just 1 thread processes the logic and the remaining 3 handle the 60K HTTP sessions. As the result, we guarantee, that all of the request will be handled and responses returned. In the real world you can create a sophisticated workers pool to use more resources of your machine and manage the concurrency.
Instead of conclusion
The abilities of compilable languages are really fascinating! Just study your project needs and choose the right architecture.
So, we just created artificial bottleneck to have the situation, where just 1 thread processes the logic and remaining 3 handle the 60K HTTP sessions. As the result, we guarantee, that all of the request will be handled and responses returned. In real world you can create sophisticated workers pool to use more resources of your machine and manage the concurrency.