This is an old revision of the document!

Server

The main focus was to tune the server so it could handle alot of connections.

Changes are made and ordered to get a noticable gain after each. Some changes could be done much earlier, but often with small impact.

baseline

No tunning, just fresh install, with a nginx home page.

A fresh nginx install, serving default home (a very small html).

Input file: small-1.txt
new page0 0
      get 10.128.0.0:80 /

/root/inject -b -d 60 -u 500 -s 20 -f small-1.txt -S 10.140.0.0-10.140.15.255:1024-65535
20932 hits/s

Ok, that's gives us baseline. What we can get without even trying.

All your core are belong to us

Nginx default configuration only has 4 workers. The systems sees 24 cpu. Lets get 24 workers !

file: /etc/nginx/nginx.conf
-worker_processes 4;
+worker_processes 24;

/root/inject -b -d 60 -u 500 -s 20 -f small-1.txt -S 10.140.0.0-10.140.15.255:1024-65535

Getting some errors in /var/log/nginx/error.log

[...] accept4() failed (24: Too many open files)

Increase the number of open files. That's just memory. Memory is cheap. Lets say that instead of 1k (ulimit -n show 1024) we want lets say 1M files (1048576).

file:/etc/default/nginx
+ULIMIT="-n 1048576"

New error…

[...] "/var/log/nginx/access.log" failed (28: No space left on device) while logging request [...]

No space left ? Damn, why am I even logging my requests ? That's some heavy disk i/o and should just be removed. Lets stop writting useless access.log (keep the error.log, there shouldn't be anything there, and if there is it will probably be usefull).

file: /etc/nginx/nginx.conf
-access_log /var/log/nginx/access.log;
+access_log off;

Yet an other error…

768 worker_connections are not enough

Lets get ALOT of connections (not wanting it to appear again anytime soon).

file:/etc/nginx.conf
-worker_connections 768;
+worker_connections 524288;

Yeah, no more errors.

/root/inject -b -d 60 -u 500 -s 20 -f small-1.txt -S 10.140.0.0-10.140.15.255:1024-65535
47875 hits/s

Good… we are getting somewhere.

We have 24 process that can handle a connection, it's better than 4.

Multiple way to get in

There might be some limitation with the bound socket. (Like the kernel locks the socket to check if it the waiting list is not too long before accepting the connection… pure speculation, code not checked)

Lets try to replace the single listen by multiple IPs to listen to.

file: /etc/nginx/sites-enabled/default
-#listen 80;
+listen 10.128.0.0:80;
+listen 10.128.0.1:80;
[...]
+listen 10.128.0.23:80;

New input file: small-24.txt
new page0 0
        get 10.128.0.0:80 /
new page1 0
        get 10.128.0.1:80 /
[...]
new page23 0
        get 10.128.0.23:80 /

/root/inject -b -d 60 -u 500 -s 20 -f small-24.txt -S 10.140.0.0-10.140.15.255:1024-65535
50743 hits/s

Good, it does help to not be limited on a single socket.

sorry to interrupt

Overall CPU graph shows that one CPU is much much more used than the others. Checking CPU#0 graph, we can see alot of the time is spent in soft-interrupts. We should try to assign the interrupts to other CPUs too…

As we can see in /proc/interrupts, we have 24 interrupts for each interface (as many as cpu - threads - seen by the system). A first approach would be to assign them in order.

eth1-TxRx-0 0 eth1-TxRx-1 1 […] eth1-TxRx-23 23

/root/inject -b -d 60 -u 500 -s 20 -f small-24.txt -S 10.140.0.0-10.140.15.255:1024-65535
53721 hits/s

Better.

stop locking yourselves

Now that our network interrupts isn't a bottle neck anymore, we get some nice connections each seconds. Nginx just doesn't accept them fast enough. By default, nginx uses a mutex so only one process accept the connection. Well, who cares ? What if everyone tries to ? Ok, most process will fail, but what if they get a new socket too ? that could fasten things up.

file:/etc/nginx/nginx.conf
+accept_mutex off;

/root/inject -b -d 60 -u 500 -s 20 -f small-24.txt -S 10.140.0.0-10.140.15.255:1024-65535
97682 hits/s

Wow, that much was just due to nginx locking itself, and preventing other workers from getting the new connections at the same time.

too crowded

We have 24 interrupts spread on our 24 cpu.
We have 24 nginx workers on our 24 cpu.

What if we get less workers ?

file: /etc/nginx/nginx.conf
-worker_processes 24;
+worker_processes 16;

/root/inject -b -d 60 -u 500 -s 20 -f small-24.txt -S 10.140.0.0-10.140.15.255:1024-65535
126731 hits/s

What if we get down to 12 ?

file: /etc/nginx/nginx.conf
-worker_processes 16;
+worker_processes 12;

/root/inject -b -d 60 -u 500 -s 20 -f small-24.txt -S 10.140.0.0-10.140.15.255:1024-65535
138247 hits/s

That much for having as many worker as cpu.

lets focus

Not having as many worker as cpu allows better performances. Yeah, that leaves more free cpu to handle IRQ…

What if we decided to split IRQ on a few CPU, and workers on other CPU.

How to split ? Lets try differents splitting.

Each core has 2 threads. Lets use one thread for IRQ, one for a worker.

irq 0-23 => cpu 0-11,0-11
workers - cpu 12-23
184769 hits/s

We have 2 real processor with 12 threads each. Lets try 1 CPU for IRQ and 1 CPU for workers.

irq 0-23 => cpu 0-5,12-17,0-5,12-17 (processor #0)
workers - set on 6-11,18-23 (processor #1)
190712 hits/s

better

What if we use first 3 cores (2 threads per core) of each processor for IRQ, and the 3 last for workers ?

irq 0-23 => cpu 0-2,6-8,12-14,18-20,0-2,6-8,12-14,18-20
workers - cpu 3-5,9-11,15-17,21-23
187394 hits/s

not as good.

Maybe now that we have a separation we can include a few more workers again, and gather the IRQ some more ?

8 cpu for IRQ, 16 workers

Lets try again to use one thread for IRQ, and one for worker… first 4 for each processor.

irq 0-23 => cpu 0-3,6-9,0-3,6-9,0-3,6-9
worker - cpu 4,5,10-23
153129 hits/s

ouch. Not that good…

What about one processor for IRQ… first 4 cores (both threads) ?

irq 0-23 => 0-3,12-15,0-3,12-15,0-3,12-15
worker - cpu 4-11,16-23
218857 hits/s

Wow, much better. Just changing which threads handle does what has a big impact.

pin the hopper

Ok, our nginx has 16 process working on 16 cpu. Why not associate each process with a single cpu, so they stop hopping from one to an other.

224544 hits/s

And yet better, with just affinity.

keep it opened

Now that we have a nice quick data transfert, our nginx serves about 200k times a single file. Maybe it should consider caching the file, and not having to access it from scratch each time. At that rate, it might make a difference.

file:/etc/nginx/nginx.conf
+open_file_cache max=1000;
236607 hits/s

I can has cookies

Kernel shows some syn flood errors…

TCP: Possible SYN flooding on port 80. Sending cookies.  Check SNMP counters.

Lets get that off our back (some options are not related to that message, but are included here too) :

file:/etc/sysctl.conf
+net.ipv4.tcp_fin_timeout = 1
+net.ipv4.tcp_tw_recycle = 1
+net.ipv4.tcp_tw_reuse = 1
+net.ipv4.tcp_syncookies = 0
+net.core.netdev_max_backlog = 1048576
+net.core.somaxconn = 1048576
+net.ipv4.tcp_max_syn_backlog = 1048576

Check how it gets on a longer period :

/root/inject -b -d 600 -u 500 -s 20 -f small-$max.txt -S 10.140.0.0-10.140.15.255:1024-65535
236103 hits/s

Ok, we can hold 236k connections per second, without hitting any limit in any log.

ze's sandcastle

Table of Contents

Server

baseline

All your core are belong to us

Multiple way to get in

sorry to interrupt

stop locking yourselves

too crowded

lets focus

pin the hopper

keep it opened

I can has cookies

ze's sandcastle

User Tools

Site Tools

Table of Contents

Server

baseline

All your core are belong to us

Multiple way to get in

sorry to interrupt

stop locking yourselves

too crowded

lets focus

pin the hopper

keep it opened

I can has cookies

Page Tools