This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
system:benches10gbps:firewall [2012/10/04 15:37] ze created |
system:benches10gbps:firewall [2012/11/02 15:41] (current) ze ipset |
||
---|---|---|---|
Line 2: | Line 2: | ||
Those bench start with the inject / httpterm configuration from direct | Those bench start with the inject / httpterm configuration from direct | ||
benches, with 270-290k connections/s between a client and a server. | benches, with 270-290k connections/s between a client and a server. | ||
- | |||
- | FIXME: graph not available yet. Will wait until the bench are over. | ||
Monitoring graphs for the different benches can be found | Monitoring graphs for the different benches can be found | ||
[[http://www.hagtheil.net/files/system/benches10gbps/firewall/|here]]. | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/|here]]. | ||
- | ====== Single ====== | + | ====== Gateway ====== |
+ | |||
+ | A gateway should be as neutral as possible on the network trafic going | ||
+ | through it. If we can get 270k hits by having client and server talking | ||
+ | directly, it would be nice to also have that while it transit via our | ||
+ | gateway. | ||
+ | |||
+ | Having 6 servers, the most we can do is probably having 2 clients | ||
+ | hitting on 3 servers. | ||
===== Baseline ===== | ===== Baseline ===== | ||
- | For a first baseline, we start with a client and a server, going through | + | For a first baseline, we start with 2 clients hitting 3 servers, |
- | a server, with the same interface. | + | directly. No gateway involved. |
- | That gives us a first approach of a gateway impact. | + | If we check the different graph to get an idea of the trafic going on, |
+ | we have (approximate reading on graphs) : | ||
- | 276619 hits/s | + | ^ what ^ per client ^ per server ^ total ^ graph ^ |
+ | ^ conn/s | 400k | 266k | 800k | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/00-baseline-direct/douves-client/tcp_stats_conn_out.png|cli1]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/00-baseline-direct/muraille-client/tcp_stats_conn_out.png|cli2]] | | ||
+ | ^ Gbps from cli/srv | 1.1/1.7 | 0.75/1.12 | 2.4/3.4 | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/00-baseline-direct/douves-client/interfaces_eth1_bps.png|cli1]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/00-baseline-direct/muraille-client/interfaces_eth1_bps.png|cli2]] | | ||
+ | ^ Mpkt/s from cli/srv | 1.2/1.62 | 0.8/1.08 | 2.4/3.24 | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/00-baseline-direct/douves-client/interfaces_eth1_pkt.png|cli1]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/00-baseline-direct/muraille-client/interfaces_eth1_pkt.png|cli2]] | | ||
- | Just having a gateway doesn't seem to have any impact on the number of | + | Well, we can have about a little over 3Gbps, with 800k connections/s. |
- | connections we can handle. | + | That might not be enough to reach the limit of a 10Gbps gateway, but it |
+ | should already be enough to give us a hint of some limits. | ||
+ | |||
+ | ===== Gateway ===== | ||
+ | |||
+ | Now that we have an idea of the trafic we can generate, lets see how it | ||
+ | gets handled by a single gateway. | ||
+ | |||
+ | For our first test, the gateway will use the same interface in and out. | ||
+ | That should theoricaly give us 5.6Gbps in and out of it. | ||
+ | |||
+ | We make sure our gateway forward the packets, and doesn't send any | ||
+ | redirect (I use the same subnet, so default might send redirect to avoid | ||
+ | using the useless gateway). | ||
+ | |||
+ | gateway#sysctl.conf | ||
+ | net.ipv4.ip_forward = 1 | ||
+ | net.ipv4.conf.all.accept_redirects = 0 | ||
+ | net.ipv4.conf.all.send_redirects = 0 | ||
+ | net.ipv4.conf.default.accept_redirects = 0 | ||
+ | net.ipv4.conf.default.send_redirects = 0 | ||
+ | net.ipv4.conf.eth0.accept_redirects = 0 | ||
+ | net.ipv4.conf.eth0.send_redirects = 0 | ||
+ | net.ipv4.conf.eth1.accept_redirects = 0 | ||
+ | net.ipv4.conf.eth1.send_redirects = 0 | ||
+ | |||
+ | And something "heard" as being a good idea (checked in later bench) : | ||
+ | |||
+ | gateway#/etc/rc.local | ||
+ | ethtool -G eth1 rx 4096 | ||
+ | ethtool -G eth1 tx 4096 | ||
+ | |||
+ | |||
+ | As we don't have any process that will be running here, and only the | ||
+ | kernel handling the interupts, the irq affinity is spread among all the | ||
+ | processor threads : | ||
+ | |||
+ | eth1-TxRx-0 0 | ||
+ | eth1-TxRx-1 1 | ||
+ | eth1-TxRx-2 2 | ||
+ | [...] | ||
+ | eth1-TxRx-22 22 | ||
+ | eth1-TxRx-23 23 | ||
+ | |||
+ | Results seems to indicate something very near the total of in/out we had | ||
+ | earlier, which is explained by the fact both go in and out of our | ||
+ | gateway via the same interface. | ||
+ | |||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/01-baseline-gateway/rempart-firewall/interfaces_eth1_bps.png|bps]] | ||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/01-baseline-gateway/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | ||
+ | |||
+ | ===== no rules ===== | ||
+ | |||
+ | Ok, without doing anything but forwarding the trafic, it gets pretty | ||
+ | nicely. Lets just check it doesn't change anything if we have the | ||
+ | firewall up, but without any rules. | ||
+ | |||
+ | iptables -L | ||
+ | iptables -t mangle -L | ||
+ | iptables -t raw -L | ||
+ | |||
+ | With no rule, and the 3 tables filter, raw and mangle, we already get | ||
+ | down from 5.7 to 4.9 (both, Gbps and M pkt/s). That's down from 800k to | ||
+ | just under 700k conn/s. | ||
+ | |||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/02-baseline-norule/rempart-firewall/interfaces_eth1_bps.png|bps]] | ||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/02-baseline-norule/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | ||
+ | |||
+ | |||
+ | ====== Firewall ====== | ||
===== Conntrack ===== | ===== Conntrack ===== | ||
Line 29: | Line 108: | ||
values. | values. | ||
+ | # nat | ||
+ | iptables -t nat -L | ||
# load the conntrack modules (ipv4) | # load the conntrack modules (ipv4) | ||
iptables -I FORWARD -m state --state ESTABLISHED | iptables -I FORWARD -m state --state ESTABLISHED | ||
iptables -D FORWARD -m state --state ESTABLISHED | iptables -D FORWARD -m state --state ESTABLISHED | ||
- | # load the conntrack modules (ipv6) | ||
- | ip6tables -I FORWARD -m state --state ESTABLISHED | ||
- | ip6tables -D FORWARD -m state --state ESTABLISHED | ||
# increase the max conntrack (default: 256k) | # increase the max conntrack (default: 256k) | ||
echo 33554432 > /proc/sys/net/netfilter/nf_conntrack_max | echo 33554432 > /proc/sys/net/netfilter/nf_conntrack_max | ||
Line 43: | Line 121: | ||
has on our connection rate. | has on our connection rate. | ||
- | average over 5 minutes: 154726 | + | average connection rate over 5 minutes : 150976 |
But looking at the graph, we see a breakdown. It starts at 180k for 120 | But looking at the graph, we see a breakdown. It starts at 180k for 120 | ||
- | seconds, then there is a drastic drop to an average of 137k for the rest | + | seconds, then there is a drastic drop to an average of 135k for the rest |
of the time. | of the time. | ||
- | As the conntrack count increase in a linear form, up to about 21.5M, and | + | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/03-firewall-conntrack/rempart-firewall/interfaces_eth1_bps.png|bps]] |
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/03-firewall-conntrack/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | ||
+ | |||
+ | As the conntrack count increase in a linear form, up to about 21.2M, and | ||
decrease from that time, it seems to drop when the first timeout start | decrease from that time, it seems to drop when the first timeout start | ||
to hit. | to hit. | ||
Line 55: | Line 136: | ||
Having conntracking gives a performance hit. Having tracking timeouts | Having conntracking gives a performance hit. Having tracking timeouts | ||
while gives an other performance hit. | while gives an other performance hit. | ||
+ | |||
+ | To make sure it is related, we checked what timeouts were at 120, and | ||
+ | changed them. | ||
+ | |||
+ | cd /proc/sys/net/netfilter | ||
+ | grep 120 /proc/sys/net/netfilter/* | ||
+ | nf_conntrack_tcp_timeout_fin_wait:120 | ||
+ | nf_conntrack_tcp_timeout_syn_sent:120 | ||
+ | nf_conntrack_tcp_timeout_time_wait:120 | ||
+ | echo 150 > nf_conntrack_tcp_timeout_fin_wait | ||
+ | echo 180 > nf_conntrack_tcp_timeout_syn_sent | ||
+ | echo 60 > nf_conntrack_tcp_timeout_time_wait | ||
+ | |||
+ | Testing with those values allowed us to get the break at 60 seconds. | ||
+ | Connections gets in time_wait, and expires after 60s instead of 120s. | ||
+ | |||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/04-firewall-conntrack2/rempart-firewall/interfaces_eth1_bps.png|bps]] | ||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/04-firewall-conntrack2/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | ||
+ | |||
+ | |||
+ | Testing with nf_conntrack_tcp_timeout_time_wait set to 1s gives directly | ||
+ | the low performances, even if the conntrack stay under 200k, instead of | ||
+ | a few millions. | ||
+ | |||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/05-firewall-conntrack3/rempart-firewall/interfaces_eth1_bps.png|bps]] | ||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/05-firewall-conntrack3/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | ||
For our heavy connections, we clearly need to be able to *not* track | For our heavy connections, we clearly need to be able to *not* track | ||
them. | them. | ||
+ | |||
+ | |||
+ | ===== notrack ===== | ||
+ | |||
+ | Obviously, not tracking those requests would probably be a good idea. | ||
+ | Lets add the rules to just do that, and see if it helps. | ||
+ | |||
+ | *raw | ||
+ | -A PREROUTING -d 10.128.0.0/16 -p tcp -m tcp --dport 80 -j NOTRACK | ||
+ | -A PREROUTING -s 10.128.0.0/16 -p tcp -m tcp --sport 80 -j NOTRACK | ||
+ | -A PREROUTING -d 10.132.0.0/16 -p tcp -m tcp --dport 80 -j NOTRACK | ||
+ | -A PREROUTING -s 10.132.0.0/16 -p tcp -m tcp --sport 80 -j NOTRACK | ||
+ | -A PREROUTING -d 10.148.0.0/16 -p tcp -m tcp --dport 80 -j NOTRACK | ||
+ | -A PREROUTING -s 10.148.0.0/16 -p tcp -m tcp --sport 80 -j NOTRACK | ||
+ | |||
+ | That let us get about the same results as without the firewall modules | ||
+ | loaded. CPU usage on the firewall seems to be lightly more on the 2 | ||
+ | thread that handles most of the interrupts. (I'd say from about 15-17% | ||
+ | to 18-20%) | ||
+ | |||
+ | Here, one of the CPU is used at 100%, with only about 4.1G bps, | ||
+ | 4.2M pkt/s, total of about 590k conn/s instead of our 800k without | ||
+ | firewall. | ||
+ | |||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/06-firewall-notrack/rempart-firewall/interfaces_eth1_bps.png|bps]] | ||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/06-firewall-notrack/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | ||
+ | |||
+ | |||
+ | Trying to get only one rule for notrack get un slightly better | ||
+ | performances : | ||
+ | |||
+ | *raw | ||
+ | -A PREROUTING -j NOTRACK | ||
+ | |||
+ | That give us about 620k conn/s. | ||
+ | |||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/07-firewall-notrack2/rempart-firewall/interfaces_eth1_bps.png|bps]] | ||
+ | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/07-firewall-notrack2/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | ||
+ | |||
+ | ===== simple rules ===== | ||
+ | |||
+ | Lets get back to a configuration without nat loaded, and see how a few | ||
+ | matching rules can affect our CPU usage, and decrease the rate we have. | ||
+ | |||
+ | Earlier, we tried with directly filter, raw and mangle. | ||
+ | |||
+ | Lets try with just filter loaded, and no rules, and then try adding | ||
+ | useless matches, like checking source IP with IP we are not even | ||
+ | considering. | ||
+ | |||
+ | # with n being the matching rules... | ||
+ | n=64 | ||
+ | iptables -F ; for ((i=0;i<n;++i)) ; { iptables -A FORWARD -s 10.0.0.$i ; } | ||
+ | |||
+ | ^ match rules ^ conn/s ^ pkt/s ^ graph ^ | ||
+ | | 0 | 800k | 5.7M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0000/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0000/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | 16 | 780k | 5.6M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0010/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0010/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | 64 | 730k | 5.1M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0040/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0040/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | 256 | 480k | 3.38M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0100/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0100/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | 1024 | 148k | 1.05M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0400/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/08-firewall-simple-rules_0400/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | |||
+ | |||
+ | ===== other matches ===== | ||
+ | |||
+ | Source match has an impact on the requests. Lets check other kind of match. | ||
+ | |||
+ | Tests done with 256 match rules. | ||
+ | |||
+ | ^ match rule ^ conn/s ^ pkt/s ^ graph ^ | ||
+ | | -m u32 --u32 ""0xc&0xffffffff=0xa0000`printf %02x $i`" | 67k | 480k | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-u32-src/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-u32-src/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | -p udp -m udp --dport 53 | 315k | 2.4M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-udp/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-udp/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | -p tcp -m tcp --dport 443 | 155k | 1.1M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-tcp-https/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-tcp-https/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | -p tcp -m tcp --dport 80 (does match) | 140k | 990k | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-tcp-http/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-tcp-http/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | -d 10.0.0.$i | 460k | 3.2M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-dst/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/09-firewall-rule-dst/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | |||
+ | Different kind of matches have different kind of impact. -d or -s have | ||
+ | about the same impact. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== other configs ===== | ||
+ | |||
+ | Tests done with 256 -s xxx matches, as it's the one that gave the best | ||
+ | performances so far. | ||
+ | |||
+ | ^ match rules ^ conn/s ^ pkt/s ^ graph ^ | ||
+ | | default | 480k | 3.38M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/10-firewall-txqueuelen1k/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/10-firewall-txqueuelen1k/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | ethtool -G eth1 {tx/rx} 512 | 505k | 3.6M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/10-firewall-ethtool512/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/10-firewall-ethtool512/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | ethtool -G eth1 {tx/rx} 64 | 450k | 3.2M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/10-firewall-ethtool64/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/10-firewall-ethtool64/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | ip link set eth1 txqueuelen 10000 | 470k | 3.3M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/10-firewall-txqueuelen10k/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/10-firewall-txqueuelen10k/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | |||
+ | txqueuelen - no effect | ||
+ | |||
+ | ring parameters rx/tx, do have an effect, and neither too big nor too | ||
+ | small is the best. | ||
+ | |||
+ | ===== tree search ===== | ||
+ | |||
+ | netfilter allow to use chains, with a few match, you can send to a | ||
+ | chain, and avoid doing what is in the chain if it does not match. | ||
+ | |||
+ | As seen previously, having alot of match in a single chain means that | ||
+ | the packet is tested for all possible match. | ||
+ | |||
+ | Lets see how we could have per ip match for a whole /13. (yeah, that | ||
+ | mean 512k different IPs) | ||
+ | |||
+ | Using iptables to generate that many entry is just too slow, it would | ||
+ | take days. generating a result, and using iptables-restore can provide | ||
+ | way better performances (5-10 minutes, instead of days). | ||
+ | |||
+ | Having 512k rules to match for each packet would slow down the trafic | ||
+ | alot. A way to do it would is to get a few matches, and send to a | ||
+ | specific rule. Then match a few more bits, and send to other rules. | ||
+ | |||
+ | Ideas for that Jesper Dangaard Brouer slides about | ||
+ | [[http://www.slideshare.net/brouer/netfilter-making-large-iptables-rulesets-scale|Making large iptables rulesets scale]]. | ||
+ | Unfortunatly, the perl library has not been updated to build with wheezy | ||
+ | iptables version yet. | ||
+ | |||
+ | Exemple of rules that would be checked/matched for 10.139.5.43 : | ||
+ | |||
+ | <code> | ||
+ | -A FORWARD -s 10.128.0.0/12 -j cidr_12_176160768 (match) | ||
+ | -A cidr_12_176160768 -s 10.136.0.0/14 -j cidr_14_176685056 (match) | ||
+ | -A cidr_14_176685056 -s 10.136.0.0/16 -j cidr_16_176685056 | ||
+ | -A cidr_14_176685056 -s 10.137.0.0/16 -j cidr_16_176750592 | ||
+ | -A cidr_14_176685056 -s 10.138.0.0/16 -j cidr_16_176816128 | ||
+ | -A cidr_14_176685056 -s 10.139.0.0/16 -j cidr_16_176881664 (match) | ||
+ | -A cidr_16_176881664 -s 10.139.0.0/18 -j cidr_18_176881664 (match) | ||
+ | -A cidr_18_176881664 -s 10.139.0.0/20 -j cidr_20_176881664 (match) | ||
+ | -A cidr_20_176881664 -s 10.139.0.0/22 -j cidr_22_176881664 | ||
+ | -A cidr_20_176881664 -s 10.139.4.0/22 -j cidr_22_176882688 (match) | ||
+ | -A cidr_22_176882688 -s 10.139.4.0/24 -j cidr_24_176882688 | ||
+ | -A cidr_22_176882688 -s 10.139.5.0/24 -j cidr_24_176882944 (match) | ||
+ | -A cidr_24_176882944 -s 10.139.5.0/26 -j cidr_26_176882944 (match) | ||
+ | -A cidr_26_176882944 -s 10.139.5.0/28 -j cidr_28_176882944 | ||
+ | -A cidr_26_176882944 -s 10.139.5.16/28 -j cidr_28_176882960 | ||
+ | -A cidr_26_176882944 -s 10.139.5.32/28 -j cidr_28_176882976 (match) | ||
+ | -A cidr_28_176882976 -s 10.139.5.32/30 -j cidr_30_176882976 | ||
+ | -A cidr_28_176882976 -s 10.139.5.36/30 -j cidr_30_176882980 | ||
+ | -A cidr_28_176882976 -s 10.139.5.40/30 -j cidr_30_176882984 (match) | ||
+ | -A cidr_30_176882984 -s 10.139.5.40/32 -j cidr_32_176882984 | ||
+ | -A cidr_30_176882984 -s 10.139.5.41/32 -j cidr_32_176882985 | ||
+ | -A cidr_30_176882984 -s 10.139.5.42/32 -j cidr_32_176882986 | ||
+ | -A cidr_30_176882984 -s 10.139.5.43/32 -j cidr_32_176882987 (match) | ||
+ | -A cidr_32_176882987 ... | ||
+ | -A cidr_28_176882976 -s 10.139.5.44/30 -j cidr_30_176882988 | ||
+ | -A cidr_26_176882944 -s 10.139.5.48/28 -j cidr_28_176882992 | ||
+ | -A cidr_24_176882944 -s 10.139.5.64/26 -j cidr_26_176883008 | ||
+ | -A cidr_24_176882944 -s 10.139.5.128/26 -j cidr_26_176883072 | ||
+ | -A cidr_24_176882944 -s 10.139.5.192/26 -j cidr_26_176883136 | ||
+ | -A cidr_22_176882688 -s 10.139.6.0/24 -j cidr_24_176883200 | ||
+ | -A cidr_22_176882688 -s 10.139.7.0/24 -j cidr_24_176883456 | ||
+ | -A cidr_20_176881664 -s 10.139.8.0/22 -j cidr_22_176883712 | ||
+ | -A cidr_20_176881664 -s 10.139.12.0/22 -j cidr_22_176884736 | ||
+ | -A cidr_18_176881664 -s 10.139.16.0/20 -j cidr_20_176885760 | ||
+ | -A cidr_18_176881664 -s 10.139.32.0/20 -j cidr_20_176889856 | ||
+ | -A cidr_18_176881664 -s 10.139.48.0/20 -j cidr_20_176893952 | ||
+ | -A cidr_16_176881664 -s 10.139.64.0/18 -j cidr_18_176898048 | ||
+ | -A cidr_16_176881664 -s 10.139.128.0/18 -j cidr_18_176914432 | ||
+ | -A cidr_16_176881664 -s 10.139.192.0/18 -j cidr_18_176930816 | ||
+ | -A cidr_12_176160768 -s 10.140.0.0/14 -j cidr_14_176947200 | ||
+ | </code> | ||
+ | |||
+ | With at most 39 check, and 11 jump, any IP within the /13 arrives it its | ||
+ | own chain (or merged chained, if you have serveral IP that need the same | ||
+ | rules). Anything not even in the /12 would get just one check, and get | ||
+ | to the next entries. | ||
+ | |||
+ | ^ bits matched per level ^ check ^ match ^ conn/s ^ pkt/s ^ graph ^ | ||
+ | | 2 | 39 | 11 | 560k | 3.9M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/11-fw-sourcetree-2/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/11-fw-sourcetree-2/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | 3 | 51 | 8 | 595k | 4.2M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/11-fw-sourcetree-3/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/11-fw-sourcetree-3/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | 4 | 73 | 6 | 580k | 4.0M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/11-fw-sourcetree-4/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/11-fw-sourcetree-4/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | | 5 | 113 | 5 | 575k | 4.0M | [[http://www.hagtheil.net/files/system/benches10gbps/firewall/11-fw-sourcetree-5/rempart-firewall/interfaces_eth1_bps.png|bps]] [[http://www.hagtheil.net/files/system/benches10gbps/firewall/11-fw-sourcetree-5/rempart-firewall/interfaces_eth1_pkt.png|pkt]] | | ||
+ | |||
+ | Note: such high number of rules uses memory. Like 20GB+ of ram used. | ||
+ | |||
+ | ===== nat ===== | ||
+ | |||
+ | Earlier, we already noticed that conntracking all our connections would | ||
+ | be too much. What if we can have a main 1:1 mapping that would not | ||
+ | require any tracking ? | ||
+ | |||
+ | Well, iptables NOTRACK prevent any form of nat, so that can't be done... | ||
+ | |||
+ | Will have to seek for other solutions. | ||
+ | |||
+ | ===== ipset ===== | ||
+ | |||
+ | Some people mentionned ipset. Lets bench that. | ||
+ | |||
+ | <code> | ||
+ | # lets create some sets we might use | ||
+ | ipset create ip hash:ip | ||
+ | ipset create net hash:net | ||
+ | ipset create ip,port hash:ip,port | ||
+ | ipset create net,port hash:net,port | ||
+ | </code> | ||
+ | |||
+ | Rules used for different tests : | ||
+ | <code> | ||
+ | -A FORWARD -m set --match-set ip src | ||
+ | -A FORWARD -m set --match-set net src | ||
+ | -A FORWARD -m set --match-set net,port src,src | ||
+ | -A FORWARD -m set --match-set ip,port src,dst | ||
+ | </code> | ||
+ | |||
+ | Lets see how a few match for hash:ip affects our traffic : | ||
+ | |||
+ | ^ # rules ^ conn/s ^ pkt/s ^ | ||
+ | | 1 | 570k | 3.6M | | ||
+ | | 2 | 340k | 2.05M | | ||
+ | | 3 | 240k | 1.45M | | ||
+ | | 4 | 184k | 1.1M | | ||
+ | |||
+ | Ok, so just a few ipset match affects us ALOT. What about other hashes ? | ||
+ | |||
+ | (tests done with 2 matches) | ||
+ | |||
+ | ^ ipset ^ conn/s ^ pkt/s ^ | ||
+ | | hash:ip | 340k | 2.05M | | ||
+ | | hash:net | 350k | 2.1M | | ||
+ | | hash:ip,port | 330k | 2M | | ||
+ | | hash:net,port | 330k | 2M | | ||
+ | |||
+ | Net or ip doesn't change much, and including the port is only a light overhead, | ||
+ | considering the overhead we already have. | ||
+ | |||
+ | What about ipset bitmasks ? | ||
+ | |||
+ | <code> | ||
+ | ipset create bip0 bitmap:ip range 10.136.0.0-10.136.255.255 | ||
+ | ipset create bip1 bitmap:ip range 10.140.0.0-10.140.255.255 | ||
+ | </code> | ||
+ | |||
+ | ^ # rules ^ conn/s ^ pkt/s ^ | ||
+ | | 2 | 550k | 3.5M | | ||
+ | | 4 | 320k | 1.9M | | ||
+ | |||
+ | |||
+ | Considering ipset is limited to 65k entries, and the results, I would advise | ||
+ | against using it, unless you really need the easy to manage set. | ||
+ | |||
+ | |||
+ | ===== interface irq affinity ===== | ||
+ | |||
+ | FIXME: add irq affinity matches with results | ||
+ | |||
+ | ====== Conclusion ====== | ||
+ | |||
+ | * Alot of matching reduce performances. | ||
+ | * u32 are costly | ||
+ | * if you can, try to match and segregate to different subchains, with like 8 to 16 match per chain (for src/dst match, maybe less with heavier match) | ||
+ | * irq affinity can change performances on high loads | ||
+ | |||