HA Proxy

Doc

Install & enable

sudo apt install haproxy
sudo systemctl start haproxy
sudo systemctl enable haproxy

config

  • /etc/haproxy/haproxy.cfg
  • Sections ref
    • global, defaults, frontend, backend
global
# global settings here
defaults
# defaults here
frontend
# a frontend that accepts requests from clients
backend
# servers that fulfill the requests

global section

  • global section are process-wide and often OS-specific.
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
maxconn 4000 # maximum no of concurrent connections
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
  • maxconn parameter specifies the maximum number of concurrent connections
  • user name and group name for which the haproxy process belongs.
  • daemon parameter specifies that haproxy runs as a background process

Defaults section

defaults
log global # use global settings
mode http # http (ideal for load balancing web servers, can use algorithms for URL parameter)/ tcp
option httplog # enables http logging for values like http sessions, source address, timers..etc
option dontlognull # disables logging for null connections (where data is not transferred like open port scanning)
option forwardfor except 127.0.0.0/8 # else target webserver doesn't receive original client ip address
retries 3 # connection retry attempts when it's refused or timeout
timeout connect 5s # period to wait for successful connection
timeout client 50s #period of wait activity for the client to send data else timeout (Normally keep client & server same value)
timeout server 50s #period of wait activity for the server to send data else close connection (this prevents deadlock)
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 500 /etc/haproxy/errors/500.http
  • TCP works in Lower Layers (Networking concepts and OSI model), Http data is also carried by TCP protocol but HTTP has more information about the http request

Note :

  • respective forwarded for must be configured in web server log format (eg: %{X-Forwarded-For}i in apache log format)
  • Nginx check rocket chat setting (it has XForward header)

Frontend section (with SSL)

  • defines the IP address and port on which the proxy listens
frontend www.mysite.com # or any name like main
bind 10.0.0.3:80 # or *:80
bind 10.0.0.3:443 ssl crt /etc/ssl/certs/mysite.pem
http-request redirect scheme https unless { ssl_fc }
use_backend api_servers if { path_beg /api/ }
default_backend web_servers

style 2 (https)

frontend www-https
bind *:443 ssl crt /etc/haproxy/mydomain.combined.pem
reqadd X-Forwarded-Proto:\ https
default_backend app

Enable stats in frontend

  • Add the following in front end section
stats enable
stats show-legends
stats auth admin:password
stats hide-version
stats show-node
stats refresh 60s
stats uri /haproxy?stats

stats

Backend section

  • The back-end system is a pool of real servers, and defines the load balancing (Scheduling) algorithms

sample 1 (roundrobin with check interval)

roundrobin

  • Distributes each request sequentially around the pool of real servers
  • All the real servers are treated as equals without regard to capacity or load
  • (Disadvantage) Different requests takes different time, so load may increase on some servers if equally distributed
backend web_servers
balance roundrobin
server app1 10.0.1.3:80 check maxconn 50
server app2 10.0.1.4:80 check
server app3 10.0.1.5:80 check inter 2s rise 4 fall 3
server app4 10.0.1.6:80 backup
  • 'check' for health (availability)
  • inter = 2seconds is health check interval
  • rise = 4, no.of.consecutive health checkups before considering the server is UP (so 4 times with 2 seconds interval = 8s)
  • fall = 3, no.of.consecutive health checkups before considering the server is DOWN

sample 2 (url_param)

URL Parameter

  • This static algorithm can only be used on an HTTP backend
  • Value of Specified URL parameter in the query string is hashed and divided by the total weight of running servers
  • If the parameter is missing from the URL, the scheduler defaults to Round-robin scheduling
backend web_servers
balance url_param userid
server app1 10.0.1.3:80 check
server app2 10.0.1.4:80 check

So algorithm hashes userid param & decides automatically which server to go always

sample 3 (hdr)

  • Distributes requests to servers by checking a particular header name in each source HTTP request and performing a hash calculation divided by the weight of all running servers.
  • If the header is absent, the scheduler defaults to Round-robin scheduling.
backend web_servers
balance hdr(User-Agent)
server app1 10.0.1.3:80 check
server app2 10.0.1.4:80 check
  • say request from firefox always goes to app1
  • say request from chrome always goes to app2

sample 4 (first)

first

  • The first server with available connection slots receives the connection. Once a server reaches its maxconn value, the next server is used
backend web_servers
balance first
server app1 10.0.1.3:80 check maxconn 1000
server app2 10.0.1.4:80 check
  • So once app1 reaches it's limit (1000 connections here), then requests goes to app2
  • Useful in cloud scenario where extra server is required/ used on demand only when server1 is full occupied

sample 5 (source)

source

  • The same client IP always reaches the same server as long as the server is up
  • It is generally used in TCP mode where cookies cannot be inserted
backend web_servers
balance source
server app1 10.0.1.3:80 check
server app2 10.0.1.4:80 check
  • Requests from same client always goes to same server (a kind of source ip hashing)

sample 6 (leastconn)

leastconn :

  • All the real servers are treated as equals without regard to capacity or load
  • ideal for an environment where a group of servers have different capacities, request process time varies..etc
  • Can use Weights
backend web_servers
balance leastconn
server app1 10.0.1.3:80 check
server app2 10.0.1.4:80 check
  • As the name specifies, server with least connections get the new request

URI

This algorithm hashes either the left part of the URI (before the question mark) or the whole URI . This ensures that the same URI will always be directed to the same server.

  • This is used with proxy caches and anti-virus proxies in order to maximize the cache hit rate. (Http backend)

Access Control List

  • blog ref
  • doc ref
  • to make decision based on request/ response/environmental status
  • syntax
    acl {name} {criteria} {flags} {operators} {pattern}
  • We can create some named conditions & can load some backend based on condition expression

sample 1 (checking header)

acl host_1 hdr(host) -i mydomain.com

sample 2

acl url_static path_beg /static /images /img /css
acl url_static path_end .gif .png .jpg .css .js
acl host_www hdr_beg(host) -i www
acl host_static hdr_beg(host) -i img. video. download. ftp.
# now use backend "static" for all static-only hosts, and for static urls
# of host "www". Use backend "www" for the rest.
use_backend static if host_static or host_www url_static
use_backend www if host_www
  • Multiple conditions can have same name, so any one conditions satisfies it becomes true (like above image url_static)

Sticky session

  • HTTP is not a connected protocol: it means that the session is totally independent from the TCP connections. Session information is saved on the Web server. So for sticky session cookies can be used
  • Alternate is to use shared storage for session or save sessions in other database or use state less (REST) api!

blog ref

Insert new cookie only if doesn't exists

backend web_servers
balance roundrobin
cookie SERVERUSED insert indirect nocache
server server1 10.0.1.3:80 cookie s01 check
server server2 10.0.1.4:80 cookie s02 check
  • For new requests if specified (SERVERUSED here) cookie doesn't exists then it's created with respective value (s01/ s02) based on server picked by algorithm
  • For requests with existing cookie, cookie value will be compared with config & matching server will be picked
  • nocache adds "Cache-Control: nocache" header so that shared cache on the internet doesn't cache it
  • Disadvantage : Client will be redirected to same server even if no real session is created in end application code, so better use existing cookie managed (created/ deleted) by application

Existing cookie

backend bk_web
balance roundrobin
cookie JSESSIONID prefix nocache
server server1 10.0.1.3:80 cookie s01 check
server server2 10.0.1.4:80 cookie s02 check
  • so prefix (if cookie exists) like JSESSIONID=s01~i12KJF23JKJ1EKJ21213KJ while request coming from back end & removes it while going to backend. So for backend it's same unchanged cookie & value
  • So sticky if only session cookie exists else roundrobin

Note

  • If both haproxy & webserver are on same machine, web server should listen to different port not 80

More ref

Full config

global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
maxconn 4000 # added
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
option forwardfor # added
retries 3 # added
timeout connect 5s # modified from default ms to s
timeout client 50s # modified from default ms to s
timeout server 50s # modified from default ms to s
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 500 /etc/haproxy/errors/500.http
frontend www.mysite.com
bind *:80
bind *:443 ssl crt /etc/ssl/certs/mysite.pem
http-request redirect scheme https unless { ssl_fc }
use_backend sticky_java_servers
stats enable
stats show-legends
stats auth admin:password
stats hide-version
stats show-node
stats refresh 60s
stats uri /haproxy?stats
backend node_servers
balance leastconn
server app1 10.0.1.3:8081 check inter 2s rise 6 fall 3
server app2 10.0.1.4:8081 check inter 2s rise 6 fall 3
# server app2 10.0.1.3:8081 check maxconn 1500
backend sticky_java_servers
balance leastconn
cookie JSESSIONID prefix nocache
server app1 10.0.1.3:8080 cookie s01 check inter 2s rise 6 fall 3
server app2 10.0.1.4:8080 cookie s02 check inter 2s rise 6 fall 3
# server app2 10.0.1.3:8080 check maxconn 1500

Temp

systemctl disable firewalld
  • /etc/selinux/config
SELINUX=disabled

Theory

Why load balancing ?

  • Software/ Hardware has maximum capacity, connections cannot be processed more than that capacity . So to increase capacity of our service load balancing is needed.

Solutions

  • using DNS
    • But has issues like DNS Cache, if one server fails? ..etc
  • using load balancer

HAProxy

It is a

  • TCP proxy : It can accept a TCP connection from a listening socket, connect to a server and attach these sockets together allowing traffic to flow in both directions

tcp-proxy

  • HTTP reverse-proxy: It presents itself as server that sits in front of web servers and forwards client (e.g. web browser) requests to those web servers. Reverse proxies are typically implemented to help increase security (say hiding ports)

  • SSL terminator: Set up SSL/ TLS connection (say https), If our web servers have HTTPS enabled, the HAProxy will appear a hacker making Man-In-The-Middle Attack, so the SSL certificates must be defined on your HAPROXY system

  • TCP normalizer: abnormal traffic such as invalid packets or incomplete connections (SYN floods) can be dropped

  • HTTP normalizer: Can be configured to process only valid/complete http requests. (This protects against a lot of protocol-based attacks).

  • Server load balancer: Load balance TCP whole connections or per HTTP requests. Server health checks, statistics/ monitoring.

  • Traffic Regulator: some rate limiting at various points, protect the servers against overloading, adjust traffic priorities based on the contents, and even pass such information to lower layers and outer network components by marking packets

IT is Fast, reliable & open source project

HA PROXY Keepalived

HA PROXY

NGINX

SPOF

  • Single point of failure
  • Use keepalived/ heartbeat/ pacemaker ..etc