HSG5: HSG high availability (HA)
This sample scenario focuses on HSG high availability (HA) deployment, to provide seamless failover in case of hardware failure. This is particularly important for on-premise deployment for large venues to ensure maximum service up-time. NOTE: we have a separate topic on Internet link bonding and failover.
Common use cases
- Big Hotels
- Big shopping malls
- Large tourism places
- Airports, stadiums, etc.
- Connect both HSG eth0 (WAN) to Internet routers (or switches, or link balancers, depending on your WAN setup).
- Connect both HSG eth1 (LAN) to LAN switch
- Connect HSG eth2 to management PC (configure PC with DHCP, then connect to mbox GUI using http://10.10.10.1, login with mboxadmin/Letthem0ut7&)
- HSG eth3 is reserved for private LAN. It's pre-configured to issue DHCP IP.
- Connect AP to LAN PoE switch
- use default VLAN1 as management VLAN for AP/WLC.
- AP will be getting DHCP IP from HSG from network 192.168.8.0/22
- Reserved IP for WLC or other device, range from 192.168.8.1 to 192.168.8.49
- add all VLANs on switch (VLAN10, 20, 30, 40, 50), configure all switch-ports to be in trunk mode, and permit all VLANs for each port (default)
- configure AP to broadcast desired SSID and assign each SSID to pre-configured VLAN
- HSG default pre-configured VLANs are VLAN10, 20, 30, 40 Auto roaming between VLANs are enabled.
- please refer to respective vendor doc for configuring AP to map SSID to VLAN
- use default VLAN1 as management VLAN for AP/WLC.
3-Step deployment from sample config
- upgrade your box to firmware version 20190606-1500 and later (follow this guide to upgrade firmware)
- download sample config for HSG-1 (primary) and HSG-2 (secondary).
- follow this video guide to deploy HSG by restoring from sample config
- follow this video guide to customize landing page and login options and create schedule reports.
NOTE: please make sure the portal name remains as "portal". If you are making your own config version (eg. due to different VLAN and IP settings, it's recommended to restore the HSG5-1 and modify from there. After fully tested the new config on primary (eg. HSG-1), you can backup HSG-1 config and restore to HSG-2, and make min changes there, eg. change some interface IP and VRRP priority settings only).
Sample config default settings
- the eth0(WAN) port is pre-configured with static IP addresses here. If you need to change interface IP/route, please follow this guide.
- In order to enable auto-roaming between VLANs, seamless relogin is enabled for 1 day. (see details on seamless relogin).
- syslog server (user access logging) is enabled to collect DNS access logs and storing data up to last 5 days (see more details on DNS logging)
- user access records are stored up to last 90 days
- user info (username and profile data) is kept unlimited
- monthly auto backup is configured, keeping the last 3 backup files (see details on backup & restore)
HSG runs hotspot services, stores user accounts, keeps hotspot sessions and user access records. So we need to make both HSG units are "identical", so that the failover is "stateful" and seamless to existing connected users, and all the databases and records are synced and maintained.
To achieve HSG high availability (HA) with "stateful" failover, we make use of below key features on HSG.
- VRRP. The VRRP detects hardware "active/standby" status, and helps to route user traffic to the alive (active) unit.
- Provision a VLAN (on eth1) between the HA units for failover detection and data sync
- Configure VRRP groups on all participating VLANs, set respective priorities for MASTER and SLAVE units
- VRRP will attach a virtual IP (VIP) to the active unit VLAN interface, and create a VRRP VIP host route on the MASTER/active unit.
- Configure interface tracking (usually track WAN interface) for VRRP to auto switch between gateways in case of failures.
- HotSpot tracking. The hotspot tracking determines which unit should activate its hotspot service by checking its VRRP VIP host route. Since HSG can run multiple hotspot instances for multiple VLANs, we can provision one group of VLANs/hotspot instances active on primary unit, and another group of VLANs active on secondary unit, therefore achieving load sharing/active-active HA deployment. NOTE: within each VLAN instance, the two units is still in active/standby mode.
- Both primary and secondary units should have identical hotspot config (eg. same set of "security hotspot xx" configs). Refer to page bottom to review config differences.
- Don't enable dhcp-server on the hotspot VLAN interface (hotspot engine is already a DHCP server, or you may relay to upstream DHCP server).
- Don't configure IP address for hotspot VLAN interfaces, instead, configure "hotspot-server x.x.x.x...." under each hotspot instance config.
- Configure "client-networks x.x.x.x" for client IP subnet, and optionally configure "client-dhcp xxx" scope.
- Configure "client-static x.x.x.x". because client may get IP from active unit and after failover to standby unit, client device still holds the same IP given by previous active unit, so to the "standby" (now active) unit, the client is having IP from external DHCP server and this command authorizes these IPs.
- On active unit, tracking will auto activate hotspot service for its VLAN, and the configured "hotspot-server IP" will become the default gateway for clients and answer to client dhcp requests. So user traffic will pass through active unit for this VLAN.
- On standby unit, tracking will turn off hotspot service for its VLAN. So the standby unit will not answer to any client DHCP requests and therefore not passing traffic.
- In case of failover, standby unit takes over VRRP VIP (therefore holds VIP host route) and hotspot tracking will activate hotspot service for standby unit, create tunnel interface taking over "hotspot-server IP", so that client traffic will be routed to standby unit (now become active).
- HotSpot database sync. This feature syncs user account databases, historical records and active "sticky" sessions (for seamless failover) from the primary to secondary unit, so that both units are identical and user experience is seamless during failover.
- each box points to VRRP-HA-VIP as primary unit. We rely on VRRP to determine which unit is primary (the unit holding VRRP-HA-VIP)
- we use a dedicated hotspot sync username mboxbackuprestore, assign it to DEFAULT entity and set your own password.
- General notes. If you're issuing DHCP for AP management IP on eth1, please split DHCP scopes between HSGs because both units DHCP services are active.
- Special notes for running Active-Active scenarios
- configure multiple VRRP groups with multiple VIP, split between two gateways, so that we can load share each VLAN traffic to their respective gateways, eg. in our sample config, we make VLAN10 & 30 MASTER/active on primary unit, and vlan20 & 40 MASTER/active on secondary unit
- configure one special VLAN with its unique HA VRRP group for HA sync purpose (make VIP on primary) between the two gateways
- on both primary and secondary units, configure splash.ransnet.com to point to the sync VRRP VIP
- when primary gateway is UP, it acts as the RADIUS & portal server for both primary and secondary unit
- when primary is DOWN, the sync VRRP VIP becomes owned by secondary unit, therefore secondary unit will point to its local RADIUS and local portal
- Set "no preempt" for HA VRRP group, so that when the failed primary unit comes back online, it will try to sync with the secondary (active) unit instead of immediately trying to resume to be primary. NOTE: we will do "preempt" (default) for other vrrp groups so that when both units are up, the traffic can be load shared across both units (active/active)
- For each hotspot, enable "client-local-access yes", this is because, by default, client traffic can only go out via WAN/eth0, but when primary is alive, the portal is on primary unit, and clients connected to secondary need to browse to portal (hosted on primary) via HA VLAN. So this command permits that access.
- on each unit, add RADIUS client using VRRP VIP and peer IP, so that the RADIUS server permits peer to use its local radius server during active/active or failover state.
- In active-active scenario, if primary unit fails, since all user database and sticky sessions are synced to secondary unit, the existing users connected to primary unit will seamless reconnect to secondary unit; however if secondary unit fails, since the "sticky" sessions on secondary unit are not synced to primary, existing users connected to secondary unit will have to relogin when they reconnect to primary unit. If this is not desirable, you may want to just run active/standby mode only, so that sticky sessions are always on the primary/active unit and failover is completely seamless.
It's very important to note that there's mini config difference between primary and secondary unit. You may refer to the CLI comparison below to see the differences. The complete CLI config can be downloaded here (HSG5-1 and HSG-2) (or if you refer to step #3 to restore from the complete backup, you will have the CLI config too).