WAN resiliency is one of the most important considerations for any critical enterprise network. There are several configuration options available to achieve the final objective. We will use below simple topology to elaborate the options.
This option utilizes the kernel routing table (default route learnt from ISP DHCP) to perform failover. This is the default failover behavior for RansNet branch routers. There's no special configuration required.
By default, the routers will auto-assign route-metric to each interface at bootup, in order of bootup sequence. In other words, the CLI interface loaded first (at the top) will have a lower metric, eg. eth0 will be lower than wwan0, and wwan0 is lower than wwan1, etc. The route with lower metric will be the preferred route (therefore the interface for this preferred route will be the primary link).
Once the kernel detects the primary link is down (eg. eth0 loses connection), it will immediately withdraw all routes associated with the down link/interface. Then the second kernel default route (eg. using wwan0) will kick in and traffic will failover to route out from wwan0.
In case you need to change the default route metric (eg. for some reason, you may want wwan0 to be the preferred/primary link and eth0 be the backup), you can configure to set the lower route-metric for wwan0.
CLI configuration for changing interface route-metric.
!
interface eth0
enable
route-metric 21
!
interface wwan0
enable
route-metric 20
!
The output of default route [can use the command ‘# show ip route‘] as shown below, wwan0 has a lower route metric and will be the preferred next-hop, therefore traffic will primarily route out from wwan0 (SIM).
K>* 0.0.0.0/0 [0/20] via 10.64.189.185, wwan0, src 10.64.189.184, 02:21:25
K>* 0.0.0.0/0 [0/21] via 10.65.10.164, eth0, src 10.65.10.165, 02:21:25
NOTE
The advantage of this option is that the failover is very fast, especially works well for dual SIM active/active scenario. Because if the primary SIM1 connection is gone, the wwan0 interface will tear down (therefore the kernel route will withdraw) and failover happens immediately to wwan1.
The disadvantage is that it does not detect upstream availability. For example, if the primary link is fiber broadband and link fails but the connection to HSA/UA is still available (eth0 is still UP), the default kernel route (via eth0) still remains. No failover will occur and traffic will be routed out from eth0 (and get dropped).
In this option, we can do upstream tracking to determine if the end-to-end connection for primary link is indeed up. It works particularly well if the primary link is a fiber/ethernet connection to an external modem/ONT.
The solution is simple
Use policy-based route to define primary path and track the path availability. The PBR precedes any default routes regardless of kernel or static default route
If the tracking fails (primary path failure), the PBR route will disappear and the backup link’s default route will take place.
Configuration example
!
hostname HSA-520
!
interface eth0
description "Connection to WAN"
enable
ip address dhcp nodefault (or static IP)
!
interface eth1
description "DO NOT configure"
enable
!
interface wwan0
enable
!
interface vlan 1 1
description "Default VLAN for all LAN ports"
enable
ip address 192.168.8.1/22
dhcp-server
router 192.168.8.1
dns 8.8.8.8 8.8.4.4
range 192.168.8.10 192.168.11.254
enable
!
ip name-server 8.8.8.8 8.8.4.4
!
ip pbr policy 100 src 192.168.8.0/22 remark LAN
!
ip pbr 100 nexthop 192.168.98.1 track icmp 1.1.1.1 15
!
firewall-access 100 permit outbound eth0
firewall-access 101 permit outbound wwan+
!
firewall-snat 100 overload outbound eth0
firewall-snat 101 overload outbound wwan+
NOTE
The PBR nexthop must use the WAN gateway IP. Do not configure eth0 as the nexthop.
This option works well if the primary link uses static IP, or dynamic IP but with fixed gateway IP. Some ISPs who offer dynamic WAN links often changes the gateway IP also, then this option can not be used (for such case, it's best to just use option #1, with a caveat that upstream can't be tracked; or use option #3 below).
If you have multiple LAN networks, you may include all the LAN networks in the PBR src config (comma seperated), eg
ip pbr policy 100 src 192.168.8.0/22,10.10.10.0/24
Multi-WAN link balancing is a more advanced traffic steering approach to provide link aggregation and failover between links. MWAN also uses the ping approach to detect upstream end-to-end link availability. It combines both routing metric and ping tracking to make failover decisions.
Configuration example
!
interface eth0
description "ISP1 connection via fixed line"
enable
ip address dhcp
mwan-group 99
track 8.8.8.8 timer 5 5
metric 1
weight 1
!
interface wwan0
description "ISP2 connection via LTE"
enable
mwan-group 99
track 8.8.4.4 timer 10 10
metric 2
weight 1
!
mwan-rule 99 ip dst 0.0.0.0/0 group 99
In the above configuration example:
If eth0 is disconnected, HSA/UA will immediately withdraw the default kernel route for eth0 and failover to wwan0, so the failover is fast for such a situation, typically in 2-3 seconds.
If eth0 upstream link is down, eg. eth0 is connected but the upstream link is down (tracking to 8.8.8.8 fails). After max of 25s (5s x 5 attempts), the tracking will declare eth0 is unusable and failover to wwan0, so the failover will be slower for this situation.
Fallback to eth0 will be determined by tracking confirmation (5s x 5 attempts), so fallback is also slower.
NOTE
To speed up the failover/fallback time, you can set shorter intervals and lesser attempts, however, for slow and unreliable links (especially mobile/LTE links), it’s not recommended to set too short as this may cause flapping (false failover).
If you want to do link balancing (eg. active/active for both links), just set same metric for both links (optionally with different weights based on their respective bandwidth), so that we can aggregate total upstream links capacity.
There's significant performance impact for this option. If you just want a simple active/standby setup, option #1 or 2 is more recommended.