WAN failover ensures an enterprise site stays connected to the Internet or central network when a primary uplink fails. SD-WAN enhances this by combining multiple links (fiber, broadband, 4G/5G) and dynamically steering traffic based on link status, health, and business policies.
We will use below simple topology to elaborate the options.
This option utilizes the kernel routing table (default route learnt from ISP DHCP) to perform failover. This is the default failover behavior for RansNet branch routers. There's no special configuration required.
By default, the routers will auto-assign route-metric to each interface at bootup, in order of bootup sequence. In other words, the CLI interface loaded first (at the top) will have a lower metric, eg. eth0 will be lower than wwan0, and wwan0 is lower than wwan1, etc. The route with lower metric will be the preferred route (therefore the interface for this preferred route will be the primary link).
Once the kernel detects the primary link is down (eg. eth0 loses connection), it will immediately withdraw all routes associated with the down link/interface. Then the second kernel default route (eg. using wwan0) will kick in and traffic will failover to route out from wwan0.
In case you need to change the default route metric (eg. for some reason, you may want wwan0 to be the preferred/primary link and eth0 be the backup), you can configure to set the lower route-metric for wwan0.
CLI configuration example
!
interface eth0
enable
route-metric 21
!
interface wwan0
enable
route-metric 20
!
The output of default route [can use the command ‘# show ip route‘] as shown below, wwan0 has a lower route metric and will be the preferred next-hop, therefore traffic will primarily route out from wwan0 (SIM).
K>* 0.0.0.0/0 [0/20] via 10.64.189.185, wwan0, src 10.64.189.184, 02:21:25
K>* 0.0.0.0/0 [0/21] via 10.65.10.164, eth0, src 10.65.10.165, 02:21:25
NOTE
The advantage of this option is that the failover is very fast, especially works well for dual SIM active/active scenario. Because if the primary SIM1 connection is gone, the wwan0 interface will tear down (therefore the kernel route will withdraw) and failover happens immediately to wwan1.
The disadvantage is that it does not detect upstream availability. For example, if the primary link is fiber broadband and link fails but the connection to HSA/UA is still available (eth0 is still UP), the default kernel route (via eth0) still remains. No failover will occur and traffic will be routed out from eth0 (and get dropped).
The upstream host tracking approach uses PBR with SLA/ICMP tracking to verify end-to-end path reachability (not just interface up/down). The PBR route (when primary link is up) precedes the backup link default route, and the PBR rule is removed when tracking fails, allowing the backup default routes to take effect. It works particularly well if the primary link is a fiber/ethernet connection to an external modem/ONT.
The solution is simple
Use policy-based route to define primary path and track the path availability (you may add advanced setting to track link SLA).
The PBR precedes any backup default routes (regardless of kernel or static default route).
If the tracking fails (primary path failure), the PBR route will disappear and the backup link’s default route will take place.
Step 1: Configure the WAN Interface with static IP; or if it's using DHCP, select "ignore default route" option.
Step 2: Configure PBR to track upstream connectivity by pinging an upstream IP (SD-WAN -> Traffic Steering).
CLI Configuration example
!
hostname HSA-520
!
interface eth0
description "Connection to WAN"
enable
ip address dhcp nodefault (or static IP)
!
interface eth1
description "DO NOT configure"
enable
!
interface wwan0
enable
!
interface vlan 1 1
description "Default VLAN for all LAN ports"
enable
ip address 192.168.8.1/22
dhcp-server
router 192.168.8.1
dns 8.8.8.8 8.8.4.4
range 192.168.8.10 192.168.11.254
enable
!
ip name-server 8.8.8.8 8.8.4.4
!
ip pbr policy 100 src 192.168.8.0/22 remark LAN
!
ip pbr 100 nexthop 192.168.98.1 track icmp 1.1.1.1 15
!
firewall-access 100 permit outbound eth0
firewall-access 101 permit outbound wwan+
!
firewall-snat 100 overload outbound eth0
firewall-snat 101 overload outbound wwan+
NOTE
The PBR nexthop must use the WAN gateway IP. Do not configure eth0 as the nexthop.
This option works well if the primary link uses static IP, or dynamic IP but with fixed gateway IP. Some ISPs who offer dynamic WAN links often changes the gateway IP also, then this option can not be used (for such case, it's best to just use option #1, with a caveat that upstream can't be tracked; or use option #3 below).
If you have multiple LAN networks, you may include all the LAN networks in the PBR src config (comma seperated), eg
ip pbr policy 100 src 192.168.8.0/22,10.10.10.0/24
Multi-WAN link balancing is a more advanced traffic steering approach to provide link aggregation and failover between links. MWAN also uses the ping approach to detect upstream end-to-end link availability (not just interface only detection.
MWAN can operate in either/both active/standby or active/active (link balancing) modes, and it combines both routing metric, weights and ping tracking to make failover and link balancing decisions.
Active/Standby: Assign lower metric to active link and high metric to standby link
Active/Active: Assign equal metric to both links, and optionally use weights to spread traffic over multiple links, increasing aggregate throughput.
CLI Configuration example (Actie/Standby)
!
interface eth0
description "ISP1 connection via fixed line"
enable
ip address dhcp
mwan-group 99
track 8.8.8.8 timer 5 5
metric 1
weight 1
!
interface wwan0
description "ISP2 connection via LTE"
enable
mwan-group 99
track 8.8.4.4 timer 10 10
metric 2
weight 1
!
mwan-rule 99 ip dst 0.0.0.0/0 group 99
In above configuration example:
If eth0 is disconnected, HSA/UA will immediately withdraw the default kernel route for eth0 and failover to wwan0, so the failover is fast for such a situation, typically in 2-3 seconds.
If eth0 upstream link is down, eg. eth0 is connected but the upstream link is down (tracking to 8.8.8.8 fails). After max of 25s (5s x 5 attempts), the tracking will declare eth0 is unusable and failover to wwan0, so the failover will be slower for this situation.
Fallback to eth0 will be determined by tracking confirmation (5s x 5 attempts), so fallback is also slower.
NOTE
To speed up the failover/fallback time, you can set shorter intervals and lesser attempts, however, for slow and unreliable links (especially mobile/LTE links), it’s not recommended to set too short as this may cause flapping (false failover).
If you want to do link balancing (eg. active/active for both links), just set same metric for both links (optionally with different weights based on their respective bandwidth), so that we can aggregate total upstream links capacity.
There's significant performance impact for this option. If you just want a simple active/standby setup, option #1 or 2 is more recommended.