SD-WAN over SSLVPN bonding (OSPF)

VPN "bonding" is part of our SD-WAN deployment technique for connecting multiple remote sites to HQ/DC securely, over redundant WAN connections at remote sites. It uses our CMG at hub end as VPN concentrator, and HSA at each remote end.

There're three main options for achieving VPN "bonding", depending on the exact requirements.

  1. Use Multi-WAN (MWAN) to achieve WAN link redudancy, then build VPN tunnel across MWAN, with link redundancy. (see more details on MWAN, and SSLVPN).
    • This approach achieves link redundancy, but because it still builds a single VPN tunnel only, so the failover can be quite long, since VPN tunnel is persistent and can't be "balanced" across multiple links at the same time.
    • So at any point of time, the VPN tunnel can only use one of the active link, and after MWAN detects link failover, VPN tunnel needs to re-establish VPN tunnel across the failover/new link. The total failover delay = MWAN link detection delay + VPN re-establishment delay.
    • This option is more suitable for deployments with small amount of remote sites, where each site network routes can be learnt from SSLVPN configuration (eg. from client net command).
    • NOTE: MWAN doesn't work well with dynamic routing, because each time when routes are learnt dynamically (via OSPF or BGP), MWAN is unaware of the newly learnt routes and will not deny traffic passing to new routes. For large deployments, we recommend to use option #2 below.
  2. Use OSPF load balancing/failover across dual/multiple VPN tunnels. This feature combines our Multi-VPN tunneling and dynamic routing capabilities.
    • We build dual/multiple tunnels, one VPN tunnel per WAN link, then run OSPF dynamic routing protocols across the tunnels.
    • use OSPF to dynamically learn routes for each remote sites, load balance traffic across multiple paths (VPN tunnels), and auto failover between paths/tunnels.
    • To switch between active/active or active/standby mode, we simply tweak OSPF link costs (eg. set "tap ospf cost xx" higher for backup tap).
    • This design is the most scalable and recommended for large deployment.
    • Another advantage of this approach is that the VPN gateways can be separated on two physical CMG, for gateway redundancy
    • We use PBR to map each tunnel/tap traffic to the respective physical/LTE interfaces
  3. Use layer-2 LACP protocol to "bond" multiple VPN tunnels. This features utilizes our Multi-VPN tunneling and Layer-2 bonding capabilities.
    • The HSA builds dual/multiple tunnels to CMG, one VPN tunnel per WAN link, then use LACP to "bond" these tunnels as one logical link.
    • Traffic will be load balanced across the tunnels as if we have a logical link with aggregated bandwidth.
    • But do note that if any of the link/tunnel has lower speed (or inconsistent performance), it will impact overall bonding link performance.
    • And unlike #2 approach, where the hub end SSLVPN server can have one VPN instance to terminate many remote tunnels, this approach requires dedicated VPN instance per remote end, because LACP bonding is "point-to-point". So if you have many remote sites/tunnels, you need to run many VPN instances on the server.
    • NOTE: unlike #2 approach, LACP bonding requires both VPN tunnels to terminate on the same VPN gateway (CMG).
    • This approach is good for smaller deployments, requiring large throughput (aggregated bandwidth) between sites, and the WAN link performance are consistent and identical.

In this section, we focus on VPN bonding using OSPF, for large deployment scenario. We will have a separate topic on VPN bonding with LACP.

In this design, we're using HSA with dual LTE/SIM to provide multiple WAN connections to tunnel to the hub CMG. Then build VPN tunnel across each LTE connection. But in real live deployment, we can also have different WAN connections (eg. MPLS, Fiber, PPPoE) to the HSA WAN port.

A few key points to NOTE:

  1. SSLVPN must run in tap mode (layer 2 tunnel) to support OSPF
  2. On CMG (SSLVPN server)
    • configure two VPN server instances (if both tunnel sharing the same gateway), under "security sslvpn-server x" configure unique port number for each instance, so that remote client (HSA) will import two profiles and built separate tunnels to each instance.
    • when configuring "server address xxx", it's recommended to use DNS name instead of public IP, so that client profile uses DNS name to connect to SSLVPN server. This allows potential change of server public IP without clients re-importing VPN profiles.
    • assign tunnel-pool for the tap interfaces, to advertise OSPF routes
    • Use "tap ospf priority 255" to make sure OSPF must ALWAYS be in DR state, so that it can receive/push routes from/to all remote ends.
    • Use "tap ospf cost xx" to tweak tap cost for each tunnel if you want to run active/standby taps, otherwise both taps will have ospf cost of 10 and load sharing (load balance) traffic across both taps. NOTE: If you are changing "tap ospf cost xx", please do it on both ends, both CMG and HSA, for the same VPN instance, to avoid asymmetric routing problems.
    • configure firewall-input and firewall-access rules to permit VPN tunnel and internal traffic to pass through
  3. On HSA
    • Set OSPF priority to 0 to make sure OSPF must NOT be DR state.
    • Use GUI (or mfusion) to configure tap interfaces, and put them into the correct "lan" firewall zone
    • Use PBR to map each tunnel to the respective WAN interfaces, eg. lte0, lte1, WAN/DSL/MPLS, etc
      1. we must not have default route in HSA, otherwise, if the PBR fails (eg. mapped tunnel physical interface down), the tunnel will try to go through default route to form tunnel, so the tunnel will still remain (but go through the default route, which can be routing through other interfaces), and the failover doesn't really happen.
      2. So, If any WAN interface is getting DHCP from ISP (especially LTE/DSL), we must configure HSA not to receive default route from ISP. Go to GUI, Network --> Interfaces, "Edit" target interface and go to "Advanced Settings", uncheck "Use default gateway". Do this for all WAN interfaces.
      3. However, if there's no default route in the main routing table, PBR will not kick in (therefore all tunnels will not form), so we need to define a dummy default route to kickstart PBR, but make sure the dummy default route has higher distance than OSPF, so default route injected from OSPF will take precedence. eg. "ip route 0.0.0.0/0 nexthop lo distance 200"

NOTE:

  1. We can use OSPF load balancing features (equal path, equal costs) across dual/multiple tunnels, but if one of the link is slow or has poor performance,it will impact the overall performance. You can use "tap ospf cost" to change the link cost, to switch between active/active or active/standby mode. tap with lower cost will be the active path.
  2. Each CMG VPN instance can support hundreds of remote tunnels (remote OSPF neighbours). But since each tunnel is in layer2 mode, we recommend no more than 200 remote peers per instance, to reduce broadcast domains.
  3. if you have large remote sites, it's recommended to put each tap into different OSPF area ID, to minimize OSPF topology update overheads.


CLI CONFIG ON CMG (VPN Server)

-------------------------------------------------

!mbox# show running-config !hostname mbox!interface eth0 description "to ISP" enable ip address 118.189.158.34/29!interface eth1 enable bridge-group 0!interface eth2 enable bridge-group 0!interface bridge 0 description "to LAN" enable ip address 10.1.1.254/24 ip ospf priority 255 dhcp-server range 10.1.1.10 10.1.1.100!ip dhcp-server start!ip default-gateway 118.189.158.1!!VPN configsecurity sslvpn-server 1 server address 118.189.158.34 1443 server tap-mode tunnel-pool 11.1.1.0 255.255.255.0 tap ospf priority 255 client client hsa-test-cmgisp start!security sslvpn-server 2 server address 118.189.158.34 2443 server tap-mode tunnel-pool 12.1.1.0 255.255.255.0 tap ospf priority 255 client client hsa-test-cmgisp start!firewall-input 010 permit all tcp dport 1443firewall-input 011 permit all tcp dport 2443firewall-input 100 permit all ospf!firewall-access 20 permit all src 10.1.1.0/24 remark "intranet traffic"firewall-access 21 permit all dst 10.1.1.0/24 remark "intranet traffic"!!Dynamic routing configip access-list 10 permit 10.1.1.0 0.0.0.255 <--change this to map to your own LAN subnet!route-map HQ-LAN permit 10 match ip address 10!router ospf router-id 10.1.1.254 network 11.1.1.0/24 area 0 network 12.1.1.0/24 area 0 redistribute connected route-map HQ-LAN!

CLI CONFIG ON HSA (VPN Client, please remove any MWAN config)

-------------------------------------------------

(download complete sample HSA config here)


!mbox# show running-config !hostname mbox!ip route 0.0.0.0/0 nexthop lo distance 200 remark dummy-default!firewall-set 10 mark 1443 output tcp dport 1443 remark "mark tap1"firewall-set 11 mark 2443 output tcp dport 2443 remark "mark tap2"!ip pbr policy 10 fwmark 1443ip pbr policy 11 fwmark 2443!ip pbr route 10 0.0.0.0/0 nexthop interface 3g-lte0 remark "map tap1 to lte0"ip pbr route 11 0.0.0.0/0 nexthop interface 3g-lte1 remark "map tap2 to lte1"!security sslvpn-client 1 tap ospf priority 0track-host 11.1.1.1 1 tap1 start!security sslvpn-client 2 tap ospf priority 0track-host 12.1.1.1 1 tap2 start!ip access-list 30 permit 192.168.2.0 0.0.0.255 remark "match local LAN subnet"!route-map ATM-LAN permit 10 match ip address 30!router ospf network 11.1.1.0/24 area 0 network 12.1.1.0/24 area 0 redistribute connected route-map ATM-LAN!

NOTE: if you're using physical WAN interfaces instead of LTE interface, you need to add the nexthop IP of the interface, eg.

!ip pbr route 11 0.0.0.0/0 nexthop 202.120.13.100 interface eth0ip pbr route 11 0.0.0.0/0 nexthop 100.20.33.3 interface br-vlan10

TROUBLESHOOTING COMMANDS-------------------------------------------------show ip interfaceshow ip ospf neighborshow ip ospf interfaceshow ip route ospftcpdump interface tap1tcpdump interface tap2