Addressing Wi-Fi captive portal roaming
Wi-Fi captive portal is used to intercept new user web connection request and redirect then to a captive portal for authentication, before granting them full Internet access.
One of the most common problem with captive portal is Wi-Fi roaming, eg. users can get kicked out and have to re-authenticate to captive portal when connecting to different APs. For example, they initially connect to one AP, login to portal, get authorized, but when they associate to another AP in the same network, they're prompted to login with portal again. Just google with keyword like "captive portal roaming problem", you will find many of such problems.
There're two common scenarios where you need captive portal roaming:
Users need to mobilize across the Wi-Fi networks, eg. walk into different network branches, retail outlets, F&B venues, and different locations of airports, shopping malls, and large event places etc. They must connect to different physical AP when moving around different physical locations.
Large dense Wi-Fi networks, eg. hotels, shopping malls, dormitories, hostels, etc, where there are many APs closely located to each other, particularly if Wi-Fi tuning/interference is not optimized, user device gets bounced between different APs (therefore roam between APs) even if users don't physically move around.
Captive portal is typically provided in two methods and each has its own way to address roaming:
By AP redirect. Some AP has built-in features to redirect users to a captive portal, either externally hosted by some service providers or local portal hosted on Wi-Fi controller. Different vendors have different approaches to handle device roaming, but almost all of them are based on user device MAC address only, eg. each hotspot user session is tracked by his device MAC only.
If a user associate to one AP, gets DHCP IP (either from controller, firewall, switch or router etc), logs into captive portal from that AP, then that user device MAC address is cached and populated to all other AP within the same network, so that same user device gets "seamless" access when associates to another AP, but this approach can consume quite some AP resources and degrade AP performance to handle other tasks, especially in a very crowded network with many MAC to store.
some vendors may use controller to centrally store all authorized device MAC and the new AP will query controller database to grant seamless access, but this approach may still trigger portal pop-up if the query/response is slower than device Captiveportal Network Assistance (CNA) detection. Users will feel they get kicked out and need to relogin again although at the background the authorization is done, but slower than CNA pop-up. Some other vendors may have other different methods, which are out of our scope to discuss here. We will focus on our gateway solution in this document.
By gateway. Our HotSpot Gateway (HSG) issues DHCP IP addresses to each user and redirects local (or external if using cloud HSG) captive portal for authentication. By default, gateway uses a "more secure" approach to track each authorized user session, using both device MAC and IP. After successful authentication, HSG tracks each hotspot user session by a combination of device MAC and IP address (MAC alone is not secure! remember MAC can be faked). Rightfully, using gateway for Wi-Fi captive portal roaming will be simplest since the gateway only needs to know MAC+IP, irregardless of which brands of AP being used (or can even be wired/LAN connection). But there're still a few situations we need to take care of Wi-Fi captive portal roaming, because user session (MAC+IP) gets changed during roaming:
In above two roaming scenarios, when device roams/bounces to another AP, some device (depends on device behavior) may send DHCP DISCOVER message when associate to new AP. If you're using the older HSG DHCP command (client-dhcp x.x.x.x y.y.y.y lease nn), the DHCP server will issue a new/different IP each time it sees a DISCOVER message, eg. user device IP address gets changed after associated to new AP. That was a simpler and more efficient DHCP implementation but it causes problem to hotspot user session because now the same user session MAC+IP is changed (despite same MAC), and user will be prompted with portal. The fix to this is to use command (client-dhcp-server). This command uses mboxOS native full-scale DHCP server to manage hotspot client DHCP requests and it keeps a lease file to track each assigned MAC+IP and leased time etc. As long as it's within the leased period, in the same network/VLAN, client always gets the same IP address irregardless of which AP or which location, therefore, maintains same hotspot session (MAC+IP). So we recommend to use "client-dhcp-server" command for large networks where roaming is required, and only use client-dhcp for simple/small networks.
Roaming across VLANs. Some large Wi-Fi networks may be sliced into multiple VLANs, and we need to allow users to roam across VLANs (seamless relogin). Because when users are in different VLAN, they must get different IP address (with different subnet), so the hotspot session MAC+IP must change. In order to address such scenario, we recommend to use client-sticky and portal-sticky features. These features are similar to how AP handles roaming. When user logins from one AP (eg. VLAN 10), his device MAC is stored in a cookie table on HSG; when he roams to another AP (eg. VLAN 20), HSG checks his MAC only and if found already authorized in VLAN 10, HSG will authorize his connection immediately. Note you can use both client-sticky and portal-sticky together (check each link for detail differences). Sometimes "client-sticky" alone may not be sufficient because user device CNA may be faster than HSG checking its cookie table (similar to AP controller checking may be slower than CNA detection), so enabling portal-sticky together will eliminate those anomalies.
Device MAC keeps changing. We experienced a very small percentage cases, that certain models of certain brands of mobile devices will randomize its device MAC address. Both gateway and AP based captive portal solution will have problem with this kind of device. There's almost no solution to address roaming issues with such devices. However, if it's possible to statically configure IP for these devices, you can consider using "layer-3 hotspot mode". This mode tracks user hotspot session by IP address only. As long as device IP doesn't change, the user can seamlessly roam across all APs within the network.