troubleshooting slow Wi-Fi issues
In a large Wi-Fi deployment, it's typical to hear users complaining slow Internet connections through Wi-Fi. If users complain slow and Wi-Fi signal is not strong, stop here, focus on Wi-Fi coverage issues or ask user move closer to Wi-Fi zone; if Wi-Fi signal is showing strong and yet Internet connection is very slow (eg. far slower than what's configure/intended), then read on...
While some problems are obvious, some intermittent problems are harder to identify. It often requires experience and relevant expertise to troubleshoot. Because our HotSpot Gateway (HSG) is the gateway to everything (router, captive portal, authentication, firewall, etc), there's typically a natural tendency to start suspecting problems with HSG first. But if we jump into fire-righting without knowing the root causes, it will often waste more time and cause more user dissatisfaction.
This doc will highlight the common root causes and explain step by step on how to isolate and fix them. (note this doc doesn't coverage Wi-Fi coverage related issues and assumes complaining users are getting decent Wi-Fi signals).
An end-to-end user experience involves many components in between, and problem can arise from any one of them. We need to follow a systematic approach - isolate which part is likely causing the problem, drill in to investigate and identify the exact root cause so we can fix it accordingly.
We will use a on-premise design scenario for this troubleshooting guide. Other design scenarios are of similar principles.
Generally, there’re a few potential blocks causing the problems:
Internet link congestion or provider backbone congestion
Internet router (or link balancer, if any) is overloaded and unable to handle all the user connections
HotSpot Gateway (HSG) is overloaded and unable to handle all the connections
The wireless infrastructure is not performing well.
Please follow below steps to isolate which component is likely causing the problems.
Step #1: plug PC to ISP link for direct speed test
This step helps to check if ISP link is giving enough committed bandwidth.
Plug ISP link to your PC (and configure your PC with the correct IP etc etc) to do direct speed test, make sure you get enough bandwidth promised by service provider (eg. = X Mbps).
If the result is too far from the SLA, stop here and escalate to service provider.
(Note: even if you get the committed speed X here, it doesn't means you will always get the same X throughout, especially for the non-guaranteed links. but we just have to accept this caveat for now as it's usually out of our control here.)
Step #2: plug PC to Internet router to do direct speed test
This step checks if any Internet router performance problem. If there's no Internet router (eg. HSG is used as Internet router too), proceed to step #3
Get your current router WAN port bandwidth utilization (eg. = Y Mbps)
If Y is close to X already, it means there's a link congestion. Stop here, there's nothing else you can do. Escalate to customer to advise them to upgrade the link speed.
If Y is far lower than X, then connect your PC to one of the spare LAN port on the router and do direct speedtest. Note when you do this speedtest, make sure do it together with live user traffic, eg. don't disconnect the port connecting to live network to cut off the user traffic. Then sum up your speedtest result together with Y, make sure it's close to X. If not, it's likely that the router is over utilized. Upgrade router until this test is passed with expected results. (eg. PC speed test result + Y = X).
(Note: many routers can give good speedtest when there're no user traffic (no user connections), but when there're live traffic with large user connections the router performance can drop significantly because it needs a lot of computing resources to track each user connections and address translations etc.).
If no more user slowness issue, stop here, else move on to next test.
Step #3: plug PC to HSG to do direct speed test
This step checks if any problem with HSG. Note this part only focuses on performance related troubleshooting for HSG. there's separate guide on troubleshooting hotspot service availability.
Get your current HSG WAN port bandwidth utilization (eg. = Y Mbps), follow below tests
If Y is close to X already, it means there's a link congestion. Stop here, there's nothing else you can do. Escalate to customer to advise them to upgrade the link speed.
If Y is far lower than X, then do below tests:
connect your PC to one of the spare LAN port on HSG and do direct speedtest. Note: do not enable hotspot service for this port, just configure as a router port. When you do this speedtest, make sure do it together with live user traffic, eg. don't disconnect the port connecting to live network to cut off the user traffic. Then sum up your speedtest result together with Y, make sure it's close to X. If not, it's likely that the HSG is having performance issues. Upgrade HSG until this test is passed with expected results. (eg. PC speed test result + Y = X). If no more user slowness issue, stop here, else move on to next test.
connect PC to one of the switch port assigned to hotspot VLAN and perform speed test (again make sure together with live user traffic). If the test result is not good (eg. similar to end user slowness, or far below the configured per user bandwidth), there's a high chance of HSG having problems. Try a few options below until this step test results become good:
upgrade HSG to bigger model, or/and
check user VLAN sizing. split into multiple smaller subnets (split into more hotspot instances), or/and
remove other optional features, eg. stop macc, "client-local-dns off", stop syslog-server
If no more user slowness issue, stop here, else move on to next step.
Step #4: check Wi-Fi infrastructure setup
This step checks if any problem with Wi-Fi infrastructures.
In a large Wi-Fi deployment, Wi-Fi infra setup is a very complex subject. We will just focus on common end-user issues
users "unable to get IP address". This can be caused by AP insufficient power source or too much AP interference (client device keeps bouncing between APs)
Slow connection. This can be verified by ping default gateway (through Wi-Fi only, not passing through HSG yet). It is usually caused by AP performance issue (eg. too many users connected per AP), or insufficient power source, or upstream/PoE switch performance issues.
A few things to check:
check there's good power source, eg. make sure UPS has sufficient power to empower all the related boxes. UPS can potentially degrade hardware performances or even cause equipment to hang when there's insufficient power supply.
check PoE switch performance. The PoE switch is the heart of Wi-Fi infrastructures and it powers up all the AP. Make sure it does have enough power for all the connected AP. Some cheap PoE switch spec looks high but in reality it's unable to power up all the APs. Or even if the APs seem ON/up, they don't produce optimal performance and cause Wi-Fi slowness. If you really have to use these cheap switches, try not to fully populate all the PoE ports, and use more switches, eg. if 24-port switch, only use up to 16 ports (connect to 16 AP), etc. Note the switches may also perform VLAN trunking etc so it does need a good performance.
check Wi-Fi settings. Use controller-based APs, either on-premise or cloud-based controller, perform below tuning to minimize interference between APs.
radio management. make sure the adjacent APs run in different channels to avoid interference between APs
AP signal strength (power) management. there's a mis-perception that stronger signal is better but it's not always the case, especially when there're many AP cluttered in a dense place. tune the AP power to minimize interference. Ask your vendor to help you on this.
client roaming. enable client roaming between AP, so when user client hands off between AP the association/disassociation is fast and seamless to users.
load balancing. Some intelligent AP can do load balancing within AP (between radios) and between APs. It is very important to enable this feature for crowded areas with lots of users, to avoid AP congestion. Ask your vendor to help on this.