Wireless Issue with Application


Recommended Posts

We've been having this issue with one of our customers and we are in the processing of upgrading their equipment, but I wanted to get some other opinions.

 

Our customer is using an EMR software on a server (SQL Based). The users are a mix of Wired PCs and Wireless Laptops. All of a sudden, the wireless laptops would randomly drop and receive SQL timeout errors. We have worked with the 3rd Party Vendor and they have done some tweaks to the SQL database, we have maxed the RAM on the server but the issue continues on only the wireless laptops.

 

They are currently using 4 APs using 2.4ghz. The first thought from some of my coworkers is that network interference is the issue. We used a Spectrum Analyst tool a few days ago and although we did see some other wireless networks from surrounding buildings, the signal strengths were relatively low. As a test, we installed some 5.0ghz APs and placed them in one section of building. We had to purchase 5.0ghz  adapters for the laptops as they only supported 2.4ghz. Unfortunately the issue continues on the 5.0ghz APs. I will note that the 5.0ghz APs are older models, but we installed these for testing.

 

The bizarre thing about this issue, is that it is completely random and not consistent. There will be some laptops that will have constant issues all day, then the next day they will have no issues. There will also be laptops that have no issues at all and then all of sudden will start getting errors. There was no build up to this, like happening on one or to machines and then increasing in frequency. From the laptop side, we have updated all drivers, including chipset, wireless nic etc. and the issues continued.

 

I'd love to here any additional insight from those and maybe have some suggestions we have not tried. Thank you!

Link to comment
Share on other sites

So this only happens wireless, does it happen on all 4 of the APs - or just some of them?  What are the AP?  What is the exact sql/application error they get?  When they get the error does your wireless logs show that user was disconnected from the network, does the user loose access to other stuff or just the sql server?

 

I assume your running real AP, or are they just wifi routers being used for AP?

Link to comment
Share on other sites

Thank you for the reply Budman.

 

The issues happens across the whole building on all the APs. We originally had 3 APs spread throughout the building but we added a 4th in one section because there seemed to be a lot of disconnects in that area. The 4th AP did help clear that up. We are using Cisco APs, but they are standalone units and not on a controller. We are about to quote a controller based system with 5GHz to upgrade them. The 5GHz spectrum is clean in the building while we can pick up a fair amount of outside 2.4Ghz networks.

 

One of the SQL Errors received: [DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or access denied. (Unspecified Error)

 

We looked at the logs on the APs and we opened up cases with Cisco and we made modifications to the timeout settings. It looked like the clients were getting disassociated from the APs when they were supposed to roam to the next one. We tweaked everything Cisco had recommended but the issue continues.

 

The connection errors only happen with this application. Internet connectivity is fine. Outlook access to the local Exchange Server is fine. Network shares on the file server are fine. It's only this one application. Just note they are not getting a Windows disconnect error, the error message is solely from this one application.

 

To me, this makes it seem like the application is the problem, but then I loop back to where the hardwired machines have zero issues.

 

To put it mildly, this has been a nightmare to all involved.

Link to comment
Share on other sites

Ok, lets dive a little more into your network setup...how are your vlans setup?  Do you have multiple vlans or is it just a flat network?  What are you using to power your ap's?  Can you verify that the aps are getting enough power?  Can you switch power sources (perhaps a different poe switch or installing a power injector)?  How do you have your SSID's setup?  Is it all the same?  Are there different characters (one ap has the ssid with an uppercase, the other ap has a ssid with a lower case, etc)?  Are the AP's on different channels?  How big is the area you are trying to cover?  Can you do a heat map (http://www.ekahau.com/wifidesign/ekahau-heatmapper )?

Link to comment
Share on other sites

"It looked like the clients were getting disassociated from the APs when they were supposed to roam to the next one."

 

So this happens when the users move around, or when they are in place say at their desk?

 

How do you resolve the sql server name?  Is the connection setting hard coded IP or dns based?  Is the sql in a cluster?  What version of sql?

 

And this is after they have a valid connection, working in the application and then all of sudden they get this error - or is it during first connection?

 

More on the logs in the AP, so the user that has the problem is show that they disassociated and then re associated to a different AP?  Or the same one? What is the rssi to the different AP the user can see from the location.. Is that they can bounce between them?

Link to comment
Share on other sites

Ok, lets dive a little more into your network setup...how are your vlans setup?  Do you have multiple vlans or is it just a flat network?  What are you using to power your ap's?  Can you verify that the aps are getting enough power?  Can you switch power sources (perhaps a different poe switch or installing a power injector)?  How do you have your SSID's setup?  Is it all the same?  Are there different characters (one ap has the ssid with an uppercase, the other ap has a ssid with a lower case, etc)?  Are the AP's on different channels?  How big is the area you are trying to cover?  Can you do a heat map (http://www.ekahau.com/wifidesign/ekahau-heatmapper )?

 

We originally setup the APs with 2 VLans. One VLan was for the internal domain network and the second VLan is for customer guest Wifi. Once the issue started, we disabled the Guest Wifi access and setup a separate router attached directly to their cable modem (They have static IPs.) The APs are powered by power injectors. As for the power questions, i'm not sure if there is a way to check but I can look in to that. Unfortunately we don't have any POE switches, just the injectors. They are using really old switches, perhaps those should be upgraded soon too. The SSID is set as the same on all 4 APs so that when they move to different parts of the building, they should roam over. They are on alternating channels only using channels, 1,6 and 11, the three non-overlapping channels. Square footage is slightly over 14K square feet. We did do a heat map on the 2.4ghz and everything looked fine, very strong signal strength throughout the building.

 

 

"It looked like the clients were getting disassociated from the APs when they were supposed to roam to the next one."

 

So this happens when the users move around, or when they are in place say at their desk?

 

How do you resolve the sql server name?  Is the connection setting hard coded IP or dns based?  Is the sql in a cluster?  What version of sql?

 

And this is after they have a valid connection, working in the application and then all of sudden they get this error - or is it during first connection?

 

More on the logs in the AP, so the user that has the problem is show that they disassociated and then re associated to a different AP?  Or the same one? What is the rssi to the different AP the user can see from the location.. Is that they can bounce between them?

 

Most of the times, the users will be stationary when they input data. It's possible they may be walking and entering data, but from what I have observed when I was onsite, is that they are stationary. The users are spread all across the building.

 

As for resolution, it's configured via the 3rd party vendor. I know they have tried both via IP and Server Hostname. This is a single SQL server environment. The version of SQL is 2008R2 with the latest service packs as this is what they (Vendor) requested installed.

 

The errors will occur once they have started working, meaning they are on the network with good connectivity and then they will randomly get this SQL errors when using the application.

 

I can't remember the exact message in the logs for APs, i'll see if I can sift through the emails we sent, but I believe it was saying that the client times out and becomes disassociated from that particular AP. From what I notice, signal strength on the clients is full bars. ( I hope that's what you meant by that question.)

Link to comment
Share on other sites

What is the subnet of the server, what is the subnet of the client, and what is the subnet of the ap? Are they all the same?

 

They are all on the same subnet.

Link to comment
Share on other sites

How many clients on each AP?  And you say they 2.4, are they G or N?  Would other clients be moving data over this connection?

 

Would it be possible for you to limit limit users on a single AP to couple of test users of this application and see if they get the area.  Wonder if AP is just has too many users?  Love to see the actual logs showing the associations and deassociations, etc.

 

This is where a controller would make troubleshooting much easier.  You looking to go cisco or other player?  You had mentioned your looking to go controller route.  Also how many wireless users total are we talking.  Can you look to see how many are on each AP during typical time of day, etc.

 

BTW here is problem when saying latest, especially related to cisco ;)

 

These are all considered latest ;)

post-14624-0-08217200-1430420068.png

 

Link to comment
Share on other sites

It varies, but I did notice that most of the clients were connected to the center AP the most. It's almost like they won't roam over, however we did set the roaming mode on the wireless NICs to aggressive. At the moment they are spread out pretty evenly. These clients should be the only ones using this wireless network.

 

Edit: We are using N on these APs

 

It would be very difficult to limit the laptops due to the fact they move all over the building, also some of the laptops are shared so one user may be on one side of the building and then another user is on the opposite side. I was thinking the same thing too, but it's hard to catch unless we are constantly looking at it.

 

Here is a small sample from one of the APs. I snipped out the MAC addresses.

Feb 12 11:52:33.606: %DOT11-6-DISASSOC: Interface Dot11Radio1, Deauthenticating Station *Snip* Reason: Previous authentication no longer valid
Feb 12 12:03:27.483: %DOT11-6-DISASSOC: Interface Dot11Radio1, Deauthenticating Station *Snip* Reason: Previous authentication no longer valid
Feb 12 12:07:41.888: %DOT11-6-DISASSOC: Interface Dot11Radio1, Deauthenticating Station *Snip* Reason: Previous authentication no longer valid
Feb 12 12:11:22.375: %DOT11-6-DISASSOC: Interface Dot11Radio1, Deauthenticating Station *Snip* Reason: Previous authentication no longer valid
Feb 12 12:30:26.677: %DOT11-6-DISASSOC: Interface Dot11Radio1, Deauthenticating Station *Snip* Reason: Previous authentication no longer valid
Feb 12 12:56:34.109: %DOT11-6-DISASSOC: Interface Dot11Radio1, Deauthenticating Station *Snip* Reason: Previous authentication no longer valid

We are going to be quoting a Cisco system, not sure the controller model and APs yet. The total amount of users is around 20 at most at one time.

 

I've noticed even spreads at times: ie. 7-7-6 (the 4th AP is in a corner, maybe 1 or 2 on it) and I've noticed times where it looks like: ie. 4-13-3

 

The firmware is the 15.3.3-JAB on all three.

Link to comment
Share on other sites

If you want to go cheaper I have had very good results with ubiquiti...We currently have ~30 uap-pro's deployed across our campus. 

 

 

Regardless, having a controller will assist with how many users can be attached to an ap at any given time.  It could also be done by signal strength.  Which your current setup relies solely on the client to be able to control what ap it is connected to solely by which one has a stronger signal.

 

What are you using for authentication?  Radius, wpa?

 

 

take a look at this...not sure if it will help you or not

https://supportforums.cisco.com/discussion/10073326/aironet-1240ag-error-previous-authentication-no-longer-valid-help

 

 

this may help you debug

http://www.cisco.com/c/en/us/support/docs/wireless/aironet-1200-series/50843-debug-authen.html

 

 

You can also try switching firmware to see if there is any change.

Link to comment
Share on other sites

Are those all different machines or the same machine being deauthed?  And you don't see them reauth right away?  You sure the client just didn't drop out of connectivity?

Link to comment
Share on other sites

Are those all different machines or the same machine being deauthed?  And you don't see them reauth right away?  You sure the client just didn't drop out of connectivity?

 

Those were each different machines. The log did not show reauthorization. Can't be sure if that is the case.

 

Here are some more from the logs of one of the APs that may have more info. These are from 2 different ones


Apr 30 18:35:49.004	Warning	Packet to client *snip* reached max retries, removing the client
Apr 30 18:35:49.001	Information	Interface Dot11Radio0, Deauthenticating Station *snip* Reason: Previous authentication no longer valid
Apr 30 18:35:49.000	Warning	Packet to client *snip* reached max retries, removing the client
Apr 30 18:32:13.594	Information	Interface Dot11Radio0, Deauthenticating Station *snip* Reason: Sending station has left the BSS

If you want to go cheaper I have had very good results with ubiquiti...We currently have ~30 uap-pro's deployed across our campus. 

 

 

Regardless, having a controller will assist with how many users can be attached to an ap at any given time.  It could also be done by signal strength.  Which your current setup relies solely on the client to be able to control what ap it is connected to solely by which one has a stronger signal.

 

What are you using for authentication?  Radius, wpa?

 

 

take a look at this...not sure if it will help you or not

https://supportforums.cisco.com/discussion/10073326/aironet-1240ag-error-previous-authentication-no-longer-valid-help

 

 

this may help you debug

http://www.cisco.com/c/en/us/support/docs/wireless/aironet-1200-series/50843-debug-authen.html

 

 

You can also try switching firmware to see if there is any change.

 

 

They are using WPA for authentication.

 

Thanks for the links. I checked the first one about aironet extensions, but they are already disabled. I'll check out the Debug link.

Link to comment
Share on other sites

Yeah that shows the client is gone.. He didn't answer so deauthed - you could increase the number of retries, etc.

 

interface Dot11Radio0

rts threshold 512

rts retries 128

Link to comment
Share on other sites

Yeah that shows the client is gone.. He didn't answer so deauthed - you could increase the number of retries, etc.

 

interface Dot11Radio0

rts threshold 512

rts retries 128

 

RTS retries are set at 64. I'll update it to 128 later.

RTS Threshold is set to 2347.

Link to comment
Share on other sites

Yeah 2347 is default, lowering it can help with disconnects.

Link to comment
Share on other sites

This topic is now closed to further replies.