Weird Problem, Stumped, DNS starts to fail after idle


Recommended Posts

have you tried a different port on the switch? have you tried resetting the switch or restarting the switch? do you have any event logs?have you tried a different switch? if it's only the one computer it could be anything that the computer is connected to or it could be the network card.

 

I will move the port on the switch right now. I can't reset the switch at this time, but I can schedule it for late tonight.

 

The cable/line/switch port is the same that the previous laptop was using. I highly doubt it is the switch/ARP in the switch, since I've disabled the LAN and put it on WIFI only and the same problem occurs. So I think that rules out the port/cable/switch/network card.

 

The event logs, the ONLY items that show as any thing with errors or anything related to the issue are the NETLOGON and GroupPolicy in the first post. There are no symptoms or errors prior to it happening from any other entry.

Link to comment
Share on other sites

"RefID: 'LOCL'"

 

why do you not have this syncing with something - I doubt it is a stratum 1 timeserver ;)

 

This would be unlikely to be your issue.  But from that error it seems to be a time related issue - your sure your in the same exact timezones, your am pm is not off?

 

You ran that command on your dc1 - run it from your workstation.  So it shows the offset from the machine to the dcs

 

Does remote desktop work if you use IP of the machine vs name?

Link to comment
Share on other sites

What switch is handling the route tables between lans? That is where you should be looking at. It may not be the switch that the computer is directly attached to. Although it may not be a bad idea to reboot that switch.

Link to comment
Share on other sites

"RefID: 'LOCL'"

 

why do you not have this syncing with something - I doubt it is a stratum 1 timeserver ;)

 

This would be unlikely to be your issue.  But from that error it seems to be a time related issue - your sure your in the same exact timezones, your am pm is not off?

 

You ran that command on your dc1 - run it from your workstation.  So it shows the offset from the machine to the dcs

 

Does remote desktop work if you use IP of the machine vs name?

 

I actually ran that command from the workstation :o

 

Yes am/pm is not off, it is correct.

 

I will have to wait again until it starts the issues to test the IP vs machine name for RDP.

What switch is handling the route tables between lans? That is where you should be looking at. It may not be the switch that the computer is directly attached to. Although it may not be a bad idea to reboot that switch.

 

Well the main gateway router is a Sonicwall TZ-215, it has been restarted since the problem has been occuring and it's uptime is 7 days the ARP table on it looks ok. There are some other netgear fiber switches between 2 buildings leading back to the Sonicwall however. I can reboot those late tonight.

Link to comment
Share on other sites

Any public stratum 1 or 2 in your region - you can get a list here

 

http://support.ntp.org/bin/view/Servers/

 

Or you can just use the pool, maybe you will even hit my server if you use pool.  I have mine joined in the pool

 

Done, I elected to set it up again with - /manualpeerlist:0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org

 

Client w32tm /monitor now shows:

 

SRV-DC01.<FQDN removed> *** PDC ***[172.16.50.6:123]:
    ICMP: 0ms delay
    NTP: +0.0000000s offset from SRV-DC01.<FQDN removed>
        RefID: ponderosa.piney0.com [66.228.35.252]
        Stratum: 3
SRV-DC02.<FQDN removed>[172.16.50.5:123]:
    ICMP: 1ms delay
    NTP: -7.6556870s offset from SRV-DC01.<FQDN removed>
        RefID: SRV-DC01.<FQDN removed> [172.16.50.6]
        Stratum: 2
Link to comment
Share on other sites

Try 0.pool.ntp.org

 

I have it in the manual list - /manualpeerlist:0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org

 

Doubt this has anything to do with the issue since the time was correct to the PDC, but might as well fix it while I am working on stuff.

 

If anyone has any additional ideas, let me know.

Link to comment
Share on other sites

so my bad on the command - do it from the workstation pointing directly to your dc with /stripchart

 

example

 

C:\>w32tm /stripchart /computer:10.206.163.27
Tracking 10.206.163.27 [10.206.163.27:123].
The current time is 5/21/2015 8:53:53 AM.
08:53:53 d:+00.1300875s o:+10.7614916s  [                           |                          @]
08:53:55 d:+00.1238433s o:+10.6851501s  [                           |                          @]
08:53:57 d:+00.1238433s o:+10.5963305s  [                           |                          @]
08:53:59 d:+00.1238433s o:+10.5075109s  [                           |                          @]

 

you can also do a /tz

 

C:\>w32tm /tz
Time zone: Current:TIME_ZONE_ID_DAYLIGHT Bias: 360min (UTC=LocalTime+Bias)
  [standard Name:"Central Standard Time" Bias:0min Date:(M:11 D:1 DoW:0)]
  [Daylight Name:"Central Daylight Time" Bias:-60min Date:(M:3 D:2 DoW:0)]

 

do that on both the dc and your workstation.

 

you can also add /packetinfo to the stripchart command and see the actual packets

 

[NTP Packet]
Leap Indicator: 0(no warning)
Version Number: 3
Mode: 4 (Server)
Stratum: 6 (secondary reference - syncd by (S)NTP)
Poll Interval: 0 (unspecified)
Precision: -6 (15.625ms per tick)
Root Delay: 0x0000.2330 (+00.1374512s)
Root Dispersion: 0x0000.42B1 (0.2605133s)
ReferenceId: 0x0A97A407 (source IP:  10.151.164.7)
Reference Timestamp: 0xD9085F0787E6FDD0 (151350 13:41:27.5308684s - 5/21/2015 8:41:27 AM)
Originate Timestamp: 0xD908624D073BDED3 (151350 13:55:25.0282573s - 5/21/2015 8:55:25 AM)
Receive Timestamp: 0xD9086254469F4FBB (151350 13:55:32.2758684s - 5/21/2015 8:55:32 AM)
Transmit Timestamp: 0xD9086254469F4FBB (151350 13:55:32.2758684s - 5/21/2015 8:55:32 AM)
[non-NTP Packet]
Destination Timestamp: Roundtrip Delay: 122802600 (+00.1228026s)
Local Clock Offset: 7186209800 (+07.1862098s)

 

Yeah they suck here stratum 6..  I have brought it up multiple times - but I don't have control over the AD stuff here or it would be much better ;)

Link to comment
Share on other sites

Everything looks good now with w32tm /stripchart, the offsets are 00.01s or less with the DC, and the /tz command matches up results on the client in question and the DCs.

Link to comment
Share on other sites

so do this same test when you have the problem again, and does it work with IP vs name to the box your trying to rdp too.

 

Just to validate that the tz is not changing and your times are correct, etc.

 

Do you plan on cleaning up the isatap, teredo, 6to4 stuff.  I doubt your actually using any ipv6 since all I saw as linklocal - so I would prob just disable it across the board until such time your ready to actually set it up.

 

for sure remove all the BS adapters for teredo and isatap, 6to4..  Simple netsh cmds can get rid of those while still leaving ipv6 intact on your actual interface.  Those are all methods of getting to ipv6 when you don't have ipv6 which really not going to work unless you allow it on your firewalls, etc. So just clutters up your ipconfig /all ;)

 

see how clean ipconfig can look - even when using ipv6 ;)

post-14624-0-60735400-1432223397.png

 

Link to comment
Share on other sites

I restarted all of our fiber switches lastnight (6 locations, aerial fiber) and this morning the system is still stable. However it's uptime is only 15 hours. So we'll see how it goes over the next day or two. I will be checking it though today and over the weekend.

Link to comment
Share on other sites

And they are back...

 

System has been up and running idle for 17 hours 47 minutes

 

errors-1.jpg

 

The event log errors are from bottom up, both GroupPolicy are the same. There are no entries around those times in Application Event log.

 

"The processing of Group Policy failed. Windows could not obtain the name of a domain controller. This could be caused by a name resolution failure"

and

"The computer was not able to set up a secure session with domain controller in domain <DOMAINAME> due to the following. The RPC server is unavailable"

 

Currently

 

I can remote desktop into it fine.

 

I can ping DNS names from command prompt and they resolve to IP and respond.

 

I cannot browse any sites in IE, even via IP, local or ourside our network. It goes directly to a "This page could not be displayed"

 

Here's something new, when I do ipconfig /all, EVERYTHING is identical to the previous ipconfig /all, but suddenly "DHCP Enabled" is "Yes" and is is grabbing a "172.16.50.51" address from the DHCP scope. If I look at the network adapter properties and the TCP/IPv4 properties, it is 100% set to Obtain an IP address automatically. Nothing under Alternate Config.

 

Checking the servers I do see a DHCP lease for .51 pointing to this system and I do see a DNS entry pointing to .51 for this system, no duplicates for the previous static IP.

 

Rebooting the system now to see if it reverts back to the Static IP of .144

 

Reboot did not take it back to static IP. I am setting it back manually. I know for a fact I did not change the IP and no one has used the system.

Link to comment
Share on other sites

Reset ip and Winsock

At an administrator command prompt

Netsh int ip reset c:\windows\temp\reset.txt

Netsh winsock reset

Restart computer

Link to comment
Share on other sites

Are these the only two Dell Latitude 5550 laptops in the environment?  What's are you using as a DHCP server?

Link to comment
Share on other sites

Reset ip and Winsock

At an administrator command prompt

Netsh int ip reset c:\windows\temp\reset.txt

Netsh winsock reset

Restart computer

 

I actually already tried this early on, I did the winsock and netsh resets

Are these the only two Dell Latitude 5550 laptops in the environment?  What's are you using as a DHCP server?

 

Yes, only two. Both bought at the same time, just a month or so ago. One unit totally fine, the other has had this issue since day one.

 

DHCP server is a Server 2008 R2 DC

Link to comment
Share on other sites

Try different drivers on the nic card. Try imaging with other laptop image.

 

Tried the latest ones from Dell, and Intel.

 

Also, it did it with the Dell factory load, and I already did a fresh load of the OS.

Link to comment
Share on other sites

Seen this once before, nearly the exact same symptoms. Check for an application leaking handles using Task Manager. I ended up scheduling a regular kill/restart of the (unfortunately necessary) third party service causing it. It was opening TCP sockets regularly to talk to a cloud service, not cleaning up after itself properly afterwards.

 

BTW if this is the same as what I experienced, you should still be able to RDP to the machine, but only using the IP address. Connecting via name causes a Kerberos check, but since the machine can't open new outbound TCP connections (existing ones stay fine) the Kerberos check fails and you get the time sync error.

Link to comment
Share on other sites

Seen this once before, nearly the exact same symptoms. Check for an application leaking handles using Task Manager. I ended up scheduling a regular kill/restart of the (unfortunately necessary) third party service causing it. It was opening TCP sockets regularly to talk to a cloud service, not cleaning up after itself properly afterwards.

 

BTW if this is the same as what I experienced, you should still be able to RDP to the machine, but only using the IP address. Connecting via name causes a Kerberos check, but since the machine can't open new outbound TCP connections (existing ones stay fine) the Kerberos check fails and you get the time sync error.

 

Ok I am going to investigate this, I referenced this page on what to look for - http://blogs.technet.com/b/yongrhee/archive/2011/12/19/how-to-troubleshoot-a-handle-leak.aspx

 

This might be a good lead, right now, It's only showing around 42,000 handles. System is using 12k, but the next largest one is wcct.exe (Unified Wireless Application) which is located in C:\Program Files (x86)\Dell\Dell Unified Suite

 

Comparing the other Dell 5550, these numbers are about the same. However, the issue is not happening right now. I'm going to monitor closely.

Edit: Check this out.... recent article might be related - http://en.community.dell.com/support-forums/network-internet-wireless/f/3324/t/19620219

 

"Dell Unified Wireless Suite" is installed on both Laptops. This may be the best lead yet. Will update once I kill wcct.exe while the issue is happening and check the handle count first.

 

Edit: On second thought I am just going to uninstall the Dell Unified Wireless Suite and see if the issue stops. Will let you all know.

 

Edit2: When I uninstalled it and the wcct.exe was killed, the system process also dropped from 12k to 2000, now the entire system idle is running at 21k handles.

 

Edit3: So by doing that, Wifi driver is removed, have to extract the Wifi package and manually install driver without the "Dell Unified Wireless Suite" app installing.

Link to comment
Share on other sites

Nice find!!  No free tcp sockets would explain why dns udp and icmp still worked..

Link to comment
Share on other sites

Nice find!!  No free tcp sockets would explain why dns udp and icmp still worked..

 

Yeah got lucky on this find, I believe it is fixed, but I am going to just let it sit over the long weekend and if there are no problems by Tuesday I'll mark it as solved for random_n. Budman and the others do get A+ for effort however. :)

Link to comment
Share on other sites

And seems you got some other stuff fixed up in the process, now pointing to legit timesource ;)  And did you clean up that ipv6 mess you had?

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.