The unsolvable computer problem...?


Recommended Posts

Hello all.

 

I'm posting here regarding a computer problem I am having and so far after 2 weeks, have not been able to resolve the issues in full. Before I go any further let me state that I am a 17 year veteran of running my own computer repair business, am A+ certified for 12+ years and have hands on experience for 20+ years.  So I have spent 2 weeks doing diagnostics on this computer.  I say that so that we are all on the same page. :)

 

First off, the computer I have is my main office desktop.  It's an HP Pavilion p6654y Magnesium Gray Edition.  The specs are the following:

- Athlon II X4 630 Cpu.

- 16GB DDR3 1333 Ram (4x4gb)

- 1TB Boot Drive

- 3 TB GPT secondary drive.

- DVD-RW master / DVD-RW slave

- ATI Radeon R9 270A

- 802.11g Wifi
- Realtec PCIe gigabit NIC (onboard 10/100 Realtec also)

- Antec 550w PSU

- Win7 64Bit OS

 

I received this system from another business closing shop and decided to use it to upgrade my main office PC in Aug 2014.  I previously had a Radeon HD7850 in it, and had it working perfectly for about 8 months.  I would even lan on it after hours when some guys came over to play CoD4.  Several weeks ago we lanned and the system immediately started crashing with random bluescreen errors.  Each one was different leading me to think either motherboard, psu, or memory.  

 

MemTest showed no errors, my PSU when connected to a psu tester passed just fine.  So I narrowed it down to the motherboard.  At this point the crashing went from playing CoD to just operating the PC daily became a task as it would never come out of sleep mode when I stepped away.  I also reached the point where after a hard power shut down, when powering it on it would never POST.  Rather stay on a black screen.

 

Finally I received a replacement motherboard and swapped it out.  At the same time I backed up my data and figured I would reinstall Windows to clean up the system in general.  Upon reformatting my hard drive and installing Win7 fresh, the system would crash in the middle of reinstalling drivers.  Namely the Radeon Catalyst drivers for the HD7850.  So to help troubleshoot even more, I reinstalled windows again, this time installing each driver one at a time, reboot, wait, install next, reboot, wait, etc.  Again, all drivers installed except when getting to the video card drivers.  Halfway through the Catalyst driver install, the system would either 1.) bluescreen and reboot or 2.) just completely power off and then hang at the POST screen again not even booting back into the hard drive.  At one point I noticed a few red artifact lines in the screen so I realized it probably was the video card.

 

Well I ordered a replacement R9 270A which arrived the other day.  Before the card arrived, to continue troubleshooting further, I ran memtest on the memory and it periodically would give me memory failures.  Then subsequent tests would be just fine.  I swapped out the power supply to another 500watt power supply, but the issues persisted.  I swapped out boot hard drives and reinstalled Windows but incurred crashes in the middle of the OS install again.  I even swapped out memory but the same issues still persisted.

 

Finally the R9 arrived yesterday so i stuck it in and proceeded to reinstall Win7 fresh.  I crossed my fingers when installing the Catalyst drivers.... and SUCCESS!  They installed without crashing or shutting down the PC!

However... the system is still randomly rebooting, giving random bluescreen crashes on that fresh install of Windows.  Sometimes it would loop 5-6 times of random reboot, boot back to the desktop, reboot again after a few minutes.  So at this point I swapped out the new motherboard back to the old motherboard in case the new one I got was faulty.  Again, the SAME reboot/crash/bluescreen issue persists!  

So as I type this I am memtesting memory on the old motherboard.  When testing all 4 sticks at once using memtest86+ 5.01 (using all 4 cores), the test hangs/freezes up at Test #7 at 41%, every time. However when I test all 4 sticks using only 1 core, everything passes.  But when I test each stick individually with 4 cores, it passes every time.  

 

I'm at an utter loss at this point to explain what is going on.  I've got a new motherboard, new video card, swapped out memory, PSU's, hard drives, and disconnected any optical drives or addin cards to eliminate variables, but the problems still persist.  I know some p6654y systems had known motherboard faults with an HP recall, but mine is too far out of date to have a recall fix.  Maybe I have TWO failing motherboards???

 

Thoughts??  Ideas??  Suggestions?? 

Link to comment
Share on other sites

Hello,

 

What utilities have you run to stress test the CPU?

 

Regards,

 

Aryeh Goretsky

Link to comment
Share on other sites

Disable auto restart and tell us what the bluescreen/s say. Or use bluescreenview if it stays on long enough to see what that says

 

If its crashing with more than 1 core, it maybe a buggy CPU. CPU's can have faulty cores, and cache.

 

This will make windows crash. I would check and see if there's an updated BIOS for it as well. I would also try another (supported CPU).

Link to comment
Share on other sites

goretsky - I'm using Memtest86+ for the memory testing and tried Prime95 within Windows but that lasts a few minutes before it crashes.

 

John.D - I have disabled auto restart and checked the bluescreen 0x00... code except it's different on each crash.  In my years of experience, when such random bluescreen errors appear frequently it's usually memory being at fault or a faulty PSU causing bad data in the ram causing totally unexpected crashes with no consistent repeating errors.  I do agree that CPU's can be buggy, unfortunately I don't have any other AM3 cpu's at the moment here in the office.  I was looking on eBay for some Phenom II or Athlon II's to get one quickly to test.  Also HP does not have any BIOS's above the one installed on both of these motherboards.  Thanks for the input!

Link to comment
Share on other sites

I forgot to mention, if anyone else has any idea's on software tools/utilities to use for testing hardware, I'm open to ideas.  I use the Hiren's Bootdisk for the software tools found on there for stress testing components.

Link to comment
Share on other sites

Can you zip/upload the dmp files you've got to somewhere like onedrive?

 

And post the link?

 

I dont need the memory.dmp files, (theyre too big), just the smaller dmp files

Link to comment
Share on other sites

Could be faulty processor. At this point it is one of the two common parts. While you are on the right track with testing memory, you have not tested or replaced the processor.

Link to comment
Share on other sites

I forgot to mention, if anyone else has any idea's on software tools/utilities to use for testing hardware, I'm open to ideas.  I use the Hiren's Bootdisk for the software tools found on there for stress testing components.

 

Why not just trying a next-gen game ?

Link to comment
Share on other sites

John.D - I don't have those dmp files at the moment.  Part of my testing was reformatting the hard drive to reinstall windows to test at what point it was crashing to determine if there was a pattern when it was crashing.  I'll be reinstalling again today and if it does, I'll upload those.

 

sc302 - I agree it could be a faulty processor.  Tho without another available AM3 processor to test against, I'm unable to confirm if it is 100% the processor at the moment.

Link to comment
Share on other sites

Well you have just about replaced/eliminated everything else...

 

post issues would be, memory, processor, video, main board, or psu. 

 

you have eliminated main board and psu, sounds like you have also eliminated memory.  what's left to confirm?

  • Like 2
Link to comment
Share on other sites

I can't tell for sure, if you have switched out the memory or just moved them around a bit.

 

Never trust a memory-test program. Run with only one stick installed for a while and see if it helps.

Link to comment
Share on other sites

Once you eliminate the impossible, whatever remains, however improbable, must be the truth....

 

It's your CPU.

 

-Forjo

Link to comment
Share on other sites

Update:

 

Last night and early this morning I was rerunning the memtest86+ (5.01).  Each stick was tested by itself on single core and multi core mode in memtest86.  One stick had intermittent failures so it was removed.  I replaced it with another memory stick to test them as a group again to see if it would hang at the same spot as before.  It did. I used the original MemTest86 4.3.3 (not the plus version) and let the tests run 10 hours.  This morning there was no hangup in that program and memory passed all tests via 3 loops just fine.  So I'm chalking the MemTest86+ issue up to a possible bug in the program.

 

This time I booted into Windows (freshly installed copy) and started Prime95 to run in high CPU/Low memory mode to focus on the CPU.  It ran for a few hours uninhibited without a single hiccup, issue, or crash.  Since I don't have any other AM3 cpu's to drop in at the moment, I am wanting a confirmation if that is the problem before I spend any more money on the system.  After a couple of hours of Prime95, I exited the program, and within 5 seconds, low and behold I got another bluescreen!  It was the stop error of 0x0000001E.

 

I recognize this stop error code as being an NTFS file system error/memory leak.  It could be because Prime95 had an issue, or could be related to hardware still.  When the system rebooted, it tried rebooting off of my Win7 DVD, but then just halted on the black screen asking to press any key to boot from the disc.  So I had to hard power shutdown to turn it fully off.

Link to comment
Share on other sites

CPU - You've changed everything else 

I'd guess it was the memory controller on the CPU if I had to take a bet on which particular part.....

Link to comment
Share on other sites

Welcome to RMA hell. I used to have a Thai and Mandarin language book while I was on hold waiting for tech support. Months and months of calls, bills, frustration. All so a contractor can make some cashola at my pain and suffering.

I'm just a monkey for a real bad company.

This is definately a bait post, but what heck.

 

Once you know your CPU and your MEM are good, that leaves temperature and addons. Obviously if you ran a working memtest for several hours and it said all is good, then chances are very high that all is good. That will leave temperature as an issue if you remove every extra component. As a young jedi, I learned your master can give you damaged goods for RMA replacements just to Frak with your mind. It is a sick sick world.

 

You are lucky you can afford stuff like that, I would be using it to bitmine and occasionally escape from the depressing mortuary this life is.

 

Try running a linux live distro, just pick one of the top 5. The live bootcds all come with a "stable" memory tester. After memory testing, actually boot the live cd, drop to a shell, root yourself, then install the tools necessary to monitor temperature. Then instal the cpu/mem tester and let it go. If it core dumps you will have something on the screen. If you see it dumping after it reaches a certain celcius, then you know it is probably a heat problem.

But seriously, if you pass the memory test just fine then it has to be the temp/voltage rail issue (either), which could be on the MB itself or any one the components such as the CPU or vid card. If you can't even pass the memtest and a live cd stress test then you definately got a hardware issue.

 

http://www.linuxmint.com/download.php

 

Good luck and please don't wipe your pc out and blame it on me, just trying to help.

Link to comment
Share on other sites

Update:

 

I'll start off with that running Prime95 under full cpu and memory load for 5+hours today I did not receive any crashes.  That's leading me to think the cpu really isn't the cause of my problems.  Also HWmonitor shows that I am running 16*c under idle for temps and 49*c under max Prime95 load.  That was consistent for the entire 5+ hours so I figured if the cpu was to fail, it'd at least fail there.  

 

Since narrowing down that one DDR3 stick last night/this morning that had some weird faults things seem to have stabilized since.  I haven't spent much time trying to fix the system today (other office duties instead) but no random reboots or crashes.  Yet.  The odd thing was, previously I had removed the offending memory stick along with it's matched twin (Gskill) and had two Kingston sticks in replacement for it.  Both of those tested fine with Memtest86+ but the same reboot, bluescreen, crash issues persisted even then.  That's why I was reluctant to place blame on the memory after fixing artifact issues by replacing the video card.

 

I know alot of these system boards had faults in them that presented symptoms very similar to what I was having.  But anyway that aside I'm beginning the full Win7 reinstall process and will try doing a bit of Call of Duty after work sometime tomorrow to see if that will cause it to bluescreen again.

Link to comment
Share on other sites

Yep, faulty memory stick would do this too. A friend of mine had some new 8gb units that failed the same day he put them in, and we were the whole next day troubleshooting. Every symptom pointed to everything BUT the memory sticks (including Windows error codes).

 

Glad it is more or less sorted out now.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.