Javascript Disabled Detected

You currently have javascript disabled. Several functions may not work. Please re-enable javascript to access full functionality.

Probable hardware failure

Started by Mr. Jess , Jul 01 2007 03:20 AM

Please log in to reply

#1
Mr. Jess

Posted 01 July 2007 - 03:20 AM

Member

Member
12 posts

I've had my rig running at its current condition for well over three years with the exact same hardware, no real issues. I decided I wanted to give linux a try and gave gentoo a whirl. For the most part I had success except for some really bizzare and intermittent errors that seemed beyond my nix savy. I chocked this up to being a linux noob, and eventual my frustrations compounded and I gave up to go back to windows. Windows xp pro seemed to work relatively well for a period of time, until I noticed it would occasionally crash and flash me a blue error screen long enough for me to see an error code but not long enough to copy it. I've had ram die on some previous rigs, and suspected this was the case as it appeared to be similar symptoms. So I ran my rig through memtest86, and it threw about a billion errors up by the 5th test. So I then tested each stick of ram separately, and surprisingly each individual stick passed all the tests not just once, but 9 times (thorough to say the least) with no errors. So I popped all three sticks back in, ran memtest86 again, and it passed all the tests once before I stopped the test. I booted up as per normal, thinking this was rather bizarre, and unsurprisingly windows crashed flashing me the blue error screen within two hours. I rebooted the machine and noticed that windows crashed almost immediately after logging on. I found that the computer would do this several times in a row, but if I let it be for maybe two or three hours, then I would be able to work in windows for a couple of hours before it crashed. I thought this might be a heating issue so I tore the side off my case to get a bit better airflow, and dusted the insides out, but with no real results. Windows would work for about two hours, crash, then continously crash unless I let it be. Just to verify that it wasn't a heating issue I loaded up aida32, watched my cpu and chipset temps while putting my computer under an extreme load. my cpu hit about 53c and chipset about 39c at full load before it crashed. I ruled out temp, as those aren't unreasonable temps. I did this a few more times and noticed that my system seemed to be crashing regularily when the ram reached about 20% free, which is cool because now I can consistently create conditions to make my system crash. Before it just appeared intermittent. I gave memtest86 another try with all sticks of ram in and it past the first set of tests, but started throwing out errors near test 6 in the second set. To be ultra safe I ran all my spyware and antivirus programs, all coming up clean as one would expect.

That gets us up to date with my troubleshooting so far. I have a few theories. I'm guessing it is a mobo issue, specifically one of the slots for ram. I tested each individual stick of ram in the same slot, the middle one. Since each stick passed 9 sets of tests in the center slot, I don't think it is the ram, I don't think it is the cpu as the cache is tested too. It will take a while to test each individual slot as I want to be relatively thorough with each slot, but was looking for any advice or any free diagnostic tools and programs you'd recommend to help me trouble shoot this issue. If you think I'm heading in the wrong direction let me know, so I can get this system stable again and try linux again as I suspect my linux issues may actually have been hardware issues coinciding with my os change.

here are the important and relevant system specs:
AMD athalon 1.1ghz t-bird core
Asus a7v motherboard
3X512mb 133sdram
Running XP Pro, updated as of last tuesday

Thanks in advance for any suggestions or diagonstic tools you can offer.

#2
anzenketh

Posted 01 July 2007 - 05:15 AM

BSOD Warrior/Computer Surgeon

Technician
2,854 posts

Hello MR. jess.

I would have to say your deduction is probably correct that it is the mobo due to the fact that you tested memory individually in one slot and then you put both in and only then it gave you errors.

#3
The Skeptic

Posted 01 July 2007 - 08:23 AM

Trusted Tech

Technician
4,075 posts

I think it's somewhat too earlt to judge that the motherboard is faulty.

You tested each ram stick and all passed the test. For the test you used only one slot which means that all the modules showed good on this particular slot. This leaves two extra options ramwise:

1: One or more of the other slots is faulty.
2: There is some compatability problem between the ram sticks and it shows up when all the three are installed.

I would do the following:

1: make sure that all the slots are clean and so are the module contacts. Use a soft brush to clean them.

2: Clear the BIOS: disconnect the power cable from the back of the computer.
Open the side cover and carefully take out the cmos battery (looks like a silvery button).
Keep it out for about 20 minutes. Reinstall in the reverse order and reboot. You will probably
get a checksome error. If you do, enter BIOS by pressing del or F2 repeatedly when starting
the computer until BIOS tables show up. Set time and date, save the new values and let the
computer boot. There are computers in which other keys have to be used to enter BIOS.
You can find the correct key when looking at the screen right after pressing the start button.
Look for what key you have to press to enter setup.

3: If the crash still occurs I would run a memtest on the other two slots, one at a time. Since you tested the modules seperately and found each of them to work well it's enough to run only one module. Run it in one slot and then in the other.

4: With Aida or, preferably, with Everst, the successor of Aida, check and report voltages of the power supply unit.

5: Run a check of the disk. Go to My computer > right click the OS local drive > properties > tools > error checking > check now. Check the two boxes and Start. Reboot if requested to do so and let the process run.

6: To enable reading the details of the crash screen go to control panel > system > advanced > startup and recovery. Uncheck Automatically Restart. This will leave the blue screen in place and will enable reporting us the details.

#4
Mr. Jess

Posted 01 July 2007 - 11:47 AM

Member

Topic Starter
Member
12 posts

Thanks for the tips skeptic. after testing two of the slots I ran into no errors, and this morning woke up and tested the last slot, which did produce errors in memtest. I'm going to run through your steps, in hopes that a cmos flash maybe resolves issues with the one slot. I'm not a big fan of giving up a slot if I don't have to, but at least my computer will be able to function with two slots running.

I'll update you shortly on the results.

#5
Mr. Jess

Posted 01 July 2007 - 02:10 PM

Member

Topic Starter
Member
12 posts

True Story:

So I unplugged my psu, popped out my cmos batter and waited 20 minutes. I then popped it back in, plugged the PSU back in and attempted to boot my machine. It starts up, with a beep fans wirling then just powers off. This is pretty odd, since this isn't the first time I've messed with the battery, but it is the first time that it has not booted. The real weird part is it seems to be power up just fine with a single short beep, then it shuts off. I checked my manual to be sure and it does say that a single beep is a successful post, but I don't get any display on my monitor before it shuts down. I'm inclined to think that it isn't posting, despite what my manual says, since in the past it usually took much longer to post.

I'm running through some resources for my mobo now to see what I can find out about that beep code. Let me know if you have any suggestions.

#6
The Skeptic

Posted 01 July 2007 - 11:50 PM

Trusted Tech

Technician
4,075 posts

Check that everything sits firmly in place, including ram sticks, video card, power cable and the main connection of the power supply on the board. If needed, unplug every item and replug it after cleaning the contacts.

Run the computer with only one memory stick.

Unplug the computer, take out the cmos battery again and use a voltmeter to check it. It should show above 3 volts. Your computer must be quite old and it is possible that the battery is weak. Make sure that you install the battery in the correct polarity. The Plus sign should face upward (away from the motherboard).

#7
Mr. Jess

Posted 03 July 2007 - 07:02 PM

Member

Topic Starter
Member
12 posts

Okay I got my computer back up and posting again. I guess the issue was a mobo specific problem, where any error or reset that puts the bios back to default will make the mobo check the rpm on the cpu fan. If it doesn't get any info, it assumes no fan is connected and powers off. So I bought another fan and got my computer back up and posting. Here are the results of the rest of your previous post.

1- there wasn't much dust ore debris since I'm already a bit of a clean freak with my comps, but cleaned the contacts with some alcohol and lightly brushed out a bit of dust in and around the sockets. all cleaned up.

2- checksum error, and I set up the date and time in the bios and adjusted the clock to 133 instead of the bios safety default of 100. Other than those two adjustments everything looked relatively in order for normal functioning with its defaults so I left them.

3- The crash did occur again so I ran the ram through a bunch of memtests. First, I should point out my findings before reseting the bios, as I suspect a possible overlooked feature in the bios causing some errors. Pre-reset I found that one slot, closest to the CPU was kaput, while the other two ran just fine. Post-reset I found that of the two slots that worked previously neither of them were working without error now. I tried all three sticks individually in each of the two slots, and each stick consistently produced errors on test 7, while some produced errors on test 6. For all three of these sticks to have passed memtest individually pre-reset, then have them all produce errors consistently on test 7 indicated to me it might be a bios/mobo setting that was funky. I dropped the memory clock to 100mhz and still got the same results. I then upped the voltage a notch, with no success. I bumped the clock back up to 133mhz to test it with the slightly higher voltage, and reasonably there were still errors. I tested 100mhz, 133mhz, both normal and upped voltage while changing a system performance setting down from its default of optimal to normal (according to my manual this setting tweaks certain memory functions) with the same errors occuring. I didn't go so far as to mess with the the timings, as I began to suspect my initial intuition was wrong or at least my approach was wrong. So for the time being I moved on to the next step.

4- voltages as reported by aida32 (consistent with my mobo's reporting too).
CPU core 1.81v
CPU aux .10v
+3.3v 3.63v
+5v 4.95v
+12v 11.92v
-12v -11.89v
-5v -4.98v

5- system won't run long enough to get an error check of the disk complete. Same thing in safe mode.

6- an MCE error message with the usual jazz of checking to make sure all new hardware is installed correctly, and to turn off any memory caching or shadowing in bios, etc.
0x0000009c (0x00000002, 0x8054D570, 0xF6002000, 0x0000017A)

I did notice my CPU was running hotter around 60c, but I imagine that had more to do with ambient temperatures since my testing today was at peak ambients of about 36c, and no lower than 33c in my unconditioned top of a complex apartment. I don't figure this should matter too much as this cpu is rated for 90c tops, but figured I might mention it.

Any suggestions? I'm at a loss, despite dinking with a few more bios settings for my ram.

#8
The Skeptic

Posted 03 July 2007 - 11:29 PM

Trusted Tech

Technician
4,075 posts

From the distance there isn't much that I can suggest further. It seems that the BIOS itself or the memory controller are faulty. In each of this cases the motherboard is usually replaced.

You can try to update the BIOS by downloading it from the motherboard's manufacurer's site. If you try this follow the instructions carefully. Doung the process incorrectly can destroy the motherboard.

There is also a possibility that the problem starts with the cpu. To test this option you have to take it off and install it on a motherboard that is known to be good.

#9
Mr. Jess

Posted 04 July 2007 - 06:55 AM

Member

Topic Starter
Member
12 posts

Thanks for the help skeptic. Yeah my bios are already up to date unfortunately. I tried troubleshooting with a few more bios memory settings, but I'm thinking the mobo is indeed trashed. Well I got nearly 7 years out of the board. That is a lot of practical use, and could be worse.

Thanks again.