More Errors

March 14, 201016 yr

Being new here, I wanted to make sure I was posting in the right section. I didn't see an "oh sh*t, what's going wrong" forum...and I did see lots of threads with the word "errors" in them...so I figured I was in the right place

New to unRAID...duh...DIY server build; starting to notice some issues. First transfer speeds are, at times, unexplainably slow. I started looking into network, NIC issues, etc. In the mean time, I had a couple of big file transfers (single files, in the 35-40G range) just flat-out fail...and now I'm starting to notice some strange behavior from my XBMC connected to the unRAID shares, and even odd issue with just bringing up the Web UI (like, it'll take a few seconds to display).

Well...here are the screen-shots; I suspect I have some drive, and perhaps other issues...but being new to unRAID, I'm not really sure the best course of action from here.

Any thoughts for this newbie?

MP

March 14, 201016 yr

You should attach a complete syslog (check FAQs on how to obtain it) for the folks who are expert at this sort of thing. Having said that, you seem to be having a similar problem to the one I am (although by no means should you take this as fact... I just noticed similar error messages in both our logs). My thread may be of some interest to you, in that case.

More sort of a weeping, "Why me?! Why me!?" kind of interest, but... y'know...

March 14, 201016 yr

Author

OK, syslog attached.

MP

syslog-2010-03-14.txt

March 14, 201016 yr

The disk assigned as disk5 ( /dev/sdc ) is showing 162 "read" errors on the unRAID screen. They show in the syslog as read_stripe errors.

Each error looks like this:

Mar 12 10:20:43 unSERVER kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Mar 12 10:20:43 unSERVER kernel: ata4.00: irq_stat 0x40000001

Mar 12 10:20:43 unSERVER kernel: ata4.00: failed command: READ DMA EXT

Mar 12 10:20:43 unSERVER kernel: ata4.00: cmd 25/00:00:17:45:ed/00:04:2e:00:00/e0 tag 0 dma 524288 in

Mar 12 10:20:43 unSERVER kernel: res 51/40:00:9a:47:ed/00:00:2e:00:00/00 Emask 0x9 (media error)

Mar 12 10:20:43 unSERVER kernel: ata4.00: status: { DRDY ERR }

Mar 12 10:20:43 unSERVER kernel: ata4.00: error: { UNC }

and each results in a read_stripe error like this:

Mar 12 10:14:09 unSERVER kernel: md: disk5 read error

Mar 12 10:14:09 unSERVER kernel: handle_stripe read error: 786239736/3, count: 1

You should get a SMART report on that drive (easy since you have unMENU installed as it is a button on the disk-management page)

You will see many "re-allocated sectors" or sectors pending "re-allocation"

When these errors occur, unRAID fails to read from the disk and instead reconstructs what should be there by reading from parity and the other data disks. It then writes the contents of what would have been read back to the failing disk. That should let the SMART firmware on the disk re-allocate the sector elsewhere.

Large modern disks have a reserve of several thousand sectors. If you do not see the re-allocated sector count increasing over the next months, you might be OK... Usually, once you see a number of bad sectors it indicates the surface of the disk is damaged and the damage causes dust particles which in turn cause more damage over time. Keep a close eye on the "errors" count on the unRAID management screen and the errors in the syslog.

Experience says that you should keep a very close eye on the disk. If you see continued errors it is slowly failing and a prime candidate for replacement. Best advice... think about replacing the drive, or, at least have a spare disk on hand to use as the replacement (start shopping the sales... before the disk runs out of sectors in its pool of spares)

Joe L.

March 14, 201016 yr

Author

Joe, here's what my untrained eye sees: Disk 2, which is a 1T WD, has 952 reallocate sector errors thus far, and is reporting "overall bad" in the SMART view of my unMENU MyMain. Disk 5, the newly added 500G Seagate, doesn't even appear in the SMART view for some reason (how can I fix that so I can get a report?).

Is it possible the errors you're seeing are for Disk 2, not 5...or do you think there's trouble brewing with both?

At the very least, I agree I likely need at least 1 replacement; do you think that's all there is to it for now? What's the best way to "test" the drive's integrity...now that it's already part of the array?

Thanks,

MP

March 14, 201016 yr

To my eye, the errors in your current syslog are ata4.00, which translates to sd 4:0:0:0:0 which results in a read failure of disk5.

The other errors reports on the smart report on the myMain screen were some time in the past.

You need to get the actual smart status reports for all of your drives... As I said, you can get them from the Disk-management page in unMENU

Mar 12 09:08:49 unSERVER kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Mar 12 09:08:49 unSERVER kernel: ata4.00: irq_stat 0x40000001

Mar 12 09:08:49 unSERVER kernel: ata4.00: failed command: READ DMA EXT

Mar 12 09:08:49 unSERVER kernel: ata4.00: cmd 25/00:00:77:7f:4a/00:04:2c:00:00/e0 tag 0 dma 524288 in

Mar 12 09:08:49 unSERVER kernel: res 51/40:00:da:82:4a/00:00:2c:00:00/00 Emask 0x9 (media error)

Mar 12 09:08:49 unSERVER kernel: ata4.00: status: { DRDY ERR }

Mar 12 09:08:49 unSERVER kernel: ata4.00: error: { UNC }

Mar 12 09:08:49 unSERVER kernel: ata4.00: configured for UDMA/133

Mar 12 09:08:49 unSERVER kernel: ata4: EH complete

Mar 12 09:08:54 unSERVER kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Mar 12 09:08:54 unSERVER kernel: ata4.00: irq_stat 0x40000001

Mar 12 09:08:54 unSERVER kernel: ata4.00: failed command: READ DMA EXT

Mar 12 09:08:54 unSERVER kernel: ata4.00: cmd 25/00:00:77:7f:4a/00:04:2c:00:00/e0 tag 0 dma 524288 in

Mar 12 09:08:54 unSERVER kernel: res 51/40:00:da:82:4a/00:00:2c:00:00/00 Emask 0x9 (media error)

Mar 12 09:08:54 unSERVER kernel: ata4.00: status: { DRDY ERR }

Mar 12 09:08:54 unSERVER kernel: ata4.00: error: { UNC }

Mar 12 09:08:55 unSERVER kernel: ata4.00: configured for UDMA/133

Mar 12 09:08:55 unSERVER kernel: sd 4:0:0:0: [sdc] Unhandled sense code

Mar 12 09:08:55 unSERVER kernel: sd 4:0:0:0: [sdc] Result: hostbyte=0x00 driverbyte=0x08

Mar 12 09:08:55 unSERVER kernel: sd 4:0:0:0: [sdc] Sense Key : 0x3 [current] [descriptor]

Mar 12 09:08:55 unSERVER kernel: Descriptor sense data with sense descriptors (in hex):

Mar 12 09:08:55 unSERVER kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00

Mar 12 09:08:55 unSERVER kernel: 2c 4a 82 da

Mar 12 09:08:55 unSERVER kernel: sd 4:0:0:0: [sdc] ASC=0x11 ASCQ=0x4

Mar 12 09:08:55 unSERVER kernel: sd 4:0:0:0: [sdc] CDB: cdb[0]=0x28: 28 00 2c 4a 7f 77 00 04 00 00

Mar 12 09:08:55 unSERVER kernel: end_request: I/O error, dev sdc, sector 743080666

Mar 12 09:08:55 unSERVER kernel: ata4: EH complete

Mar 12 09:08:55 unSERVER kernel: md: disk5 read error

Mar 12 09:08:55 unSERVER kernel: handle_stripe read error: 743080600/3, count: 1

Mar 12 09:08:55 unSERVER kernel: md: disk5 read error

Mar 12 09:08:55 unSERVER kernel: handle_stripe read error: 743080608/3, count: 1

Mar 12 09:08:55 unSERVER kernel: md: disk5 read error

Now... disk2 may have already failed... the only way to know for sure is to look at the full SMART report, not the summary on the page written by bjp999.

go to the disk-management page and get full SMART reports on ALL your drives.

In any case, since I have no idea how old your version of unMENU and myMain might be, the newest versions I have for unMENU 1.3 are here:

http://lime-technology.com/forum/index.php?topic=5568.0 You'll neeed the newer versions if you are on unRAID 4.5, and besides lots of tiny improvements and bug-fixes have been made over the years..

The drive missing from your display might have already been fixed by bjp999 (probably related to the un-assigned slots in your array). If not, send him a PM. I too have an un-aasigned slot in the middle of my array and the following slot does appear, so it might be fixed.

Joe L.

March 14, 201016 yr

Joe L., the Smart View tells you most everything you need to know. unMisterPink, if you click on the link in the far righthand column called "sm", it will run a full smart report. (The "sy" link will show you the relevant links from the syslog about this desk - also pretty useful!)

The "Overall" column indicates that the smartctl program is reporting that the disk has failed. The reallocated sectors and pending sectors confirm that diagnosis.

This disk is DEFINITELY bad. Replace ASAP!

March 14, 201016 yr

Joe L., the Smart View tells you most everything you need to know. unMisterPink, if you click on the link in the far righthand column called "sm", it will run a full smart report. (The "sy" link will show you the relevant links from the syslog about this desk - also pretty useful!)

The "Overall" column indicates that the smartctl program is reporting that the disk has failed. The reallocated sectors and pending sectors confirm that diagnosis.

This disk is DEFINITELY bad. Replace ASAP!

But all that has nothing to do with the errors in your syslog. The errors in the syslog are for disk5. It too is failing with media errors and unreadable sectors. It just has not yet gotten to the point where the SMART system marked it as failed (as it already has for Disk2)

You basically need to replace disk2 immediately, and disk5 if it keeps showing additional errors... (and the odds are good it will not magically repair itself )

March 14, 201016 yr

When you go to replace the disks you will be faced with an issue. You MUST replace one at a time, and use the others to rebuild the one being replaced. Normally this is easy, since all the other disks are readable. However you have potentially 2 disks that cannot be read without errors.

If drive2 is stable, all the bad sectors were re-allocated and no more bad sectors are un-readable, even if marked as bad by SMART as having reached its failure threshold, but errors are still occurring on drive5, then I'd replace drive5 first, read from disk2 to rebuild it, then replace disk2.

If drive5 is stable, and errors are still occurring on drive 2 (I did not see any in the syslog you posted, but don't know if you were reading from it) then I'd replace drive2 first.

In BOTH cases you are relying on the other failing disk to reconstruct the one you are replacing. That is risky. I'd make copies of ANY critical files on both disks somewhere else BEFORE you do anything.

Also, get SMART reports on all the drives (Use the Disk-Management page since disk5 does not appear on the myMain page) before doing anything and post them as attachments here.

Only then can anybody recommend any process to replace the disks.

Also, you asked how to best test the disks now that they are part of the array. The best method is to perform a parity CHECK. In fact, that is one step you should perform after making copies of any critical files on a PC other than the unRAID array and before you replace the first of the two drives that are failing.

By performing the parity check all the sectors of all your disks will be read. If disk2 has any sectors it has not re-allocated that are still failing they'll show up in the syslog. If it does not show any read errors during the parity check, and disk5 does, then disk2 might be the more reliable drive and you'd want to replace disk5 first. For that reason, performing a manual parity check, after you make copies of any critical files elsewhere, is the best thing you can do right now.

Post a new syslog once the parity check is complete, but before you replace any drive. And also post the SMART reports for your drives, both BEFORE and after the parity check, in case new bad sectors are identified during the parity check. (We'd like to see two sets of SMART reports. Before parity check, and after)

Joe L.

March 14, 201016 yr

In this situation (2 sick disks) I would NOT replace either one and let it rebuild... too much chance of a second failure while rebuilding.

I would get a new 1.5 TB drive, install it as a NEW drive, let it preclear and come online. Then COPY files from the 2 sick disks to it. Then remove the other two sick disks from the array.

March 14, 201016 yr

Author

I'm ordering a new 1.5 tonight...and I'll run a Long Smart Test and post the results.

BTW Joe, I'm running 4.5.3 and I believe I have the latest version of unMENU (this is CDLehner from AVS )

MP

March 14, 201016 yr

I'm ordering a new 1.5 tonight...and I'll run a Long Smart Test and post the results.

BTW Joe, I'm running 4.5.3 and I believe I have the latest version of unMENU (this is CDLehner from AVS )

MP

The long test will abort unless you disable disk spin-down.

March 14, 201016 yr

In this situation (2 sick disks) I would NOT replace either one and let it rebuild... too much chance of a second failure while rebuilding.

I would get a new 1.5 TB drive, install it as a NEW drive, let it preclear and come online. Then COPY files from the 2 sick disks to it. Then remove the other two sick disks from the array.

I agree with this. I would not run any more diagnostics on the disks until you get the data backed up. If it were me, I'd shut the thing down and wait.

March 14, 201016 yr

Author

Thanks for all the advice guys; nice to know I've found another great support community (I come from AVS). So...I actually have another 500G I haven't added to the array yet; and the 1.5 is on the way.

What do I do first? Should I get that 500G in there, and try to get data off Disk 5? Should I do a "replace" for Disk 5? Should I run tests...should I sit tight?

MP

March 14, 201016 yr

Check your cables next time you shut down, make sure they are well seated. I just discovered the latch on one of my drives doesn't work no matter what cable I use. Even brushing lightly against it loosens it.

March 14, 201016 yr

Thanks for all the advice guys; nice to know I've found another great support community (I come from AVS). So...I actually have another 500G I haven't added to the array yet; and the 1.5 is on the way.

What do I do first? Should I get that 500G in there, and try to get data off Disk 5? Should I do a "replace" for Disk 5? Should I run tests...should I sit tight?

MP

I'd not try to replace any drive just yet.

If you do have a free 500 gig drive, and a free port you can connect it to on a disk controller you have the option of connecting the new disk, pre-clearing it, adding it to the array as a pre-cleared drive (so it does NOT have to re-compute parity) and then copy the files from the failing drive to the new one. You would still have to compute parity to remove the old 500Gig, reading from the loardger bad drive, so that is still not a good idea.

I'd wait for the 1.5Gig drive. Pre-clear it, Install it, copy from both the failing drives to it, then remove BOTH failing drives, press restore and compute parity without both of them.

March 14, 201016 yr

Author

Thanks for all the advice guys; nice to know I've found another great support community (I come from AVS). So...I actually have another 500G I haven't added to the array yet; and the 1.5 is on the way.

What do I do first? Should I get that 500G in there, and try to get data off Disk 5? Should I do a "replace" for Disk 5? Should I run tests...should I sit tight?

MP

I'd not try to replace any drive just yet.

If you do have a free 500 gig drive, and a free port you can connect it to on a disk controller you have the option of connecting the new disk, pre-clearing it, adding it to the array as a pre-cleared drive (so it does NOT have to re-compute parity) and then copy the files from the failing drive to the new one. You would still have to compute parity to remove the old 500Gig, reading from the loardger bad drive, so that is still not a good idea.

I'd wait for the 1.5Gig drive. Pre-clear it, Install it, copy from both the failing drives to it, then remove BOTH failing drives, press restore and compute parity without both of them.

OK, new drive should be here soon enough. I'll just sit tight, open the case up, drop both drives in, pre-clear and take it from there.

MP

March 15, 201016 yr

If it were me, I think I would:

1 - Power down the array and wait for the 1.5T to arrive

2 - Open the case and mount both the 1.5T and the 500G

3 - Preclear the 1.5T (as a way to verify it is a good disk, not because it needs to be precleared). If the disk has reallocated sectors or other problems - you need to consider clean living and then try again )

4 - Mount the new disk. Joe L. has posted instructions on how to do this. You can also do it via unMenu. You can also mount the disk as a cache disk if you disable the mover script.

5 - Copy everything from the 2 failing disks to the 1.5T drive.

6 - Power down, remove the 2 old disks. Power up and boot unRAID.

7 - From devices tab assign your 2 good disks (1.5T and 500G) to disk slots, from the main tab press the "Restore" button (to forget about the old disks and redefine the array with to include just the currently configured drives).

8 - Start the array - wait for parity to build.

9 - Run a parity check. If parity errors - you have other problems.

Check all the smart reports. Hopefully you will have a clean and stable array.

March 15, 201016 yr

Author

If it were me, I think I would:

1 - Power down the array and wait for the 1.5T to arrive

2 - Open the case and mount both the 1.5T and the 500G

3 - Preclear the 1.5T (as a way to verify it is a good disk, not because it needs to be precleared). If the disk has reallocated sectors or other problems - you need to consider clean living and then try again )

4 - Mount the new disk. Joe L. has posted instructions on how to do this. You can also do it via unMenu. You can also mount the disk as a cache disk if you disable the mover script.

5 - Copy everything from the 2 failing disks to the 1.5T drive.

6 - Power down, remove the 2 old disks. Power up and boot unRAID.

7 - From devices tab assign your 2 good disks (1.5T and 500G) to disk slots, from the main tab press the "Restore" button (to forget about the old disks and redefine the array with to include just the currently configured drives).

8 - Start the array - wait for parity to build.

9 - Run a parity check. If parity errors - you have other problems.

Check all the smart reports. Hopefully you will have a clean and stable array.

Sounds like a good plan to me. I also plan to take the two "bad" drives and test them outside the array...once they're removed. Just to make sure I'm not throwing away disks that might have some life left in them.

CD

March 15, 201016 yr

Author

On an unrelated matter...why, if I'm trying to respond by typing under a quote...or sometimes it happens with images in the post as well...does the text field keep "popping up" from the current line? I know that isn't a very good explanation, but hopefully someone knows exactly what I'm talking about...and how to fix it....because it's maddening.

CD

March 16, 201016 yr

On an unrelated matter...why, if I'm trying to respond by typing under a quote...or sometimes it happens with images in the post as well...does the text field keep "popping up" from the current line? I know that isn't a very good explanation, but hopefully someone knows exactly what I'm talking about...and how to fix it....because it's maddening.

The only thing I can think of that remotely relates to your question (if I understand it correctly), is that you have to remember to first click inside the text box before starting to type there, and later too if you click elsewhere. I often scroll around, and cut and paste from previous messages, then scroll back and forget to click back inside the text box, and that results in strange screen jumps. Worse yet is when you start to correct a typo by using the Backspace key, and the browser goes Back to the previous web page, and you worry you may have just lost all of your typing. A Forward click has always (so far) restored my work.

March 16, 201016 yr

On an unrelated matter...why, if I'm trying to respond by typing under a quote...or sometimes it happens with images in the post as well...does the text field keep "popping up" from the current line? I know that isn't a very good explanation, but hopefully someone knows exactly what I'm talking about...and how to fix it....because it's maddening.

CD

This happens to me. I agree, it's maddening. If I want to reply and there is a big quoted area it makes it scroll up, so I end up copying the text into notepad and back when I'm done editing. I'm using IE. Could that be the problem?

March 16, 201016 yr

Author

On an unrelated matter...why, if I'm trying to respond by typing under a quote...or sometimes it happens with images in the post as well...does the text field keep "popping up" from the current line? I know that isn't a very good explanation, but hopefully someone knows exactly what I'm talking about...and how to fix it....because it's maddening.

CD

This happens to me. I agree, it's maddening. If I want to reply and there is a big quoted area it makes it scroll up, so I end up copying the text into notepad and back when I'm done editing. I'm using IE. Could that be the problem?

Thank god I'm not crazy! I use IE too.

MP

March 16, 201016 yr

Author

OK, got my new 1.5 today, and both it and the .5 are inside the box. I'm going to read back through the posts and see what my steps are. I think stop array and pre-clear the new disks is first.

EDIT- pre-clear is underway on the 1.5. Waiting it out, and then will do the same for the .5

MP

March 17, 201016 yr

Author

OK, here are the pre-clear results for my .5 drive. What am I looking for here?

Thanks,

MP

More Errors

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)