Possible disc errors but system reports as OK?

June 11, 201313 yr

I am using RC12 with unmenu and have "unRAID Status Alert sent hourly by e-mail" installed. The emails report the system as being fine, however the syslog suggests that there are issues with the disc and I cannot get a SMART report for the drive. If I use http://Tower/Main the drive still has a green ball but reports 763 errors.

The drive in question was recently added to my array to replace a smaller drive. It was not brand new.

For now the array is accessible (I can read/write) and both web GUI's show the drives as Green, but the syslog errors and failure to produce a SMART report have me concerned of drive failure. I haven't rebooted the server yet.

I'm also concerned about the value of the email alerts - is this expected behaviour?

unraid_status_ok_email.txt

Quote

June 11, 201313 yr

Can you show a screen-shot of the unRAID main page showing the errors?

Also, please attach a syslog to your next post. (you can download it easily from the syslog page in unMENU)

Quote

June 11, 201313 yr

You can have thousands of "read" errors, and a drive will not be taken out of service. This is expected. The drive WILL be taken out of service on a single "write" failure. The linux kernel will often re-set the disk controller and retry and not even report the error upward to the calling process.

If all the indicators are green, then the disk is still being accessed.

What do you get when you attempt to get a smart report?

How are you attempting to get it?

Joe L.

Quote

June 11, 201313 yr

Author

Thanks Joe,

Full syslog was too big, so only the last 2 days are attached for those who like a little light reading...

I have only tried to get the SMART report from the unmenu disk management page. I get the following ONLY for that drive - all other drives work:

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Quote

June 11, 201313 yr

Thanks Joe,

Full syslog was too big, so only the last 2 days are attached for those who like a little light reading...

I have only tried to get the SMART report from the unmenu disk management page. I get the following ONLY for that drive - all other drives work:

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

As moderator I removed your syslog. Your email/password were in it. (I was the only downloader, so you are probably safe, but you should change your email password regardless)

You really need to turn OFF the debugging mode you currently have enabled as it puts your email/password in the syslog.

(probably part of the e-mail setup/config screen)

disk1 seems to have lots of "media errors" (un-readable sectors)

Does it respond and provide output to

hdparm -i /dev/sde

( I think it is /dev/sde )

or, if on a different controller than the rest of your disks you might try

smartctl -a /dev/sde

post its SMART report.

Another disk, /dev/hdb

seems to also have errors. Is this an IDE drive? (not an SATA drive?) Is this the one not responding t the SMART report request?

Jun 10 17:47:02 Tower kernel: hdb: task_pio_intr: status=0x51 { DriveReady SeekComplete Error }

Jun 10 17:47:02 Tower kernel: hdb: task_pio_intr: error=0x04 { DriveStatusError }

Jun 10 17:47:02 Tower kernel: hdb: possibly failed opcode: 0xa1

Edit: It seems to be a WD-EADS drive according to your screen shot earlier. If so, it is an SATA drive and you have the disk controller set in IDE emulation mode. You need to set it in your BIOS to ACHI mode. (Take it out of legacy mode, or whatever they cal it in your BIOS) Right now it is running in a much slower mode designed to emulate IDE disks so it can boot Win-XP or prior.

Quote

June 11, 201313 yr

Author

oops... thanks for getting rid of the syslog for me...

I ran the hdparm command on a few drives. It does work on sde, and while most of what it reports means nothing to me, I note "Config = { Fixed }" on it, whereas all others seem to have other values.

smartctl -a /dev/sde didn't seem to work. -i did work however, and -H reports the following:

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: FAILED!

Drive failure expected in less than 24 hours. SAVE ALL DATA.

Failed Attributes:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 001 001 051 Pre-fail Always FAILING_NOW 41698

I also tried to run a short test which says it should complete in 2mins, but some 10 mins later nothing has appeared on the console. I tried to run the same test and pipe it to a text file, but got the same as I see on the console (which is not test results, just "testing has begun..." type info).

So it looks like this drive is about to fail, and a second (/dev/hdb) is also having issues... If so, why does unraid continue to report the array as being "ok"? If both were to fail it would surely result in data loss.

Quote

June 11, 201313 yr

Author

/dev/hdb is on a PCI-x controller. It's a generic card I bought on ebay running a Jmicron chip (BIOS v1.06.54), and its interface only has options for configuring RAID - nowhere to set AHCI. Curiously, /dev/sdj is also on the same controller, and I presume that because it's sdj it means it's been detected as SATA/AHCI. Strange that the other has not.

I had 2x spare slots on that controller, so moved it to the next but the controller spent 10mins searching for disks before I turned it off. Moved it to the next free slot and it was detected immediately and is now /dev/sdj. What used to be /dev/sdj is now /dev/sdk (both on the same controller).

Weird...

All lights are green and error counts were cleared during the reboot. Disk1 was /dev/sde and is now /dev/sda - I can also now run SMART tests on it from unmenu (couldn't before for some reason). The 'Status Report' suggests the drive as failing - yet unraid reports nothing but green lights. I have triggered a short SMART test, but some 13mins later (when the suggestion is that it should only take ~2) has displayed nothing. Is this normal - does it actually display the result of the test, or does it just update the SMART values (to later view in the 'Status Report')?

It seems like the only time the array isn't considered ok, is when a drive actually fails/missing. How do people maintain awareness of the ACTUAL state of their array without manually checking?

Quote

Possible disc errors but system reports as OK?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)