Read errors on one disk, what now?

June 15, 201511 yr

Hey!

This is my first real problem I have with unraid and I'm not sure how to handle it. I'm copying some file from one disk share to another so that I can reformat some disks with xfs. I copied all files from disk4 to disk1 without any problems, changed the filesystem to xfs for disk4 and copied all files from disk5 to disk4. Then I compared all files from disk5 with disk4 using rsync so that I know all files on disk4 are okay but unraid shows read error on disk4. So what should I do now?

diagnostics.zip attached. Since the syslog is quite long, the error accrued at the end of the log so better start at the end SMART report for disk4 is WDC_WD20EARS-00MVWB0_WD-WMAZA1724425 (sdk)

Thanks in advance for your help.

p.s. unraid 6.0 RC 6a

nas-diagnostics-20150616-0135.zip

Quote

June 16, 201511 yr

SMART looks ok. Run a parity check.

Quote

June 16, 201511 yr

Author

Thanks for your reply. I'm running a non correcting parity check now, this will take a while. I'll report.

Quote

June 16, 201511 yr

Have you never fallen to the floor one of your Samsungs?

Quote

June 16, 201511 yr

Author

Pardon? I don't unterstand what you mean, sorry :-[

Quote

June 17, 201511 yr

Author

Okay, status report:

The non correcting parity check finished with 0 errors. The webgui shows 0 errors at all disks. I hate it when issues disappear on their own, now I have to worry about if it recurs $:-\$

Quote

June 17, 201511 yr

Have you never fallen to the floor one of your Samsungs?

I think he's asking if you dropped one of the Samsungs. Those drives report G-shock and one of them looks like it took a bump but is otherwise ok.

Quote

June 17, 201511 yr

Increase parity check frequency and hopefully it will reveal itself.

Quote

June 17, 201511 yr

Have you never fallen to the floor one of your Samsungs?

I think he's asking if you dropped one of the Samsungs. Those drives report G-shock and one of them looks like it took a bump but is otherwise ok.

Yes, exactly. A VERY BAD bump...

SMART value for that attribute is reporting 1 that is the worst possible value (e.g. a BUMP harder than hard drive specifications...)

Anyway even one of your Seagates is reporting some reallocated sectors... I would check it's surface with a HDD Regenerator live CD before putting back it on production... $:-\$

Quote

June 17, 201511 yr

Author

I have a lot of funny SMART values but no, I've never dropped this Samsung drive, I have no clue why the raw value is that high. The other SMART data are fine and the drive never showed any issues.

Relating to the 2 reallocated sectors on one of the Seagates. The SMART report shows this value for years now, even before I used the drive in my unRAID server. I precleared this drive two times before I put it in my array, the value is constant so I guess this is not an issues.

What I'm concerned about is the command timeout. The raw value is 38655361034, value is 100, worst 098, threshold 000; 098 >> 000 so I guess it is okay too.

Quote

June 24, 201511 yr

Author

So here we are again with errors. This time it is disk3, still no problem in smart data.

Syslog and Smart report attached. What could be the reason? Cable? SATA controller-card? I don't know where to start or even how?

nas-sylog-smart.zip

Quote

June 26, 201511 yr

Check for BIOS and SATA card firmware updates.

Do the disks share a controller?

Quote

June 26, 201511 yr

Author

Yes, disk 3 and 4 are connected to the mainboard so they share the onboard controller. No BIOS update available. No new firmware for the cheap 2 port SI3132 controller or the adaptec 1430SA controller (last update is from 2010).

Last night I run a parity check, no sync errors but again disk errors, this time it was disk4. There is one thing I don't unterstand, a parity check is a read only process, right? So why are there 37 writes on disk 4? 37 writes, 37 error, coincidence? The other data disks shows 0 writes (see screenshot starting with disk2. First column is reads, second writes and third errors.)

Quote

June 26, 201511 yr

The array is still accessible during parity operations. The parity check is read-only. There could be any number of other processes access the array at any time.

Quote

June 26, 201511 yr

Author

True but I ran the parity check at night, all systems that could access the unRAID Server were powered down. So there shouldn't be any access except unRAID itself. Anyway, so it is just coincidence that there are 37 writes and 37 errors.

Any ideas what I could do to find the reason for the random error?

Quote

June 26, 201511 yr

What is the exact model power supply?

Quote

June 27, 201511 yr

Author

It is a 400W Corsair power supply. CMPSU-400CX

Quote

June 27, 201511 yr

Power supply has reached its limit. Get >40-50Amp 12V rail.

Quote

June 27, 201511 yr

Author

Oh really? Before I bought this PSU a few years ago I checked the forum (or the wiki?) and it should support up to 12 drives.

Anyway, this could be the reason, I've added a new drive recently and I've never had any issues before. I think for a quick fix I'll remove an old 1 TB drive from the array. If I'm back to ten drives the psu should be fine again.

Thank you for your help.

Quote

Read errors on one disk, what now?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)