Jump to content

Read errors on one disk, what now?


Just Me

Recommended Posts

Hey!

 

This is my first real problem I have with unraid and I'm not sure how to handle it. I'm copying some file from one disk share to another so that I can reformat some disks with xfs. I copied all files from disk4 to disk1 without any problems, changed the filesystem to xfs for disk4 and copied all files from disk5 to disk4. Then I compared all files from disk5 with disk4 using rsync so that I know all files on disk4 are okay but unraid shows read error on disk4. So what should I do now?

 

diagnostics.zip attached. Since the syslog is quite long, the error accrued at the end of the log so better start at the end  ;) SMART report for disk4 is WDC_WD20EARS-00MVWB0_WD-WMAZA1724425 (sdk)

 

Thanks in advance for your help.

 

p.s. unraid 6.0 RC 6a

nas-diagnostics-20150616-0135.zip

Link to comment

Have you never fallen to the floor one of your Samsungs?  :o

 

I think he's asking if you dropped one of the Samsungs. Those drives report G-shock and one of them looks like it took a bump but is otherwise ok.

 

Yes, exactly. A VERY BAD bump...  ???

SMART value for that attribute is reporting 1 that is the worst possible value (e.g. a BUMP harder than hard drive specifications...)

 

Anyway even one of your Seagates is reporting some reallocated sectors... I would check it's surface with a HDD Regenerator live CD before putting back it on production...  :-\

Link to comment

I have a lot of funny SMART values but no, I've never dropped this Samsung drive, I have no clue why the raw value is that high. The other SMART data are fine and the drive never showed any issues.

 

Relating to the 2 reallocated sectors on one of the Seagates. The SMART report shows this value for years now, even before I used the drive in my unRAID server. I precleared this drive two times before I put it in my array, the value is constant so I guess this is not an issues.

What I'm concerned about is the command timeout. The raw value is 38655361034, value is 100, worst 098, threshold 000; 098 >> 000 so I guess it is okay too.

 

Link to comment

Yes, disk 3 and 4 are connected to the mainboard so they share the onboard controller. No BIOS update available. No new firmware for the cheap 2 port  SI3132 controller or the adaptec 1430SA controller (last update is from 2010).

 

Last night I run a parity check, no sync errors but again disk errors, this time it was disk4. There is one thing I don't unterstand, a parity check is a read only process, right? So why are there 37 writes on disk 4? 37 writes, 37 error, coincidence? The other data disks shows 0 writes (see screenshot starting with disk2. First column is reads, second writes and third errors.)

 

writes.jpg.d1a2cb4dcc94a7d024ae6ce688a55cf9.jpg

Link to comment

True but I ran the parity check at night, all systems that could access the unRAID Server were powered down. So there shouldn't be any access except unRAID itself. Anyway, so it is just coincidence that there are 37 writes and 37 errors.

 

Any ideas what I could do to find the reason for the random error?

Link to comment

Oh really? Before I bought this PSU a few years ago I checked the forum (or the wiki?) and it should support up to 12 drives.

 

Anyway, this could be the reason, I've added a new drive recently and I've never had any issues before. I think for a quick fix I'll remove an old 1 TB drive from the array. If I'm back to ten drives the psu should be fine again.

 

Thank you for your help.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...