Jump to content

[SOLVED] Drive in array disabled, bunch of sector errors during write


Go to solution Solved by JorgeB,

Recommended Posts

Yesterday i logged on to my UnRaid dashboard to find that one of my drives had been disabled 2 days ago due to a disk failure/error. The array itself is still online, and all data (seems) accesible. 

 

My Server hardware:

Mainboard: MSI Z370 SLI plus

CPU: i5-8600k (stock, not overclocked)

RAM: 16GB dual channel (2x8GB)  DDR4 RAM (not sure about speed, but pretty standard RAM)

Drives: Main array: 4x 12TB Western Digital white label drives (shucked external WD mybook drives), 1x 3TB Seagate, 1x 1TB Seagate

Cache: 2x 500GB Samsung 860 EVO drives

PSU: Cooler Master G750M

UnRAID version: 6.9.2

All drives are connected through a Fujitsu D2607 / LSI SAS 9211-8i card in HBA mode.

 

The drive that failed was drive 3, which is the "last" drive of my 4x 12TB drives. Going through logs myself it's clear that the drive started spitting out sector errors like crazy when it was writing data to the drive. I'm pretty sure this happened while I was manually moving files from my cache drive to the array using Krusader, as I've done in the past. I'm not 100% sure though. Also the drive in question was completely new when I installed it in my server and have only started writing data to it in the last couple of months, so these sectors that are spitting out errors have probably never been written to before.

 

A small snippit of the sector write errors in my syslog file:

Quote

Mar 19 23:10:38 Vault kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Mar 19 23:10:38 Vault kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Mar 19 23:10:38 Vault kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Mar 19 23:10:38 Vault kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Mar 19 23:10:38 Vault kernel: sd 4:0:1:0: [sdc] tag#1648 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=3s
Mar 19 23:10:38 Vault kernel: sd 4:0:1:0: [sdc] tag#1648 Sense Key : 0x2 [current] 
Mar 19 23:10:38 Vault kernel: sd 4:0:1:0: [sdc] tag#1648 ASC=0x4 ASCQ=0x0 
Mar 19 23:10:38 Vault kernel: sd 4:0:1:0: [sdc] tag#1648 CDB: opcode=0x8a 8a 00 00 00 00 02 f0 bf 34 e0 00 00 04 00 00 00
Mar 19 23:10:38 Vault kernel: blk_update_request: I/O error, dev sdc, sector 12628997344 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997280
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997288
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997296
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997304
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997312
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997320
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997328
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997336
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997344
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997352
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997360
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997368
Mar 19 23:10:38 Vault kernel: md: disk3 write error, sector=12628997376

 

I've attached the full syslog file with all the erros, there's a bunch of them so should be easy to find.

Since then I've stopped the array, rebooted the server and restarted the array (I do have my extensive diagnostics zip from BEFORE i rebooted should that be necessary, I'll attach it to this post just in case). After rebooting the array started itself up with Disk3 still disabled but UnRAID gave me a notification that the array was "good" and that there were "no read errors", but Disk3 is still disabled. Afterwards I started an extensive SMART test on the failed/disabled drive which completed a couple of hours ago. It came back with no SMART errors. I've attached the SMART report as well.

 

At this point I'm not sure what my next step should be, any help would be appreciated.

 

 

 

 

vault-smart-20220322-1758.zip vault-syslog-20220321-1909.zip vault-diagnostics-20220321-2012-Anonymous.zip

Edited by Simplicity
Link to comment
  • Solution
4 minutes ago, Simplicity said:

one of my drives had been disabled 2 days ago

Enable system notifications so you're notified immediately when there's a problem.

 

Problem itself looks more like a power/connection issue, replace/swap cables to rule that out and as long as the emulated disk is mounting you can rebuild on top.

 

 

Link to comment

Thank you for the quick reply. I was hoping this points to a power/connection issue so I'll try your suggestion. I'm guessing it will take a while to rebuild the disk so I'll report back when it's done.

 

In order to rule out a connection/power issue would it be wise to swap the sata and power connections with another disk after te rebuild is done (I have 2 spare sata connections from the HBA card that I can use in the meantime), specifically I'm thinking of swapping the cables with my 1TB disk which is the last disk in my array and doesn't have any data on it at the moment. My thinking is that after the rebuild is complete I could connect the possbile faulty cable/connection to the last 1TB disk and try to fill it completely with data as a test to see if any write issues come back, or would that be a bad idea/not be a good test? Ofcourse I wouldn't try it with important data, just copy a bunch files that are already on another disk.

Edited by Simplicity
Link to comment

The disk finished rebuilding a couple of hours ago with no errors and everything seems to be back up and running just fine. Reconnected the disk with one of the spare SATA cables from my HBA card before rebuilding, haven't tested the possible faulty SATA connector on another disk but I'll do that some other time. I set up e-mail alerts as per your earlier suggeston as well.

 

Thank you for your help!

Edited by Simplicity
  • Like 1
Link to comment
  • Simplicity changed the title to [SOLVED] Drive in array disabled, bunch of sector errors during write

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...