Geck0 Posted March 1 Share Posted March 1 Hi, the weekend has greeted me with read errors on Drive 1. Unraid was in the middle of a monthly parity check, which I've now paused until I've received some feedback. Under "Fix Probles", it states this Quote If the disk has not been disabled, then Unraid has successfully rewritten the contents of the offending sectors back to the hard drive. It would be a good idea to look at the S.M.A.R.T. Attributes Drive hasn't been disabled, a short SMART test shows no errors. Disk log information is Quote text error warn system array login Feb 19 18:40:39 Nexus kernel: ata10: SATA max UDMA/133 abar m2048@0xfbe00000 port 0xfbe00380 irq 83 Feb 19 18:40:39 Nexus kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 19 18:40:39 Nexus kernel: ata10.00: ATA-11: ST14000VN0008-2JG101, SC60, max UDMA/133 Feb 19 18:40:39 Nexus kernel: ata10.00: 27344764928 sectors, multi 16: LBA48 NCQ (depth 32), AA Feb 19 18:40:39 Nexus kernel: ata10.00: Features: NCQ-sndrcv Feb 19 18:40:39 Nexus kernel: ata10.00: configured for UDMA/133 Feb 19 18:40:39 Nexus kernel: sd 10:0:0:0: [sdi] 27344764928 512-byte logical blocks: (14.0 TB/12.7 TiB) Feb 19 18:40:39 Nexus kernel: sd 10:0:0:0: [sdi] 4096-byte physical blocks Feb 19 18:40:39 Nexus kernel: sd 10:0:0:0: [sdi] Write Protect is off Feb 19 18:40:39 Nexus kernel: sd 10:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 19 18:40:39 Nexus kernel: sd 10:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 19 18:40:39 Nexus kernel: sd 10:0:0:0: [sdi] Preferred minimum I/O size 4096 bytes Feb 19 18:40:39 Nexus kernel: sdi: sdi1 Feb 19 18:40:39 Nexus kernel: sd 10:0:0:0: [sdi] Attached SCSI removable disk Feb 19 18:41:04 Nexus emhttpd: ST14000VN0008-2JG101_ZHZ3DK2T (sdi) 512 27344764928 Feb 19 18:41:04 Nexus kernel: mdcmd (2): import 1 sdi 64 13672382412 0 ST14000VN0008-2JG101_ZHZ3DK2T Feb 19 18:41:04 Nexus kernel: md: import disk1: (sdi) ST14000VN0008-2JG101_ZHZ3DK2T size: 13672382412 Feb 19 18:41:04 Nexus emhttpd: read SMART /dev/sdi Feb 19 18:43:59 Nexus emhttpd: shcmd (209): /usr/local/sbin/set_ncq sdi 1 Feb 19 18:43:59 Nexus root: set_ncq: setting sdi queue_depth to 1 Feb 19 18:43:59 Nexus emhttpd: shcmd (210): echo 128 > /sys/block/sdi/queue/nr_requests Mar 2 01:31:41 Nexus kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 2 01:31:41 Nexus kernel: ata10.00: irq_stat 0x40000001 Mar 2 01:31:41 Nexus kernel: ata10.00: failed command: READ DMA EXT Mar 2 01:31:41 Nexus kernel: ata10.00: cmd 25/00:80:78:84:e8/00:01:29:06:00/e0 tag 18 dma 196608 in Mar 2 01:31:41 Nexus kernel: ata10.00: status: { DRDY SENSE ERR } Mar 2 01:31:41 Nexus kernel: ata10.00: error: { UNC } Mar 2 01:31:41 Nexus kernel: ata10.00: configured for UDMA/133 Mar 2 01:31:41 Nexus kernel: sd 10:0:0:0: [sdi] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=6s Mar 2 01:31:41 Nexus kernel: sd 10:0:0:0: [sdi] tag#18 Sense Key : 0x3 [current] Mar 2 01:31:41 Nexus kernel: sd 10:0:0:0: [sdi] tag#18 ASC=0x11 ASCQ=0x4 Mar 2 01:31:41 Nexus kernel: sd 10:0:0:0: [sdi] tag#18 CDB: opcode=0x88 88 00 00 00 00 06 29 e8 84 78 00 00 01 80 00 00 Mar 2 01:31:41 Nexus kernel: I/O error, dev sdi, sector 26472907896 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 2 Mar 2 01:31:41 Nexus kernel: ata10: EH complete Mar 2 01:31:48 Nexus kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 2 01:31:48 Nexus kernel: ata10.00: irq_stat 0x40000001 Mar 2 01:31:48 Nexus kernel: ata10.00: failed command: READ DMA EXT Mar 2 01:31:48 Nexus kernel: ata10.00: cmd 25/00:00:f8:85:e8/00:02:29:06:00/e0 tag 6 dma 262144 in Mar 2 01:31:48 Nexus kernel: ata10.00: status: { DRDY SENSE ERR } Mar 2 01:31:48 Nexus kernel: ata10.00: error: { UNC } Mar 2 01:31:48 Nexus kernel: ata10.00: configured for UDMA/133 Mar 2 01:31:48 Nexus kernel: sd 10:0:0:0: [sdi] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=6s Mar 2 01:31:48 Nexus kernel: sd 10:0:0:0: [sdi] tag#6 Sense Key : 0x3 [current] Mar 2 01:31:48 Nexus kernel: sd 10:0:0:0: [sdi] tag#6 ASC=0x11 ASCQ=0x4 Mar 2 01:31:48 Nexus kernel: sd 10:0:0:0: [sdi] tag#6 CDB: opcode=0x88 88 00 00 00 00 06 29 e8 85 f8 00 00 02 00 00 00 Mar 2 01:31:48 Nexus kernel: I/O error, dev sdi, sector 26472908280 op 0x0:(READ) flags 0x0 phys_seg 64 prio class 2 Mar 2 01:31:48 Nexus kernel: ata10: EH complete ** Press ANY KEY to close this window ** .....and more importantly here are the diagnostics attached. I would appreciate it if somebody could cast their eye over this, I've taken docker offline and paused the parity check. nexus-diagnostics-20240302-0902.zip Quote Link to comment
JorgeB Posted March 2 Share Posted March 2 It's logged as a disk error, but it may be corrected now, try another parity check from the beginning or run an extended SMART test on disk1 Quote Link to comment
Geck0 Posted March 2 Author Share Posted March 2 Hi Jorge, I cancelled the existing parity check and put the array into maintenance mode. I'm currently running an extended smart test. I'll reverr after it completes. I have got a new drive on standby if I need to swap out. Thanks for taking the time to respond. Quote Link to comment
Geck0 Posted March 4 Author Share Posted March 4 Hi JorgeB, I've completed the extended smart test, it came back as "completed without error". The Extended smart test results; The parity test completed today and came back with no issues. I was running a backup of my Nextcloud data and noticed in the logs that a number of excel files had an md5 hash difference from the last backup. All of them are on disk1. I've only just found them and still need to compare to see if there is an issue with the server side ones, as it may be the backup drive thats at fault here. However, it makes me nervous that there are other issues as well, I don't backup the entire drive, just the important data. I'm not great at reading SMART drive results, is it worth swapping out the drive and performing a rebuild from parity? Quote It's logged as a disk error, but it may be corrected now, try another parity check from the beginning or run an extended SMART test on disk1 Do you mean corrected from parity or reallocated sectors? I'm not sure what happens in this instance, but have this concern that corrupted files have been written to parity. Any input would be appreciated. Quote Link to comment
JorgeB Posted March 4 Share Posted March 4 6 minutes ago, Geck0 said: but have this concern that corrupted files have been written to parity No reason to think that, if the SMART test passed disk is OK for now, keep monitoring, more errors in the near future you may consider replacing it. Quote Link to comment
Geck0 Posted March 6 Author Share Posted March 6 Hi JorgeB et al, I've had an interesting week. Drive 5 started failing today, it kicked off with reallocated sectors, which increased from 17 to 126 within 4 hours and then up to 215 after another 45 mins. It also came up with a pending sector of 1, which later returned to normal. The disk then went off line, after becoming "uncorrectable is 1" and entering "Disk 5 in error". Fortunately, I still had a brand new 18TB on standby, already hooked up. I've started a rebuild. The original disk can still be mounted, but I've left this alone for now, in case the rebuild fails. I've not had two drives with errors in the same week before. Can you advise if there is anything else I should consider? I'm not aware that a faulty cable or disk controller could cause this issue, I'm just wondering if there is anything else to look at? The two drives this week are both Iron Wolf Pro and purchased a couple of years apart. The one that is failing today is only a couple of years old. It failed the extended SMART test and dropped like a rock from there. I'm starting to rethink the quality of Seagate's drives. nexus-diagnostics-20240306-1704.zip Quote Link to comment
itimpi Posted March 6 Share Posted March 6 1 hour ago, Geck0 said: I'm not aware that a faulty cable or disk controller could cause this issue, I'm just wondering if there is anything else to look at? The only possibility I can think of other than the disk itself failing might be some obscure power related issue. Do not think, however, that could cause the rapidly increasing reallocated sectors value. Quote Link to comment
JorgeB Posted March 6 Share Posted March 6 47 minutes ago, itimpi said: Do not think, however, that could cause the rapidly increasing reallocated sectors value. I have seen that before, bad power causing reallocated sectors, but most likely it was just a bad disk, if it happens again to a different disk, then I would consider that. Quote Link to comment
Geck0 Posted March 6 Author Share Posted March 6 Okay, thanks for replying guys. Appreciate it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.