ShitTheBed Posted January 15 Share Posted January 15 Hello Unraiders, I came home to both parity disks being disabled, with a whopping 2000+ errors on each parity disk. I checked the logs and they both failed very close in time to each other. There is a plugin that is mentioned in the logs too... https://pastebin.com/9pg56jC2 The beggining of the errors starts with: Jan 15 01:31:56 VAULT kernel: ata2.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6 frozen Jan 15 01:31:56 VAULT kernel: ata2.00: failed command: WRITE FPDMA QUEUED Jan 15 01:31:56 VAULT kernel: ata2.00: cmd 61/08:80:80:08:00/00:00:00:00:00/40 tag 16 ncq dma 4096 out Jan 15 01:31:56 VAULT kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 15 01:31:56 VAULT kernel: ata2.00: status: { DRDY } Jan 15 01:31:56 VAULT kernel: ata2: hard resetting link It is an 8 disk array, raid 6. I am worried for my NAS! I find it very strange that both parity disks fail together when the array has being going for 2 years. I restarted twice, and both disks remain disabled. This message popped up: Adnd the 2k errors on each parity disk were zeroed away: Unraid version is: Version 6.12.6 2023-12-01 I have left the machine off whilst I post here. Any help would be appreciated. - ShitTheBed Quote Link to comment
itimpi Posted January 15 Share Posted January 15 The parity drives will remain disabled until you go through the process of rebuilding them. If you want to try rebuilding parity back to the same drives then the process is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page. The error you posted looks rather like a cabling or power issue so you should check that before attempting any rebuild. The system’s diagnostics might also give clues as to what was happening. Quote Link to comment
ShitTheBed Posted January 15 Author Share Posted January 15 Hi Itimpi, Thank you for your prompt response! I have the diagnostics, I collected them before restart. Find them attached. Can you guide me on where to look for clues? Thanks again vault-diagnostics-20240115-1723.zip Quote Link to comment
JorgeB Posted January 15 Share Posted January 15 Jan 15 01:32:11 VAULT kernel: ata1: softreset failed (1st FIS failed) Jan 15 01:32:11 VAULT kernel: ata1: hard resetting link Jan 15 01:32:16 VAULT kernel: ata6: link is slow to respond, please be patient (ready=0) Jan 15 01:32:16 VAULT kernel: ata2: softreset failed (1st FIS failed) Jan 15 01:32:16 VAULT kernel: ata2: hard resetting link Jan 15 01:32:20 VAULT kernel: ata5: softreset failed (1st FIS failed) Jan 15 01:32:20 VAULT kernel: ata5: hard resetting link Jan 15 01:32:20 VAULT kernel: ata6: found unknown device (class 0) ### [PREVIOUS LINE REPEATED 1 TIMES] ### Jan 15 01:32:21 VAULT kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jan 15 01:32:21 VAULT kernel: ata1: softreset failed (1st FIS failed) Jan 15 01:32:21 VAULT kernel: ata1: hard resetting link Jan 15 01:32:26 VAULT kernel: ata6.00: qc timeout after 5000 msecs (cmd 0xec) Jan 15 01:32:26 VAULT kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x5) Jan 15 01:32:26 VAULT kernel: ata6.00: revalidation failed (errno=-5) Jan 15 01:32:26 VAULT kernel: ata6: hard resetting link Jan 15 01:32:30 VAULT kernel: ata5: softreset failed (1st FIS failed) Jan 15 01:32:30 VAULT kernel: ata5: hard resetting link Jan 15 01:32:36 VAULT kernel: ata6: softreset failed (1st FIS failed) Jan 15 01:32:36 VAULT kernel: ata6: hard resetting link Jan 15 01:32:46 VAULT kernel: ata6: softreset failed (1st FIS failed) Jan 15 01:32:46 VAULT kernel: ata6: hard resetting link Jan 15 01:32:51 VAULT kernel: ata2: softreset failed (1st FIS failed) Jan 15 01:32:51 VAULT kernel: ata2: limiting SATA link speed to 3.0 Gbps Jan 15 01:32:51 VAULT kernel: ata2: hard resetting link Jan 15 01:32:56 VAULT kernel: ata1: softreset failed (1st FIS failed) Jan 15 01:32:56 VAULT kernel: ata1: limiting SATA link speed to 3.0 Gbps Jan 15 01:32:56 VAULT kernel: ata1: hard resetting link Jan 15 01:32:56 VAULT kernel: ata2: softreset failed (1st FIS failed) Jan 15 01:32:56 VAULT kernel: ata2: reset failed, giving up Jan 15 01:32:56 VAULT kernel: ata2.00: disable device Problem with all 4 disks connected to the onboard controller at the same time, unless they share a splitter it could be a controllers issue. Quote Link to comment
ShitTheBed Posted January 15 Author Share Posted January 15 Very insightful. I see. The case im using is UNAS-810a. It comes with a built in controller. From the pic it looks like two x4 controllers: i will find out if the 4 ones with issues were on same controller. i take it that the two non parity disks were fine after the 'issue', but parity cant tolerate the interuption? Quote Link to comment
itimpi Posted January 15 Share Posted January 15 Just now, ShitTheBed said: take it that the two non parity disks were fine after the 'issue', but parity cant tolerate the interuption? Unraid starts disabling drives when they drop offline, but only disables the number equivalent to the number of parity drives. I think it was just that the parity drives were the first ones noticed. Quote Link to comment
ShitTheBed Posted January 15 Author Share Posted January 15 Aha i see! I am going to fully back up data first. Will see if i can source a new controller, it might be tricky as its part of the (expensive!) case. Im guessing a brown out aswell as a psu issue could cause this? Quote Link to comment
ShitTheBed Posted January 15 Author Share Posted January 15 UPDATE: I zerod in on which physical drives were disabled. As JorgeB highlighted, the log shows 4 going down. I've bolded them in the list below: ata1 => sdb 1TB SSD ata2 => sdc 1TB SSD ata5 => sdd HDD (parity 2) ata6 => sde HDD (parity) ata9 => sdf HDD ata10 => sdg HDD ata11 => sdh HDD ata12 => sdi HDD ata13 => sdj HDD ata14 => sdk HDD The top two SSD's form a 1tb mirrored cache. The bottom 8 HDD form an array using two of these backplanes: https://www.u-nas.com/xcart/cart.php?target=product&product_id=17703 I thought for sure the 4 effected drives were going to be 4 HDD's on one backplane. It wasn't! It was both SSD's (not on a backplane - they go directly into mobo) that form the cache and both parity drives (on one backplane). Now I will pop of case cover and check if drives share a PCIE card, or plug right into mobo, or both... Another twist - the four failed drives all plug directly into mobo! The other 6 HDD's plug into the PCIE card: I'm now thinking power issues. PSU or unstable power coming into PSU. I tell my wife about the issue, and how it happened at 1:31 am. She thinks she went into garage (where NAS is) at that time for midnight weatabix (pregnant). She checks fitbit, the watch that tracks sleep, and confirms she did indeed! We had an electrician in a week ago who installed lots of automatic lights in garage... I'm now thinking the new light set up isnt playing nice with the NAS. I will leave them always on for now, so no power spikes. I will look into how it is wired, and if I can get hardware to monitor power supply to NAS. I will do another update if I can confirm lights causing power spikes. Thanks for the leads guys! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.