ZipsServer Posted August 26, 2022 Share Posted August 26, 2022 (edited) I recently moved my unraid server to a new location and was having I/O issues on some of my external drives that seemed to have been caused by a loose internal USB connector. I think I fixed the external drive issues [so I went to add a parity drive] and then all of the sudden two of my array drives (disk 3 first and then disk 2) started to have I/O errors [when I tried to start the array and build parity]. I checked the cables and even switched drives around in my hot swap cages to see if it was a cable problem. I then tried running a btrfs scrub on disk3 which aborted itself. I turned off the machine to switch drives/cables around again, restarted, and now disk3 is unmountable. The most recent diagnostics is with disk3 unmountable. The earlier diagnostics is when I was having I/O errors, but the disk was still mounted. It seems like a hardware problem somewhere, but it seems unlikely that I would have multiple drives fail like this. EDIT: I feel like I am chasing my tail here so I want to stop before I do any damage. mastertower-diagnostics-20220826-1636.zip mastertower-diagnostics-20220826-1526.zip Edited August 27, 2022 by ZipsServer added details about aprity drive Quote Link to comment
ZipsServer Posted August 26, 2022 Author Share Posted August 26, 2022 One pass of Memtest completed without any errors. Quote Link to comment
JorgeB Posted August 27, 2022 Share Posted August 27, 2022 Parity and disk3 are disabled, which by itself is strange, but without parity disk3 can't be emulated, do a new config with all disks except parity and start the array then post new diags. Quote Link to comment
ZipsServer Posted August 27, 2022 Author Share Posted August 27, 2022 Thanks! Here are the new diags. As you will see in the diags, - I have my external drive pools disconnected for now. - I ran an extended self test overnight on disk3 and it showed no errors. Should I try to add parity back in now to see what happens? mastertower-diagnostics-20220827-1451.zip Quote Link to comment
ZipsServer Posted August 27, 2022 Author Share Posted August 27, 2022 ... yesterday I was trying to move my most important photos/videos off disk3 and it appears that those files are now gone/missing. Quote Link to comment
trurl Posted August 27, 2022 Share Posted August 27, 2022 3 hours ago, ZipsServer said: move my most important photos/videos off disk3 Move them where? Best idea is to copy (not move) to somewhere off the array. Moving means you are deleting from the source, which may not be working well, and better if you don't move or copy to other array disks unless the array is working well with redundancy. Were any disks disabled while you were trying to do this move? Any changes to an emulated disk would be lost since you only have the contents of the physical disks now. All disks mounted now, including disk3 which shows it is 88% full. Quote Link to comment
ZipsServer Posted August 28, 2022 Author Share Posted August 28, 2022 Tried moving them from disk3 to disk9. I used "rsync -av --remove-source-files /mnt/disk3/folder-path /mnt/disk9/" which I entirely regret now. disk3 was not disabled at that time, but there were I/O errors which is why I was trying to move those files off. rsync started to give errors that it couldn't copy the files and said something like "will try again". It is also embarrassing to admit that I was running the array without a parity drive because I had issues adding one a month or so ago. I forget the exact issues that prevented me from adding the parity. Quote Link to comment
trurl Posted August 28, 2022 Share Posted August 28, 2022 On 8/26/2022 at 4:48 PM, ZipsServer said: external drives that seemed to have been caused by a loose internal USB connector. USB NOT recommended for array or pools for many reasons, some of which you have already experienced. Quote Link to comment
ZipsServer Posted August 29, 2022 Author Share Posted August 29, 2022 Yes, I know external disks are not recommended for the array or pools. It is an unfortunate stop gap measure at the moment. However, I am running all of those external pools in btrfs single disk mode so there is no RAID. I tried adding a parity disk back to the setup, but disk2 and disk3 are still erroring out when trying to build parity. I have attached new diags. This makes no sense since the smart test showed no problems and the disks do not error when there is no parity disk. mastertower-diagnostics-20220828-2025.zip Quote Link to comment
ZipsServer Posted August 29, 2022 Author Share Posted August 29, 2022 I am going to keep the parity disk out of the mix for the moment, move all the data off disk2 and disk3, reformat disk2 and disk3, and then try adding parity back in. Any thoughts or insight on this series of events of plans? Quote Link to comment
JorgeB Posted August 29, 2022 Share Posted August 29, 2022 Doesn't look like a disk problem, try updating the LSI firmware to latest, if errors persist try connecting disk3 to the onboard SATA, you can swap with another disk. Quote Link to comment
trurl Posted August 29, 2022 Share Posted August 29, 2022 12 hours ago, ZipsServer said: Any thoughts or insight on this series of events of plans? If those hardware/firmware changes don't help, a better approach would be to rebuild parity without those disks first, then use Unassigned Devices to copy their data. No formatting until you have a working array with parity and with all your data. Quote Link to comment
ZipsServer Posted August 30, 2022 Author Share Posted August 30, 2022 (edited) Thanks everyone. Last night I ran an rsync command to copy all the contents from disk3 to another disk(8). That completed with zero errors. So it does seem to be something weird with adding the parity disk. I will update the LSI firmare and then retry adding parity. EDIT: Updated LSI (wow that was easier than the first time I did it years ago, thanks JorgeB!) but I am still running into the same issues with disk2 and disk3 when adding parity. diags attached mastertower-diagnostics-20220829-2146.zip Edited August 30, 2022 by ZipsServer Quote Link to comment
ZipsServer Posted August 30, 2022 Author Share Posted August 30, 2022 Switched disk2 and disk3 to the SATA connection on the mobo. There seems to be problems preventing disks from being unmounted and the array from stopping. (failed command: READ FPDMA QUEUED) diags attached. probably going to have to hard shutdown mastertower-diagnostics-20220829-2204.zip Quote Link to comment
ZipsServer Posted August 30, 2022 Author Share Posted August 30, 2022 Not sure what happened, but somehow the disks were unmounted and the array stopped mastertower-diagnostics-20220829-2208.zip Quote Link to comment
JorgeB Posted August 30, 2022 Share Posted August 30, 2022 Syslog stopped writing because of spam, cannot see what happened after switching the disks to the onboard controller, please reboot and post new diags after array start. Quote Link to comment
ZipsServer Posted August 31, 2022 Author Share Posted August 31, 2022 (edited) @JorgeB Same behavior this time, however I think I got the diags before it spammed the syslog too much. The array runs perfectly normally without the parity disk. I have now copied all data from disk2/3 onto other disks in the array with now issues. Googling some of the errors returns this, which suggest these problems are from bad sata cables.... but I am not sure how to interpret this in the context of these errors only happening when I add a parity drive... mastertower-diagnostics-20220830-2152.zip Edited August 31, 2022 by ZipsServer Quote Link to comment
JorgeB Posted August 31, 2022 Share Posted August 31, 2022 It does look like a cable problem, but there were also issues when connected to the HBA with different cables, did you also replaced/swapped the power cable? Quote Link to comment
ZipsServer Posted September 1, 2022 Author Share Posted September 1, 2022 (edited) I have not replaced any cables, I did not swap the power cables, nor have I used different SAS cables to connect the drives to the HBA. However, I did swap the drives around in the hot swap cages at the very beginning before I posted in the forum... which would have swapped both sata and power connections. And then I also connected the drives straight to the MB as requested, which was the most recent configuration The fact that there are only errors when I try to add a parity disk still doesn't make sense to me. Is there anyway to check the HBA or MB for problems? Edited September 1, 2022 by ZipsServer Quote Link to comment
ZipsServer Posted September 1, 2022 Author Share Posted September 1, 2022 (edited) I swapped the parity disk and disk1 between hot swap cages. Now disk1, disk2, and disk3 are erroring. disk1, 2, and 3 are all in hotswap cage 1 which are all connected to the HBA card on port/connector 0 via a SAS breakout cable. Maybe the SAS cable randomly went bad? I could order a new HBA with new SAS cables. I probably need to do this anyway so I can get my external drives properly added to the array. Any other things to check before deciding to buy new equipment? mastertower-diagnostics-20220831-2028.zip Edited September 1, 2022 by ZipsServer Quote Link to comment
itimpi Posted September 1, 2022 Share Posted September 1, 2022 Are you sure your PSU is up to handling that many drives? Do you have any splitters used in the power supply cabling to the drives? Quote Link to comment
JorgeB Posted September 1, 2022 Share Posted September 1, 2022 7 hours ago, ZipsServer said: I have not replaced any cables, I did not swap the power cables, nor have I used different SAS cables to connect the drives to the HBA. Yes, but you connected disk3 to the onboard SATA, so the SATA cable must be another one, and the errors followed the disk, was the power cable the same at that time? Quote Link to comment
ZipsServer Posted September 2, 2022 Author Share Posted September 2, 2022 @JorgeB Correct, I connected disk3 to onboard SATA with a different generic SATA cable. The power cable was the same. I have not moved power cables around. disk3 and the other drives with errors are all in the same hot-swap cage so the power cables are connected to the hot-swap cage. @itimpi I don't have any brown out or similar problems when I spin up all the drivers at once, so I doubt power rating is the problem. It is a 8+ year old system that has been running 24/7 most of the time so maybe the PSU is going bad? Although it is connected to a battery backup with a power conditioner on it. Quote Link to comment
kizer Posted September 2, 2022 Share Posted September 2, 2022 Are you running any Power Splitters like a Molex to Multiple Sata or anything else out of the norm? I'm only asking this because I had two drives go "bad" when it was just a bad cable to 2 different drives. Or Drive Cage aka box that holds Multiple drives has a bad power connector or Sata Connector on the can cause issues too. Quote Link to comment
trurl Posted September 2, 2022 Share Posted September 2, 2022 26 minutes ago, ZipsServer said: don't have any brown out or similar problems when I spin up all the drivers at once I wouldn't expect your lights to flicker if that's what you mean. Doesn't mean the PSU can supply the power required. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.