Unexplained disk errors when adding parity disk


Recommended Posts

I recently moved my unraid server to a new location and was having I/O issues on some of my external drives that seemed to have been caused by a loose internal USB connector. I think I fixed the external drive issues [so I went to add a parity drive] and then all of the sudden two of my array drives (disk 3 first and then disk 2) started to have I/O errors [when I tried to start the array and build parity]. I checked the cables and even switched drives around in my hot swap cages to see if it was a cable problem. I then tried running a btrfs scrub on disk3 which aborted itself. I turned off the machine to switch drives/cables around again, restarted, and now disk3 is unmountable.

 

The most recent diagnostics is with disk3 unmountable. The earlier diagnostics is when I was having I/O errors, but the disk was still mounted.

 

It seems like a hardware problem somewhere, but it seems unlikely that I would have multiple drives fail like this. EDIT: I feel like I am chasing my tail here so I want to stop before I do any damage.

mastertower-diagnostics-20220826-1636.zip mastertower-diagnostics-20220826-1526.zip

Edited by ZipsServer
added details about aprity drive
Link to comment
3 hours ago, ZipsServer said:

move my most important photos/videos off disk3

Move them where?

 

Best idea is to copy (not move) to somewhere off the array. Moving means you are deleting from the source, which may not be working well, and better if you don't move or copy to other array disks unless the array is working well with redundancy.

 

Were any disks disabled while you were trying to do this move? Any changes to an emulated disk would be lost since  you only have the contents of the physical disks now.

 

All disks mounted now, including disk3 which shows it is 88% full.

 

Link to comment

Tried moving them from disk3 to disk9. I used "rsync -av --remove-source-files /mnt/disk3/folder-path  /mnt/disk9/" which I entirely regret now. disk3 was not disabled at that time, but there were I/O errors which is why I was trying to move those files off.

 

rsync started to give errors that it couldn't copy the files and said something like "will try again".

 

It is also embarrassing to admit that I was running the array without a parity drive because I had issues adding one a month or so ago. I forget the exact issues that prevented me from adding the parity.

Link to comment

Yes, I know external disks are not recommended for the array or pools. It is an unfortunate stop gap measure at the moment. However, I am running all of those external pools in btrfs single disk mode so there is no RAID.

 

I tried adding a parity disk back to the setup, but disk2 and disk3 are still erroring out when trying to build parity. I have attached new diags. This makes no sense since the smart test showed no problems and the disks do not error when there is no parity disk.

mastertower-diagnostics-20220828-2025.zip

Link to comment
12 hours ago, ZipsServer said:

Any thoughts or insight on this series of events of plans?

If those hardware/firmware changes don't help, a better approach would be to rebuild parity without those disks first, then use Unassigned Devices to copy their data. No formatting until you have a working array with parity and with all your data.

Link to comment

Thanks everyone. Last night I ran an rsync command to copy all the contents from disk3 to another disk(8). That completed with zero errors. So it does seem to be something weird with adding the parity disk.
 

I will update the LSI firmare and then retry adding parity.

EDIT: Updated LSI (wow that was easier than the first time I did it years ago, thanks JorgeB!) but I am still running into the same issues with disk2 and disk3 when adding parity. diags attached

mastertower-diagnostics-20220829-2146.zip

Edited by ZipsServer
Link to comment

@JorgeB Same behavior this time, however I think I got the diags before it spammed the syslog too much.

 

The array runs perfectly normally without the parity disk. I have now copied all data from disk2/3 onto other disks in the array with now issues.

Googling some of the errors returns this, which suggest these problems are from bad sata cables.... but I am not sure how to interpret this in the context of these errors only happening when I add a parity drive...

mastertower-diagnostics-20220830-2152.zip

Edited by ZipsServer
Link to comment
  • ZipsServer changed the title to Unexplained disk errors when adding parity disk

I have not replaced any cables, I did not swap the power cables, nor have I used different SAS cables to connect the drives to the HBA. However, I did swap the drives around in the hot swap cages at the very beginning before I posted in the forum... which would have swapped both sata and power connections. And then I also connected the drives straight to the MB as requested, which was the most recent configuration

The fact that there are only errors when I try to add a parity disk still doesn't make sense to me. Is there anyway to check the HBA or MB for problems?

Edited by ZipsServer
Link to comment

I swapped the parity disk and disk1 between hot swap cages. Now disk1, disk2, and disk3 are erroring.

disk1, 2, and 3 are all in hotswap cage 1 which are all connected to the HBA card on port/connector 0 via a SAS breakout cable. Maybe the SAS cable randomly went bad?

I could order a new HBA with new SAS cables. I probably need to do this anyway so I can get my external drives properly added to the array.

 

Any other things to check before deciding to buy new equipment?

mastertower-diagnostics-20220831-2028.zip

Edited by ZipsServer
Link to comment
7 hours ago, ZipsServer said:

I have not replaced any cables, I did not swap the power cables, nor have I used different SAS cables to connect the drives to the HBA.

Yes, but you connected disk3 to the onboard SATA, so the SATA cable must be another one, and the errors followed the disk, was the power cable the same at that time?

Link to comment

@JorgeB Correct, I connected disk3 to onboard SATA with a different generic SATA cable. The power cable was the same. I have not moved power cables around. disk3 and the other drives with errors are all in the same hot-swap cage so the power cables are connected to the hot-swap cage.

@itimpi I don't have any brown out or similar problems when I spin up all the drivers at once, so I doubt power rating is the problem. It is a 8+ year old system that has been running 24/7 most of the time so maybe the PSU is going bad? Although it is connected to a battery backup with a power conditioner on it.

Link to comment

Are you running any Power Splitters like a Molex to Multiple Sata or anything else out of the norm?

I'm only asking this because I had two drives go "bad" when it was just a bad cable to 2 different drives. 

 

Or 

 

Drive Cage aka box that holds Multiple drives has a bad power connector or Sata Connector on the can cause issues too. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.