unraid intermittent drive failure

September 26, 201114 yr

Hello,

I’ve been experimenting with unraid over the last few weeks. I built a new machine to run unraid, specs are:

Motherboard – Gigabyte LGA 1155 Intel H67

CPU – Core i3 2100 3.1GHz

Ram – Kingston 4GB kit

PSU – Corsair enthusiast series 650-Watt 80 Plus

HD – 7x WD 2TB Caviar Green Sata II, 64MB Cache

HD Tray – Supermicro 5-in-3

Storage Card - AOC-SASLP-MV8

I’m running unraid v4.7 basic.

I quickly assembled parts, and had a working unraid test system. I precleared 3 drives, setup a Parity drive, as well as disk 1 and disk 2, setup a few shares, and started copying data. Drive 1 and 2 are residing in the supermicro cage (along with my other 3 drives which are not being used yet). Parity drive is outside the supermicro. These were all originally connected to the SAS card, but as you read through the story and debug steps, this changes.

Everything worked fine for a few weeks while I experimented with setting up PS3 media streaming, and playing around with other plugins.

I was convinced the system was fine, so one day I started to preclear other drives in anticipation of upgrading to Plus. I was very careful to make sure I was not preclearing a drive that was part of the active disk set, based on the serial number. I am 100% sure of this. However, as soon as I kicked off the preclear command, the entire array went offline.

(note, all of the data on the unraid system is test data. I have not deleted anything from my previous machines. So I am not out any data here, please take that into account as part of your analysis of my situation).

I had trouble accessing the main unraid menu, so I had no choice but to cold boot the machine. When unraid came back up, drive 1 was listed as dead. I removed the drive from the array, rebooted, and re-added the drive to the array. I then kicked off the rebuild. It started and ran for about 4 days, reaching as high as 75%, at which point all drives would go offline, and I would have to restart this process.

I thought maybe the disk was bad, so I precleared another disk and tried using that disk for the rebuild, but the same thing happened again.

I thought maybe the cables were not connected properly, so I removed and re-attached all cables, same result.

I thought maybe there is a problem with the SAS card, so I connected all drives to the MB directly, same result.

I thought maybe the problem was the Supermicro 5 in 3, so I removed drive 1 and 2 from the supermicro (still connected directly to the MB, bypassing SAS card), and IT WORKS. The drive rebuilt in about 10 hrs overnight, and all weekend the array was working fine. Parity checks fine. Everything seems ok.

So my conclusions is the supermicro cage must be bad. However, I’m not sure why it would manifest a problem when preclearing or when doing a disk rebuild. In all other situation (copying data to and from the nas, etc), it was fine. Does my conclusion that the cage is bad seem ok to you guys? Unfortunately I have no logs to track the errors I was seeing. I’m not an unraid expert but I remember the errors being along the lines of drive not found or not responding.

So should I just go out and buy another supermicro? Is there anything else you would check/try, to try to narrow down the problem above what I have already done?

Also a couple of bonus questions

B1 – I assume that preclearing is not necessary for a disk to be used for repair, since every block on the disk will be written as part of rebuild process using parity compare data from disk 2 and the parity disk. Is this correct?

B2 – I assume that the cache drive does not need to be precleared, since it will not be used for parity calculations, is that correct?

B3 – I assume I do not suffer from the Gigabyte HPA problems, as my disks are the same size as reported in another thread (1,953,514,552), and this size is the same as reported when connected to the SAS card as when connected directly to the MB. Does that assumption seem sound?

Thanks for any help you can provide,

-Rav

Quote

September 26, 201114 yr

Disable all add-ons and see if it works. Post a syslog.

Quote

unraid intermittent drive failure

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)