[Solved] Errors on Disk 8 (AGAIN!) but During Parity Rebuild


Recommended Posts

Sorry for another lengthy post... let this be a warning to others trying to make use of older hardware.

 

Sigh... as much as I like unRAID, my on-going drive connection issues are driving me crazy. Today I did a scheduled weekly reboot of my unRAID server. Alas when it came back, I had 3 drives missing. I have 2 parity drives, but that only allows for recovery of 2 devices.

 

The array tried starting and appeared to do an extremely accelerated 'parity check/rebuild' for drive 8, one of the 3 that went missing. It took about 20 minutes to 'parity rebuild' 8TB... this was obviously not doing anything as there were no reads or writes being done to it or the rest of the drives in the array.

 

At the end of this 'accelerated' rebuild, Disk 8 went 'green' but the filesystem was shown as not mountable and of course no used/free space was shown. Disk 12 (4TB HGST) and Disk 15 (new 10TB that was still empty) were also shown as unmountable. In the usual spot on the Main tab, all 3 drives were listed as needing a 'format' by unRAID. I did NOT select the Yes box to allow the formatting of all 3 drives.

 

At this point I made sure Disk Settings had the array set to NOT autostart. I powered down and pulled drive 8, which has now existed on 3 separate 8TB drives that are 2 - 3 months old and all test fine. I then attached the pulled drive to my Ubuntu system and it recognized it as 8TB with an XFS file system. Further checks revealed what appear to be intact contents... movies, tv shows, documents, pictures, etc all seem fine.

 

So at current, my unRAID is powered down, and to prevent losing the apparent data on the pulled disk 8, I'm now using ddrescue on my Ubuntu system to copy the 'bad' 8TB to a new 8TB. This should take about 15 hrs. I'll also do the same with the 4TB HGST drive that also went missing.

 

The other drive that went missing was a new 10TB that had been added a few days ago but as yet did not have anything copied to it so it was still empty, but is now shown as missing. The other was one of my 4TB HGST drives that now is just unmountable. I have yet to check the 4TB HGST to see if Ubuntu can mount it. So my 1st steps towards recovery are:

 

1. Let ddrescue completely copy the 'bad' disk 8 to another new 8TB drive.

2. Use ddrescue to copy the 'bad' disk 12 to another 4TB drive.

 

While these copies complete, I've been researching options for a different case. Alas it seems that with my current budget (I'm on disability income) I'm stuck with either a new Norco or a Rosewill. As much as I'm disappointed in my 6+ year old Norco RPC-4220, it's looking likely that I'm going to take the chance on a new one, though this time I'll go for the RPC-4224 model with 4 more drive bays. I already have the genuine LSI controller on the way so it combined with my new Norco should hopefully rectify the connection issues.

 

I also sourced the Rosewill RSV-L4500 that can take 15 3.5" drives. I haven't purchased it yet but as I can't trust my old Norco RPC-4220, I thinking this could be a temporary solution until I can save up the cash for a new Norco RPC-4224. As the Rosewill doesn't have a SAS/SATA backplane for any of the 15 bays, I'll need to grab some new mini-SAS (SFF-8087; host/initiator) to 4 SATA (target; individual drives) breakout cables. The cables I have are reverse - 4 SATA (host) to SFF-8087 (target/miniSAS backplane). I believe my existing cables to be a 'crossover' breakout that's meant to hookup 4 discreet SATA ports from the motherboard/controller to a mini-SAS backplane. I could look at re-wiring my existing cables but that's too much work.

 

I'm still trying to decide if I should get the Rosewill and new mini-SAS to 4 SATA breakout cables. Since they're only needed temporarily, I may just leave unRAID offline until I can afford the replacement Norco RPC-4224. My other thought was to acquire 4 of the hard drive cages that convert 3 x 5.25" bays to 5 x 3.5" bays - Supermicro, Icy Dock, iStarUSA, Norco all make these but 4 of them cost more than a new Norco RPC-4224.

 

My last option is to temporarily create a new unRAID config that only uses direct cabling (no backplanes), albeit much reduced in capacity. I can transfer a bunch of the drives to USB enclosures and can mount them using UD for the interim. I could create a dual parity unRAID with a 1TB cache SSD, a 512GB SSD UD-mounted for Docker/VM and new 10TB and 8TB drive for data.

 

Sad that i only ever used 8 bays out of the 20 when it was used as my FreeNAS, but the rest of the unused bays are now creating havoc likely due to oxidation and dirt buildup on the connectors for 6_ years. Any thoughts or suggestions, other than stop trying to use my old Norco?

 

 

 

 

Link to comment

Thanks @Benson... I plan to do this. The Mini-SAS SFF-8087 to 4 SATA breakout cables have been ordered and should be here tomorrow (Amazon Prime). I have 4 more of the reverse cable that I could have likely re-wired, but the effort and time aren't worth it so I just ordered new ones.

 

I'm still not sure that the OEM LSI adapter from China is functioning properly, but as mentioned I did order a genuine retail packaged controller and it's on the way as well. Of course there's the need for a bunch of SATA power splitters which I have lots of... the backplanes took the standard 4 pin Molex power connectors so I'll have to use a bunch of Molex to SATA splitters so I can power all of the drives.

 

My ddrescue on the 8TB drive is 80% complete and so far no errors have been found so I may get lucky and just have to copy the data back to the array.

 

I did have one other thought - if the 8TB drive is OK all the way through, I might consider creating a new unRAID configuration once I'm direct cabled to the drives. To my understanding, all the drives that have been members of my existing config (XFS formatted by unRAID) can be re-imported to the new config without losing the data. I'll just have to let the parity rebuild on the dual parity drives. I'll do more reading on this but that seems like the easiest way to get back to a functional array.

 

Link to comment
2 hours ago, AgentXXL said:

To my understanding, all the drives that have been members of my existing config (XFS formatted by unRAID) can be re-imported to the new config without losing the data. I'll just have to let the parity rebuild on the dual parity drives. I'll do more reading on this but that seems like the easiest way to get back to a functional array. 

Sure, no matter what file system, sometimes I will re-arrange and then rebuild parity. Pls also take care on power splitter/adapter, I like build all by DIY. Good luck.

Edited by Benson
  • Like 1
Link to comment

An update and more questions for anyone following this saga. My cables to allow removal of the failing backplanes haven't arrived yet but are out for delivery. In the meantime, I went ahead and did the disk re-enable procedure for both the 'bad' 8TB and 4TB drives.

 

I was expecting this to require a data rebuild, but as I mentioned a few posts ago, when the 3 disks went missing after my reboot, the system was still configured to auto-start the array. Somehow it ran an extremely accelerated 'parity rebuild' of that 8TB drive - it took 22 mins compared to the normal 12 - 15 hrs. It didn't actually do anything as no reads/writes were incrementing for any disks other than the parity drives which only saw reads increasing. When it finished, the failed/missing Disk 8 went from red X to green dot but was still showing as unmountable.

 

As mentioned a few posts back, the 8TB seemed fine when attached to my Ubuntu system. I was able to successfully clone it to another new 8TB drive. I then re-installed the original 8TB (with the matching serial number) back into the array but on a bay that's attached to the motherboard SATA ports. Yet somehow, after re-installing the drive, it came back online in the next restart of the array with no data rebuild. And it mounts fine with all files appearing to be intact. This is not a bad thing but puzzling as I still don't know how unRAID did this.

 

That left me with 2 disks that were unmountable and still needed rebuild, and both show up under the 'Format' option below the array Start/Stop button on the main tab. What's strange is the 10TB disk has not been re-added to the array as there's no sense rebuilding what was an empty drive. Stranger still that unRAID sees it as unmountable and needing format, even though it hasn't been re-added to the array - the slot for that disk shows a red X and 'Not installed' message.

 

The 4TB drive was also attached to a bay with a motherboard connected SATA port. After the re-enable procedure it went ahead and did a data rebuild from parity. When finished, it still showed as unmountable and needing a format just like the empty 10TB. I rebooted unRAID but upon starting the array, it still lists the 4TB as unmountable and needing a format.

 

I did a power down and pulled the drive to see if my Ubuntu system sees it as valid, which it doesn't - it sees the XFS partition but when I try to mount it, Ubuntu reports it as unmountable - 'structure needs cleaning'. I assume this means an unclean filesystem from when the disk went missing.

 

I'm going to image the drive with ddrescue before trying any methods to clean the filesystem. I don't have any spare or new 4TB drives so instead of using ddrescue for disk to disk cloning, I'll just do source disk to image file. Are there any concerns with this? And are there any recommended procedures for cleaning the filesystem that are unRAID specific? I'm fairly certain that any filesystem repair will be successful and once re-installed in the array, it should hopefully be seen and re-mounted properly.

 

When my cables arrive later today, I'll of course remove the backplanes and direct wire them as suggested by @Benson. Once this is done and I power up, I'll confirm that all drives are 'green' and then proceed to do a New Config procedure. This will of course require the dual parity drives to be rebuilt. Hopefully after all this, I'll finally have a more reliable system. I'll still have the suspect LSI controller in use, so perhaps I should wait to do the new config until I have the genuine replacement installed?

 

Any other concerns I should be aware of? Thanks again for all the help!

 

 

 

 

Edited by AgentXXL
Link to comment

Latest update: the new MiniSAS to 4 SATA cables arrived and I went ahead and removed the failing backplanes from my Norco enclosure. The cables and numerous Molex to SATA power splitters later, my array is back and running. With a few caveats...

 

1st, the unRAID 'Maintenance' mode was partially successful in repairing the filesystem on the HGST 4TB drive. Alas when I ran XFS_Repair to implement the corrections, it didn't retain the original folder structure and everything got moved into numeric labelled folders in a root folder called 'lost and found'.

 

2nd, when I re-added the 4TB back into its slot, unRAID considered it a replacement disk so it's currently doing a rebuild from parity. As I ran the repair on the array device name (/dev/md14), the parity data was kept valid even though everything got moved into the 'lost and found' folder in the root of the drive.

 

Once the rebuild from parity completes (in about 4hrs), I'll then have to go into the 'lost and found' folder to rename and move items back to their original location. Tedious, but at least it appears I didn't lose any data throughout this whole ordeal.

 

One other note: before I ran the filesystem check and subsequent repair, I did make a ddrescue image of the 4TB partition. This was before the repair so it should have the original folder/file names, but unfortunately the image seems to be unmountable. I'll hold onto the image in case I need to try further recovery, but I suspect I'll be able to rename and move files and folders back to their originals once the parity rebuild is complete.

 

And once that step is complete, I'll then go ahead and do a 'New Config' so I can re-order and group drives. This isn't strictly necessary but my OCD tendency will only be sated once I do the new config.

 

The good news is that the drives that were throwing frequent UDMA CRC errors no longer appear to be having connection issues. No errors so far and I'm half-way through the parity rebuild. This would seem to indicate that all of my connection issues were related to the backplanes in the enclosure. Now that everything is direct cabled, all seems good!

 

Thanks again to the users who've helped me through this.

 

Edited by AgentXXL
  • Like 1
Link to comment
  • AgentXXL changed the title to [Solved] Errors on Disk 8 (AGAIN!) but During Parity Rebuild

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.