January 11Jan 11 So I recently migrated my server to new hardware. Everything appeared to be up and running well. I had expanded my parity from 2x 4TB drives to 2x 6TB drives, and the rebuild was complete.Today, I decided to complete the last step of the migration, by moving the GPU from my old server to the new one.I shut down the new server, swapped over the card, and booted it back up. No other hardware or software changes were made.Everything came back up fine, except that the array didn't start. (?!)I looked at the [Main] tab, and it says... well, see the screenshots, please.I don't understand. The drive it says is "missing" is the one that's listed right there as "parity". Parity 2, which is currently empty, was previously another 6TB drive of the same model, which is now listed down under unassigned devices, although I have no way of knowing which one it was, since I have several of those drives. I was still sorted out how to handle the expansion of the array, so they were just hanging out in reserve.1) Why would it just drop a parity drive like that?2) How can I determine which drive was Parity 2?3) Can I just add it back in at the top?4) How can I prevent this from happening again?EDIT: From looking through the logs, I'm fairly sure that (sdh) is/was Parity 2, since it's the only one that isn't flagged as 'pre-cleared'. I had run pre-clearing on the other drives to get them ready to add or swap in later.EDIT 2: This makes no sense. Am I seeing that it's expecting both Parity and Parity 2 to be drive S/N ...DON5QL? That's not possible, right? Has something gone glitched here? Edited January 11Jan 11 by Elmojo added info about Parity2
January 11Jan 11 Author 2 hours ago, trurl said:RAID controller?I assume you mean is it going bad? I don't think so, but I really wouldn't know how to check. All drives are showing as physically online, by which I mean that the lights on the front of the chassis are indicating that each drive is connected and online.Also, the error appears to be something specific to unRAID, just from the message it's showing me on the [main] screen. I mean, why does it show that the Parity drive is xxx5QL, but then it says that drive 5QL is 'missing'? I thought maybe that drive got assigned to the wrong parity slot somehow, so I moved 5QL to Parity 2, and assigned drive xxxMAC (the one that I'm fairly sure was one of the parity drives before all this happened) to Parity.Now it says that Parity is "wrong" and is looking for 5QL, even thought that drive is in the Parity 2 slot. (see new screenshots for clearer explanation).I tried flipping the order around, and it doesn't like that either.I'm fairly sure that the correct assignment should be drive MAC (sdh) in the Parity slot, and drive 5QL (sdi) in the Parity 2 slot, but it says these are wrong.At this point, can I just pick and arrangement and go with it? I know I'll have to let it rebuild the parity, again, but I just need this server back up and running ASAP.I did not apply any of these changes, by the way.
January 11Jan 11 Community Expert Parity and parity2 do not have the same contents. They are calculated by different independent algorithms so they can provide independent results (2 extra bits instead of 1) and so allow 2 rebuilds at the same time.So if they aren't assigned exactly as before both will have to be rebuilt.
January 11Jan 11 Author Ok... I didn't know that, but how does that bear on this issue? I'm not being snarky, I genuinely don't understand.The problem is that the machine just suddenly decided to change the parity assignments, and now it can't seem to agree on what drive was which parity slot.As best as I can tell, it thinks that both slots, and neither one, were drive 5QL. That just sounds like a plain 'ol bug to me.Is there any sort of diagnostic (other than what I already submitted) that could show what it's thinking in the background? Is there a config file that tells it which drive is which parity or something?I'm just trying to understand how this could happen.More urgently important at the moment, what do I need to do to get the server back up and functional? I have about 3 hours before this blows up in my face... 😬
January 11Jan 11 Community Expert My first thought was that a problem with flash drive had failed to record your new drive assignments. And your diagnostics do indicate flash has been repaired, perhaps automatically when it booted./boot total 890440 drwx------ 3 root root 4096 Apr 12 2021 EFI- -rw------- 1 root root 4096 Dec 31 1979 FSCK0000.REC -rw------- 1 root root 24576 Dec 31 1979 FSCK0001.REC -rw------- 1 root root 4096 Dec 31 1979 FSCK0002.REC -rw------- 1 root root 4096 Dec 31 1979 FSCK0003.REC -rw------- 1 root root 4096 Dec 31 1979 FSCK0004.REC Then I saw the RAID controller and thought maybe it wasn't cleanly showing Unraid the drive serial numbers so it was having trouble identifying them.If you are sure of your drive assignments I guess you could New Config/Preserve All (and maybe make any missing assignments) before starting the array with the Parity Valid box checked. And then maybe do a parity check anyway just to be sure.
January 11Jan 11 Author Any of what you mentioned is possible, I guess. That worries me that you're seeing flash drive repairs. :/ I certainly didn't do anything manually, it must have been some sort of auto repair.What did you see about the RAID controller that made you suspicious of it? Just the fact that it is present? It should be in HBA/IT mode, and only passing the drive data through to the OS. I'm not using any of the actual RAID features, unless I did something wrong. ;) I'm not 100% sure of the drive assignments, more like 85-90%. What happens if I do the new config thing, and I get it wrong? Will it corrupt my array data, or will I just maybe have to rebuild the parity from scratch?
January 11Jan 11 Community Expert 16 minutes ago, Elmojo said:It should be in HBA/IT modeI don't think so19:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] [1000:005f] (rev 02) DeviceName: Integrated RAID Subsystem: Dell PERC H330 Adapter [1028:1f44] Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas
January 11Jan 11 Author 1 minute ago, trurl said:I don't think soYou don't think it is, or you don't think it should be?
January 11Jan 11 Community Expert Also, those 35000 numbers tacked onto the end in your screenshots. And these 35000 are also the filenames of the SMART reports instead of the serial number.
January 11Jan 11 Author 1 minute ago, trurl said:Also, those 35000 numbers tacked onto the end in your screenshots. And these 35000 are also the filenames of the SMART reports instead of the serial number.What about them? Please explain things. Don't assume I automatically know what you mean. You just confusing me more! lol
January 11Jan 11 Author Ok, I understand that part now. You're saying that you don't think my PERC is in HBA mode. I'm trying to get into the iDRAC to check, but it seems to have hopped IPs on me when I rebooted. Standby...
January 11Jan 11 Community Expert Just now, Elmojo said:What about them?In the screenshots for your parity assignments, the wrong is showing you what it thinks should be assigned there. I don't know if that 35000 part at the end agrees with what the controller is currently telling it or not.
January 11Jan 11 Community Expert I thought perhaps @JorgeB might come into this thread since it was started 5 hours ago. It might be getting a little late in his timezone now. He has more experience with some of this hardware than I do.
January 11Jan 11 Author 5 minutes ago, trurl said:In the screenshots for your parity assignments, the wrong is showing you what it thinks should be assigned there. I don't know if that 35000 part at the end agrees with what the controller is currently telling it or not.I see! So it's not just the actual drive serial, but also the controller assignment, or whatever that number is, that could be tripping it up?
January 11Jan 11 Author I finally got into iDRAC (flippin' ethernet cable was unplugged!) and was able to check status of RAID controller. See screenshot.All I know is what it tells me. 🤷
January 11Jan 11 Author So I have to do something... any suggestions?This server must be running, one way or another, in the next hour. I'll be away all next week, and I have to be able to access my data remotely.If I need to just discard the current parity and rebuild it, I'll do that. Please just direct me on whatever path is the least likely cause any sort data corruption on the array.
January 11Jan 11 Community Expert 1 hour ago, trurl said:If you are sure of your drive assignments I guess you could New Config/Preserve All (and maybe make any missing assignments) before starting the array with the Parity Valid box checked. And then maybe do a parity check anyway just to be sure.
January 12Jan 12 Author Well, the array started. :) I'm running a parity check now, hopefully it will complete without errors. I told it that the parity was correct, as you suggested.It still baffles me why this happened, and worries me that it could do it again, the next time I reboot.Do you think it's related to the RAID controller?Other than switching it to "HBA mode" in the UEFI, which Dell says should operate as "drive passthrough", I don't know what else to try. It doesn't appear that there's an option to flash this card to true IT mode like I did in my T630. I can't recall which RAID/PERC it was running, but there was an LSI firmware available that made it show up as a true HBA card to the machine.Thanks for your help with this! Maybe if JorgeB comes around later, he may have some additional thoughts. Edited January 12Jan 12 by Elmojo
January 12Jan 12 Community Expert Recommend flashing the HBA to IT mode; it's currently using the megaRAID driver, not the HBA driver, that can cause this and other issues.
January 12Jan 12 Community Expert 14 hours ago, Elmojo said:You don't think it is, or you don't think it should be?UNraid thinks it is NOT in IT mode since it is using the megaraid dtiver.
January 12Jan 12 Author 14 hours ago, JorgeB said:Recommend flashing the HBA to IT modeI would LOVE to, but I haven't found an option to do that for this adapter. Do you know otherwise?
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.