-
Restarted unraid, the list of disk assignments has changed?!?
No backplane - the rackmount I used to have made too much noise. I now have a Fractal Design 7 XL case which some have gotten 18 disks into. I've only managed 16 plus an SSD. I think it's a 850W power supply, way overkill for normal Unraid use, but useful sometimes. The SATA cables allow for 5 connections per cable but distance from the power supply doesn't let me usually use more than 4 of them for disks. I do attach an extention to that fifth sata connection sometimes, but only for fans. A couple of the drives are powered off of a Molex cable using Sata converters. You didn't mention if "trust my array" is still a thing, and if so is it advisable. Also, I'm not sure how to get Unraid to restore the data from my GPE disk to the empty MXN disk. Usually you replace a disk with another and it just happens. In this case Unraid seems to think MXN has been there all along. Edit: I've checked again, now Unraid promises (threatens?) to rebuild the disk if I assign either GPE or MXN to slot 9. So I plan to pull GPE and put it away for safekeeping for the next couple of days while (I hope) Unraid recreates its contents on MXN. I appreciate you looking at this for me.
-
Restarted unraid, the list of disk assignments has changed?!?
New diagnostics after restarting with no disk9 are attached. Thanks for helping! new-tower-diagnostics-20260223-1337.zip
-
Restarted unraid, the list of disk assignments has changed?!?
FYI - my 22 TB drive which was ticking but which has a lot of data on it has the serial number ending in GPE. The Empty disk's serial ends in MXN.
-
-
Restarted unraid, the list of disk assignments has changed?!?
I've never seen a problem like this one before. This morning I noticed some 'ticking' coming from my unraid system, when none of the disks should have been spinning. The main menu confirmed nothing spinning, But it really sounded like a disk. So I checked the log and found a lot of instances of this: sd 11:0:79:0: Power-on or device reset occurred [287068.089764] end_device-11:79: add: handle(0x0021), sas_addr(0x300062b2059b99d1) [287068.093952] sd 11:0:79:0: [sdj] 42970644480 512-byte logical blocks: (22.0 TB/20.0 TiB) [287068.093955] sd 11:0:79:0: [sdj] 4096-byte physical blocks [287068.102641] sd 11:0:79:0: [sdj] Write Protect is off [287068.102649] sd 11:0:79:0: [sdj] Mode Sense: 9b 00 10 08 [287068.112922] sd 11:0:79:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA [287068.213670] sdj: sdj1 [287068.213807] sd 11:0:79:0: [sdj] Attached SCSI disk [287076.845392] sd 11:0:79:0: device_block, handle(0x0021) [287079.344353] sd 11:0:79:0: device_unblock and setting to running, handle(0x0021) [287079.367803] sd 11:0:79:0: [sdj] Synchronizing SCSI cache [287079.367852] sd 11:0:79:0: [sdj] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK [287079.368197] mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x300062b2059b99d1) [287079.368201] mpt3sas_cm0: removing handle(0x0021), sas_addr(0x300062b2059b99d1) [287079.368203] mpt3sas_cm0: enclosure logical id(0x500062b2059b99c0), slot(10) [287079.368206] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [302907.901584] scsi 11:0:80:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) [302907.901586] scsi 11:0:80:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1) [302907.988422] sd 11:0:80:0: Attached scsi generic sg9 type 0 [302907.988464] end_device-11:80: add: handle(0x0021), sas_addr(0x300062b2059b99d1) [302907.988468] sd 11:0:80:0: Power-on or device reset occurred So I conclude what's happening is that an HBA (I have 2) is losing comms with a drive, the port and drive go away, the HBA finds it again and the drive reinitializes - rinse and repeat. Now, at this time unraid showed everything including sdj with a green ball. I ran a short SMART report on the drive which passed with flying colors. So I then started wondering about cabling. I didn't find anything. I stopped the array and got out my stethoscope thinking I would confirm that disk 9 - sdj - was the source of the clicking and try pulling and restoring power to it. After I pressed the stop array button, the ticking rapidly increased. By the time the array stopped, Unraid had given disk 9/sdj the red X. I used the stethoscope and confirmed that that disk was indeed the source of the ticking. So now I am getting concerned. I am far from convinced the drive has really failed; I decide to power-cycle the whole system and restart the array. This is where it gets weird. When the system comes back up, of course the array isn't started. But disk 9 isn't sdj anymore! I happen to have another disk of the same make and model as sdj in the system, it's been precleared but has never added to the array. But now disk 9 is that empty disk - sdr - and Unraid has it red x-ed out. And on the array operatrions tab, Unraid does not have a message that starting the array will trigger a parity check or that data will be overriden - it seems willing to start the array with this completely wrong configuration. Back on the devices tab, I change the configuration to the way it is supposed to be, with sdj as disk 9 instead of the empty space sdr. This triggers the message that starting the array like this will overwrite the disk. Which, if parity is good, would I hope overwrite the disk with the data that is already on it, but I don't want to take the risk yet. (Besides, it would take two days to overwrite the 22T drive). What I really want to do is to but sdj back on the list, get Unraid to trust that array, and continue diagnostics. I know there used to be a "trust my array" procedure that I believe involved the "New Config" button, but for some reason I can't find it. So this is where I appeal for help. What is my best way ahead here? Should I do a "trust my array"? And if so, can you point me at that procedure because I can't find it. Or is another way ahead better? I really don't want to overwrite the disk because I think the disk is likely good. And while I love having the protection of the parity disks, I hate using them because you can neveer 100% absolutely trust them. (That's why we do periodic parity checks, after all). If I end up having to do that, I'd rather have it write to my empty disk. As I said though, Unraid doesn't seem to think it needs to overwrite the empty disk if I start with that one. Diagnostics are attached. Help! tower-diagnostics-20260223-1315.zip
-
Current flash drive recommendations?
I recommend using a USB drive where the plug (the part that goes into the socket) is metal. The cheap ones are often plastic with just a trace for grounding. The metal acts as a tiny radiator. This is especially important when using those tiny drives that hardly extend past the port socket at all. If the socket is directly on the motherboard I'd think there would be a fan getting to it, but it doesn't hurt to check.
-
Unclean Shutdown - Not Gonna Pay the Penalty
I'd like to thank everybody who responded; I think I have a better understanding of what went on now. And now, for those who remember Paul Harvey, here is "the rest of the story". I didn't want this part to distract from my main question, so didn't mention it before. After power was restored, I started the system to confirm that it would be complaining of an unclean shutdown, which it was. Before I could get around to doing anything about it, I got a call for help and had to leave for a couple of days. So I pressed the power button to turn the system off again (cleanly) and left. When I finally returned and powered the system on again, it came right up. It didn't mount the disks, but it wasn't complaining about unclean shutdowns anymore either. The array started on command with no issue. Has anybody heard of unclean shutdowns just going away like that? Could the "I am unclean" flag have been reset by the clean shutdown of an unclean array?
-
Unclean Shutdown - Not Gonna Pay the Penalty
Hmmm. So let's suppose I wrote somthing so that X amount of time after all the disks spin down, all the disks are synced, unmounted, remounted, then spun down again. Would you then think that under the same circumstances as before (basically nothing happening) that a subsequent loss of power would not result in parity errors? I would have thought the risk would be the same either way. (I'm not trying to argue with you, btw - I'm just trying to understand the situation as best I can). Thanks for your help!
-
Unclean Shutdown - Not Gonna Pay the Penalty
JorgeB, i appreciate your reply. I find it vaguely terrifying. If you mean "there could be a few sync errors" in the sense that any Unraid system anywhere, at any time, might have a few sync errors noboby knows about, I agree with you. Undetected (as yet) hardware failures, as-yet undetected system software errors, the stray cosmic ray, sure. OTOH, if you mean there is something in my situation that raised the likelihood of errors, I am both terrified and confused. Here we have disks that have just passed a parity chack (with no errors found or corrected), no other disk activity had occurred since (no sharing or network), the disks had long since spun down and entered their low-power state, no VMs at all, the only Docker container (and Docker itself) is configured to not use the array. Please explain what you think might be a cause of parity errors. Perhaps I can ask it this way - suppose, instead of power failing, I had gone to the console and entered a powerdown command. (Or via gui, if you prefer). What would those powerdown actions have done to prevent any parity errors that you think might have sneaked in, and what commands could I have entered to do the equivalent, without actually powering down? Thanks!
-
Unclean Shutdown - Not Gonna Pay the Penalty
Hi all, I've been putting together a secondary server (6.12.13 - I'm a slow adapter), just about got it where I wanted it when an unfortunate situation caused that server to lose power, and so now it's in "unclean shutdown" mode. Now, I had basically just finished a parity check, and no writes had been made after that check completed. I use XFS exclusively and I know for a fact all the disks were spun down. Another parity check at this point would cost me two days that I can't really spare right now. I don't believe there is any realistic chance of a data issue here. I'm thinking the simplest/easiest way around this is to just start a parity check and then cancel it. Am I wrong? Any other suggestions? Thanks!
-
Replacing parity drives fails twice on two different disks.
Much experimentation later I concluded it must be the power supply. Replaced it and things seem to have settled down.
-
Mystery messages on console during boot not logged
Could be, I recently replaced my 10+ year flash drive. The new one is 64G - what I happened to have - so obviously the vast majority of space I'm not using. Anything I can do to resolve this? Something that will mark sectors bad?
-
Mystery messages on console during boot not logged
I was starting Unraid 6.12.6 today after an unclean shutdown. I also took the opportunity to plug in a new (precleared) disk. I glanced at the console during boot and saw line after line of this: nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn where n is a digit. At the end of the list was a message "These errors will not be automatically corrected". All the above is from memory because the screen cleared and boot continued before I could write it down, so consider all this as approximate. After the boot, the array seemed fine. I looked in the syslog, those messages are not there. I grepped around in /var and didn't find it anywhere else, either. I did find a bunch of BTRFS errors and warnings in the log, which is strange because I use XFS exclusively. Can anyone enlighten me as to (1) what these messages are and what they mean, (2) where if anywhere they are logged, and (3) what I should be doing about them? Thanks! tower-diagnostics-20240203-1141.zip
-
Looking for Power Supply recomendation for old RPC-4220 case
I have an opportunity to pick up an old but near-mint Norco RPC-4220 case for cheap. The problem is, the backplane in this beast requires 5 Molex power connectors, each one powering 4 3.5 inch SATA drives. (It actually requires 6, the last one powering fans behind the backplane, but I could always swap the fans out). The case is built for an ATX power supply, but I don't know of any modern power supplies that come with that many Molex connectors. They seem to be almost afterthoughts these days - you get one, maybe two cables with Molex connectors. The power supply I have on hand (an EVGA Supernova 850 G7) has three SATA power cables with four connectors each, and one cable with 3 Molex connectors. I know I could use Molex splitters to get more connectors, and/or use adapters to convert a SATA power cable to Molex, but they make me concerned since each connector ultimately ends up powering 4 drives. How would you deal with this case? Am I worrying over nothing, should I just go ahead and adapt SATA power to Molex and/or use Molex splitters? Is there a different power supply that would be a better fit for this case? (I'm thinking 850 watts would be about what I need). Or should I just pass on the case entirely? Thanks for your advice.
-
Replacing parity drives fails twice on two different disks.
Thanks for looking at it. The parity check/rebuild was not manually initiated, Unraid automatically started it after I assigned the second parity disk and started the array, Referring to the first syslog, you'll find it starting at line 2343 (15:13:20). The SATA link drops almost immediately afterwards and the drive goes offline. Almost makes me wonder about the power supply, but the system had been working fine and all I really did was replace two WD Red 10TB drives with 22TB drives. (The specs say they consume less than 2 watts more each, but that won't be peak). The second syslog I uploaded looks like it got truncated somehow. I probably did get that log after it was too late.
-
Replacing parity drives fails twice on two different disks.
I am at wit's end on this one. I'd been meaning to upgrade my parity drives (I use two) to larger ones for some time now. Around Christmastime WD had a pretty good price on a bundle of two 22T red disks, so I went ahead and bought them. When I got them, I precleared both of them (just to get some burn-in use), then stopped the array, replaced the old disks with the new ones, and restarted to let the parity build. After a few minutes the rebuild failed on parity disk 2 with write errors. After trying some things with no luck, I let parity rebuild on just the new Parity 1 disk, which succeeded. I RMA'd the other disk. I now have a replacement for that "failed" disk and precleared it successfully. I then assigned it as Parity 2 and let a parity rebuild start. The rebuild stopped after 4.5 hours with write errors again. Plus I can neither stop the parity rebuild nor can I stop the array. This being the second new Parity 2 disk, it's getting hard to believe in dsk errors now. I did the usual trick of starting in maintenance mode, unassigning, reassignng. Then I moved the disk to a different slot in the case (so the adapter and cables are different) and tried the rebuild again. This time the parity rebuild fails in just a minute. Maybe Unraid just doesn't like 22T disks? I'm attaching two sets of diagnostics, one from after the first failed parity rebuild and one from after the most recent rebuild. Neither one contains a smart report for the Parity 2 disk, so I got one while in maintenance mode and am uploading that too. That report looks clean. EDIT: I've removed the second diagnostics, they were taken too late and there's nothing useful there. Help! This has gotten beyond me. If I go back to WD and tell them that the disk they replaced my RMA'd disk with has the same problems the first one did, I expect they're going to want some hard evidence. first fail, tower-diagnostics-20240123-1521.zip tower-smart-20240123-1550.zip
jhyler
Members
-
Joined
-
Last visited