CorserMoon

May 6

On 3/15/2024 at 4:29 PM, FearlessAttempt said:

I had this same issue. Leaving Runtime left to initiate shutdown (minutes) value blank seemed to be causing it.

Cam here to say that this fixed it for me.

March 2

OK, so for whatever reason, a valid music directory is required (I never had one set up[ in the past). I supposed I could have just removed that parameter as well.

March 2

EDIT: SOLVED.

I just upgraded from 6.12.4 (I think?) to 6.12.8. Upon reboot, plex docker is gone. Attempting to reinstall via Previous Apps results in this error:

Also, maybe unrelated, after clicking DONE, I get this error:

Any ideas?

October 26, 2023

WOO! Got it working by unassigning both cache drives, starting array, stopping array, reassigning cache drives, and starting array.

Thanks to THIS POST.

October 26, 2023

@JorgeB OK, I reseated the nvme drives and the missing one is back. I reassigned it to its original spot but I am unable to start the array due to the above error (Wrong Pool State). Here is what the GUI looks like:

Not sure how to overcome that error. Any help appreciated. Sorry to bug you.

October 25, 2023

Now getting this error when trying to start the array. Latest diags attached.

image.png.1279b575c0b69812fdb49c4d8cca2893.png

diagnostics-20231025-1554.zip

October 25, 2023

So I manually deleted many gigs of data off the drive, but free space according to the GUI didn't change, still 279GB free. I tried running Mover but it didn't seem to start because there is still data sitting on the cache drive that is configured to move onto the array when mover is invoked. I then rebooted the server and the free space didnt change and the files that I deleted are back. I am stuck and don't know what I am doing wrong.

EDIT: At this point it seems to make sense to reformat the pool (since I have the backup from the Backup/Restore Appdata plugin). Is there a guide on how to do this? And I also have the issue of the missing cache drive so not sure how to knock the cache pool back down to 1 drive again (it wont let me change the number of devices from 2 back to 1). Or maybe a better idea to just pop in a replacement ssd so I'm back up to 2 drives first and then reformat the pool?

Additional weird observations:

As stated in my OP, I was also trying to add new drives to the array. At that time I added them but paused the disk-clear when I noticed issues. I've since removed the new disks, returning those array slots to "unassigned" but now every time I reboot the server, all those drives are back and disk-clear starts!
I tried using one of the aforementioned HDDs to replace the missing cache drive and provide additional space so hopefully btrfs would be able to balance but cache pool still mounting as read-only and I received a new error: Unraid Status: Warning - pool BTRFS too many profiles (You can ignore this warning when a pool balance operation is in progress)

October 25, 2023

11 minutes ago, JorgeB said:

It's the way btrfs works, it would take more time than I have now to explain here, you can Google it, it's easy to find.

No.

If it just flashes and disappears you can ignore.

Thanks so much for your help.

Last questions for now: Would it make sense that 1 of the cache drives dying would lead to this full allocation issue? Could it be resolved by just replacing that 1 dead drive?

I'm just trying to figure out if I have 1 issue or multiple different issues.

October 25, 2023

13 minutes ago, JorgeB said:

It's not full,

But the result is the same, it won't be able to write any data until there's some free space to allocate new metadata chunks.

That's strange, works for me:

You need to free up some space first or the balance will likely fail.

So what is the difference between allocation and free space? What would cause allocation to fill and is there a way to monitor for that? It's just weird that all this starteed happening after one of the cache drives just disappeared. Would full allocation cause this?

I also just noticed that when the array is stopped and I am assigning/un-assigning disks, this error sporadically pops up briefly then disappears:

image.png.2721c06145e43a6144145529955cd156.png

EDIT: I tried to start the Mover process to move any extraneous data of the cache drive but the mover doesnt appear to be starting.

October 25, 2023

1 hour ago, JorgeB said:

Pool is missing a device and the other is fully allocated, so it's going read-only, disable docker/VM services and reboot, if the pool doesn't go immediately read-only you need to free up some space and rebalance, see here:

https://forums.unraid.net/topic/62230-out-of-space-errors-on-cache-drive/?do=findComment&comment=610551

I don't think it actually is full though. The "Super_Cache" pool has 2 1TB drives (super_cache and super_cache 2). 1 disappeared (aka missing) but everything was working fine after I acknowledged that it was missing, since the drives were mirrored (1TB actual space). I was having no issues with docker until this morning. I monitor that capacity closely and they were ~70% full before all this happened. GUI currently shows the remaining drive (super_cache 2) w/ 279GB free space.

Strangely, du -sh super_cache/ shows total size of 476GB. But regardless, it shouldn't be full.

side note, that link throws this error: You do not have permission to view this topic.

October 25, 2023

I recently dismantled a secondary, non-parity protected pool of several hdds. 2 of these drives are to replace the existing single parity drive of array and the remaining to be added to array storage. I have run into a lot of cascading issues which has resulted in the docker service not starting. Here is the general timeline:

Stopped array in order to swap a single 12tb parity drive for 2x14tb parity drives. As soon as the array stopped, one of my 2 cache drives (2x1tb nvme, mirrored) disappeared. Shows missing and not in disk dropdowns. My first thought is that it died.
Immediately restarted the array (without swapping the parity drives) and performed a backup of the cache pool to the array via the Backup/Restore Appdata plugin. Completed successfully. Everything, including docker, working normally.
Ordered new nvme drives to replace both.
Stopped array and successfully replaced swapped parity drive as outlined earlier. Parity rebuilt successfully.
Stopped array to add remaining HDDs to array storage. Added, started array, and disk-clear started automatically as expected.
Got notification "Unable to write to super_cache" (super_cache is the cache pool). Paused disk-clear and rebooted the server.
Same error upon reboot. In the interest if troubleshooting, I increased docker image size to see if that was the issue but the service still wouldn't start. I AM able to see/read files on cache drive but can't write to it. A simple mkdir command in appdata share errors saying it's a read-only file system.

My best guess is that both nvme drives failed? Or maybe the pci-e adapter they are in failed? Any thoughts or clues from the attached diagnostics as I wait for the replacement drives to arrive?

diagnostics-20231025-1118.zip

June 15, 2023

Thanks to help and recommendations from @JorgeB, I've learned that my cache pool (2 nvme drives set to mirror) have some uncorrectable errors (based on Scrub results). THIS older thread recommends backing the cache pool files onto the array, wiping/reformatting the drives, and moving the files back onto the cache pool.

What is the best practice for moving 600GB from these onto the array? Rsync via webUI terminal? Krusader? Something else?

And for the "wiping/reformatting" portion, is this the proper command?

blkdiscard /dev/nvmeX

image.png.21a845238d735d2200eb1d8b915f822d.png

June 15, 2023

1 hour ago, JorgeB said:

You can run with the array running since the pool in not mounted.

Do you recommend running a scrub on the cache pool?

June 15, 2023

That did it. Thank you @JorgeB.

June 15, 2023

21 minutes ago, JorgeB said:
If the log tree is the only issue this may help:
btrfs rescue zero-log /dev/nvme1n1p1
Then re-start array.

Should I stop the array first before running this command?

June 15, 2023

My Unraid server was non-responsive so I had to force reboot via IPMI. Upon reboot, I am getting the following error and the docker tab is showing no docker containers installed:

BTRFS: error (device nvme1n1p1) in btrfs_replay_log:2500: errno=-5 IO failure (Failed to recover log tree)

I came across THIS post which seems relevant but their error was slightly different. Thoughts on how to proceed? (diags attached)

EDIT: Here is another clue. The cache pool on which docker.img lives is showing unmountable:

corsermoon-diagnostics-20230615-1340.zip

July 14, 2022

1 minute ago, JorgeB said:

Filesystem went read-only to avoid writing corrupt data, rebooting should fix it but this usually indicates a hardware problem or some other kernel memory corruption.

OK, thanks for the insight. Bad storms last night and despite everything being plugged into UPS's, could have been flakey power issue.

July 14, 2022

Hi all. Woke up this morning to Organizr not working (throwing "not writeable" error) as well as many other dockers not operating as expected. Next step was checking the log file which is 100% full. All disks/pools/shares are green and readable though. Log filled up with BTRFS and rsyslog write errors (I am using syslog server). Before I reboot to clear the log file, wanted your expert eyes on.

executor-diagnostics-20220714-1053.zip

April 1, 2022

I ended up just sending a Power Off command via IPMI which essentially forced power off. After rebooting, the NIC came back up but I can't find in the logs what was holding up the shutdown. Have syslog server running as well, but the only entries I see for today are when I powered it back on. I don't see the powerdown command.

April 1, 2022

So earlier today I suddenly lost connection to my unraid box. After troubleshooting, determined that the NIC is dead (Mellanox ConnectX-2). So I IPMI'd into the motherboard and used the iKVM console to log into unraid via CLI and issued the command 'powerdown'. Problem is that it has been sitting at 'Shutdown Nginx gracefully...' for 30 minutes. Do I have any options besides power cycling it? Really trying to avoid that and the 30 hour parity check.

image.png.be6e4239dc7c8b6429269a93b837cbf2.png

December 31, 2021

26 minutes ago, ljm42 said:

OK if you are accessing by IP then DNS isn't the issue. Sorry, all of the tips I have are in the first two posts, I don't have any other ideas.

Sent from my GM1917 using Tapatalk

I'm thinking it is either weirdness with my gateway (ATT fiber gateway) or corruption/conflicts with the unraid routing table. I may try resetting the unraid network settings so see if that helps. I'm also in hte process of building a pfsense box and bypassing the gateway. Hopefully one of those fixes the issue.

December 31, 2021

2 minutes ago, ljm42 said:

Give me some examples of things you are trying to access. http://what

Sent from my GM1917 using Tapatalk

With only my router IP as the DNS, I can only access unraid (192.168.1.107) but no internet (http://www.google.com for example) and no other devices on my LAN such as 192.168.1.254 (router), 192.168.1.111 (managed switch) or 192.168.1.201 (Hubitat), etc. If I add 8.8.8.8 to the DNS record (so it's then 192.168.1.254,8.8.8.8) I can access unraid (192.168.1.107) and the internet (Google, etc), but still no other LAN IPs. Right now I'm at my in-laws on their network which is 192.168.68.x so that shouldn't be a conflict.

December 31, 2021

21 minutes ago, Nodiaque said:

Hello everyone, I seems to have a common issue and I cannot find the problem.

I've setup wireguard with 8.8.8.8 as dns. I have Host Access Enabled because if I don't, my pihole running on br0 cannot be contacted. Local server uses nat to no, peer type of access to Remote access to LAN.

I also added 2 rules in my pfsense

source: 10.253.0.0/24 (vpn)

destination: unraid ip

protocol: any

and

source: 10.253.0.0/24 (vpn)

destination: lan ip address

protocol: any

With that, I can access the Internet through my VPN and I can reach my unraid server, but I cannot access anything else on the network (neither docker container with there own IP or other device on the network). I don't have vlan, thus all my devices are on the same subnet, same as my server and my docker with fixed ips.

Is there a way to have that?

Thank you

Yea, similar issue to me (though I don't use pihole). I can only access unraid when i have the DNS set to my router but no internet and no LAN. If I add a public DNS like 8.8.8.8, I can then access internet, but still no LAN. I've read through dozens of threads and reddit posts and still have been unable to get local LAN access to work.

December 19, 2021

11 minutes ago, Frank1940 said:

Or the top of a VERY tall rack... 🤣 On a more serious thought, make sure that cabinet has adequate ventilation.

Yeah, that's why I originally went with an open rack. Will have to figure out proper ventilation without all the noise.

December 19, 2021

5 minutes ago, Shazster said:

Pheeew!!!. Good to hear. I couldn't help but forward the subject title to one of my friends who is a sysadmin professionally and father of three. He seemed quite impressed yours skipped the power button pressing phase entirely and went straight for drive yanking.

Yeah, I'm now in the process of looking into getting a locking cabinet...

CorserMoon

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by CorserMoon

UPS Settings wont load

Plex docker disappeared after 6.12.8 update, failing to reinstall from "previous apps"

Plex docker disappeared after 6.12.8 update, failing to reinstall from "previous apps"

Disappearing cache drive, Unable to write to other cache drive, and now unable to start docker service.

Disappearing cache drive, Unable to write to other cache drive, and now unable to start docker service.

Disappearing cache drive, Unable to write to other cache drive, and now unable to start docker service.

Disappearing cache drive, Unable to write to other cache drive, and now unable to start docker service.

Disappearing cache drive, Unable to write to other cache drive, and now unable to start docker service.

Disappearing cache drive, Unable to write to other cache drive, and now unable to start docker service.

Disappearing cache drive, Unable to write to other cache drive, and now unable to start docker service.

Disappearing cache drive, Unable to write to other cache drive, and now unable to start docker service.

Best practice for temporarily moving cache pool data onto array?

Error on cache drive pool, Unmountable: No file system. BTRFS: error, IO failure, Failed to recover log tree

Error on cache drive pool, Unmountable: No file system. BTRFS: error, IO failure, Failed to recover log tree

Error on cache drive pool, Unmountable: No file system. BTRFS: error, IO failure, Failed to recover log tree

Error on cache drive pool, Unmountable: No file system. BTRFS: error, IO failure, Failed to recover log tree

Log file filled up last night, btrfs errors then rsyslog write errors. All disks green. Afraid to reboot.

Log file filled up last night, btrfs errors then rsyslog write errors. All disks green. Afraid to reboot.

NIC died. Attempting to 'powerdown' from CLI. Hanging at 'Shutdown Nginx gracefully...'.

NIC died. Attempting to 'powerdown' from CLI. Hanging at 'Shutdown Nginx gracefully...'.

WireGuard quickstart

WireGuard quickstart

WireGuard quickstart

HELP - My Toddler just pulled 6 drives out of my server. Some parity, some not. I'm panicking.

HELP - My Toddler just pulled 6 drives out of my server. Some parity, some not. I'm panicking.