Mortalic

November 15, 2023

On 11/14/2023 at 8:56 AM, itimpi said:

I would very much doubt if your can successfully run a parity check and an extended SMART test at the same time. You really want the SMART test to have exclusive access to the drive while it is running.

This appears to be correct. I let parity finish, (0 sync errors somehow) gave it a reboot and told all the drives to run extended SMART tests. Overnight the cache drive finished successfully, three of the others are 80-90% the larger ones are all around 50%.

Regarding the docker img recreate process...

Is the process basically:

backup the configurations
reinstall docker apps
copy configs over the top

EDIT:
After parity success and reboot, there is still one error in the syslog:

Nov 14 20:31:57 vault kernel: ata9.00: exception Emask 0x10 SAct 0x4 SErr 0x280100 action 0x6 frozen
Nov 14 20:31:57 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error

EDIT2, ran dmesg | grep ata9

looks like the replacement drive is the culprit, but I've got a new replacement drive showing up tomorrow, so I'll know one way or the other if that error gets solved.

EDIT3, All extended SMART tests passed, even the stand in drive showing a CRC smart error... weird.

New drive showed up to replace the stand in drive, no more syslog errors.

Parity run should be complete in a couple days.

Renamed docker.img to docker_old.img restarted docker service.

redownloaded all my docker containers and it appears everything picked up where it left off.

This was a long road but thank you itimpi for helping me out.

On a side note, I also used GPT4 to ask it some pretty specific questions at times and it was pretty helpful in laying out ways to troubleshoot certain steps. Even when I dropped giant logs into it's prompt it was good at parsing them to pick out and explain what was happening. I suggested that for anyone else running into issues.

November 14, 2023

8 hours ago, itimpi said:

The test only increments in 10% amounts so it is quite normal for it to stick for a while at each value. I normally estimate something up to 2 hours per 10% increment, but if it is taking longer than that it may not be a good sign. You could check to see if anything is showing in the syslog.

Hmmm, next morning and Parity drive still reports 10% so that's not comforting. The actual Parity check is still going, so perhaps that's getting in the way?

Syslog does have some messages from yesterday I didn't notice, but nothing since then.

I can't remember what time I started the extended SMART check, but it would have been a bit after the parity check. which was around this time frame.

Nov 13 14:19:04 vault kernel: ata9.00: exception Emask 0x12 SAct 0x200000 SErr 0x280501 action 0x6 frozen
Nov 13 14:19:04 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Nov 13 14:19:05 vault kernel: ata9.00: exception Emask 0x10 SAct 0x20000000 SErr 0x280100 action 0x6 frozen
Nov 13 14:19:05 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Nov 13 14:19:05 vault kernel: ata9.00: exception Emask 0x10 SAct 0x20000 SErr 0x280100 action 0x6 frozen
Nov 13 14:19:05 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Nov 13 14:19:06 vault kernel: ata9.00: exception Emask 0x10 SAct 0x38000 SErr 0x280100 action 0x6 frozen
Nov 13 14:19:06 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Nov 13 14:20:02 vault root: Fix Common Problems: Error: Default docker appdata location is not a cache-only share ** Ignored

November 14, 2023

5 hours ago, itimpi said:

You can always delete the docker.img file and let Unraid recreate it and use Apps->Previous Apps to get containers back with their previous settings.

however that message looks a little concerning as I would have expected the repair process to fix anything like that. It is possible there really is a problem with the disk. You might want to consider running an Extended SMART test on it to check it can complete that without error.

I started the extended SMART test on the parity drive (16tb) about an hour ago and it's been at 10% the entire time. Is that normal?

EDIT:

Extended SMART test still at 10% after several hours.... starting to get nervous about it.

November 13, 2023

Ok, I'll do some reading there. I'm not familiar with that process, so thank you for suggesting it.

This is a different disk than the one that got me in all this trouble. It never actually failed, but disk1 kept throwing these crazy errors so I replaced it with another disk I had laying around.

I'll run extended SMART tests on all of them.

Also, I kicked off a Parity run, ok to let that run?

November 13, 2023

14 hours ago, itimpi said:

This is the normal action at this point as Unraid has already failed to mount the drive. Normally the -L option causes no data loss, and even when it does it is only the last file being written that tends to have a problem.

The section of the the online documentation accessible via the Manual link at the bottom of the Unraid GUI has this section covering repair and it mentions you should use the -L option in point 5.

Thank you, I ran it, in maintenance mode. It only took a minute or two. Now my array came back up, mostly problem free. Shares are all back (haven't validated any data yet) as well as my VM's, however all my docker containers are missing.

the syslog has this section which was when I rant it.

Nov 12 18:47:33 vault kernel: XFS (md1p1): Internal error i != 0 at line 2798 of file fs/xfs/libxfs/xfs_bmap.c. Caller xfs_bmap_add_extent_hole_real+0x528/0x654 [xfs] Nov 12 18:47:33 vault kernel: CPU: 3 PID: 20234 Comm: fallocate Tainted: P O 6.1.49-Unraid #1 Nov 12 18:47:33 vault kernel: Call Trace: Nov 12 18:47:33 vault kernel: XFS (md1p1): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_alloc_file_space+0x206/0x246 [xfs] Nov 12 18:47:33 vault kernel: CPU: 3 PID: 20234 Comm: fallocate Tainted: P O 6.1.49-Unraid #1 Nov 12 18:47:33 vault kernel: Call Trace: Nov 12 18:47:33 vault root: truncate: cannot open '/mnt/disk1/system/docker/docker.img' for writing: Input/output error Nov 12 18:47:33 vault root: mount error

Looking at that file, it exists, but it's owned by nobody... should I just update chown and chmod to make it owned by root and executable?

-rw-r--r-- 1 nobody users 21474836480 Nov 13 14:11 docker.img
root@vault:/mnt/disk1/system/docker#

November 13, 2023

Man.... I think I've really lost everything that was on disk1.... now I'm getting xfs errors on boot, even though the array can start.

It's telling me to:

root@vault:~# xfs_repair /dev/md1p1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

Attempting to mount the filesystem does not clear this up, and in fact when I go to /mnt it throws three errors:

root@vault:~# cd /mnt
root@vault:/mnt# ls
/bin/ls: cannot access 'disk1': Input/output error
/bin/ls: cannot access 'user': Input/output error
/bin/ls: cannot access 'user0': Input/output error
cache/ disk1/ disk2/ disk3/ disk4/ usb/ user/ user0/
root@vault:/mnt#

Is there anything I can do? I don't want to lose everything. I've started copying what's left on disks 2, 3 and 4...

if I run xfs_repair -L it seems like I could lose everything....

Please help

November 12, 2023

Yes, I added a new edit to the post, and included the diagnostics.

November 11, 2023

Last night my unraid server went unresponsive to webui, ssh and even the local after I plugged a monitor/kb in.

After hard powering it (I know), parity started but was crazy slow. I checked the smart status on all the drives and they all checked out.

This morning the server is unresponsive again.

What should I do?

EDIT: I also tried tapping the power button because that would power it down normally too, but that did not work.

EDIT2:
I was able to get an actual error after the most recent reboot:

The log mentions XFS (md1p1): Internal error and XFS (md1p1): Corruption detected. Unmount and run xfs_repair.

Is it safe to run xfs_repair?

the log is verbose, but keeps repeating this too:
[ 314.906880] I/O error, dev loop2, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2 [ 314.906880] I/O error, dev loop2, sector 564320 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.906890] BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 [ 314.906895] BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 1, flush 1, corrupt 0, gen 0 [ 314.906900] BTRFS warning (device loop2): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount [ 314.906903] BTRFS: error (device loop2) in write_all_supers:4370: errno=-5 IO failure (errors while submitting device barriers.) [ 314.906933] I/O error, dev loop2, sector 1088608 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.907022] BTRFS info (device loop2: state E): forced readonly [ 314.907025] BTRFS: error (device loop2: state EA) in btrfs_sync_log:3198: errno=-5 IO failure [ 314.907025] BTRFS error (device loop2: state E): bdev /dev/loop2 errs: wr 0, rd 2, flush 1, corrupt 0, gen 0 [ 314.908111] I/O error, dev loop2, sector 564320 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.908117] BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 0, rd 3, flush 1, corrupt 0, gen 0 [ 314.908138] I/O error, dev loop2, sector 1088608 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.908141] BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 0, rd 4, flush 1, corrupt 0, gen 0 [ 314.908636] I/O error, dev loop2, sector 564320 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.908644] BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 0, rd 5, flush 1, corrupt 0, gen 0

EDIT3:

I replaced drive1 which seemed to be coming up in the logs a lot, though no errors, and smart checked out ok. This has allowed me to boot and performance appears normal, however after data rebuild, there are no shares, docker can't start, and no VM's.

But yes, now I can get into settings and pull diagnostics.

vault-diagnostics-20231112-0706.zip

July 2, 2023

Great advice, I was able to find some threads suggesting that a reboot would work in this situation and it did. All my shares returned om reboot. So the summary, I don't think I accidentally deleted it after all, I think there was just an issue querying them as the cache drive filled up.

July 1, 2023

I was in the process of trying to move stuff around from my cache drive as it was full and somehow my brain deleted it. I see all my data is still there but obviously I don't remember what all my shares were mapped to.

What do I do?

EDIT:
I tried to add back some shares I knew by looking at their docker paths like /mnt/user/media however it doesn't seem to actually add.

Possibly because the folder already exists?

EDIT2: I think the issue wasn't the appdata folder being deleted... I think it disappears because the cache drive filled up and it couldn't write to it. So I guess it's solved?

April 2, 2023

Oh that's probably the issue then. It is in host. I'll mess with that and see if that fixes it.

April 1, 2023

Hello, I've got two docker containers both trying to listen on 8443, unifi-controller and MineOS-Node. I figured I'd change the mineos web port to 8444 via the unraid edit, and while it shows the change, the container still spins up on 8443.

My apologies if this is a dumb question, but how can I change that value?

May 22, 2022

I need some education on how to edit game files for the Ark container. I added mods and made a few changes to the GameUserSettings.ini file however upon restart it seems to not save any of them. I've done some searching but have so far come up short. I don't really know what I'm searching for I guess.

Other relevant info, I'm using the ich777/steamcmd container on the current version, Ark works great, just all on default vanilla settings.

Thanks in advance

July 30, 2020

I replaced the disk. Thanks everyone for the help.

July 30, 2020

Extended SMART test said it had errors in the extended test but unraid still shows the drive is healthy. Very strange.

Anyway I've got low activity on the system this morning so I'm going to replace it.

Any chance someone could explain what I should be looking at in that Extended SMART test?

July 30, 2020

14 hours ago, trurl said:

Run an extended SMART test on it, maybe replace it first.

WDC_WD20EFRX-68EUZN0_WD-WCC4M2ZPD2U2-20200729-2059.txt

July 29, 2020

Short smart test completed without error

July 29, 2020

14 minutes ago, trurl said:

Did the short SMART test ever complete? I would replace. Do you know how?

It's still stuck at 90% actually. I do know how. I had one drive I knew was throwing smart errors I used to practice with before I sent it for recycling.

Thanks for asking though.

July 29, 2020

55 minutes ago, trurl said:

I just responded on your other thread.

Probably whatever problem you have there is the cause of your problem here.

Would you like me to merge these threads for you? I don't see any point responding to this thread until you fix your other issues.

That would be great, thank you. My apologies for the confusion.

July 29, 2020

5 minutes ago, trurl said:
Some of that might be connection issues, but there are these mixed in:
Jul 28 18:29:04 vault kernel: ata6.00: cmd 25/00:28:a8:08:72/00:03:06:00:00/e0 tag 5 dma 413696 in
Jul 28 18:29:04 vault kernel:         res 51/40:17:b0:09:72/00:02:06:00:00/e0 Emask 0x9 (media error)
Also, this SMART attribute is something to watch out for on WD Reds, in fact, I have it added to the notifications for those disks I have:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    266
Do you have a replacement?

Regarding the connection issues, like a sketchy SATA cable or something?

Regarding the replacement, yeah I've got a spare drive. Should I swap it out, or take a wait and see approach?

July 29, 2020

19 minutes ago, trurl said:

Diagnostics will give us a lot more information than just the results of that SMART test. It might even tell us more about the problems on your other thread(s). Be sure to get us the diagnostics without rebooting since syslog resets on reboot.

Well the SMART test has been stuck at 90% for quite a while now, so I'm just going to put the diagnostics here now. Syslog looks like it's a bunch of read errors. What does that mean?

vault-diagnostics-20200729-1251.zip

July 29, 2020

Thanks I am currently running the short SMART test on the drive, it looks like the diagnostics collects that data so I'll wait for it to finish.

July 29, 2020

I've been copying about 7tb of data over the last few days and it seems to not be utilizing the actual network speed.

I'm just connected from one physical machine using the unraid's samba share and hit copy/paste.

I'm worried I've got a setting wrong somewhere and streaming media will also be subject to this.

I'm using Ubiquity 8 port switches and cat5e.

July 29, 2020

Woke up this morning and noticed this disk has thrown 6974 errors. It still shows green.

https://wiki.unraid.net/Troubleshooting#Hard_drive_failures

This FAQ suggests nothing to worry about, but it says no-zero. 6974 seems a bit more worrisome than non-zero.

Should I get a replacement?

For what it's worth, I'm copying a ton of data to the array right now (and for the last few days).

July 23, 2020

I've been slowly moving some old hyper-v vm's to unraid. In each case I was selecting the Emulated CPU model. However I just noticed that none of the vm's have emulated set and are all set to host pass through. Worse, I can't change them back to emulated as it throws an error:

VM creation error

XML error: Non-empty feature list specified without CPU model

I've been searching the forum and it seems other people get this error, but doing the opposite. Am I doing something wrong? How do I not pass through cpu cores?

Mortalic

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by Mortalic

Server unresponsive

Server unresponsive

Server unresponsive

Server unresponsive

Server unresponsive

Server unresponsive

Server unresponsive

Server unresponsive

I just accidentally delted my appdata share, what do I do?

I just accidentally delted my appdata share, what do I do?

Have two containers with conflicting webui ports

Have two containers with conflicting webui ports

[Support] ich777 - Gameserver Dockers

Does this mean this drive is dying?

Does this mean this drive is dying?

Does this mean this drive is dying?

Does this mean this drive is dying?

Does this mean this drive is dying?

Copying a ton of data..slowly...

Does this mean this drive is dying?

Does this mean this drive is dying?

Does this mean this drive is dying?

Copying a ton of data..slowly...

Does this mean this drive is dying?

Emulated Vs host passthrough question