Mortalic

Members
  • Posts

    34
  • Joined

  • Last visited

Everything posted by Mortalic

  1. This appears to be correct. I let parity finish, (0 sync errors somehow) gave it a reboot and told all the drives to run extended SMART tests. Overnight the cache drive finished successfully, three of the others are 80-90% the larger ones are all around 50%. Regarding the docker img recreate process... Is the process basically: backup the configurations reinstall docker apps copy configs over the top EDIT: After parity success and reboot, there is still one error in the syslog: Nov 14 20:31:57 vault kernel: ata9.00: exception Emask 0x10 SAct 0x4 SErr 0x280100 action 0x6 frozen Nov 14 20:31:57 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error EDIT2, ran dmesg | grep ata9 looks like the replacement drive is the culprit, but I've got a new replacement drive showing up tomorrow, so I'll know one way or the other if that error gets solved. EDIT3, All extended SMART tests passed, even the stand in drive showing a CRC smart error... weird. New drive showed up to replace the stand in drive, no more syslog errors. Parity run should be complete in a couple days. Renamed docker.img to docker_old.img restarted docker service. redownloaded all my docker containers and it appears everything picked up where it left off. This was a long road but thank you itimpi for helping me out. On a side note, I also used GPT4 to ask it some pretty specific questions at times and it was pretty helpful in laying out ways to troubleshoot certain steps. Even when I dropped giant logs into it's prompt it was good at parsing them to pick out and explain what was happening. I suggested that for anyone else running into issues.
  2. Hmmm, next morning and Parity drive still reports 10% so that's not comforting. The actual Parity check is still going, so perhaps that's getting in the way? Syslog does have some messages from yesterday I didn't notice, but nothing since then. I can't remember what time I started the extended SMART check, but it would have been a bit after the parity check. which was around this time frame. Nov 13 14:19:04 vault kernel: ata9.00: exception Emask 0x12 SAct 0x200000 SErr 0x280501 action 0x6 frozen Nov 13 14:19:04 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error Nov 13 14:19:05 vault kernel: ata9.00: exception Emask 0x10 SAct 0x20000000 SErr 0x280100 action 0x6 frozen Nov 13 14:19:05 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error Nov 13 14:19:05 vault kernel: ata9.00: exception Emask 0x10 SAct 0x20000 SErr 0x280100 action 0x6 frozen Nov 13 14:19:05 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error Nov 13 14:19:06 vault kernel: ata9.00: exception Emask 0x10 SAct 0x38000 SErr 0x280100 action 0x6 frozen Nov 13 14:19:06 vault kernel: ata9.00: irq_stat 0x08000000, interface fatal error Nov 13 14:20:02 vault root: Fix Common Problems: Error: Default docker appdata location is not a cache-only share ** Ignored
  3. I started the extended SMART test on the parity drive (16tb) about an hour ago and it's been at 10% the entire time. Is that normal? EDIT: Extended SMART test still at 10% after several hours.... starting to get nervous about it.
  4. Ok, I'll do some reading there. I'm not familiar with that process, so thank you for suggesting it. This is a different disk than the one that got me in all this trouble. It never actually failed, but disk1 kept throwing these crazy errors so I replaced it with another disk I had laying around. I'll run extended SMART tests on all of them. Also, I kicked off a Parity run, ok to let that run?
  5. Thank you, I ran it, in maintenance mode. It only took a minute or two. Now my array came back up, mostly problem free. Shares are all back (haven't validated any data yet) as well as my VM's, however all my docker containers are missing. the syslog has this section which was when I rant it. Nov 12 18:47:33 vault kernel: XFS (md1p1): Internal error i != 0 at line 2798 of file fs/xfs/libxfs/xfs_bmap.c. Caller xfs_bmap_add_extent_hole_real+0x528/0x654 [xfs] Nov 12 18:47:33 vault kernel: CPU: 3 PID: 20234 Comm: fallocate Tainted: P O 6.1.49-Unraid #1 Nov 12 18:47:33 vault kernel: Call Trace: Nov 12 18:47:33 vault kernel: XFS (md1p1): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_alloc_file_space+0x206/0x246 [xfs] Nov 12 18:47:33 vault kernel: CPU: 3 PID: 20234 Comm: fallocate Tainted: P O 6.1.49-Unraid #1 Nov 12 18:47:33 vault kernel: Call Trace: Nov 12 18:47:33 vault root: truncate: cannot open '/mnt/disk1/system/docker/docker.img' for writing: Input/output error Nov 12 18:47:33 vault root: mount error Looking at that file, it exists, but it's owned by nobody... should I just update chown and chmod to make it owned by root and executable? -rw-r--r-- 1 nobody users 21474836480 Nov 13 14:11 docker.img root@vault:/mnt/disk1/system/docker#
  6. Man.... I think I've really lost everything that was on disk1.... now I'm getting xfs errors on boot, even though the array can start. It's telling me to: root@vault:~# xfs_repair /dev/md1p1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Attempting to mount the filesystem does not clear this up, and in fact when I go to /mnt it throws three errors: root@vault:~# cd /mnt root@vault:/mnt# ls /bin/ls: cannot access 'disk1': Input/output error /bin/ls: cannot access 'user': Input/output error /bin/ls: cannot access 'user0': Input/output error cache/ disk1/ disk2/ disk3/ disk4/ usb/ user/ user0/ root@vault:/mnt# Is there anything I can do? I don't want to lose everything. I've started copying what's left on disks 2, 3 and 4... if I run xfs_repair -L it seems like I could lose everything.... Please help
  7. Yes, I added a new edit to the post, and included the diagnostics.
  8. Last night my unraid server went unresponsive to webui, ssh and even the local after I plugged a monitor/kb in. After hard powering it (I know), parity started but was crazy slow. I checked the smart status on all the drives and they all checked out. This morning the server is unresponsive again. What should I do? EDIT: I also tried tapping the power button because that would power it down normally too, but that did not work. EDIT2: I was able to get an actual error after the most recent reboot: The log mentions XFS (md1p1): Internal error and XFS (md1p1): Corruption detected. Unmount and run xfs_repair. Is it safe to run xfs_repair? the log is verbose, but keeps repeating this too: [ 314.906880] I/O error, dev loop2, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2 [ 314.906880] I/O error, dev loop2, sector 564320 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.906890] BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 [ 314.906895] BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 1, flush 1, corrupt 0, gen 0 [ 314.906900] BTRFS warning (device loop2): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount [ 314.906903] BTRFS: error (device loop2) in write_all_supers:4370: errno=-5 IO failure (errors while submitting device barriers.) [ 314.906933] I/O error, dev loop2, sector 1088608 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.907022] BTRFS info (device loop2: state E): forced readonly [ 314.907025] BTRFS: error (device loop2: state EA) in btrfs_sync_log:3198: errno=-5 IO failure [ 314.907025] BTRFS error (device loop2: state E): bdev /dev/loop2 errs: wr 0, rd 2, flush 1, corrupt 0, gen 0 [ 314.908111] I/O error, dev loop2, sector 564320 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.908117] BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 0, rd 3, flush 1, corrupt 0, gen 0 [ 314.908138] I/O error, dev loop2, sector 1088608 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.908141] BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 0, rd 4, flush 1, corrupt 0, gen 0 [ 314.908636] I/O error, dev loop2, sector 564320 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2 [ 314.908644] BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 0, rd 5, flush 1, corrupt 0, gen 0 EDIT3: I replaced drive1 which seemed to be coming up in the logs a lot, though no errors, and smart checked out ok. This has allowed me to boot and performance appears normal, however after data rebuild, there are no shares, docker can't start, and no VM's. But yes, now I can get into settings and pull diagnostics. vault-diagnostics-20231112-0706.zip
  9. Great advice, I was able to find some threads suggesting that a reboot would work in this situation and it did. All my shares returned om reboot. So the summary, I don't think I accidentally deleted it after all, I think there was just an issue querying them as the cache drive filled up.
  10. I was in the process of trying to move stuff around from my cache drive as it was full and somehow my brain deleted it. I see all my data is still there but obviously I don't remember what all my shares were mapped to. What do I do? EDIT: I tried to add back some shares I knew by looking at their docker paths like /mnt/user/media however it doesn't seem to actually add. Possibly because the folder already exists? EDIT2: I think the issue wasn't the appdata folder being deleted... I think it disappears because the cache drive filled up and it couldn't write to it. So I guess it's solved?
  11. Oh that's probably the issue then. It is in host. I'll mess with that and see if that fixes it.
  12. Hello, I've got two docker containers both trying to listen on 8443, unifi-controller and MineOS-Node. I figured I'd change the mineos web port to 8444 via the unraid edit, and while it shows the change, the container still spins up on 8443. My apologies if this is a dumb question, but how can I change that value?
  13. I need some education on how to edit game files for the Ark container. I added mods and made a few changes to the GameUserSettings.ini file however upon restart it seems to not save any of them. I've done some searching but have so far come up short. I don't really know what I'm searching for I guess. Other relevant info, I'm using the ich777/steamcmd container on the current version, Ark works great, just all on default vanilla settings. Thanks in advance
  14. I replaced the disk. Thanks everyone for the help.
  15. Extended SMART test said it had errors in the extended test but unraid still shows the drive is healthy. Very strange. Anyway I've got low activity on the system this morning so I'm going to replace it. Any chance someone could explain what I should be looking at in that Extended SMART test?
  16. WDC_WD20EFRX-68EUZN0_WD-WCC4M2ZPD2U2-20200729-2059.txt
  17. Short smart test completed without error
  18. It's still stuck at 90% actually. I do know how. I had one drive I knew was throwing smart errors I used to practice with before I sent it for recycling. Thanks for asking though.
  19. That would be great, thank you. My apologies for the confusion.
  20. Regarding the connection issues, like a sketchy SATA cable or something? Regarding the replacement, yeah I've got a spare drive. Should I swap it out, or take a wait and see approach?
  21. Well the SMART test has been stuck at 90% for quite a while now, so I'm just going to put the diagnostics here now. Syslog looks like it's a bunch of read errors. What does that mean? vault-diagnostics-20200729-1251.zip
  22. Thanks I am currently running the short SMART test on the drive, it looks like the diagnostics collects that data so I'll wait for it to finish.
  23. I've been copying about 7tb of data over the last few days and it seems to not be utilizing the actual network speed. I'm just connected from one physical machine using the unraid's samba share and hit copy/paste. I'm worried I've got a setting wrong somewhere and streaming media will also be subject to this. I'm using Ubiquity 8 port switches and cat5e.
  24. Woke up this morning and noticed this disk has thrown 6974 errors. It still shows green. https://wiki.unraid.net/Troubleshooting#Hard_drive_failures This FAQ suggests nothing to worry about, but it says no-zero. 6974 seems a bit more worrisome than non-zero. Should I get a replacement? For what it's worth, I'm copying a ton of data to the array right now (and for the last few days).
  25. I've been slowly moving some old hyper-v vm's to unraid. In each case I was selecting the Emulated CPU model. However I just noticed that none of the vm's have emulated set and are all set to host pass through. Worse, I can't change them back to emulated as it throws an error: VM creation error XML error: Non-empty feature list specified without CPU model I've been searching the forum and it seems other people get this error, but doing the opposite. Am I doing something wrong? How do I not pass through cpu cores?