FlexGunship

Members
  • Posts

    32
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

FlexGunship's Achievements

Noob

Noob (1/14)

0

Reputation

  1. No overclocking. It's a stock i7-6700 (non-k) with a stock cooler on a stock Dell mobo. I haven't run memtest recently. But I HAVE run memtest since I had the issue. I have also removed each stick (running with 48GB at a time) and have noted a crash in every case. Also... I bought this processor, mobo, and RAM to address the crashing issue. I had the same issue on an i3-6100 previously with 32GB of different RAM on an HP motherboard. I also bought an LSI SAS card to try to solve this issue as the Dell mobo has a Marvel data controller which is notorious for not working in unRAID (I'm told). To be fair, I don't know for a *fact* that the failure mode is identical as the i3-6100 HP days, but the rate of failure and manifestation is the same.
  2. syslog I guess I had it running since last year... so it was huge. I trimmed anything before 2/12 for the purpose of this upload. If you need more of the log... or need to me start it clean, please let me know.
  3. Just updating that the syslog server is enabled, and I'm waiting for the next crash at this point.
  4. Polite Bump I'm still having this issue about every 24 to 36 hours. No lost data, but I would really appreciate any other insights anyone has.
  5. First question first - it seems to be an artifact of early PCIe SSDs. It's actually a single device on a single PCIe slot, but internally it has 4 devices mounted in raid. In my experience, unRAID recognizes a "head" device and 3 "others". If I mount one of the "others" I get 240GB and a single device - if I mount the "head" device, the internal firmware of the SSD kicks in and mount all devices as a single 1TB device. It kind of took a while to figure out how to make it work; I don't pretend to understand the internal machinations. EDIT: Anyway, it made cache pool thing tricky to manage when the array went down. So, I just nixed. No deeper meaning. If, for any reason, you believe this is contributing, I can pull it. Previously my docker.img was on the array. Likewise, I can also collocate the appdata folder if you think it could be related. But the problem, again, existed long before this recent move to the non-array device. I've recently switch to ipvlan - I didn't notice a change for better or for worse after the change. DOUBLE EDIT: Was it clear what the mode of failure was from the first syslog? It's not clear to me that there was evidence of failure in the diagnostics ZIP.
  6. Lol... do you know how many times I'm mentally admonished someone for forgetting their diag.zip? My bad - apologies. athena-diagnostics-20230203-2333.zip
  7. syslog.txt Hi all, I've been battling this problem for a couple of years now, I've never had a true resolution. Problem statement: If running Docker (with containers running, or not) eventually my system will hard lock on a page fault or (rarely) a kernel panic It doesn't seem to matter which containers I run, but ones which access the array more, or seem to use more computational power will speed up the failure I can only recover by physically powering off the server, rebooting, and letting the parity check run again - then I get another couple of days of use In the past, I've tried the following: Remake the flash drive (I've used a total of 4 so far) Blown away the docker image Put the docker image to a single disk Put the docker image to a disk that's not part of the array Enable or revoke privileged mode for every container Limit the memory of each container such that the total sum is less than half my physical memory (64GB) Swapped the mainboard and processor Putting all array disks on an LSI SAS controller Swapped memory Upgraded the power supply So, syslog is attached - hoping someone can help here. My next step is to pull one stick at a time of the 4 DIMMs in the system and assume one stick is bad. I don't have evidence of that, and I've already swapped it... but, evidence seems kind of RAMy. Thanks in advance. EDIT: I didn't immediately put this in the Docker support area because the last time I asked for help, someone pointed me towards a corrupted file system. It didn't resolve the issue, but that person was correct -- I don't know if the Docker thing is a symptom or the root cause.
  8. Diagnostics was my first stop. I'll setup a syslog server. I think I need to stream that log to another machine on the network to debug this, right? Will it work if I do it to the flash and crashes during a write?
  9. I've been using Unraid for ~5 years. I almost always have a smoking gun when I have an issue, but this time, I'm totally lost. Hardware specs at the end. Basically, after a few days, my system goes unresponsive. I can't connect to the webui, or via ssh... and after connecting a monitor, the monitor display goes blank (not, that there's simply nothing displayed, the output actually goes dead, and the monitor displays that its searching for a signal). I can reboot, go through a parity check, and then things are fine for another few days. Logs are useless since I can't get to them. I'm not sure if there's an improved logging plugin that will keep crash files through a power-cycle -- that's what I really need. I thought it might've been a flaky Seagate drive (showing command timeouts but after replacing and rebuilding -- same issue persists. Okay, system specs: Model: Custom M/B: Dell Inc. 09WH54 Version A00 - s/n: /84V2QD2/CN7220066A010Q/ BIOS: Dell Inc. Version 1.3.6. Dated: 05/26/2016 CPU: Intel® Core™ i7-6700 CPU @ 3.40GHz HVM: Enabled IOMMU: Enabled Cache: 256 KiB, 256 KiB, 1 MB, 8 MB Memory: 64 GiB DDR4 (max. installable capacity 64 GiB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000 Mbps, full duplex, mtu 1500 Kernel: Linux 5.15.46-Unraid x86_64 OpenSSL: 1.1.1o Connected drives are in the attached screenshot. Primary disk controller is an LSI MPT SAS2. Only important note is that the 4 OCZ drives are actually a single PCIe-attached SSD. It presents as a single 1TB drive. It's an unassigned device that I only use for quick transfers (not a cache drive) and Plex transcoding. Unraid is basically agnostic to it. Any idea how I can further diagnose this?
  10. Hey all, that seemed to improve things, but I've seen the issue again. Sorry for the delay. I moved 3000 mile, my daughter was born, and I started a new job.
  11. Test in progress. Transcoding some Plex, copying some trivial files, etc. Will report back.
  12. Put it in maintenance mode, and run on disk2 (removed the -n argument). Results below: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 0 - agno = 1 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done
  13. Tried many times... the system hardlocks pretty aggressively. Even locally, with keyboard and mouse on the server, you can't navigate. The syslog is the best I've gotten so far.
  14. Same problem... diag.zip doesn't help since it refreshes on boot. I have the syslog attached below: syslog