Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

silasfelinus

Members
  • Joined

  • Last visited

  1. Yeah, I was feeling overly optimistic. The server crashed almost immediately after I typed that last message. I'm now working on a theory that I was effectively DoSing myself with a personaly project that was using an old build and updating my mariadb container with too many log updates. If so, then killing the old server will have fixed my problem. 5 hours of uptime and counting. If not, I have the server logs configured and I will update.
  2. I’m probably just feeling overly optimistic, but I haven’t had another crash since the afternoon. I’ve gone longer, but I’ve got a feeling from the general responsiveness of my containers that one of the ssd drives I removed was the culprit (I suspect it was the 1tb drive that gave CRC errors on disconnect, though I would have previously wagered the older 480gb drive was likelier to have been at fault). in any case, I’m marking this as solved to save people unneeded effort, and I’ll install the syslog server and report back if my hope was unfounded.
  3. After two more crashes, I've removed two cache disks that were of questionable consistency and being used exclusively for docker container processes. There were a couple hundred crc errors when I removed one of the disks, and now the system is running again but with meager uptime to say anything consistent. If this doesn't fix it, I'm not currently sure where I'll look next.
  4. Including most recent diagnotics after the aforementioned reboot. alexandria-diagnostics-20241004-1344.zip
  5. As a data point, the server crashed again while I was typing this. Uptime was about 45 minutes. Accessing the NAS directly, the login prompt for non-gui was still available, but it accepted root as login, followed by a 60 second timeout error and then popped back to login without prompting for password. My server still claims to be online via unraid connect. [EDIT: I attempted to shutdown via a soft reset, got the 90-second graceful notice, but 10 minutes later the message was hung on "Shutting down" and I forced reset.]
  6. I've been experiencing server instability for about a month or more, with what feels like escalating frequency. At this point, I suspect a hardware problem, but I would appreciate any insight about where to look next for troubleshooting. Symptoms: after anywhere from 6-48 hours of uptime, server becomes unreachable. If I can access unraid at all, the docker tab will claim to crash but some docker services may still be reachable online, (which is extremely odd). Sometimes I cannot access the web portal at all, but the server will claim to be online via Unraid Connect. What I have done: deleted my docker image and reinstalled apps rebuilt my boot drive (after finding a boot sector error, which I had hoped was the source). disk checks in maintenance mode for all array disks with and then without "-n" (x2) Given the regretful regularity in which I've force-reset my system over the last month, file and docker corruptions have not been a surprise, but it's also making it challenging to identify the root cause of the problem. Today I had to force reset again. I saved my diagnostics and came here to see if anyone has any suggestions. This forum has been exceedingly helpful in the past. Thank you for your time. alexandria-diagnostics-20241004-1238.zip
  7. I haven't had an error since those initial IO's, so I've marked that as solved. I'll admit feeling a bit chagrined if it was a cable issue all along, but I'm so hoping that's all it was, thanks!
  8. Relocated ssd to a new spot on array cables next to the other ssds. It appeared on reboot, no errors reported on Main and all drives accounted for, but now log shows new I/O errors, possibly after triggering Move Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/10:c8:70:5c:8f/00:00:23:00:00/40 tag 25 ncq dma 8192 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/08:d0:88:5c:8f/00:00:23:00:00/40 tag 26 ncq dma 4096 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/08:d8:98:5c:8f/00:00:23:00:00/40 tag 27 ncq dma 4096 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/10:e0:b0:5c:8f/00:00:23:00:00/40 tag 28 ncq dma 8192 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/08:e8:c8:5c:8f/00:00:23:00:00/40 tag 29 ncq dma 4096 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: WRITE FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 61/40:f0:80:bc:e1/00:00:15:00:00/40 tag 30 ncq dma 32768 out Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/08:f8:10:5c:8f/00:00:23:00:00/40 tag 31 ncq dma 4096 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3: hard resetting link Nov 13 11:10:29 alexandria kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Nov 13 11:10:29 alexandria kernel: ata3.00: configured for UDMA/133 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#20 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#20 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#20 CDB: opcode=0x28 28 00 23 8f 5b d8 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597720 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#21 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#21 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#21 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#21 CDB: opcode=0x28 28 00 23 8f 5b e8 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597736 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#22 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#22 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#22 CDB: opcode=0x28 28 00 23 8f 5c 20 00 00 10 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597792 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#23 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#23 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#23 CDB: opcode=0x28 28 00 23 8f 5c 38 00 00 20 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597816 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#24 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#24 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#24 CDB: opcode=0x28 28 00 23 8f 5c 60 00 00 10 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597856 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#25 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#25 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#25 CDB: opcode=0x28 28 00 23 8f 5c 70 00 00 10 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597872 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#26 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#26 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#26 CDB: opcode=0x28 28 00 23 8f 5c 88 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597896 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#27 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#27 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#27 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#27 CDB: opcode=0x28 28 00 23 8f 5c 98 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597912 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#28 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#28 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#28 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#28 CDB: opcode=0x28 28 00 23 8f 5c b0 00 00 10 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597936 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#29 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#29 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#29 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#29 CDB: opcode=0x28 28 00 23 8f 5c c8 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597960 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: ata3: EH complete Nov 13 11:32:01 alexandria Plugin Auto Update: Checking for available plugin updates Nov 13 11:32:01 alexandria Docker Auto Update: Community Applications Docker Autoupdate running Nov 13 11:32:01 alexandria Docker Auto Update: Checking for available updates Nov 13 11:32:07 alexandria Plugin Auto Update: Checking for language updates Nov 13 11:32:07 alexandria Plugin Auto Update: Community Applications Plugin Auto Update finished Nov 13 11:32:49 alexandria Docker Auto Update: No updates will be installed Nov 13 11:33:10 alexandria webGUI: Successful login user root from fe80::729f:a1b:a061:3faf alexandria-diagnostics-20231113-1129.zip
  9. After replacing a string of older hard drives, removing one drive and reducing my array (possibly introducing some duplicate files in the process of trying to save data), and re-mapping cables (19 disk array, lots of cables), the system has finally been running without hard drive errors reported on Main, but today I noticed these in the log after a bunch of dockers refused to load via Traefik: Log is repeating these errors, with a long block of the rd error messages ending in the format: "rd 2044759, flush 0, corrupt 0, gen 0" Nov 13 09:41:54 alexandria kernel: BTRFS error (device sdm1: state EAL): bdev /dev/sdm1 errs: wr 44, rd 2044759, flush 0, corrupt 0, gen 0 Nov 13 09:41:56 alexandria kernel: vethaf9fa70: renamed from eth0 Nov 13 09:41:56 alexandria kernel: br-de846b490fb5: port 13(vethbc0943c) entered disabled state Nov 13 09:41:56 alexandria avahi-daemon[7877]: Interface vethbc0943c.IPv6 no longer relevant for mDNS. Nov 13 09:41:56 alexandria avahi-daemon[7877]: Leaving mDNS multicast group on interface vethbc0943c.IPv6 with address fe80::705a:8ff:fe30:bca5. Nov 13 09:41:56 alexandria kernel: br-de846b490fb5: port 13(vethbc0943c) entered disabled state Nov 13 09:41:56 alexandria kernel: device vethbc0943c left promiscuous mode Nov 13 09:41:56 alexandria kernel: br-de846b490fb5: port 13(vethbc0943c) entered disabled state Nov 13 09:41:56 alexandria avahi-daemon[7877]: Withdrawing address record for fe80::705a:8ff:fe30:bca5 on vethbc0943c. Nov 13 09:41:57 alexandria kernel: br-de846b490fb5: port 13(veth64d52cd) entered blocking state Nov 13 09:41:57 alexandria kernel: br-de846b490fb5: port 13(veth64d52cd) entered disabled state Nov 13 09:41:57 alexandria kernel: device veth64d52cd entered promiscuous mode Nov 13 09:41:57 alexandria kernel: br-de846b490fb5: port 13(veth64d52cd) entered blocking state Nov 13 09:41:57 alexandria kernel: br-de846b490fb5: port 13(veth64d52cd) entered forwarding state Nov 13 09:41:57 alexandria kernel: eth0: renamed from veth304c008 Nov 13 09:41:57 alexandria kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth64d52cd: link becomes ready Nov 13 09:41:58 alexandria kernel: btrfs_dev_stat_print_on_error: 142 callbacks suppressed Nov 13 09:41:58 alexandria kernel: BTRFS error (device sdm1: state EAL): bdev /dev/sdm1 errs: wr 44, rd 2044902, flush 0, corrupt 0, gen 0 Thank you for advise! Diagnostics attached alexandria-diagnostics-20231113-0934.zip
  10. I think we're good! I was worried when I switched cabling and the new drive was suddenly "unmountable". Instead, I started in maintenance and ran short smart tests on all the drives, Disk 9 took a while but finished (for the first time successfully), and now the drive is being rebuilt without errors (so far), and I'm even seeing data that went missing last night and I'd already written off. Thank you for the help! You saved my day!
  11. That tracks! I just checked cabling. Disk 9 and Disk 12 were connected on the same strand next to each other. I've got my NAS maxed with 16 drives in a 15 drive case (plus 4 SSDs), and cabling is a bit of a challenge. I even disconnected the cabling around 12 when I put in the last drive, and I'm fairly certain I didn't put that section back in the exact same cable layout. I'm waiting on the reboot and a fresh report.... Thank you for the response, it's renewed a smidgen of hope.
  12. I've replaced two hard drives in approximately two weeks, and yesterday a third disk died. I started a data rebuild, but had 5mil+ read errors on a different Disk 12 and a flood of I/O errors "xfs_repair: read failed: Input/output error can't read data block 0 for directory inode 3180500740 error 5" on the Disk 9 that is supposed to hold the emulated data. I stopped my data-rebuild (to Disk 9) at 45% this morning after the errors on Disk 12 appeared. I ran short smart tests on everything, which found the I/O errors on 9 and would not let me complete the xfs-repair. Possibly regretably: there was a jammed log on 12, and I saw the warning that I could lose "valuable metadata" but I deleted the metadata and had it complete xfs-repair. I'm just ran the disks in maintenance mode with a disk check, but it said it would take 3 days to complete, and those error messages on 12 re-appeared. I stopped the test, and am now running an extended self-test on Disk 12. Any advise is appreciated. alexandria-diagnostics-20230804-0809.zip
  13. Excellent advice. I'll be doing so right after I print this out and show it to my wife. Just kidding (mostly). I hadn't actually thought my 32 GB DDR4 could be a bottleneck, but it makes sense that my 50+ containers could be overtaxing. That's remarkably simple of a solution. I'll throttle down the containers as my default, and report back if the problems persist after I scrape together the upgrade. I'd honestly missed your last line at first, not realizing it probably had the fix. Thanks for the help.
  14. I have removed nerdpack, and disabled deep scan and anything that said it could be resource intensive on my komga libraries. I'm unclear on what you meant by the mcelog package, was it something in Nerdpack? >Your best bet would be to boot in safe mode, install ca manually and start adding back plugins and docker containers to see if one of them is causing the problem. I really wish I could l create a more structured environment to test, but the problems take even longer to appear when less apps are running, and running a hobbled server for the length of time it would take to test feels untenable. Thank you for the wise advise I may one day rue ignoring. At this point I have the network running and everything seems stable. I'm going to keep monitoring, and watch for the next spike and see what I see in the logs.
  15. Unraid 6.11. I'm getting out of memory issues if I run all my apps for over 1-2 days. The server was last running for 2 days, 18 hours, and I woke up to the "Out of memory errors detected on your server" error and instructions to post to this forum [Definitely not the first time, but this time I'm following the advice. Apparently, it takes me a while to ask for help]. 1.5 days ago, I had every app running, my CPU load spiked to 100%, and it settled once I killed Komga (a go-to troubleshooting step unfortunately, I love the app but it's clearly got a problem independent of this one). I kept Komga off, and the CPU load dropped to normal operating levels and continued that way, as far as I know, until sometime last night. Diagnostics attached, please let me know if I can offer any more info. Thanks! alexandria-diagnostics-20221006-0635.zip

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.