silasfelinus

Members
  • Posts

    27
  • Joined

  • Last visited

silasfelinus's Achievements

Noob

Noob (1/14)

2

Reputation

  1. I haven't had an error since those initial IO's, so I've marked that as solved. I'll admit feeling a bit chagrined if it was a cable issue all along, but I'm so hoping that's all it was, thanks!
  2. Relocated ssd to a new spot on array cables next to the other ssds. It appeared on reboot, no errors reported on Main and all drives accounted for, but now log shows new I/O errors, possibly after triggering Move Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/10:c8:70:5c:8f/00:00:23:00:00/40 tag 25 ncq dma 8192 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/08:d0:88:5c:8f/00:00:23:00:00/40 tag 26 ncq dma 4096 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/08:d8:98:5c:8f/00:00:23:00:00/40 tag 27 ncq dma 4096 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/10:e0:b0:5c:8f/00:00:23:00:00/40 tag 28 ncq dma 8192 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/08:e8:c8:5c:8f/00:00:23:00:00/40 tag 29 ncq dma 4096 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: WRITE FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 61/40:f0:80:bc:e1/00:00:15:00:00/40 tag 30 ncq dma 32768 out Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3.00: failed command: READ FPDMA QUEUED Nov 13 11:10:29 alexandria kernel: ata3.00: cmd 60/08:f8:10:5c:8f/00:00:23:00:00/40 tag 31 ncq dma 4096 in Nov 13 11:10:29 alexandria kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 13 11:10:29 alexandria kernel: ata3.00: status: { DRDY } Nov 13 11:10:29 alexandria kernel: ata3: hard resetting link Nov 13 11:10:29 alexandria kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Nov 13 11:10:29 alexandria kernel: ata3.00: configured for UDMA/133 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#20 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#20 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#20 CDB: opcode=0x28 28 00 23 8f 5b d8 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597720 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#21 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#21 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#21 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#21 CDB: opcode=0x28 28 00 23 8f 5b e8 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597736 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#22 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#22 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#22 CDB: opcode=0x28 28 00 23 8f 5c 20 00 00 10 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597792 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#23 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#23 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#23 CDB: opcode=0x28 28 00 23 8f 5c 38 00 00 20 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597816 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#24 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#24 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#24 CDB: opcode=0x28 28 00 23 8f 5c 60 00 00 10 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597856 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#25 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#25 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#25 CDB: opcode=0x28 28 00 23 8f 5c 70 00 00 10 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597872 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#26 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#26 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#26 CDB: opcode=0x28 28 00 23 8f 5c 88 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597896 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#27 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#27 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#27 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#27 CDB: opcode=0x28 28 00 23 8f 5c 98 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597912 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#28 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#28 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#28 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#28 CDB: opcode=0x28 28 00 23 8f 5c b0 00 00 10 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597936 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#29 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=30s Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#29 Sense Key : 0x5 [current] Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#29 ASC=0x21 ASCQ=0x4 Nov 13 11:10:29 alexandria kernel: sd 4:0:0:0: [sdd] tag#29 CDB: opcode=0x28 28 00 23 8f 5c c8 00 00 08 00 Nov 13 11:10:29 alexandria kernel: I/O error, dev sdd, sector 596597960 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Nov 13 11:10:29 alexandria kernel: ata3: EH complete Nov 13 11:32:01 alexandria Plugin Auto Update: Checking for available plugin updates Nov 13 11:32:01 alexandria Docker Auto Update: Community Applications Docker Autoupdate running Nov 13 11:32:01 alexandria Docker Auto Update: Checking for available updates Nov 13 11:32:07 alexandria Plugin Auto Update: Checking for language updates Nov 13 11:32:07 alexandria Plugin Auto Update: Community Applications Plugin Auto Update finished Nov 13 11:32:49 alexandria Docker Auto Update: No updates will be installed Nov 13 11:33:10 alexandria webGUI: Successful login user root from fe80::729f:a1b:a061:3faf alexandria-diagnostics-20231113-1129.zip
  3. After replacing a string of older hard drives, removing one drive and reducing my array (possibly introducing some duplicate files in the process of trying to save data), and re-mapping cables (19 disk array, lots of cables), the system has finally been running without hard drive errors reported on Main, but today I noticed these in the log after a bunch of dockers refused to load via Traefik: Log is repeating these errors, with a long block of the rd error messages ending in the format: "rd 2044759, flush 0, corrupt 0, gen 0" Nov 13 09:41:54 alexandria kernel: BTRFS error (device sdm1: state EAL): bdev /dev/sdm1 errs: wr 44, rd 2044759, flush 0, corrupt 0, gen 0 Nov 13 09:41:56 alexandria kernel: vethaf9fa70: renamed from eth0 Nov 13 09:41:56 alexandria kernel: br-de846b490fb5: port 13(vethbc0943c) entered disabled state Nov 13 09:41:56 alexandria avahi-daemon[7877]: Interface vethbc0943c.IPv6 no longer relevant for mDNS. Nov 13 09:41:56 alexandria avahi-daemon[7877]: Leaving mDNS multicast group on interface vethbc0943c.IPv6 with address fe80::705a:8ff:fe30:bca5. Nov 13 09:41:56 alexandria kernel: br-de846b490fb5: port 13(vethbc0943c) entered disabled state Nov 13 09:41:56 alexandria kernel: device vethbc0943c left promiscuous mode Nov 13 09:41:56 alexandria kernel: br-de846b490fb5: port 13(vethbc0943c) entered disabled state Nov 13 09:41:56 alexandria avahi-daemon[7877]: Withdrawing address record for fe80::705a:8ff:fe30:bca5 on vethbc0943c. Nov 13 09:41:57 alexandria kernel: br-de846b490fb5: port 13(veth64d52cd) entered blocking state Nov 13 09:41:57 alexandria kernel: br-de846b490fb5: port 13(veth64d52cd) entered disabled state Nov 13 09:41:57 alexandria kernel: device veth64d52cd entered promiscuous mode Nov 13 09:41:57 alexandria kernel: br-de846b490fb5: port 13(veth64d52cd) entered blocking state Nov 13 09:41:57 alexandria kernel: br-de846b490fb5: port 13(veth64d52cd) entered forwarding state Nov 13 09:41:57 alexandria kernel: eth0: renamed from veth304c008 Nov 13 09:41:57 alexandria kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth64d52cd: link becomes ready Nov 13 09:41:58 alexandria kernel: btrfs_dev_stat_print_on_error: 142 callbacks suppressed Nov 13 09:41:58 alexandria kernel: BTRFS error (device sdm1: state EAL): bdev /dev/sdm1 errs: wr 44, rd 2044902, flush 0, corrupt 0, gen 0 Thank you for advise! Diagnostics attached alexandria-diagnostics-20231113-0934.zip
  4. I think we're good! I was worried when I switched cabling and the new drive was suddenly "unmountable". Instead, I started in maintenance and ran short smart tests on all the drives, Disk 9 took a while but finished (for the first time successfully), and now the drive is being rebuilt without errors (so far), and I'm even seeing data that went missing last night and I'd already written off. Thank you for the help! You saved my day!
  5. That tracks! I just checked cabling. Disk 9 and Disk 12 were connected on the same strand next to each other. I've got my NAS maxed with 16 drives in a 15 drive case (plus 4 SSDs), and cabling is a bit of a challenge. I even disconnected the cabling around 12 when I put in the last drive, and I'm fairly certain I didn't put that section back in the exact same cable layout. I'm waiting on the reboot and a fresh report.... Thank you for the response, it's renewed a smidgen of hope.
  6. I've replaced two hard drives in approximately two weeks, and yesterday a third disk died. I started a data rebuild, but had 5mil+ read errors on a different Disk 12 and a flood of I/O errors "xfs_repair: read failed: Input/output error can't read data block 0 for directory inode 3180500740 error 5" on the Disk 9 that is supposed to hold the emulated data. I stopped my data-rebuild (to Disk 9) at 45% this morning after the errors on Disk 12 appeared. I ran short smart tests on everything, which found the I/O errors on 9 and would not let me complete the xfs-repair. Possibly regretably: there was a jammed log on 12, and I saw the warning that I could lose "valuable metadata" but I deleted the metadata and had it complete xfs-repair. I'm just ran the disks in maintenance mode with a disk check, but it said it would take 3 days to complete, and those error messages on 12 re-appeared. I stopped the test, and am now running an extended self-test on Disk 12. Any advise is appreciated. alexandria-diagnostics-20230804-0809.zip
  7. Excellent advice. I'll be doing so right after I print this out and show it to my wife. Just kidding (mostly). I hadn't actually thought my 32 GB DDR4 could be a bottleneck, but it makes sense that my 50+ containers could be overtaxing. That's remarkably simple of a solution. I'll throttle down the containers as my default, and report back if the problems persist after I scrape together the upgrade. I'd honestly missed your last line at first, not realizing it probably had the fix. Thanks for the help.
  8. I have removed nerdpack, and disabled deep scan and anything that said it could be resource intensive on my komga libraries. I'm unclear on what you meant by the mcelog package, was it something in Nerdpack? >Your best bet would be to boot in safe mode, install ca manually and start adding back plugins and docker containers to see if one of them is causing the problem. I really wish I could l create a more structured environment to test, but the problems take even longer to appear when less apps are running, and running a hobbled server for the length of time it would take to test feels untenable. Thank you for the wise advise I may one day rue ignoring. At this point I have the network running and everything seems stable. I'm going to keep monitoring, and watch for the next spike and see what I see in the logs.
  9. Unraid 6.11. I'm getting out of memory issues if I run all my apps for over 1-2 days. The server was last running for 2 days, 18 hours, and I woke up to the "Out of memory errors detected on your server" error and instructions to post to this forum [Definitely not the first time, but this time I'm following the advice. Apparently, it takes me a while to ask for help]. 1.5 days ago, I had every app running, my CPU load spiked to 100%, and it settled once I killed Komga (a go-to troubleshooting step unfortunately, I love the app but it's clearly got a problem independent of this one). I kept Komga off, and the CPU load dropped to normal operating levels and continued that way, as far as I know, until sometime last night. Diagnostics attached, please let me know if I can offer any more info. Thanks! alexandria-diagnostics-20221006-0635.zip
  10. Postmortem: I misdiagnosed my problem. I had an influx of files coming in that I couldn't access via smb share, but could navigate in Krusader. I knew I'd copied from my cache files the day before and panicked, heading into emergency mode. As it happens, I was only experiencing the problem because they were mismatched case and I didn't have my lowercase settings configured properly for the drives when I viewed in Windows. When I first set up the server, everything came in lower case, but the new files grabbed by Sab and QB were mixed case, and the timing unfortunately coincided poorly with my diagnosis. And the cache files: yep, they were cache all right. From my user created "network_cache" folder under mnt/user/network_cache..... If I felt foolish yesterday, today I'm looking over the damage I caused trying to "fix" the problem (including learning how *not* to restore from backup), and I'm shaking my head. All learning lessons, of course. These are the battle stories I'll get to trot out at the server admin retirement home, yeah? Thanks for the assist, even though I reported the wrong problem!
  11. Super thank you for replying! Nothing looks like zero length, just unreachable. Last question: is there any quick trick to discovering which files were effected other than trying to copy everything and seeing what gives an error?
  12. Unraid 6.9.1 As usual, I learned my lesson after the fact. I was trying to deal with some obstinate cache folders that wouldn't leave, and I stupidly used Krusader to put them in user folders. I even knew that I mustn't combine user and disk folders, but foolishly wasn't thinking that "cache" was equivalent to "disk". So...now what? Sadly, a number of those files were queued Qbitorrent downloads, which were then moved to various directories, which makes it all the more awkward. For what it's worth, the individual files are all replaceable (I've already mentally written them off) and I have no problems with stopping my array, rebuilding parity, or anything else. Currently, I am able to view the files and parent directory folders on an smb share, and can view inside folders using Krusader, but cannot open anything. Should I just delete everything I moved? Most importantly: have I done anything more permanently damaging to my file system by introducing this recursion? Thank you for your help! EDIT: I was wrong about my diagnosis. I moved between a user created cache to user folder, which was completely safe. I was actually experiencing symptoms of the case-handling mismatch between windows and linux. I'm still testing, but changing "Case Sensitive Names" from "Force Lower" to "Auto" corrected the problem I was seeing in at least one share.
  13. The plex server is active and heavy with transcoding, so it can stay on the linux box. I like the idea of everything else being served on the nas. That's a radical suggestion as it goes, and saves me a few problems making sure the active computers are running daily. I've definitely been constrained in my conception so far, as my system has evolved piece by piece, and I'm used to divide and conquer. Sounds like Docker really might be the level up I've been looking for. Thank you so much for the reply!