dexdiman Posted September 15, 2022 Share Posted September 15, 2022 One of my disks is constantly going offline, but it's not being reflected in the GUI or logs. The disk will show up fine in the GUI with a green indicator saying the device is active. It's passing SMART, and the disk logs aren't showing any errors. If I launch the CLI type "mc" and navigate to /mnt/ to see disks it shows up red (?diskX) or if I browse the disk via the GUI it says "No listing: Too many files". This has been happening a couple times a day. If I stop and start the array the disk comes back fine again until it eventually just goes back "down". I initially discovered this when sonarr wasn't downloading new episodes one day. I went to investigate and the sonarr logs were filled with array read/write errors. NZBget also has tons of read/write errors. Usually when a disk goes down the data is emulated and still "visible" at least, but this is totally different. The data is gone from the array until I stop and start the array and the disk starts working again. I've tried using a different SAS cable (internal and external) and plugging it into a different port on my HBA card. My disks are in a JBOD. Quote Link to comment
trurl Posted September 15, 2022 Share Posted September 15, 2022 attach diagnostics to your NEXT post in this thread Quote Link to comment
dexdiman Posted September 16, 2022 Author Share Posted September 16, 2022 coruscant-diagnostics-20220915-2008.zip Quote Link to comment
JorgeB Posted September 16, 2022 Share Posted September 16, 2022 Disk5 is disabled since you didn't mention that, also read errors on disk7, both look cable related, replace SATA cables, also check filesystem on disk5. Quote Link to comment
dexdiman Posted September 16, 2022 Author Share Posted September 16, 2022 Yeah, I was rebuilding Disk5 when I made the diag file. I've already tried replacing the cable but I'll try replacing it again and update. Quote Link to comment
dexdiman Posted September 20, 2022 Author Share Posted September 20, 2022 (edited) Over the weekend I did some testing. Replaced internal SAS cable - same problem Replaced external SAS cable - same problem Plugged into another HBA - same problem Swapped with a hot spare - same problem Currently, it's rebuilding the data on the hot spare and having the issue. GUI is rebuilding the drive fine but the CLI says the drive is offline. coruscant-diagnostics-20220920-1328.zip Edited September 20, 2022 by dexdiman Added diagnostic Quote Link to comment
trurl Posted September 20, 2022 Share Posted September 20, 2022 26 minutes ago, dexdiman said: CLI says the drive is offline. Those screenshots show the disk is invalid, because it is rebuilding. Diagnostics show disk7 is mounted and 78% full. In what way is it "offline"? Quote Link to comment
dexdiman Posted September 21, 2022 Author Share Posted September 21, 2022 6 hours ago, trurl said: Those screenshots show the disk is invalid, because it is rebuilding. Diagnostics show disk7 is mounted and 78% full. In what way is it "offline"? That's the problem. The GUI and logs all say the drive is perfectly fine and responding but... If I try to view the contents of the drive via the GUI - I see this: If I try and access the drive via Windows Explorer I can't: Anything on that drive is gone from the array (it's not being emulated or anything). One example is my movies share is missing anything on disk7 when it's down. Number of files when disk7 is working: Number of files when disk7 is down: I don't have any screenshots but sonarr, radarr, and nzbget all show read/write errors and won't function when disk7 goes down. Nextcloud has issues as well. The only way to reliably get disk7 to work after it's gone down is to stop and start the array. Then disk7 comes back and works for an undetermined amount of time before it goes down. An easy and reliable way I can tell if it goes down is by launching the CLI typing mc and navigating to /mnt to see if disk7 is red with a ? in front of it. Even rebuilding the disk causes what I showed in my previous post. The GUI and logs saying disk7 is fine and being rebuild but the CLI showing a red ?disk7 and anything on it not accessible. Quote Link to comment
Solution JorgeB Posted September 21, 2022 Solution Share Posted September 21, 2022 Sep 19 23:36:23 Coruscant kernel: XFS (md7): Internal error rec.ir_free != frec->ir_free || rec.ir_freecount != frec->ir_freecount at line 1558 of file fs/xfs/libxfs/xfs_ialloc.c. Caller xfs_dialloc_ag_update_inobt+0xd0/0x121 [xfs] Check filesystem on disk7. Quote Link to comment
dexdiman Posted September 23, 2022 Author Share Posted September 23, 2022 On 9/21/2022 at 1:12 AM, JorgeB said: Sep 19 23:36:23 Coruscant kernel: XFS (md7): Internal error rec.ir_free != frec->ir_free || rec.ir_freecount != frec->ir_freecount at line 1558 of file fs/xfs/libxfs/xfs_ialloc.c. Caller xfs_dialloc_ag_update_inobt+0xd0/0x121 [xfs] Check filesystem on disk7. I ran that on disk 7 and today disk 7 hasn't had any issues. I know I've ran that before but it didn't fix the issue so whatever. I'm just glad it's fixed. Thanks. Quote Link to comment
itimpi Posted September 23, 2022 Share Posted September 23, 2022 1 hour ago, dexdiman said: I ran that on disk 7 and today disk 7 hasn't had any issues. I know I've ran that before but it didn't fix the issue so whatever. I'm just glad it's fixed. Thanks. I wonder if perhaps last time you forgot to run without removing the -n option so that the disk got checked but not repaired? Quote Link to comment
dexdiman Posted September 26, 2022 Author Share Posted September 26, 2022 (edited) On 9/22/2022 at 9:20 PM, itimpi said: I wonder if perhaps last time you forgot to run without removing the -n option so that the disk got checked but not repaired? Yeah maybe. Who knows. I work in IT Infrastructure and see this kind of thing all the time. User tries a bunch of troublshooting - they call me - I try the same things they did and one of them fixes the issue. LOL. Edited September 26, 2022 by dexdiman Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.