Jump to content

Ramshackleton

Members
  • Posts

    21
  • Joined

  • Last visited

Everything posted by Ramshackleton

  1. Update #2: All affected drives now use Molex -> SATA adapters. All drives recognized, no clicking detected. Parity rebuild underway, looking MUCH better in terms of ETA, closer to the day-ish timeframe I was expecting. Hopefully this is the end of the saga. Thanks again for all the help!
  2. Update: Tonight, after fiddling around a bit, I decided to rip out my new power supply and replace it with the old one. After all, it's power-related, right? Long story short: drives started not showing up upon boot, some of the same ones between power supply changes. Then I realized what the problem was: these were shucked drives that previously were probably connected via a Molex -> SATA converter, but now were straight SATA from the PSU. But they were never fixed to deal with the 3.3 V pin issue. This probably wasn't my original problem, as I'd think you'd either have power or not, so it wouldn't be the problem causing the clicking/power drops. Just goes to show what changing too many things at once can do to really complicate matters. I'm too tired to address it tonight, but I'll work on it in the morning/afternoon tomorrow to see if I can get it fixed. @trurl thanks again for all your help thus far.
  3. Thank you! I'm going to re-assess and revert back. Thank you so much for your help so far! Can't tell you how much I appreciate it.
  4. Corsair RM750x. I had another Corsair model before, also 750W, it was just older and didn't have modular plugs. The problem here is that I have an untested and poorly-thought-out power scheme and doesn't ship with nearly enough connectors for this many drives. In any case I will clean this up ASAP.
  5. You know, I think you're onto something - all these problems started after I moved the guts of this server to a new case/power supply. I thought this would be better than my old one, which was full of molex -> SATA power adapters. This one though does have a fair # of splitters - the drive in question is running on the same line as at least 4 other drives and several fans, might be even more drives, it's a bit of a mess. I'm going to try to clean this up and will report back. But in the meantime, I assume I'm safe to abort the parity build so I can shut down?
  6. Ok, so LGTF is the one I said was clicking (and I can hear it doing it now as well). Note this was NOT the original parity drive. This was a data drive. It is mounted but apparently unhealthy. Is he the one slowing things down? Should I try to salvage what's on it and replace?
  7. That is pretty much what I see, but the ETA on the parity rebuild is still really long, currently hovering around 100 days. Diagnostics attached. tower-diagnostics-20220722-1427.zip
  8. Ok, I thought it was started when I took those diagnostics. It's been running all night. Just took the attached ones, thanks! tower-diagnostics-20220722-1129.zip
  9. Ok, I removed the LWTC drive, the old parity disk, diagnostics attached. tower-diagnostics-20220721-2236.zip
  10. I don't have a lost+found share, no. Data looks ok, but hard to know without a deep dive. I guess the question now is, am I safe to assign my new drive as parity, or will it try to take 255 days again?
  11. It mounted this time! Diagnostics attached. tower-diagnostics-20220721-1508.zip
  12. Sorry for being so thick guys, thanks for all the help. Here's the output of the -L, and new logs. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... sb_fdblocks 121671163, counted 123624614 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 5 - agno = 4 - agno = 6 - agno = 7 - agno = 1 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:1879329) is ahead of log (1:2). Format log to cycle 4. done tower-diagnostics-20220721-1350.zip
  13. So just with -v? I did that and got this: Phase 1 - find and verify superblock... - block cache size set to 1473832 entries Phase 2 - using internal log - zero log... zero_log: head block 1879314 tail block 1879310 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.
  14. Phase 1 - find and verify superblock... - block cache size set to 1473832 entries Phase 2 - using internal log - zero log... zero_log: head block 1879314 tail block 1879310 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_fdblocks 121671163, counted 123624614 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 5 - agno = 7 - agno = 2 - agno = 4 - agno = 3 - agno = 6 - agno = 1 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Thu Jul 21 10:53:02 2022 Phase Start End Duration Phase 1: 07/21 10:51:57 07/21 10:51:57 Phase 2: 07/21 10:51:57 07/21 10:51:58 1 second Phase 3: 07/21 10:51:58 07/21 10:52:32 34 seconds Phase 4: 07/21 10:52:32 07/21 10:52:32 Phase 5: Skipped Phase 6: 07/21 10:52:32 07/21 10:53:02 30 seconds Phase 7: 07/21 10:53:02 07/21 10:53:02 Total run time: 1 minute, 5 seconds
  15. Ah yes of course, silly me. Attached! Thanks! tower-diagnostics-20220721-0912.zip
  16. Just completed, said it completed without error. Are there logs to look at? I don't see a link on the web page. Yes, that's one of my reolink cameras trying to FTP video to the array. Disregard.
  17. Thanks for the quick reply, attached. tower-diagnostics-20220720-1032.zip
  18. Thanks all - I just did what @trurl suggested and started with no parity. Also, I've attached the diagnostics from earlier (which were saved on the array, whew), which probably have more useful info. Unfortunately, the LGTF drive didn't mount when I started the array, it's saying unmountable. I suppose I should try to run some repairs on it? tower-diagnostics-20220719-0900.zip
  19. Agreed, I'm fine with losing whatever was written since. I just want a stable system back!
  20. 13 Data disks, 1 Parity (14TB), 1 cache. ~120TB total. I've been having issues since moving my system to a new rack-mount case about a month ago, presumably some power-supply-related issues but things have been pretty stable until the other day: Disk activity was running really slowly so I decided to reboot. When I did, the Parity drive wasn't recognized. I immediately jumped the gun, assumed it was dead, and ran to Best Buy to replace it with another 14TB WD drive (serial # ending in 94UG) to shuck. Once installed, it began rebuilding parity as normal. This process for me at this current array size typically takes about 24 hours. But in this case it was going VERY slowly, targeting hundreds of days. All the data seemed to be there but Plex was very flaky and the system overall was too slow to be very usable. At this point I also heard one of the drives periodically clicking (in a bad way), but I couldn't tell which one. It clearly wasn't the old parity drive because it was no longer connected. I tried running SMART tests and using the Disk Speed Test plugin, but they gave results that weren't too obvious to me that anything was wrong. Eventually though I did figure out which drive was clicking, it was the one with serial # ending in LGTF if you look at the attached diagnostics. However, I don't see any tests that claim he's problematic. However I did have trouble accessing certain files while browsing that drive directly, so he must have something wrong with him. Just not sure if that's my only problem. Anyway, at that point I decided to revert back to my original configuration. I cancelled the parity rebuild, put the old parity drive back in place, keep the clicky drive in as well, run whatever diagnostics I can, and if necessary move data off clicky. The trouble is that the original parity drive (LWTC) is either unrecognized upon most reboots, or sometimes recognized, but if I choose him as parity to start the array, it immediately de-selects it and removes it from the drop-down of potential parity drives. I can swap in the new (94UG) drive for parity and start the array, but I think I'd be back to square one with being unprotected, taking eons to build parity, and a clicky drive slowing me down for hundreds of days. Bottom line: it SEEMS like I have two drives that are in some kind of wonky state, so with only 1 parity, I'm kind of screwed? I guess I'm just looking for tips on what else to try or think about. Thanks in advance. tower-diagnostics-20220719-1136.zip tower-syslog-20220719-1649.zip
  21. Unraid v 6.6.3 I've had some issues with transfer speeds or even directory listings going slowly for the past couple months, never sure exactly what was causing it but noticed that after a fresh restart of my dockers the problem tended to go away at least for a little while. But recently the problem has gotten worse, in that in trying to watch a movie (via Plex, over my own gigabit LAN), the movie would pause every 10-20 seconds. Unfortunately I didn't take my diagnostics at that moment, but I will say that a server reboot at that point did not fix the problem. I did see another thread suggesting that I move my docker containers' config directories directly to /mnt/cache/appdata, which I did do this morning. But it didn't seem to address the issue, as shfs still reported reading around 1-2MB/sec on average. Now I don't know if that's enough to cause problems, and the CPU utilization for the process was usually in the 10% range, but imagine that it was probably worse last night when my movie was stuttering. Bu today's troubleshooting activities have left me scratching my head: I first shut down my Linux VM (which usually has somewhat high CPU utilization but doesn't seem to have much effect on disk IO), then started killing docker containers one by one, keeping an eye on iotop to see if/when the IO would go down. None of them touched it. Then I killed the docker service. Still no change. Then I restarted the docker service which in turned started all the containers back up, and at some time in the next 10-15 minutes, disk utilization dropped back down to essentially zero. Perhaps this was a delayed reaction, but I can't understand why a reboot last night wouldn't have fixed the problem. In any case I know this is fairly rambling but I'm trying to make sure I mention anything potentially relevant. Many thanks in advance for ideas on what to try next. tower-diagnostics-20190701-1512.zip
×
×
  • Create New...