wheel

Members
  • Posts

    201
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

wheel's Achievements

Explorer

Explorer (4/14)

1

Reputation

  1. Following a lot of mid-pandemic work on my unRAID towers, I’ve reached a point where I’m pretty comfortable I’ve done all I can do to ensure against catastrophic failure: finally converted all my ReiserFS drives to XFS, got everything protected by dual parity, resolved a bunch of temperature issues. One thing bugs me, though: two of these 21-drive towers (and one 13-drive tower) are about a decade (and about 7 years) old, and I keep reading snippets of “well, unless your PSU fails and takes out everything at once” in unrelated threads that, combined with the “capacity reduces by 10% or so yearly up to a point” adage, has me thinking I may be dancing on thin ice with all three of these PSUs currently. What gives me pause on replacing all three (or at least the pair of ~decade old ones) immediately is the weird use case of UnRAID (or maybe just mine specifically). All three of these towers were designed for their UnRAID WORM purpose, and none of their parts had any previous life. Am I being extremely paranoid, or is replacement at this point a prudent idea? There have definitely been times (months, even years) where one tower or the other has not been powered on at all, or has seen extremely minimal use (90% idle time when powered on). Could these use situations mitigate the normal “danger zone” timeline on replacing a PSU? …or not enough to ease larger concerns on something like built-up fan dust congealing and overheating the PSU regardless of how long it’s actually been in operation (and at what level of effort)? Any guidance on how concerned I should be (and how swiftly I should replace what I have) would be greatly appreciated! PSU/System Age Specifics (all drives 3.5” between 5700-7200): The 2011 21-disk tower is running on a Corsair Enthusiast Series TX650 ATX12V/EPS12V 80 Plus Bronze, purchased in 2011 The 2012 21-disk tower is running on a Corsair Enthusiast Series TX650 ATX12V/EPS12V 80 Plus Bronze, purchased in 2012 The 2015 13-disk tower is running on a Corsair RM Series 650 Watt ATX/EPS 80PLUS Gold-Certified Power Supply - CP-9020054-NA RM650, purchased in 2015
  2. Damn, it does: all five drives are in the same 5x3 Norco SS500 hot swap rack module (from 2011, so... damn). Since the tower is four of those Norco SS500s stacked on top of each other, I'm going to need to find a basically identically-sized 5x3 hotswap cage replacement if something's dying in that one, and I'm not having any luck with a quick search this morning. Might start up a thread in the hardware section if replacement's my solution. I'm guessing with the SS500 as the most likely culprit for power issues, there's no need for me to run extended SMART tests on the 4 drives throwing up errors, but are there any other preventative measures I can take while figuring out the hotswap cage replacement situation? My gut's telling me it's best just to keep the whole thing powered down for now, but that's a massive pain for family reasons. Thank you for confirming it's likely a power issue (connections feel way less likely considering the age involved, but might try and replace the hotswap cage's cables first just to be safe). Any other suggestions to to make sure I really need to replace this thing before I put the effort into trying to replace it would be really helpful!
  3. All four drives are Seagate SMR 8TB drives, which, considering the whole hard drive pricing thing going on for larger-sized drives, has me mildly concerned. It just feels like it's a cabling issue with 4 closely-related drives throwing up the same issue at the same time, but all of my drives are in 5-drive cages, so it feels weird seeing 4 vs 5 (though it could definitely be just the connection of those 4 drives to my LSI card, I've never seen this sort of multi-drive issue in almost 10 years of operation). Diagnostics attached, because I'm scared to touch a damn thing at this point until someone looks at what I've got going on. Thank you in advance for any guidance provided! Edit: just checked age on the drives, and three may have been purchased around the same time (around 2 years of power-on time), but one's less than a year old and definitely from a different purchase batch. Edit 2: Based on other threads I just checked, I went ahead and ran a short SMART test on each of the 4 affected drive. Updated diagnostics file attached. tower-diagnostics-20210513-1436.zip
  4. Possibly a random question with a stupid easy answer for a competent Linux head, but I’ve been searching for hours with no luck: Is there an easy way in the GUI (or terminal) to determine which disks (by any unique identifier, think I could reverse engineer the info I need from there) are being read directly through the SATA ports on my motherboard vs. the ones plugged into my LSI cards? When I initially set up this box (with 4 stacked 5-in-3 Norco hotswap cages), I wasn’t paying attention to which cable ports on the back were associated with which drives (in a left to right order), and when I compare it to another box using the same Norco hotswap cages, I’ve realized they probably changed production between my building the two boxes (both hotswap cage sets are SS-500s, but have different port layouts on the back and different light colors up front) and online instructions aren’t really helping now. So my initial plans of just tracing the motherboard-connected plugs to the hotswap cage cable port fell flat, and now I’m just trying to determine which of these swap cage trays are the ones connected directly to the board so I can use them for Parity drives specifically (as parity’s taking forever on this system and I’m following all the steps for even marginal improvement). Is there an easy way to just see if a certain disk / which disks are SATA1/2/3/4 (the four ports I have on the motherboard) and which are running through the LSIs (16 out of 20)? Thanks for any help, and sorry if this is the dumbest question I’ve ever asked on here. Always appreciate the assistance!
  5. I was kind of hoping that’d be the case, but felt like it’d be safest to check when playing with Parity on a massive array I haven’t moved to dual Parity yet. Thanks for the help!
  6. Same situation as OP, but I’m physically moving my Parity Disk to a slot currently holding a data disk. Just completed an unrelated Parity check, so timing seems perfect. Anything I need to do differently, or swap disks / new config / re-order in GUI / trust Parity works just as simply for (single) Parity In 6.8.3? Thanks for any guidance!
  7. Yeah, I'm just reading tea leaves at this point and hoping there's something obvious I'm missing. I have at least two (could be three in a couple of days) theoretically fine 8TBs ready to roll, and the original 6tb that was throwing up errors (which may have nothing to do with the disk, now) before the rebuild. GUI shows the rebuild ("Read-Check" listed) as paused. I'm guessing my next steps without a free slot to try are going to be: Cancel rebuild ("Read Check"). Stop array, power down. Place (old 6tb? another different 8tb?) into Disk 12 slot. Try a rebuild again today (since I'm guessing unraid trying to turn the old 6tb into an 8tb but failing mid-rebuild means I can't simply re-insert the old 6tb and have unraid automatically go back to the old configuration?) Any reasons why I shouldn't other than the fact that I'm playing with fire again with another disk potentially dying while I'm doing all these rebuilds? I'm starting to think my only options are firedancing or waiting who knows how long for an appropriate hotswap cage replacement and crossing my fingers that I'll physically rebuild everything fine (and I'm almost more willing to lose a data disk's data than risk messing up my entire operation).
  8. Unfortunately not - it's an old box (first built in 2011, I want to say?), four 5-slot Norco SS-500 hotswap cages stacked on each other in the front. Nothing ever really moves around behind the cages, and the only cable movement that I can recall since I first built it was when unplugging/replugging the cage's breakout cables when replacing the Marvell cards with LSIs back in December (and these issues with disk 12 started occurring maybe a quarter of a year later). The hotswap cage containing Disk 12's slot is the second up from the bottom, and could be a massive pain to replace (presuming I can find a replacement of such an old model, or one that doesn't mess up the physical spacing of the other 3 hotswap cages). Edit 2: any chance the rebuild stopped at *exactly* 6tb could be significant? Feels like a bizarre coincidence.
  9. Soooooo something may be up with the Disk 12 slot. That 6tb couldn't finish an extended smart test, so I dropped what I was pretty sure was a fine 8TB (precleared and SMART ok after being used in another box for a couple of years) into the slot for the rebuilt. Had a choice between using an SMR Seagate and CMR WD and used the WD. Rebuild was interestingly exactly 75% complete (right at the 6tb mark) and the new 8tb in the Disk 12 slot started throwing up 1024 read errors and got disabled. My instinct's to throw another 8TB spare in the slot and try it again, but something feels weird, so here's the diagnostics. Am I reaching a point where something's likely wrong with the hotswap cage and I'm going to need to buy / replace that whole thing again? tower-diagnostics-20200605-0534.zip
  10. OK, running extended test now - hate that it's consistently throwing up errors and need to replace a 6tb soon anyway, but definitely don't want to throw out disks unnecessarily during what could be a weird economic time for getting new disks. Thanks for the quick response!
  11. The sync (vs disk) correcting parity check was a total brain fart on my end, and I'm hoping it turned out okay (no error messages but I'll go back to check the underlying data as soon as I can). I was just writing to disk 12 and the GUI threw up a read error, so I immediately pulled diagnostics to send here. I have a precleared 8tb spare ready to replace Disk 12's 6tb, and I'm leaning towards just shutting down and throwing that thing in there to start a Disk 12 rebuild/upgrade now - any reasons I shouldn't do that in terms of better-safe-than-sorry? Thanks for all the guidance! tower-diagnostics-20200604-1054.zip
  12. Weird Disk12 happenings again. I had an unclean shutdown with someone accidentally hitting the power button on my UPS that powered two unraid boxes. One booted back up and prompted me to parity check. One (this one) weirdly gave me the option for a clean shutdown, which I took, then started back up. No visible issues, but felt paranoid, so ran a non correcting parity check before modifying any files. ~200 read errors on Disk 12. Ran correcting parity check. Tried collecting diagnostics at every possible opportunity to help see if anything weird turned up that anyone else might notice: 5-27: right after "unclean" / clean shutdown 5-29: after non-correcting parity check 5-30: after correcting parity check tower-diagnostics-20200530-2053.zip tower-diagnostics-20200529-2000.zip tower-diagnostics-20200527-1017.zip
  13. Thought I'd update in case it helps anyone else searching threads: the 3.3V tape trick worked, so I'm not sure what the root problem was, but if anyone has these drives working in some SS-500s but not others, rest assured the tape trick should work on those other SS-500 cages.