wheel

Members
  • Posts

    236
  • Joined

  • Last visited

Everything posted by wheel

  1. Nice. So the seeming disk-after-disk issues associated with slot #12 are probably just coincidental? Both the 166k error drive from March and the swiftly-disabled disk this month were pretty old (the latter being a white label I got maybe 4 years ago?), so it makes sense, but the recurrence of #12 issues definitely caught my attention in a single-parity setup.
  2. 5/12 (Diagnostics After 299 Sync Errors Non-Correcting Check) 5/13 (Diagnostics After Correcting Check) 5/14 (Diagnostics After Final, Non-Correcting Check) Hope these help figure out what's going on with the 12 slot (if anything!) tower-diagnostics-20200514-0549-FINAL-NONC-CHK.zip tower-diagnostics-20200513-0732-AFTER-CORR-CHK.zip tower-diagnostics-20200512-1054-AFTER-299ERROR-NONC-CHK.zip
  3. Added to the plan: extra diagnostics sets. I'll report back here with those in ~48 hours or so. Thanks a ton!
  4. That makes sense - trick is, I haven't run a correcting check since the one back in March described above. The check I ran after installing the replacement drive on Sunday/Monday was non-correcting, and that's the same one that's finishing up right now. It does sound like now's the time to run a correcting parity check, with a plan to run a non-correcting check after that check (two checks total, starting this morning) to make sure I don't have a bigger issue specific to Disk 12's hotswap cage considering the consistent issues across disks that may or may not be coincidentally occurring there. (Really, really hope I don't need to replace a middle-of-the-tower hotswap cage in a pandemic, but technically easier than moving everything to a new build...) Thank you both for the help and guidance, JB & trurl!
  5. Sounds like a plan: check's almost done and about to start another one. Presuming it's best to run a non-correcting one to be safe - or should I run this one as correcting, then run another to see if new (vs additional) sync errors appear? Edit: the sync errors stopped growing after they hit 299. Looks like they've stayed stable there overnight and the check's almost done, so definitely a lower volume of errors than last time Disk 12 (or its hotswap slot) started going screwy.
  6. Back to the game - but this time, with a fully-updated unraid 6.8.3 diagnostics set! I was writing to the rebuilt Disk 12 last night when the disk disabled itself with write errors. Had a hot spare 6TB sitting and'll be out of town later this week, so figured I'd go ahead and replace it now with a known-good 6TB. Rebuild seemed to go fine, and I'm running the non-correcting parity check now - bam, at some point, picked up 216 sync errors. Just jumped to 217 while I was typing this. None of the errors are associated with any specific disk on the main page, but showing up at the summary at the bottom. Diagnostics attached; should I stop the noncorrecting parity check? Any new info from the new diagnostics from an updated unraid version? Thanks for any help! Edit - 268 now, steadily growing a few errors at a time. tower-diagnostics-20200511-1913.zip
  7. That's what's been sitting in my cart - just figured I'd confirm that people've successfully taped drives and installed them in otherwise-not-recognizing SS-500s (specifically SS-500s, like OP had issues with in this thread) before I dropped quarantine funds on something for which I might not have any actual use.
  8. Actually haven't ordered any yet: was doing my research first, and had always been under the (possibly mistaken) belief that hotswap cages bypassed the 3.3V issue (mostly because they always worked in the one box).
  9. Having a similar problem, but a little stranger - two boxes, each with 4 Norco SS-500s running to a pair of Genuine LSI 6Gbps SAS HBA LSI 9211-8i P20s. One box recognizes the WD80EMAZ drives totally fine (AMI ECS A885GM-A2, Phenom II X4 820). The other (AMI Supermicro X8SIL, Core i3 540) doesn't recognize a single EMAZ, despite recognizing EFZXs just fine. My first instinct was "oh, I've finally run into the taping issue people have complained about for years," but this thread makes it seem like taping my EMAZ drives and putting them in the Norco SS-500s will simply lead to unraid not starting up, and there's another solution (in the linked 2011 thread above). I've tried swapping boot order in the BIOS, but that doesn't change anything as far as recognition goes. Am I making this pre-taping research more complicated than it should be, and the boot order change is what fixes the "unraid won't start" problem that'll occur once I apply tape to my EMAZ drives and place them into the Norco SS-500 hot swap cages? If so, any idea why one set of cages is giving me trouble while another isn't? I can't find any posted reasons anywhere on the internet (yet) why the X8SIL motherboard would be giving me specific problems with EMAZ drives, but that seems like the only difference (same wires, same cages, same case, same number of drives in setup except the non-EMAZ-accepting box has a cache drive, too). Thanks for any clarification anyone can provide!
  10. Thanks a ton, itimpi - took those steps, rebooted fine, GUI loaded fine, array started fine! One thing seems weird: on the dashboard, my parity drive's now showing a reallocated sector count of 3. I'd *just* run an errorless parity check right after installing IT-flashed LSI cards (to replace Marvell ones) before attempting to upgrade from 5.0.6 to 6.8.3, and my instinct is to run another parity check right now, to make sure it remains constant at 3, but is there anything that could have happened during the upgrade (nothing touched inside the box) to lead to the reallocated sector counts I should be cautious about before running that parity check?
  11. Just tried method 1 (upgrading from 5.0.6 to 6.8.3, or apparently a half-step to 6.5.3?), and unRAIDServer.plg saved fine. I ran installplg unRAIDServer.plg, and: *successfully wrote INLINE file contents *downloaded infozip-6.0-i486-1.txz *verified infozip-6.0-i486-1.txz *installed infozip-6.0-i486-1.txz But then, the plugin tries to download unRAIDServer-6.5.3-x86_64.zip from s3.amazonaws.com. First it tells me s3.amazonaws.com's certificate cannot be verified (unable to locally verify the issuer's authority), and when the HTTP request was sent, the response was 403 Forbidden. With the steps that the Method 1 upgrade plugin, should I just download the 6.8.3 zip from Limetech's site and move the bz* files into the root of my existing flash drive? If I do, can I do that through SMB file transfer and a reboot, or do I need to turn off the server, remove the flash drive, and do anything manually to it that the plugin didn't get around to doing? If my fix is a simple addition of bz* files at this point, are there any other things the upgrade plugin would have taken care of automatically after the 6.5.3 zip was downloded that I now need to take care of manually before trying to boot into 6.8.3? I'm running pure vanilla unRAID at 5.0.6 and have no expectations of adding any dockers/plugins to this box in the future, if that makes a difference. Thanks in advance for any guidance / next steps! I'll go ahead and leave the box on until I hear back here in case staying online after taking the first few steps of unRAIDServer.plg has put me in a position where a reboot would be screwy now.
  12. I'm about to upgrade a box from unraid 5.0.6 to 6.8.3, and I've been reading posts voraciously to make sure I'm not missing out on anything important before making the leap. This is a pure-unraid box - never had any plugins running, never plan to have any docker stuff running. Just adding drives, storing stuff on them, replacing dead ones / upgrading to more space as necessary. The box has 19 8TB drives in play, and I'm about to upgrade its parity to 12TB and start replacing data drives with 12TBs. I went ahead and replaced Marvell SAS cards with Genuine LSI 6Gbps SAS HBA LSI 9211-8i P20 IT Mode ZFS cards (seem OK so far; installed, recognized all drives, running a non-correcting parity check now). It sounds like I may want to upgrade my single-core AMD Sempron to a multi-core chip for parity checks. Motherboard's an ECS A885GM-A2 (v1.1) AM3 AMD 880G SATA 6Gb/s ATX AMD, and box has 2GB of Crucial 240-pin DDR3 1333 (PC3 10600) SDRAM (Desktop Memory Model CT25664BA1339). Existing (but not for long) parity drive is a WDC_WD80EFZX; upgraded 12TB parity drive will be a shucked WD12TB drive. 6 of the data drives are known-SMR drives (ST8000DM004), but I'm pretty sure the rest are CMR: 10 of the data drives are WDC_WD80EMAZ, two are WDC_WD80EFZXs, and one is a ST8000AS0002. Currently, my parity checks seem to be consistently 65MB/s, and the parity check runs for close to 35 hours (will confirm exact timing after this one completes; been going for 15 hours and has about 20 hours remaining according to the GUI). Having read that parity checks can be longer in V6 than V5 (especially working on a single core CPU) and knowing a jump from 8TB to 12TB should add some serious time to parity check anyway, a potentially 2-day parity check or rebuild operation feels really weird (if not downright risky on a regular basis), I'm leaning towards ordering a new CPU today in hopes of getting it sometime next week and upgrading everything then. I'm thinking the only bottleneck I'm going to be looking at that's easily upgradable is the CPU (presuming the 6 SMRs aren't what's killing my parity speeds) - am I missing something else / more impactful, though? I'm looking at AMD Phenom IIs on ebay and saw a warning that at least one model (Black) will ONLY work with DDR2 (not DDR3) ram, so I'm avoiding that one, but does anyone know a specific model of AMD Phenom II or Athlon II that'll be ideal for my hardware setup / future needs? I'm genuinely ONLY caring about parity check / rebuild performance, and will absolutely not be adding a cache drive or docker apps / VMs / anything beyond vanilla unraid to this particular box. It looks like I'll be buying used regardless, so that'll be its own adventure, but identifying the right CPU for my specific situation (if a CPU upgrade's even my smartest move right now) is my primary concern with which I'm going to need some experienced help. Really appreciate any guidance anyone can provide, as I'm in way over my head as far as hardware goes and have been learning on the fly almost exclusively through reading forum posts. Thanks in advance!
  13. Bummed a monitor off a neighbor, got into BIOS and changed boot order back to flash drive, unraid booted up fine, but the 8TB drive isn't showing up at all now as a possible parity replacement. Edit: just ran ls -l /dev/disk/by-id and the 8TB isn't showing up in the list at all. Edit 2: when I needed to swap the BIOS boot order, the new entry (that I swapped back to the flash drive) was "SAS: #0200 ID0E LUN", if that helps. Edit 3: Confirmed the 8TB is fine - just threw it in a caddy on another system and all seems normal, with unraid 6.5.3 recognizing it as an unassigned drive. Any ideas why the 8TB isn't showing up at all? The system's definitely interacting with the drive, if only by virtue of changing the boot order when it was installed vs. not. Diagnostics from post-BIOS-change boot with 8TB drive in play attached. tower-diagnostics-20200416-0854.zip
  14. Box 2 successfully upgraded to 6.8.3, I successfully upgraded a couple of 4TBs to 6TBs with clean parity checks after, and now I’ve hit a new headache. I want to go ahead and upgrade my Parity Drive to 8TB for future failed disk replacements, but whenever I place my precleared (twice) WD 8TB in the hotswap cage slot which previously held my Box 2 Parity Disk and try to boot the system, it fails to boot (normal startup beep, four beeps that usually come after the hard drives are all read, and then the system seems to hang - never shows up as an attached device on my network). When I replace that new 8TB Parity Drive with the old 6TB Parity Drive, the system boots fine. Is there a chance that the 8TB is messing up the boot order in some way, or is it more likely the jump from 6 to 8TB is blowing past my power supply? Or even just something weird with the disk despite finishing two preclears fine? PSU is a CORSAIR Enthusiast Series TX650 650W ATX12V/EPS12V 80 PLUS BRONZE Certified Active PFC High Performance Power Supply; motherboard’s a SUPERMICRO MBD-X8SIL-F-O Xeon X3400 / L3400 / Core i3 series Dual LAN Micro ATX Server Board w/ Remote Management. 11 6TB drives, 8 4TB drives, a 6TB Parity, and a 1TB Cache. edit: Going to be pretty difficult to source a monitor to see where the boot’s hanging, but probably not impossible if it’s my only hope. I’ve attached diagnostics from the last successful boot (using the old 6TB Parity Drive) in case they help. tower-diagnostics-20200416-0011.zip
  15. Ok, I’m about ready to do that (box array is stopped, moving/overwriting 6.8.3 bz* files to flash root) but wanted to make sure - is this magnitude of jump (6.3.5 to 6.8.3) something I can just do over SMB, or should I shut down the system, remove the flash drive, and manually add those bz* files to be safe? Thanks again for the help!
  16. Thanks, Squid - reviewed the thread, got ready to upgrade, and the “Update Plugin” operation gave me two errors: downloading: https://S3.amazonaws.com/dnld.lime-technology.com/stable/unRAIDServer-6.5.3-x86_64.zip ... failed (Invalid URL / Server error response) downloading: https://S3.amazonaws.com/dnld.lime-technology.com/stable/unRAIDServer-6.5.3-x86_64.zip download failure (Invalid URL / Server error response) Is there any way I can still upgrade 6.3.5 to 6.5.3 manually, and then upgrade further using the upgrade system from there?
  17. Slight speed bump on the 6.3.5 to 6.4+ upgrade on Box 2 (decided to go ahead and take care of that one first): Instructions say to install "Fix Common Problems" plugin. Couldn't find a direct URL or .plg in its forum thread, so in order to search for that, I tried adding Community Applications through the URL, but just got an error message that I need a higher version (6.4+) to install it. Is there an easy way to get past this catch-22 and install an older version of Fix Common Problems that it sounds like I'll need to upgrade from 6.3.5 to 6.4+, and from there onto even newer versions of unraid? Thanks in advance for any guidance!
  18. History on First Box ("Box 1"): History on Second Box ("Box 2"): Box 1's at 5.0.6: I'd held off on moving to 6+ until I had LSI cards, but I'm ready to roll with those cards in hand now. Box 2's at 6.3.5 and ready to upgrade so I can get better diagnostic info moving forward. Box 1 just had a parity check complete late last night. Box 2 threw up 166k errors during its last parity check about a week ago on a disk that'd been improperly reconstructed due to read errors the the day prior, and no drives have been modified since. I've identified all corrupted files on that 166k error disk, and can easily replace them after the upgrades, but haven't done anything to them yet. Planning to move cables from misread/error-causing disk if new diagnostics from upgraded Unraid can't provide a better idea of whether my read errors are coming from a bad drive or bad cabling. Both boxes share the same UPS, so I'm planning to upgrade one box at a time for safety and feel like I should prioritize the v5 box (but wanted to check just to be sure that's a good call.). From what I've read, 6.8.3 is a safe update considering the hardware I'm dealing with on both boxes, but if I'm making a wrong assumption there or it's safer to upgrade to a lower but more-stable/tested version, I'd love to know before locking in! In light of all that, any suggestions on the best order in which to take the following steps would be greatly appreciated: A: Upgrade Marvell cards to LSI on Box 1 B: Upgrade Unraid 5.0.6 to 6.8.3 on Box 1 C:Run NC Parity Check on Box 1 (step even needed?) D:Run Correcting Parity Check on Box 2 (to get error count to zero since all files are replacable? step even needed?) E: Upgrade Unraid 6.3.5 to 6.8.3 on Box 2 F: Run NC Parity Check on Box 2 (step even needed?) (LAST STEP?): Move Cabling on Box 2 if New Diagnostics Don't Show Drive Faults; Run NC Parity Check Before Next Rebuild to Avoid Repeat of Corrupted Rebuild
  19. So Disk13 completed the extended SMART self-test without error. Since I'm probably going to end up upgrading a handful of other disks during the course of this mess, my new concern is why Disk13 threw up read errors during the Disk12 rebuild - and how to prevent that from happening again the next time I rebuild a disk. Any guidance on how best to trace that problem to its source and stop it from reocurring would be greatly appreciated!
  20. Extended test's at 50% now, so - holding off! Been spot-checking D12, and already found a few files that won't open properly. Going to be a hunt, but I've got time for it. Thanks a ton for your patience and advice in such a weird time for everyone, JB.
  21. Well, the short SMART test on D13 came back fine, but the extended's been sitting on 10% for over 2 hours now, which feels weird on a 6tb. I'm going to let it keep rolling for awhile, but I feel like this doesn't bode well for that 6tb having much life left in it. Am I safer off replacing that 6tb (if Extended SMART fails) before upgrading unraid to a newer version? If so, since I just ran a non-correcting parity check, is any of the (now-corrupted) D12 data repairable through the old parity I haven't "corrected" yet? Or should I run a correcting parity check before replacing that 6tb?
  22. Old disk 12 is still in the exact same shape, and I have an eSATA caddy on another tower I can hopefully easily use for the checksum compare on the two 12s over the network (about to do some reading on that). Also looking into the reiserfs thing - definitely news to me, and feeling like I should be better safe than sorry on all towers during this mess. (EDIT: File juggling is going to be tough until I can get some more drives in the mail. Hopefully their being reiserfs won’t screw me too hard during the crisis if external hard drives keep getting their shipment times pushed back as non-essential.) Any recommendations on how to confirm whether D13 needs replacing now with the unraid version still sitting at 6.3.5? Thanks again!
  23. Damn. No special reason on the old version; vaguely remember planning to upgrade around 6.6.6(?) but read about some weird stuff going on and decided to hold off for a future version. Time flew by in between then and now (unraid’s mostly a set-and-forget thing for me). So I’m out of 6TBs but can upgrade one in another tower to an 8TB and get another 6TB to use and replace 13’s 6TB if needed. I’m guessing these are my next steps: (1) Confirm file integrity on D12 and D13 (2) Identify whether disk 13 has a problem or if it’s related to hotswap cage or wires or whatever (NOT sure on this one) (3) Upgrade to last stable Unraid release OR (3) Replace D13 and upgrade to last stable Unraid release On the right track? Thanks for the swift help, JB!
  24. Some history on this tower: So the LSI controllers have been in since December and doing fine. I’ve successfully upgraded at least 3, maybe more, 4TB to 6TB drives in the time since, always parity checking before and non correcting parity check after. This is the first time since the LSI cards came in that I had a disk die on me (Disk 12, 2 sync errors on the GUI and drive automatically disconnected). I was maybe 2 days max away from upgrading a random 4 to 6 for space reasons anyway, so I went ahead and put the 6 in and started the rebuilding process. I’d been steadily adding files over the past month and a half or so since my last upgrade (and last parity check), but weirdly not many to the disk that’s now showing 166k errors (Disk 13). The first half or so of the parity check had zero errors. I checked it with about 3 hours left and saw the 166k errors, but let the check run to completion. No more errors popped up in the last 3 hours of the check, the sync error disk (13) isn’t disabled or marked in any negative way outside of the error count, and all files (including the ones added to that disk during the ~45 days of no parity checks) seem to open fine still. With all these factors in play, any suggestions on next steps here? Got a feeling hardware replacements are going to be a pain in this environment, but I’m swimming in free time if there are some time-intensive steps I can take to figure out what’s going wrong here and get things back to normal. Thanks in advance for any help or guidance! tower-diagnostics-20200326-0913.zip
  25. For sure; that's what I'm going to do with this and my other unraid boxes, but since it's never happened before and it's happening twice in relatively quick succession now, I figured I should check in here in case that's a symptom of something else weird going on under the hood. I'm a ridiculously basic user with almost no linux experience, so I realize this could all be totally innocuous - it's the fact that I've run this specific setup (network, unraid, kodi, no changes) for years with no issues, but two strange things are happening concurrently, that's kind of freaking me out. I'm really appreciating all the eyes on this and advice from everyone, though! This place is the best.