ratmice

Members
  • Posts

    315
  • Joined

  • Last visited

Everything posted by ratmice

  1. So I noticed that one of my data disks (disk 7) and my cache disk, are both showing read errors. The data disk SMART report shows a few CRC errors from a long time ago (I think). The cache disk seems like it's dying. Would someone be kind enough to see if my assessment of the 2 disks is correct? Other recommendations would be greatly appreciated, as well. Thanks for the help. One add'l question: can I use btrfs for cache (in case I want create a pool later) while using XFS for data, or does that cause any problems. tower-diagnostics-20210110-1407.zip
  2. OK, thanks. I think I will just not tempt fate and leave everything alone until the new controllers get here. That should be an adventure, as well. O_o
  3. So, one last question, now that the repair has proceeded, and lots of items placed in Lost + Found, indicating, I think, that there was a lot of corruption, should i bother to try mounting it? or, will that screw up things when I go and try to rebuild the disk. My thinking here is that if it is actually mountable, then unraid will think that it's current state is what it's supposed to be and adjust parity accordingly, thus giving me a screwed up rebuild. As it is it appears that the emulated disk works OK.
  4. That's what I thought, i can wait a few days for the controllers to arrive. Here goes nothing Thanks again.
  5. So, I attempted to run xfs_repair and got this output, seems like the disk is really borked. However is there a way to attempt mounting a single disk while in maintenance mode, or is starting the array and having it choke on this disk enough to just jump to ignoring the log while repairing? I am too *nix illiterate to know this. root@Tower:~# xfs_repair -v /dev/md17 Phase 1 - find and verify superblock... - block cache size set to 349728 entries Phase 2 - using internal log - zero log... zero_log: head block 449629 tail block 449625 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Also, 2 new controllers on the way. If this attempt to repair disk17 is unsuccessful, would the best course of action be to shut down the array, install the new controllers, and then try to rebuild disk 17 to a new drive? or would it be OK to do that now?
  6. Thanks for the prompt reply. As always, Johnny, you are a superb asset to the forums. I am going to replace the controllers ASAP. I currently have a SuperMicro X8SIL-F motherboard. looks like I'm limited to x8 PCI-e cards (but it seems the SASLP are x4 cards). Any recommendations for direct replacements for the SASLP controllers? seems like: LSI 9211-8i P20 IT Mode for ZFS FreeNAS unRAID Dell H310 6Gbps SAS HBA might be a reasonable replacement, I'm just a bit fuzzy on the bus/lane deal. Are there more stable, proven replacements that have the SFF-8087 SAS connector so I can just plug and play?
  7. Here are the post-reboot diagnostics. Also the disabled dis has a note that it in "unmountable: no file system". Not sure if that is SOP for disabled disks, or not. tower-diagnostics-20200625-1540.zip
  8. Unraid - 6.7.2 So I woke up this morning to an array that has a disabled disk (single parity system) and seven disks all with millions of read errors. The array isn't mounted, and is unreachable from the network. In the shares pane only the disk shares are showing up, none of the other shares. I pulled a diagnostic report (attached below), and now wonder what the safe thing is to do. I did start a read test as indicated on the main page, but paused it almost immediately, not knowing if it would screw things up. Last night I rebooted the server, as I was having some trouble with the TV system, as well. All seemed OK, and Plex was able to rescan the TV shows to find some new items that had been added. Watched a couple of episodes, and went to bed. My inclination is that I need to shut it down and check cabling, reseat controllers, etc... as that seems the most likely cause for millions of read errors all of a sudden, but I don't want to do anything that might compromise the system. I need to figure out if all the disks are on the same controller, but need to shut it down to get at the disks to see. In poking through the forums, this morning, I see that Marvell controllers can be an issue, so I assume it's due to one of those. The trouble appears to start around 5:59 in the log, and there are indications that the controller is the issue. I am woefully deficient at understanding these logs, however. This server has been running faithfully for many years, with the current HW, just FYI. Also, I am assuming that the system disabled that one particular disk just because it can only do one with single parity, and it randomly chose it when the array crapped out. Additionally, the system log tool has only one entry: Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134115360 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 20 Any help in how to proceed would be greatly appreciated, I am not very knowledgable about the deep inner working of the UnRAID system, and Linux, in general, but I am a quick learner. Hopefully some of you forum denizens will be able to help me out and point me in the right direction. Thanks for listening. tower-diagnostics-20200625-1429.zip
  9. Thanks again Johnnie. Just to be extra clear (paranoid) the UnRAID managed device number should always be the same as the disk number, correct? So if I need to zero disk 16, I would use md16. Sorry for the cluelessness.
  10. OK, so back again. I am trying to use the 'clear array drive' script in order to shrink my array. I added a drive to the array earlier today and shortly realized that another drive was acting up. I am in the process of trying to remove the newly added drive by the clear drive and then redeploy it for the dodgy drive to be rebuilt. When I run the script, it finishes instantly and the folder 'clear-me' still remains on the drive in question. This drive was only added to the array and formatted (to do so) so does not have any data on it. I don't see any pesky hidden files, so I am wondering how to get it to zero the drive?
  11. Thanks, Johnnie. You always seem to be around to answer these questions and I really appreciate it. Have a great day.
  12. Thanks for the explanation. Also, what happens if I screw up the exclusion/inclusion thing?
  13. Thanks. Just one stupid question, where the clear and remove option says "Make sure that the drive you are removing has been removed from any inclusions or exclusions for all shares, including in the global share settings." Does this apply to settings that are set to "all", as well. SO basically just change all the inclusions and exclusions to "none".
  14. So, I had a precleared disk laying about and decided to add it to a open slot in my array. No problem there, it formatted and I was off to the races. However, just after (of course), I noticed one of my older disks is showing signs of age. Is there an easy (safe) way to remove that newly added, empty disk and just rebuild the dodgy disk onto it without having to rebuild parity. Nothing has been written to the array since adding the new disk.
  15. So all of a sudden the buttons on the front panel of my Norco 4220 enclosure are not working anymore, I did play around with the connector, but they seem dead still. MB is X8SIL-F, been working fine for years. Any help would be appreciated by this not very savvy user.
  16. So my UPC battery died yesterday and forced an unclean shutdown. When I started the array again, a parity check was performed automatically. It came back with ~1500 sync errors. The other interesting thing is that last parity check (~ 2 mos. ago) there were exactly the same number of sync errors. I have included the diagnostics. I was under the impression that, unless you specifically uncheck the option, that parity checks were correcting. This does not seem to be the case as i see NOCORRECT in the syslog. In the main UnRAID window the correcting box was checked, when I went to look aft it completed. My questions are: 1. why wasn't a correcting check done, as I thought that was the default? 2. Does anyone see anything helpful in the syslog? 3. should I go ahead and do a correcting check, or is something else warranted? 4. can anyone tell if it is a particular disk that may be the culprit? Thanks for any wisdom people are willing to impart. tower-diagnostics-20180619-2146.zip
  17. @Limy: truthfully, that sounds like a decent option, however I am unsure that my current ability is up to the task. I have been avoiding VM utilization on my Mac, but may have to rethink it. Thanks for the answer.
  18. Thank you , John. Maybe I should give OSXFUSE another try.. I only need to read so that the data doesn't get dead-ended on the drives if something gets messed up.
  19. SSIA. I would really like a way to at least get at data if the server craps out. I would prefer a native method, as VMs may be beyond my knowledge level at this point. In the past I tried OSXFUSE, but my knowledge, to get it running, was obviously inadequate. Any help would be appreciated.
  20. I thought it had remained offline through a previous power cycle. Anyway, it's zeroing and will be removed ASAP. Once again, thanks all who contributed to this thread, I am always glad that this community has so many helpful, and smart, souls.
  21. Overnight the pre-failing disk was zeroed, I removed it from the array this morning . All went well. Interesting side note is that now the red-balled disk is back online like nothing is wrong? Not sure what to make of that.
  22. So, this brings up an interesting dilemma. In terms of probability of having another issue while extricating myself from the current situation, what would be the best way to proceed? 1. zero and remove the emulated drive first - worried about the stress on the pre-failing while zeroing emulated drive making it actually fail 2. zero and remove the pre-failing drive first - worried about the stress of zeroing on the drive making it actually fail 3. just bite the bullet and pull both, add the new one, new config and live without parity protection for a day. 4. I suppose if 1 or 2 fails I can always fall back to 3. am I waaaaaaay overthinking this?