elecgnosis

Members
  • Posts

    50
  • Joined

  • Last visited

Everything posted by elecgnosis

  1. I started the array with a new config and almost immediately got errors on a different drive than I was having trouble with before. I'm going to go through the same Smart/Scrub checks, but I'm not expecting any surprises. Previously, I was running off of a voltage-regulating UPS. I didn't keep it during the move, so I wonder if that's the secret sauce here. I'm using an 850W power supply that has been rock-solid otherwise. I have a new UPS coming, but not for a couple weeks. If I still have trouble on the other side of that, I'll look into replacing the power supply. Marking this as solved for now.
  2. Thanks. Do you have any guidance for the UI not responding? I can force a shutdown if I have to, I'm just trying to avoid it.
  3. Forgot the files, here they are. omni-diagnostics-20231110-1910.zip
  4. After moving recently, I started a new parity check to see whether my disks were still okay. Later, I checked in on it and it seemed to have ran for about six hours, then quit with read errors on disk 1 and connectivity problems with five other drives. There were also UDMA errors on a hot spare that didn't even have any data on it, besides a preclear record. I shut down the server without downloading diagnostics (sorry). I checked how secure data and power cables were, moved some drives around in the server's slots, then powered back on. Next I unassigned the drive with read errors and mounted it in unassigned devices. I ran a BTRFS scrub. It returned an exit code of 0, meaning no errors. Then I ran an extended SMART scan on it and all of the other drives that had connectivity problems. Hopefully those show up in the diagnostics I've included here. Today, I tried reassigning my hot spare to the array in the position that the read error disk had. I accepted that the replacement drive would be rebuilt over and started the array. The operation didn't run for very long before the replacement drive came up with read errors, putting it in an unmountable state. I tried to stop the array, but none of the server buttons in Main are responding. I tried stopping the read-check as well, but nothing is responding there. I can click around the interface to do other things, just not this. I was able to pull both diagnostics and the system log. I also noticed that the system log had an error for /var/log being 97% full. I don't know if that's why the UI is acting like it is. The last time I was dealing with something like this, I remember finding something on the forum saying that stopping loop2 could get it to work again, but I haven't tried that yet. The BTRFS scrub leads me to believe that the original drive with the read error is okay. Should I just do a new config? I'll take any other steps you folks think I should try, otherwise.
  5. I ended up doing a new configuration, sacrificing parity. I then spent the last week doing BTRFS scrubs on all of the original array drives, even the first one that had reported errors. All finished without errors. I'm rebuilding parity now.
  6. I found these steps to try, and after mounting the drive, it appears that the data is still there. However, there's 9TB of data to back up. I don't have 9TB of free space available on any individual drive in the array, though I guess I could still back it up if I distribute the directories across multiple drives. Or I could swap the drive and let parity rebuild the data for me. Unless any of you think there's another route to fixed here that doesn't involve waiting ten days to write 18TB of data, that's the option I'm going to go with. I will hold on to the troubled drive in case something happens during the parity rebuild.
  7. I was away from home for a week and when I got back, I noticed I had an error in Unraid. Something was wrong with a two-year-old drive. I shut down the server, checked the cables, and also moved the drive to another slot. The next startup, it appeared that the drive that I swapped its slot with also had a problem, so I shut down again and shuffled more drives around. This last startup, the one that I posted diagnostics from, still showed the first problem drive as "Unmountable: Unsupported or no file system" but at least it was the only one that seemed to be bad. I'm not sure how to proceed. I could just replace it with a hot spare, but I've seen some posts here where people in similar situations were able to repair or restore the file system and even add the drive back to the array. Diagnostics posted, please advise. my-diagnostics-20230715-1652.zip
  8. That got it. Cables secured, moved drive 11 to another physical position, started back up. Data-rebuild running now. Good enough to call it solved for now, thank you.
  9. I've cancelled the data-rebuild in the UI, but it doesn't seem to do anything. Should I just use the shutdown button? Latest diagnostics attached. omni-diagnostics-20230511-1102.zip
  10. Thank you, I'll get to it. If I don't run into any other trouble, I'll mark as [SOLVED].
  11. I had an old, smaller drive I wanted to replace with a newer, larger drive. I precleared the new drive, shut down the entire server, removed the drive I wanted to replace, and set the new drive in its place. I also pulled one other drive out to check its serial number and re-seated it. Next I powered up the server, set the new drive to replace the old in the array, and started the rebuild. I checked the server this morning and it appears that another older drive (not the one I re-seated) has a rack of read errors. Checking some notes I had, it seems that mysterious read errors have hit this drive before, but amounted to nothing as re-seating it resolved those issues at that time. My problem now is, I want to re-seat or even re-position that drive in the server, but I don't know how to do that without ruining the data rebuild currently in process. Do I need to stop the rebuild, shut down the server, re-seat/reposition/check cables or do whatever physical maintenance this seems to call for, then resume the rebuild, presuming that the "problem" drive is just having physical issues?
  12. I've stopped the array and mounted the cache drive as an unassigned device. The DROP share is there in its entirety. I still don't know why Disk 11 became unmountable after the restart or why the cache drive wasn't recognized, but as for where the missing share went, that mystery is solved.
  13. I checked that old diagnostics and the share was included across all drives. It was marked to prefer the cache and that comment at the end of the file tells the story. As long as the data is still on that cache drive, I should be good, unless you see something else. # Generated settings: shareComment="..." shareInclude="disk1,disk2,disk3,disk4,disk5,disk6,disk7,disk8,disk9,disk10,disk11,disk12,disk13,disk14,disk15,disk16,disk17,disk18,disk19,disk20,disk21,disk22" shareExclude="" shareUseCache="prefer" ... # Share exists on no drives
  14. The share name was "DROP" and it should have been distributed across all disks. My memory may be failing me and it could all have been on the Cache drive, but I will need to stop the array and mount that as an unassigned device to be sure. I was going to do that anyway so that I could replace the drive, but I didn't want to make any further changes before consulting the forum. Could high data usage lead to share or filesystem problems? Edit: Would old diagnostics show share distribution? I may be able to look into the upload on my other post to find it in that case.
  15. I wanted to restart my server a couple of days ago. After restarting and attempting to start the array, Disk 11 was unmountable. I had trouble with this drive a few weeks ago, but as I was able to see data on the drive when mounted as an unassigned device, I did a parity rebuild and thought everything was fine. First I tried swapping the drive for a hot spare, but parity was trying to write the unmountable drive onto the new drive. After again checking the unmountable drive for file data, I did a parity rebuild. I noticed during the rebuild that the cache drive had failed to be recognized (separate problem, adding for context). The rebuild finished today and it was only then that I realized that an entire share was missing. I had intended to replace Disk 11 because of these intermittent failures, but I didn't want to go one step further without seeing if there's anything I can do to fix the share problem. diagnostics-20210304-0928.zip
  16. I had physically removed the original disk 11, but I didn't reassign it to the disk 11 slot. The hot spare was still in its slot when I did the new config. I reassigned it to slot 11, did another new config, and started up the array. Parity is resyncing. Everything seems to be fine now. Thanks, everyone.
  17. I put the original disk 11 back in the array. I set a new config with the same drive assignments. I started the array. Disk 11 still comes up as unmountable/no file system even though when I had it mounted in unassigned devices, all of the data was there. Not sure what to do now. Am I hosed? diagnostics-20210211-0101.zip
  18. I was able to mount the drive with the write error as an unassigned device. I am still able to access its files. I'm looking at this similar topic, and I think I understand the problem better: While there may not have been any mechanical failure or damage in the drive, its BTRFS filesystem was somehow corrupted? So, even after rebuilding, I will need to repair the file system on the new drive by using the Scrub command?
  19. I'm confused now. I chose to pull the drive that had the write error and replace it with my hot spare. When I started the array, as the drives were mounting, the new drive came up as Unmountable, though the rebuild is still happening. I haven't done anything with the original drive that had the read error. I have a bad feeling that the rebuild will result in an empty drive. Can you help me find out what's going on? Do I still have an opportunity to save that data?
  20. I wanted to know what possible causes could be and next steps. I think I have what I need to take action, so I'll mark this solved. @jonathanm and @JorgeB, thanks for your help. If I have trouble after this that I can't understand on my own, should I reply to this thread, open a new one, or reach out over PM?
  21. So I have two paths: Trust the disk's data (new config/re-sync parity) or trust the drive's condition (rebuild on top). Regardless of which option I go with, if another drive goes bad during either operation, I will lose the contents of both drives. If I go with rebuilding on top, would it be better to preclear the disk first? Is there any other way to validate the drive's condition?
  22. So I need to troubleshoot all of my connections and maybe even my power supply. That said, it sounds like it may also be okay as is. How would I reset the red X without replacing the drive that UnRaid disabled?