nmills3

Members
  • Posts

    105
  • Joined

  • Last visited

Everything posted by nmills3

  1. is there a solution so switch to a 3 drive pool with 2TB of storage without needing to switch to zfs or is that my only option?
  2. well it's not technically a new device, it's the old device being re added after an extended period of time, will the system just play nice with that or will it try and do something strange as it will recognise it as a previously existing disk in the pool? Also how hard should it be to convert the pool to a raid 5 after adding the drive back?
  3. So a while back i had to remove 1 disk of my 3 drive cache pool because of an issue that was causing write errors for the drive. Since then i've worked out that the drive is fine and that it was an issue with the sas to 4 x sata cable that i was using, i've replaced the cable now. How do i go about re adding the previously removed cache drive back to the cache pool? Before removing the drive the pool was made up of 3 x 1TB ssds and i think i had it as a mirror (yes that was dumb) and it had a total capacity of 1.5TB. After removing the drive the pool got converted into what i think is just a 2 disk mirror with 1TB of total storage. My goal is to re add the failed drive to go back up to 3 disks total and then i want to convert it into whatever raid will give me 2TB of storage with 1 disk being used for redundancy. Ideally it would be nice if there is a way to do this while leaving the data in place and not having to move everything to the array and then back onto the cache pool as the cache pool has a lot of tiny files and will probably take a long time to move over and back again
  4. but when i removed the ssd and started the array a few days ago. it asked me if i wanted to remove the device and i said yes, shouldn't that mean it was removed from the pool?
  5. Should i make this into a new post? i think this might be a bit outside of the scope of the original issue at this point
  6. So i think i found a bug in unraid. It seems like that ssd that i removed a few days ago was causing the cache to not mount properly. With that drive installed but not assigned to the cache pool (it was a 3 drive pool with the 2nd slot unassigned) the pool would fail to mount, there was also an issue with the superblock size but that was fixed with the command "btrfs rescue fix-device-size" after running a check on the pool. Then when running a "btrfs filesystem show" command, i noticed it was listing the installed but not assigned drive as a part of the pool but with a different storage size (about 80gb less than the drives that were actually assigned to the pool). So i shut down and ripped out that drive again and now the server starts fine and the pool seems to be functioning properly. So i think that maybe when unraid removed the disk from the pool when i removed it the first time, it didn't actually remove it properly and it was still trying to use it as a part of the pool
  7. @JorgeB so my situation has got even more strange. So i just replaced the cable and transfered the server to a new case without a cheap hot swap backplane and now the normal drives all seem to be working fine, but now my cache pool is completely dead. So i moved everything, checked all the disks showed up, re-assigned a disk to the main array that had a faulty cable and the started the array and now the cache that was working this morning as a 2 drive raid 1 is just showing 2 disks with the error "Unmountable: Unsupported or no file system". I have no idea what could have happened to them. I did a clean shutdown before hand and all seemed to be working fine since my last update, nothing seemed to be corrupted and everything was working, and now this has happened. tower-diagnostics-20231111-1817.zip
  8. ok, i've disconnected both of the bad drives and it seems to be working now. The cache pool had enough space to just convert into a 2 drive raid 1 and the main array has been running with a missing disk for about 2 months already. So everything seems to be working now. I'm going to try and reinstall the ssd once the replacement cable shows up, but until then, at least the server can limp along. I'll attach new diagnostics anyway just incase there's anything useful tower-diagnostics-20231109-1056.zip
  9. @JorgeB any other advice you have for what I can do would be appreciated
  10. also ignore the drive errors from sde. that's an unassigned drive in a slot that i know is bad
  11. so i just reseated the cables and restarted the server and now most of my docker containers using appdata won't start and a few of them have errors saying about a read only file system. The syslog also has some errors about a checksum and it giving a lot of I/O errors from the same drive as before. I'll attach a new diagnostic to this post. So on a scale of 1-10, how screwed am i in terms of the data on the cache pool? tower-diagnostics-20231108-2045.zip
  12. tower-diagnostics-20231108-1938.zip i'll attach a diagnostics just in case it's useful
  13. ok, the array finally stopped. Hopefully my cache data still exists
  14. so i just tried stopping the array so that it wouldn't keep getting errors while i wait for a replacement cable. The server just sat with the loading thing for a while and wouldn't load any other pages. Now the web ui is responsive again but the array still isn't stopped and the log is filled with this message Nov 8 19:32:50 Tower kernel: BTRFS error (device sdf1): error writing primary super block to device 2 sdf is the first drive in the cache pool. but sdg which is the second in that pool is the one that was giving the crc errors
  15. also once i replace the cable, do i need to do anything else? if it was the array i'd run a parity check after, but i don't know if the cache needs anything like that
  16. well it's using one of those sas to 4x sata adapters from the hba. is it likely that the cable suddenly went bad?
  17. I just started getting crc errors on one of my cache drives. i've never had issues with this drive before, but in the last hour or so it's racked up about 4000 errors. I'm assuming i need to replace the drive but i'm not sure how to replace the cache drive. it's part of a 3 drive raid 1 pool, so am i safe to just stop the array, rip out the old drive and throw a new one in, or do i need to move all the files of the cache drives?
  18. I also don't think it's helping that since the recent change where notifications are now hidden in the little drop down in the top left, you are now expected to go off the colour of the bell to know what notifications are waiting for you. If i see a green bell i assume something is there i can just ignore. A yellow bell means it's something i need to look at incase a larger problem is starting and then a red bell means something has gone very wrong. So everytime a parity check happens, i get the yellow bell and think something is going wrong, then it just turns out to be a parity check notification that could have been ignored
  19. don't some IO related errors show as warnings instead of errors? for example, i'm pretty sure if a drives "UDMA CRC error count" increases, that shows as a warning to the user and not an error in the notifications in the top right. Then once the error count gets high enough the disk gets disabled and then an actual error notification is sent. Hence my example of a warning being an event that doesn't need user intervention. Then once it gets to the point where the drive is disabled, then you get the error notification because at that point it falls into the "you need to fix it" category. I would also like to be clear that for the purpose of warnings, errors and notices. I'm only talking about the notifications in the top left on the UI
  20. so for me personally, i think of warnings as a "Something has gone wrong but we can handle it without intervention", errors as "something has gone wrong and you need to fix it" and notices as a "just so you know, this is happening." I think the issue is that it's almost between the 2 categories. I can see it being a warning when it's unexpected like the array wasn't shut down correctly and parity needs to be rebuilt or when a drive is replaced. But for schedules ones i have to disagree. As it's a scheduled event, the user should already be aware of it and should be expecting the performance hit anyway and most likely will have already set the check to run at a time where that is not an issue anyway. I think that scheduled checks should be considered notices and non scheduled checks should be warnings. Also in response to your comment about it being a major array operation i have a question to ask. Given that the mover could also be considered a major array operation depending on how many files need to be moved, should it not also have a warning level norification when it starts? Currently it has the potential to also have a large performance impact on the array but doesn't have any notifications of any type from what I can find.
  21. Currently when a scheduled parity check starts it displays a notification in the warning category. This isn't really a warning of anything though as it's just a scheduled check. It would probably make more sense for it to be in the notices category as the idea of the notification seems to be to notify you that the parity check is starting
  22. so i have my unraid tower as 192.168.1.208 on my network and i have an swag/nginx container running on the server using the Br network to have it as 192.168.1.252 on my network. All other devices on the network can ping and access 192.168.1.252 including VMs running in unraid, but unraid its self cannot ping that ip and gives me the error "Destination Host Unreachable" is there an easy way to fix this?
  23. the smbd process also seems to be holding about 30000 files open. is that likely to be related?
  24. thanks, so now this thread seems to be done. what do i mark as the solution? because the solution is spread across about 2 pages and multiple posts