RoachBot

Members
  • Posts

    20
  • Joined

  • Last visited

Everything posted by RoachBot

  1. Thanks, for the reply. I took a pause on fixing my server due to stress, but I'm back at it. It seems the PSU may have been the issue, but I'm running into issues with restoring the array. Shorter Summary I rebuilt disk4 separately (see longer summary) and had no errors or indication of unmountable. Next, I synced parity1 and had no errors. I start the array normally (not maintenance) and disk4 was unmountable. I use `xfs_repair` and now every file/directory is in "lost+found". I have the original disk4 and it mounts fine using Unassigned Devices. What is the best way to proceed if my main concern is losing data or corrupted files? I could try copying original disk4 contents to new disk4, but I'm worried it had write errors from original post. Longer Summary Replaced power cables and checked connections Replaced both drives (parity1 and disk4) and attempted Data Rebuild/Parity Sync Cancelled due to 20k+ read errors and the same i/o power reset errors Replaced PSU Before restoring drives, button and text seem different from step 2 and seem to only indicate Parity Sync (without mention of Data Rebuild) Decide to split up restore to ensure disk4 is rebuilt Rebuild disk4 in maintenance mode. Success, no errors, no indication of unmountable. Sync parity1 in maintenance mode. Success, no errors. Start array normally but disk4 is unmountable. Run `xfs_repair` after doing some research. Every file/directory on disk4 is in "lost+found". Mount original disk4 with Unassigned Devices without issue.
  2. This is the second time I've had 2 drives disabled at once. Both times occurred during my monthly scheduled Parity Check, and both involved a Parity drive. This post is about the first time. Is my data already corrupt since there were write errors or were those ultimately corrected? Should I use 2 drives I have on hand and keep the old ones in case there is a 3rd disk failure during rebuild? Any ideas why I keep getting 2 drives disabled at once? obelisk-diagnostics-20230602-0443.zip
  3. I replaced the corrupt file and successfully rebuilt Parity 2. So far, it's running without issue. I'm still not sure why 2 drives were disabled at the same time and then again with 2 different drives. A new cable connected to HBA controller didn't fix it, but maybe I had too many drives attached to one cable on the power supply. I replaced that particular splitter and distributed power more evenly. Additionally, I discovered my Parity 2 drive had Type 1 Protection enabled and this probably contributed to read errors during rebuild. I'm not sure if that had an impact on 2 drives getting disabled simultaneously. Anyways, I'll try to update if the problem persists. Thanks @JorgeB for all your help!
  4. Yes. Then I used the syslog to get the sector numbers and confirmed there were 128. Is it okay to do the following or do you suggest an alternative? Start array normally. Delete the corrupt file. Copy over the intact file. Re-sync Parity 2 using the data drives.
  5. Does Unraid report ALL read errors that occurred during rebuild? As in "Can I rely on the 128 being the only sectors with read errors"? If yes, before rebuilding I used this comment to figure out which files were in the 128 sectors for disk3 and disk7 and copied the files (from P9GHWUKW and P9GHWK6W) . There was only one file affected.
  6. Perhaps I'm confused. Whenever I rebuild, the Unraid UI has a"Sync" button so I thought re-sync parity and rebuild are synonymous. Did you mean rebuild the parity drive using the data drives?
  7. I re-synced with P9GHWUKW and P9GHWK6W as you mentioned here.
  8. I did another rebuild with the disks you mentioned and there were 101 read errors. The problem sectors were the same. obelisk-diagnostics-20230323-1137.zip
  9. If you don't mind, how are you able to tell disk3 was rebuilt successfully and disk 7 wasn't?
  10. Can any of the intact disks be used to correct the 128 errors during rebuild? YHKZ3J6D - disk 3 - missing any writes that occurred between Rebuild 1 and Rebuild 2 P9GHWUKW - disk 3 - not missing writes between Rebuilds, but it had write errors (?) before being disabled P9GHWK6W - disk 7 - not missing writes between Rebuilds, but it had write errors (?) before being disabled
  11. I ran an extended test on the drives I purchased recently and they both passed. Attached are the SMART reports if that's useful. I found this thread while researching the SMART error. Perhaps it's related? HUS724030ALS640_P9GHWK6W_35000cca0581ce404-20230320-1142.txt HUS724030ALS640_P9GHWUKW_35000cca0581ce810-20230320-1957.txt
  12. I have disk 3 (YHKZ3J6D) intact from before Rebuild 1. It was disabled for read errors but is potentially missing any data written to the array between Rebuild 1 and Rebuild 2. Disk 7 (YVKU27RK) is not intact. It was precleared, re-inserted for Rebuild 2 (which ultimately had errors), and is currently in the array. I have disk 3 (P9GHWUKW) and disk 7 (P9GHWK6W) intact from before Rebuild 2. These are the used drives I acquired recently. They were part of the successful Rebuild 1 and potentially had new data written to them during normal array operation. They were disabled after having read errors as well as what looks like write errors. I'll run the extended smart tests you mentioned and report back. Thanks!
  13. I had 2 drives disabled at the same time by Unraid due to read errors. I successfully rebuilt the array with used 3TB drives I purchased off eBay. Then a day or two later, both drives were disabled by Unraid. This made me think the problem isn't necessarily the drives and might be a cable or controller. I ran successful SMART tests on the original disabled drives and successfully precleared them using a different port. After replacing the cable and checking all connections, I tried to rebuild once more. Unfortunately, it finished but with 128 read errors on Parity 2. Parity 2 is on the same controller/port combination as the problem drives, but using the new cable. I've attached 2 diagnostics: Before 2nd Rebuild (with 2 disabled drives and read/write errors) After 2nd Rebuild (with 128 read errors) Any advice on how I should proceed? Thanks! obelisk-diagnostics-20230315-1832.zip obelisk-diagnostics-20230317-2310.zip
  14. I'm having an issue where the binhex-rtorrentvpn container fails to restart and will not respond to STOP or KILL. It even blocks stopping the Unraid array or restarting the server gracefully. This happens about every other month and I can't figure out why. I usually find out because the GUI is not responding. 2021-11-11 13:15:41,950 DEBG 'rutorrent-script' stderr output: 2021/11/11 13:15:41 [error] 4896#4896: *2477 upstream timed out (110: Unknown error) while reading response header from upstream, client: 10.10.20.127, server: localhost, request: "GET /php/getplugins.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:7777", host: "10.10.20.100:9181", referrer: "http://10.10.20.100:9181/" 2021-11-11 13:16:53,790 DEBG 'start-script' stdout output: [info] Successfully assigned and bound incoming port '30168' 2021-11-11 13:17:40,840 WARN received SIGTERM indicating exit request 2021-11-11 13:17:40,841 DEBG killing watchdog-script (pid 222) with signal SIGTERM 2021-11-11 13:17:40,841 INFO waiting for logrotate-script, rutorrent-script, shutdown-script, start-script, watchdog-script to die 2021-11-11 13:17:41,842 DEBG fd 31 closed, stopped monitoring <POutputDispatcher at 22489151549104 for <Subprocess at 22489151546752 with name watchdog-script in state STOPPING> (stdout)> 2021-11-11 13:17:41,842 DEBG fd 35 closed, stopped monitoring <POutputDispatcher at 22489151549152 for <Subprocess at 22489151546752 with name watchdog-script in state STOPPING> (stderr)> 2021-11-11 13:17:41,843 INFO stopped: watchdog-script (terminated by SIGTERM) 2021-11-11 13:17:41,843 DEBG received SIGCHLD indicating a child quit 2021-11-11 13:17:41,843 DEBG killing start-script (pid 218) with signal SIGTERM 2021-11-11 13:17:42,845 DEBG fd 26 closed, stopped monitoring <POutputDispatcher at 22489151548720 for <Subprocess at 22489151546560 with name start-script in state STOPPING> (stdout)> 2021-11-11 13:17:42,845 DEBG fd 30 closed, stopped monitoring <POutputDispatcher at 22489151548576 for <Subprocess at 22489151546560 with name start-script in state STOPPING> (stderr)> 2021-11-11 13:17:42,845 INFO stopped: start-script (terminated by SIGTERM) 2021-11-11 13:17:42,845 DEBG received SIGCHLD indicating a child quit 2021-11-11 13:17:42,846 DEBG killing shutdown-script (pid 217) with signal SIGTERM 2021-11-11 13:17:42,846 DEBG 'shutdown-script' stdout output: [info] Initialising shutdown of process(es) 'nginx: master process,^/usr/bin/rtorrent' ... 2021-11-11 13:17:42,910 DEBG fd 16 closed, stopped monitoring <POutputDispatcher at 22489151547808 for <Subprocess at 22489151546176 with name rutorrent-script in state RUNNING> (stdout)> 2021-11-11 13:17:42,910 DEBG fd 20 closed, stopped monitoring <POutputDispatcher at 22489151547856 for <Subprocess at 22489151546176 with name rutorrent-script in state RUNNING> (stderr)> 2021-11-11 13:17:42,910 INFO exited: rutorrent-script (exit status 0; expected) 2021-11-11 13:17:42,910 DEBG received SIGCHLD indicating a child quit 2021-11-11 13:17:43,912 INFO waiting for logrotate-script, shutdown-script to die 2021-11-11 13:17:46,915 INFO waiting for logrotate-script, shutdown-script to die 2021-11-11 13:17:49,918 INFO waiting for logrotate-script, shutdown-script to die
  15. What's a simple list of what counts as a different filesystem (e.g. partition, volume, smb share)? I think I'm confused because I would've guessed something like a a single drive with multiple partitions using the same file system like 'xfs' would be able to use hard links across partitions. Thanks for your help!
  16. This seems to have fixed it. Server stayed connected through the night. Thanks, Squid!
  17. After upgrading to Unraid 6.9.0, my server disconnects from the network during the night and I need to reboot it to access it. My server is tucked away, so I set up the syslog server to capture what happens leading up to it. Just in case it's relevant, I also set up ZNC irc bouncer the same night before problems began. Thanks in advance! syslog-127.0.0.1.log
  18. Thanks for the reply. I went ahead and applied the fix and things seem fine. For this path, did you mean '/boot/config/plugins..' because that's where I found it. Thanks again!
  19. I've installed multiple dockers via Community Applications and Docker Hub search. I keep getting the following warnings from the "Fix Common Problems" plugin like the following: I've always used Community Applications to install, so I'm not entirely sure how the template system works. I've been ignoring the warnings because I want to use the Docker Hub version with the configuration and variables I set. Unfortunately, I keep getting new warnings (I'm assuming because the dockers have updated). I can press "Apply Fix" but I don't know how that will change my current setup. How do I stop these warnings while continuing to use the Docker Hub version with my config/variables?