privateer Posted March 8 Share Posted March 8 I was in the middle of a scheduled parity check and 66% of the way through it encountered 2112 errors on a drive on the array and paused. I ran a short SMART check which the drive passed, then tried to run an extended test several times. I may have accidentally cancelled once, but the next time I saw it at 50% and then it looked like it had never been run. Right now, if I click to run a smart test, nothing happens, and nothing is shown in the log. Additionally, the drive that has the red X is also displayed under Unassigned Devices. When I check the logs I have loads of errors such as below: Mar 7 21:03:37 Tower kernel: traps: lsof[7196] general protection fault ip:14fe13e634ee sp:6690f0b91e1b046c error:0 in libc-2.36.so[14fe13e4b000+16b000] Mar 7 21:12:23 Tower kernel: traps: lsof[29820] general protection fault ip:151d585114ee sp:86baf5ee546675bc error:0 in libc-2.36.so[151d584f9000+16b000] Mar 7 21:24:35 Tower kernel: traps: lsof[29136] general protection fault ip:152f3c6314ee sp:152a85e9f914a6b9 error:0 in libc-2.36.so[152f3c619000+16b000] Mar 7 21:41:46 Tower kernel: traps: lsof[3668] general protection fault ip:14fd3f0824ee sp:780278895a35ec7 error:0 in libc-2.36.so[14fd3f06a000+16b000] Any thoughts on what might be causing any of this? Attached are my diagnostics. tower-diagnostics-20230307-2048.zip Quote Link to comment
JorgeB Posted March 8 Share Posted March 8 Dis4 dropped offline, it's not logged as a disk problem, looks more like a power/connection problem. Quote Link to comment
privateer Posted March 8 Author Share Posted March 8 5 hours ago, JorgeB said: Dis4 dropped offline, it's not logged as a disk problem, looks more like a power/connection problem. A few quick follow-up questions: Do I have to shut down the server to deal with this? If not, should I stop the parity check, take the array offline, check connections, and restart array? Or what are the next steps? Quote Link to comment
Solution JorgeB Posted March 8 Solution Share Posted March 8 I would recommend at least checking the cables, both power and SATA, or ideally replacing them to rule them out. Quote Link to comment
privateer Posted March 8 Author Share Posted March 8 Unplugged the cables, plugged everything back in. Seems to work right now. I'm going to wait a bit to mark this solved just to make sure it doesn't pop up as a problem again in a few days. Quote Link to comment
privateer Posted March 9 Author Share Posted March 9 9 hours ago, JorgeB said: I would recommend at least checking the cables, both power and SATA, or ideally replacing them to rule them out. Drive rebuild is underway, no issues there. Went through my logs and have found another instance of this error. Mar 8 15:09:36 Tower kernel: traps: lsof[26299] general protection fault ip:14ec3865f4ee sp:9d0015fe713527f1 error:0 in libc-2.36.so[14ec38647000+16b000] Is this unrelated? If so happy to discuss separately. Quote Link to comment
JorgeB Posted March 9 Share Posted March 9 Unrelated, and a single error no cause for concern for now. Quote Link to comment
privateer Posted March 10 Author Share Posted March 10 On 3/8/2023 at 8:31 AM, JorgeB said: I would recommend at least checking the cables, both power and SATA, or ideally replacing them to rule them out. Data rebuild completed, however I received a yellow notification (warning) saying it had successfully completed but the description was 'Canceled'. From the archived notices: 10-03-2023 02:21 Unraid Data-Rebuild Notice [TOWER] - Data-Rebuild finished (0 errors) Canceled warning However, when I go into the system log, I see this: Mar 10 02:20:47 Tower kernel: md: sync done. time=130479sec Mar 10 02:20:47 Tower kernel: md: recovery thread: exit status: 0 Is the cancelled notice an error or bug or something? Not sure how to reconcile these. Quote Link to comment
JorgeB Posted March 10 Share Posted March 10 Looks like a notification issue, there have been other users reporting notifications showing info from the previous action, but I've not been able to reproduce it so far, log shows it completed without error so you're fine. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.