HidYn Posted September 1, 2019 Share Posted September 1, 2019 (edited) Hi Guys, Came home from a camping trip today to a horror show. I had disk errors and a disabled disk. Following a few guides from the forums I'm trying to rebuild the disk that got disabled but I'm getting read errors from my parity. All disks seem to pass S.M.A.R.T fine. I do a parity check on the 1st of the month and it looks like that completed fine this morning from the e-mail I was sent. Should also mention I'm using a HP Micro Gen8 and seem the have the same issue as an earlier poster with 2 disks appearing in unassigned devices. I didn't use my brain and take a diagnostics before the inital reboot and panic to try and fix the issue. Stupid of me I know but never had any issues for the last 2 years. I've attached a diagnostics of the current state of play in the hope that someone can help. Any help would be really appreciated. You live and learn. Thanks in advance alpha-diagnostics-20190901-1828.zip Edited September 1, 2019 by HidYn Quote Link to comment
JorgeB Posted September 2, 2019 Share Posted September 2, 2019 There are no read errors on the diags posted, rebuild was canceled almost right after array start. Quote Link to comment
HidYn Posted September 2, 2019 Author Share Posted September 2, 2019 4 minutes ago, johnnie.black said: There are no read errors on the diags posted, rebuild was canceled almost right after array start. I stopped the rebuild when the reads/writes shot up to over 22 million and read errors went to over 82k according to the GUI. I'll start a rebuild again and take a screenshot and another diagnostic. Thanks for the reply. Quote Link to comment
JorgeB Posted September 2, 2019 Share Posted September 2, 2019 I see, the problem is that the log is being flooded with these: Sep 1 16:53:20 alpha kernel: ACPI Error: Method parse/execution failed \_SB.PMI0._PMM, AE_AML_BUFFER_LIMIT (20180810/psparse-516) Sep 1 16:53:20 alpha kernel: ACPI Error: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20180810/power_meter-338) Sep 1 16:53:21 alpha kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20180810/exfield-393) So it didn't catch the start of the problem, I believe there's was a way to stopping that, you might want to google it. Quote Link to comment
HidYn Posted September 2, 2019 Author Share Posted September 2, 2019 I ran it again. I’ve attached a screenshot. 2 drives have appeared as unassigned devices. Will look at removing that message that floods when I get in from work thanks for the advice. Excuse the screenshot from an iPad. Best I can do whist in work. Quote Link to comment
HidYn Posted September 2, 2019 Author Share Posted September 2, 2019 It’s just finished the rebuild and the drive is still listed as unmountable and now I have 4 unassigned devices. No idea what’s going on. Quote Link to comment
JorgeB Posted September 2, 2019 Share Posted September 2, 2019 No point in rebuilding with errors on another disk(s), likely a controller/cable/power problem. Quote Link to comment
JorgeB Posted September 2, 2019 Share Posted September 2, 2019 BTW, fix for the ACPI error is in this thread, under ACPI: https://forums.unraid.net/topic/59375-hp-proliant-workstation-unraid-information-thread/ 1 Quote Link to comment
HidYn Posted September 2, 2019 Author Share Posted September 2, 2019 3 hours ago, johnnie.black said: No point in rebuilding with errors on another disk(s), likely a controller/cable/power problem. You might be on to something, Disabled the HPE Smart Array and now rebuilding the disk again. Looks much more promising with it's estimations - 2 days to rebuild 8tb. Fingers crossed this sorts it and thanks again for the help. Quote Link to comment
itimpi Posted September 2, 2019 Share Posted September 2, 2019 Note that if a disk is shown as unmountable before starting the rebuild it will still have that status on completing the rebuild. The only way to clear an unmountable status (besides wiping the disk contents) is to run a file system repair. This can be done either on the emulated drive or on the rebuilt drive. 1 Quote Link to comment
HidYn Posted September 6, 2019 Author Share Posted September 6, 2019 Still having issues. I left the array down and tried to leave it running some smart extended tests. All the disks vanished after a day of it running and went to unassigned devices. I've attached the logs. Can you help me pin it down to exactly what I need to replace. Thanks again. alpha-diagnostics-20190906-1740.zip Quote Link to comment
JorgeB Posted September 6, 2019 Share Posted September 6, 2019 There are errors on all disks, could be the miniSAS cable, the board (i.e., the SATA controller) or the PSU, I would start with replacing the miniSAS cable (or checking it's correctly connected on the motherboard). 1 Quote Link to comment
HidYn Posted September 6, 2019 Author Share Posted September 6, 2019 5 minutes ago, johnnie.black said: There are errors on all disks, could be the miniSAS cable, the board (i.e., the SATA controller) or the PSU, I would start with replacing the miniSAS cable (or checking it's correctly connected on the motherboard). Thanks I'll get on this now. Really do appreciate the help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.