JorgeB Posted October 23, 2018 Share Posted October 23, 2018 Diags are showing some ATA errors on the parity disk, likely a connection issue. Quote Link to comment
sawdustfarmer Posted October 23, 2018 Author Share Posted October 23, 2018 My server is a HP Gen8 Microserver. If it is a faulty sata cable then I have to replace all of them. It looks like this https://www.serverworlds.com/hp-724493-001-microserver-g1610t-gen8-3-5-cable-kit-718089-001/ Quote Link to comment
JorgeB Posted October 23, 2018 Share Posted October 23, 2018 Change that disk to a different slot and see if the problem follows the disk. Quote Link to comment
sawdustfarmer Posted October 25, 2018 Author Share Posted October 25, 2018 I did a preclear on my new 8TB so i wasn't able to swap the drive slot until tonight (preclear came back fine from the same bay Disk1 was in) I've swapped the parity to the same bay that disk1 was in as it was spare. I started the array and have attached the diags I also followed these instructions to try and get rid of the ACPI error message while I had the server off. I added "rmmod acpi_power_meter" to "/boot/config/go" on the usb boothttps://forums.unraid.net/topic/59375-hp-proliant-workstation-unraid-information-thread/ tower-diagnostics-20181025-2214.zip Quote Link to comment
JorgeB Posted October 25, 2018 Share Posted October 25, 2018 ATA errors followed the disk. Quote Link to comment
sawdustfarmer Posted October 25, 2018 Author Share Posted October 25, 2018 What would you suggest I do now? Quote Link to comment
JorgeB Posted October 26, 2018 Share Posted October 26, 2018 Use a different disk, it's either bad or has some compatibility issues with the server. Quote Link to comment
sawdustfarmer Posted October 26, 2018 Author Share Posted October 26, 2018 At the moment I have disk1 removed because it was having errors. It’s being emulated by the paritySo I can’t take the parity drive out while disk1 is dead. Should rebuild disk1 from the parity first, then replace the parity?Sent from my iPhone using Tapatalk Quote Link to comment
JorgeB Posted October 26, 2018 Share Posted October 26, 2018 You can try that, the ATA errors will slowdown the rebuild but it still should finish successfully. Quote Link to comment
sawdustfarmer Posted October 26, 2018 Author Share Posted October 26, 2018 Should I be concerned about this, I've just come back to the server after having the array online for 24 hours to errors on disk 2 and 3 now. Could something be wrong with the B120 controller found in the HP Gen8 causing all the errors? tower-diagnostics-20181026-2137.zip Quote Link to comment
JorgeB Posted October 26, 2018 Share Posted October 26, 2018 The controller on those servers in AHCI mode is just a regular Intel SATA controller and works very well, errors are similar to the previous ones, most likely there's a problem with the cable, board or power supply. Quote Link to comment
trurl Posted October 26, 2018 Share Posted October 26, 2018 2 hours ago, sawdustfarmer said: I've just come back to the server after having the array online for 24 hours to errors on disk 2 and 3 now. Do you mean you had to open the webUI before you knew you had this problem, or did you get a Notification be email or other agent? You really must have Notifications sent to you when the events occur. Quote Link to comment
sawdustfarmer Posted October 26, 2018 Author Share Posted October 26, 2018 I’ll order a new sas cable and power supply before I go any further, hopefully it’s not the motherboard. I don’t have email notifications setup, they just pop up in the web GUI. I’ll set up email notifications now but there isn’t a lot I can do most of the time because I can’t access my server remotely, I do check the Web GUI every time on my pc though Quote Link to comment
trurl Posted October 26, 2018 Share Posted October 26, 2018 If you get Notifications sent to you, then you will know if you need to look at the webUI. Many people have not looked until they knew something was wrong because they had multiple disk problems and Unraid couldn't emulate all their data anymore. Which of course is too late for reliably recovery. Quote Link to comment
sawdustfarmer Posted November 17, 2018 Author Share Posted November 17, 2018 So the parts I needed finally came in, I have a new power supply and SAS cable with backplane. At the moment I have just replaced the Power Supply and removed an unnecessary molex y spliter and molex to sata (just incase they were the issue) Can you please check the attached diags for errors I also have notifications to my phone setup through pushbullet so i shouldn't miss a thing tower-diagnostics-20181117-1142.zip Quote Link to comment
sawdustfarmer Posted November 17, 2018 Author Share Posted November 17, 2018 It also says "Unmountable disk present: Disk 1 • ()" and sees its partition size as 4TB, I physically removed that failing 4TB drive. Another thing to add the parity doesn't seem to be emulating the missing files from the failing 4Tb I removed. Quote Link to comment
JorgeB Posted November 17, 2018 Share Posted November 17, 2018 Syslog is spammed with ACPI errors and cuts off, but try checking filesystem on disk1 Quote Link to comment
sawdustfarmer Posted November 17, 2018 Author Share Posted November 17, 2018 The ACPI error is to do with the HP Gen8 on a UPS I believe. The server is currently doing a read check and has about another 11 hours left. Disk1 is physically out of the system and has been for the last 2 weeks while the server was off. I just plugged Disk1 (the failing disk) in via USB and mounted it through unassigned Devices and I can see all the files are there through /mnt/disks in MC so they're not lost, but shouldn't the files be emulated through parity anyway, there is no /mnt/disk1 in MC either Quote Link to comment
JorgeB Posted November 17, 2018 Share Posted November 17, 2018 The emulated disk isn't mounting, so not data from it will be available, filesystem check should fix it. Quote Link to comment
sawdustfarmer Posted November 17, 2018 Author Share Posted November 17, 2018 sorry I miss understood you, you mean follow these instructions for BTRFS at this link? https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Quote Link to comment
JorgeB Posted November 17, 2018 Share Posted November 17, 2018 If the data disk really is btrfs formatted I would really need to see the syslog before, and likely your best option is going to be rebuilding parity with the old disk, assuming the old disk is OK. Quote Link to comment
sawdustfarmer Posted November 17, 2018 Author Share Posted November 17, 2018 would the diagnostics from my post on October 25 have the syslog details you need? the server has been turned off since then otherwise the last time the failing Disk1 was in the system was on my post from October 22 and I copied all the files off it to an external USB drive Quote Link to comment
JorgeB Posted November 17, 2018 Share Posted November 17, 2018 I would need to see a syslog showing the mount attempt of the emulated disk, and resulting errors. Quote Link to comment
sawdustfarmer Posted November 17, 2018 Author Share Posted November 17, 2018 I stopped the read check, turned off the server, inserted Disk1 back into the system, is this ok to start the array, Unraid thinks its a new device? Quote Link to comment
JorgeB Posted November 17, 2018 Share Posted November 17, 2018 No, don't do that, or it rebuild on top of the old disk Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.