JorgeB Posted August 29, 2016 Share Posted August 29, 2016 If it's not getting a lease something is wrong with your network, you can rename network.cfg back but doubt it will make any difference. Quote Link to comment
Ziggy Posted August 29, 2016 Author Share Posted August 29, 2016 If it's not getting a lease something is wrong with your network, you can rename network.cfg back but doubt it will make any difference. Yeah, I'm starting to think you're right. I just got a DHCP lease, but pinging a Google DNS server returns about 73% package loss, and I still can't load the web interface. I'm not sure why I'm not experiencing any issues on the other machine... Maybe the cable is dying... I'll move the hardware to my router and hook it up there to see what happens. Quote Link to comment
Ziggy Posted August 29, 2016 Author Share Posted August 29, 2016 Nope. It is definitely NOT an issue with my network. I moved the hardware and hooked it up straight to my router after testing the router and new cat5 cable with another device. It is possible that the PCI NIC got damaged, it's been gathering dust for a couple of years now... I don't have another NIC to test . EDIT: I'll go out to buy a new NIC, looks like I can find one el-cheapo for 10 bucks. Quote Link to comment
Ziggy Posted August 29, 2016 Author Share Posted August 29, 2016 Replacing the NIC did not resolve the issue. I'm getting an IP lease, but there's loss when pinging other devices. I am ready to start pulling my hair out at this point. I have NO idea what to try next. EDIT: Putting the other motherboard in resolves the network issue. I am 100% positive it is not a physical issue. Quote Link to comment
Ziggy Posted August 29, 2016 Author Share Posted August 29, 2016 Sorry for the quadruple post... So I slid my old motherboard back in and am having severe issues mounting my disks now... Multiple disks are unmountable, including my two RAID 0 SSD cache drives... https://gyazo.com/bfc06adeaa77e8a7dc2bc71b13a0aa2f I'm assuming one of the filesystems became 'corrupt' because of all the hard resets I had to do to address the issues discussed hereabove. I tried running xfs_repair on /dev/sde but it's been stuck at phase 1 for over an hour now... Phase 1 - find and verify superblock... bad primary superblock - bad magic number !!! attempting to find secondary superblock... ziggy_unraid-diagnostics-20160829-2141.zip Quote Link to comment
JorgeB Posted August 29, 2016 Share Posted August 29, 2016 ...it's been stuck at phase 1 for over an hour now... This can take several hours, up to the same time as a full scan of the disk. Quote Link to comment
Ziggy Posted August 29, 2016 Author Share Posted August 29, 2016 ...it's been stuck at phase 1 for over an hour now... This can take several hours, up to the same time as a full scan of the disk. Thank you for putting up with me! Alright, I'll let it run overnight. What about my two cache drives that are unmountable? They have BTRFS, how do I scan/repair those? Quote Link to comment
JorgeB Posted August 29, 2016 Share Posted August 29, 2016 Alright, I'll let it run overnight. What about my two cache drives that are unmountable? They have BTRFS, how do I scan/repair those? Try one at a time to see if it mounts, the other has to be physically disconnected from the server, not enough to unassign it. Quote Link to comment
Ziggy Posted August 29, 2016 Author Share Posted August 29, 2016 Try one at a time to see if it mounts, the other has to be physically disconnected from the server, not enough to unassign it. Nope, they're not mounting separately either (I tried disconnecting each of the SATA cables and then mounting the drives, no joy). For your information, I did put them in a RAID0. Quote Link to comment
JorgeB Posted August 29, 2016 Share Posted August 29, 2016 I did put them in a RAID0. Sorry, didn't notice that, btrfs recovery tools are not the best, good luck: https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_BTRFS Quote Link to comment
Ziggy Posted August 30, 2016 Author Share Posted August 30, 2016 Sorry, didn't notice that, btrfs recovery tools are not the best, good luck: https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_BTRFS Thanks a bunch. You weren't kidding about the btrfs tools, took me many hours to recover the contents... I'm still stuck with disk two though. xfs_repair did manage to make it mountable again, but it's still spitting out errors in the CLI. On top of that, xfs_repair keeps crashing: https://gyazo.com/45901f8134be18ad90de7317af30b962 . ziggy_unraid-diagnostics-20160830-1523.zip Quote Link to comment
Ziggy Posted September 4, 2016 Author Share Posted September 4, 2016 Bump, still experiencing this issue All my user shares disappeared and I'm getting this error when executing 'ls' in /mnt/user/: https://gyazo.com/2bd24cb0091d09ef165932e6f6cf74ac Quote Link to comment
JorgeB Posted September 4, 2016 Share Posted September 4, 2016 If xfs_repair keeps crashing you can try 2 things, upgrade to v6.2 (includes newer xfs_repair) or move all data to other disk(s) and format that disk. Quote Link to comment
Ziggy Posted September 4, 2016 Author Share Posted September 4, 2016 If xfs_repair keeps crashing you can try 2 things, upgrade to v6.2 (includes newer xfs_repair) or move all data to other disk(s) and format that disk. Hmm, upgrading to v6.2-rc4 did not resolve the issue. How would you recommend I go about moving the data to a new disk? Can I MV everything or should I use specialized tools? Quote Link to comment
JorgeB Posted September 5, 2016 Share Posted September 5, 2016 You can use any tool you want, MC (midnight commander) is a popular one, just make sure you always move form disk share to disk share, never from disk share to user share. Quote Link to comment
Ziggy Posted September 9, 2016 Author Share Posted September 9, 2016 I'm puzzled. I noticed that some of the disks in the array only went into error mode during a parity check. So I wiped the parity and rebuilt it, and after 48+ hours, the issues seem to have gone away... Is it possible that a corrupt parity caused the parity check algorithm to crash? Quote Link to comment
JorgeB Posted September 9, 2016 Share Posted September 9, 2016 Is it possible that a corrupt parity caused the parity check algorithm to crash? Not really, did you grab the diagnostics when that happened? Quote Link to comment
Ziggy Posted September 9, 2016 Author Share Posted September 9, 2016 Is it possible that a corrupt parity caused the parity check algorithm to crash? Not really, did you grab the diagnostics when that happened? Same behavior as originally described in this post. PCIe disks going offline during a parity check, formatting the parity disk **seems** to have resolved the issue... Exact same hardware, only an upgrade to Unraid 6.2 Quote Link to comment
trurl Posted September 10, 2016 Share Posted September 10, 2016 Is it possible that a corrupt parity caused the parity check algorithm to crash? Not really, did you grab the diagnostics when that happened? Same behavior as originally described in this post. PCIe disks going offline during a parity check, formatting the parity disk **seems** to have resolved the issue... Exact same hardware, only an upgrade to Unraid 6.2 Haven't been following the thread, but formatting the parity disk is completely pointless since it doesn't have a filesystem. Perhaps you meant something other than actually formatting. Format means "write an empty filesystem to this disk". That is what it has always meant on every operating system you have ever used. Quote Link to comment
Ziggy Posted September 10, 2016 Author Share Posted September 10, 2016 Haven't been following the thread, but formatting the parity disk is completely pointless since it doesn't have a filesystem. Perhaps you meant something other than actually formatting. Format means "write an empty filesystem to this disk". That is what it has always meant on every operating system you have ever used. Okay, I simply reconstructed the disk by starting the array without the parity disk prior to re-adding it. I was under the impression that this would format the drive since I did not realize the parity disk does not have a FS. So I guess it just overwrote the disk. In any case, my box is still up and running after 3 days. I really thought it was a hardware issue and am bummed we'll probably never know what it was. Quote Link to comment
JorgeB Posted September 10, 2016 Share Posted September 10, 2016 My guess is that whatever was causing your issues was fixed (or aliviated), by the new kernel and drivers on v6.2. Quote Link to comment
Ziggy Posted September 12, 2016 Author Share Posted September 12, 2016 **sigh** Happened again, after 5 days of uninterrupted usage, coincidentally in the time-frame when the automated parity check was supposed to run. Seems to be the drives attached to that SATA controller again (see logs). I'm going to replace the board once more. Quote Link to comment
Ziggy Posted September 15, 2016 Author Share Posted September 15, 2016 Replaced motherboard with the exact same model, ran two MANUAL parity checks: no problems. Ran a scheduled parity check: problem (see logs). I have been able to reproduce this three times now, it seems like only the scheduled parity check triggers this behaviour... Any more advice? ziggy_unraid-diagnostics-20160915-1843.zip Quote Link to comment
JorgeB Posted September 15, 2016 Share Posted September 15, 2016 There's no difference between manual ad scheduled parity checks. Errors continue to be on the 4 disks on the Marvell controller, I would get a different controller, if a 4 port controller is enough you can get an adaptec 1430sa for like 20$ on ebay. Quote Link to comment
Ziggy Posted September 15, 2016 Author Share Posted September 15, 2016 There's no difference between manual ad scheduled parity checks. Errors continue to be on the 4 disks on the Marvell controller, I would get a different controller, if a 4 port controller is enough you can get an adaptec 1430sa for like 20$ on ebay. I'm just not inclined to believe that after 3 attempts, having randomly run between 1 and 3 manual parity checks before setting up a scheduled check, this is just a coincidence. First time, sure. Second time, ehe why not? But three times in a row, being able to reproduce it like that...? I don't know what to tell you. It could also be that it has something to do with mover, which is scheduled to run every two hours. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.