tazman Posted September 25, 2016 Share Posted September 25, 2016 Hi, I am having massive problems with 6.2. See my boiler plate for my config. After the upgrade from 6.1.9 everything seems to be running well and I added a second parity drive which was built without problems. Then I noticed that the system became unresponsive on shares and then the webgui. Eventually I had to take it down with poweroff as nothing else worked. Note that, out of habit, I tried powerdown first and that the powerdown script from 6.1.9 was still installed. After a restart everything first looked normal in the log but the array could not be started. I would sit on "Mounting disks..." and then messages about xfs errors, tainted mounts and eventually kernel traces and kernal panics appeared. Culminating with (not in the attached log): Kernel panic - not syncing: Fatal exception in interrupt Shutting down cpus with NMI Kernel Offset: disabled I reverted to 6.1.9. A memtest didnt show any errors. I was able to start the array. No errors occurred. Back on 6.1.9 xfs_repair -v (and -vn) reported the following error on ALL of my 14 data drives: bad primary superblock - bad magic number !!! I interrupted "attempting to find secondary superblock..." on a 3TB drive after it has been running without any results for 2 hours. The drive sdc1 which shows up with the "bad dir block magic!" error message in the log file is the second parity drive - at least after I downgraded which may have changed the drive numbering. I am at a loss now with what to do as I dont want to do something that makes things worse or even unrecoverable. I welcome your advice on what to do next. Thanks, Thomas syslog.txt Quote Link to comment
JorgeB Posted September 25, 2016 Share Posted September 25, 2016 You should run xfs_repair with v6.2, as it contains a much newer version, you need to start in maintenance mode, you should also use the md* device, e.g., for disk1: xfs_repair -v /dev/md1 Quote Link to comment
tazman Posted September 25, 2016 Author Share Posted September 25, 2016 Thanks, how do I find out which md* devices I have and to which sd* ones they correspond? unRAID lists the disks as sd*. Quote Link to comment
JorgeB Posted September 25, 2016 Share Posted September 25, 2016 Md1 is disk1, md2 is disk2 and so on. Quote Link to comment
tazman Posted September 25, 2016 Author Share Posted September 25, 2016 Thanks johnnie.black. I rebooted 6.2 and performed xfs_repair against all 14 md devices. When I use sd* I still get the same error message as before but I assume that this was wrong in the first place and that these devices should have been sd*1 in the first place which produces the same result as md*. None of the checks returned any errors and I am really wondering what was wrong in the first place. I will continue monitoring. Quote Link to comment
JorgeB Posted September 25, 2016 Share Posted September 25, 2016 You have to use md* to maitain parity. Quote Link to comment
tazman Posted September 25, 2016 Author Share Posted September 25, 2016 Ok, good to know. I read that maintenance mode is needed for that as well. Anyhow. I have been wondering for a while: if you suspect that your device is faulty. And you run corrections on it.... doesn't "maintaining parity" make the parity data faulty as well? If something is really wrong with a disk, wouldn't it make more sense not to touch the parity disk at all assuming that it maintains still better/more accurate data that could be used to rebuild the fault disk from scratch if needed? Quote Link to comment
itimpi Posted September 26, 2016 Share Posted September 26, 2016 Anyhow. I have been wondering for a while: if you suspect that your device is faulty. And you run corrections on it.... doesn't "maintaining parity" make the parity data faulty as well?If the disk already corresponds to parity then you will want it to keep on doing so. If something is really wrong with a disk, wouldn't it make more sense not to touch the parity disk at all assuming that it maintains still better/more accurate data that could be used to rebuild the fault disk from scratch if needed? This is true if a disk has really failed at the physical level, but in such a case you will be able to run a repair of the physical disk in the first place. However if it is unmountable due file system corruption then that corruption is almost certainly already reflected in parity. That is why trying to do a rebuild of a disk marked as unmountable it is still shown as unmountable at the end of the rebuild process and still needs a repair tool to fix the file system level corruption. By running in maintenance mode when doing the repair you ensure that any changes made to the physical disk are reflected in parity. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.