oryhara Posted June 17, 2015 Share Posted June 17, 2015 Updated: Second crash in as many days. Exact text of kernel panic: Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x0 from 0xfffffffff81000000 (relocation range: 0xfffffffff80000000-0xffffffffff9fffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt transcribed from a picture on my cell phone, so I might have the wrong number of f's and 0's. Woke up this morning to a stopped array with unclean shutdown detected. I had left a terminal open tailing the syslog, but all it showed was disks spining down around 3 AM. How can I find out what caused my raid to shutdown uncleanly? Its on a UPS and connected, so if power went out it should have shut down cleanly. It had been doing this nightly crashing before the upgrade to 6.0, but I had hoped that would fix it. Link to comment
enetec Posted June 17, 2015 Share Posted June 17, 2015 It seems an unexpected self-reset because of failing/having issues hardware to me... I would start with an accurate (read: long...) Memtest and with a doublecheck of fans and operative CPU temps... Even bad BIOS settings (e.g. overclocking, etc...) could be a cause of this too... (revert any changes eventually) Link to comment
oryhara Posted June 17, 2015 Author Share Posted June 17, 2015 I did a long memtest before the 6.0 upgrade. There is no overclocking on my system. Only change to stock BIOS was to boot from USB stick instead of a hard drive. It happened sometime between 0314 and 0600. Could the mover script run at 0340 have caused this? Link to comment
trurl Posted June 17, 2015 Share Posted June 17, 2015 I did a long memtest before the 6.0 upgrade. There is no overclocking on my system. Only change to stock BIOS was to boot from USB stick instead of a hard drive. It happened sometime between 0314 and 0600. Could the mover script run at 0340 have caused this? Mover doesn't do anything unusual. Just moves files from cache to other disk(s). Sounds like a hardware issue. What is the exact model of your power supply? Link to comment
oryhara Posted June 17, 2015 Author Share Posted June 17, 2015 100% sure it's a Silencer Mk II 950W Mfr. PN PPCMK2S950 bought in 2011 I checked. Link to comment
oryhara Posted June 18, 2015 Author Share Posted June 18, 2015 Updated: Second crash in as many days. Exact text of kernel panic: Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x0 from 0xfffffffff81000000 (relocation range: 0xfffffffff80000000-0xffffffffff9fffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt transcribed from a picture on my cell phone, so I might have the wrong number of f's and 0's. Woke up this morning to another crash, this time with a kernel panic showing on the monitor attached to the raid. Kernel panic - not syncing: Fatal exception in interrupt. Link to comment
RobJ Posted June 19, 2015 Share Posted June 19, 2015 I suspect you may have a SAS card or other disk controller installed? Try making sure that you have the latest firmware for each disk controller, and that your BIOS is up to date. You may also want to pin down exactly when these panics occur, to see if they correlate with the mover or anything else. If nothing else works, then you may want to swap in another disk controller if that's doable, just to see if it's a specific card at fault. With an extra controller, you can move drives around, and see if the behavior changes with any combination. You say you have pictures, are there common code addresses listed between the different panics, and what are they? You can attach pictures here if you want. Link to comment
oryhara Posted June 19, 2015 Author Share Posted June 19, 2015 Here is a picture of the kernel panic. I have a PCI port multiplier SATA card, and 3 internal bridgeboards giving me 5 SATA ports each. My raid isn't yet full, but I designed it for up to 20 drives. 4 port multipliers, 5 drives each, and 4x 5into3 icydock sleds in a 12x 5 1/4" case. Would it be a good idea to trade that all for the X10SL7 motherboard and use SAS to sata cables to give me the ability to mount 24 sata drives? And stick it all in a NORCO RPC-4224? Link to comment
oryhara Posted June 19, 2015 Author Share Posted June 19, 2015 Oh, and this didn't happen when I disabled the cache drive. Which I did after the second kernel panic while rebuilding the failed drive in 5.0RC8 before I upgraded to 6.0 Link to comment
RobJ Posted June 19, 2015 Share Posted June 19, 2015 You can never draw any conclusions or even suspicions from a single data point, so all I can say is that this particular panic occurred during write-related I/O to a ReiserFS formatted disk, specifically during file management, possibly a file deletion. Now what we need is more data points, more panic pics, even if mental pics. Would you say that the attached picture is (1) exactly the same as all other panics, (2) very similar to the others (all Reiser disk write I/O), (3) similar (all have disk I/O), or (4) not at all alike? And roughly how many pics do you have, mental or camera? I assume you will be doing further testing without that Cache drive, to prove it only happens when it's connected? It's always nice to talk about new hardware, but in a way it's not relevant, if you have no idea yet what the real issue is, which component is faulty. Link to comment
oryhara Posted June 19, 2015 Author Share Posted June 19, 2015 Mental pictures would say they are all alike, at least insofar as Kernel panic - not syncing. I'll take another phone pic if it happens again tonight. I have left a terminal open tailing the syslog, and its last entry was at 313 AM, which led me to suspect the mover script and/or cache drive, but invoking it manually did not cause a problem. I need to move plex config off of the cache drive(installed to a folder named .plex) to another disk before i disable the cache drive, since its docker won't let me use a share for config. At least i think that is correct, i confess i did not fully understand the instructions for setting that up with docker. But it does work, and this crash was happening in version 5 so i think it is not related to my plex install, which i also had in version 5. I might also try moving the cache drive from a bridgeboard to the main motherboard's SATA port, but only if disabling the cache drive fixes this kernel panic. Link to comment
oryhara Posted June 22, 2015 Author Share Posted June 22, 2015 Update: Friday evening I disabled the cache drive. Saturday morning I awoke to an unclean shutdown detected. No screenshot. Sunday morning I saw this. Link to comment
oryhara Posted June 22, 2015 Author Share Posted June 22, 2015 Update: Monday morning I saw this: I also left a tail on the syslog, and the last entry was dated 0153 AM. I believe plex starts its routine maintenance at 2AM, so perhaps it is the culprit here. But this system worked for years with plex and the three other apps(sabnzbd, sickbeard, couchpotato), so something else has changed for it to now be failing. Link to comment
Zonediver Posted June 22, 2015 Share Posted June 22, 2015 Change your Power-Supply and test again. Link to comment
oryhara Posted June 22, 2015 Author Share Posted June 22, 2015 I don't have a spare lying around; so I'll have to buy one. Do you think I need a larger one? How much power would I need for 24 3.5" drives? Link to comment
archedraft Posted June 22, 2015 Share Posted June 22, 2015 I don't have a spare lying around; so I'll have to buy one. Do you think I need a larger one? How much power would I need for 24 3.5" drives? I have used this website in the past to get an idea. Keep in mind that gives you a minimum guestimation and I would error on the side of caution and get a bigger PSU than it tells you. http://www.extreme.outervision.com/psucalculatorlite.jsp Link to comment
oryhara Posted June 22, 2015 Author Share Posted June 22, 2015 According to that calculator, I need 671 watts for 20 drives. I've only got 16 in there now, and a 950W supply. The system is still on after these kernel panics, which leads me to doubt that the PSU is causing any problem. But its still under warranty, so I'll contact the manufacturer. Link to comment
dgaschk Posted June 22, 2015 Share Posted June 22, 2015 See check disk filesystems in my sig. Link to comment
oryhara Posted June 22, 2015 Author Share Posted June 22, 2015 Are those instructions still relevant in unraid 6.0.0? And do you want me to check every disk? Link to comment
dgaschk Posted June 22, 2015 Share Posted June 22, 2015 There is a GUI to do this but the instructions still work. Check all disks except parity. Link to comment
oryhara Posted June 23, 2015 Author Share Posted June 23, 2015 Running disks. 1-3 fine. Disk 4: Fatal corruptions were found, Semantic pass skipped 6 found corruptions can be fixed only when running with --rebuild-tree ########### reiserfsck finished at Mon Jun 22 22:36:54 2015 ########### spose I should run this with the --rebuild-tree option then? Link to comment
oryhara Posted June 23, 2015 Author Share Posted June 23, 2015 Should that be done before or after I check the rest of the (15) drives? or does it matter? In other news, the raid did not crash last night, but it was in maintenance mode, so I believe that does not count. Link to comment
oryhara Posted June 25, 2015 Author Share Posted June 25, 2015 Update: All disks have been checked. Disk 4 ran with rebuild-tree. do I need to check disk 4 again after the rebuild-tree? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.