Justin_ Posted September 23, 2016 Share Posted September 23, 2016 So guys I am a bit freaked out because I can't access any of my files on my UnRAID (6.1.8 stable) NAS, all of the drives are reporting that they are good, and I can browse the file tree but if I try to open a file it won't work and I will just get an error. Hoping to get some help here. my NAS is constructed with a INTEL server board with 2X Intel Xeon E5340 CPU's, 4X4GB ECC buffered RAM and 2X1GB DDR2 ECC buffered RAM, 8X4TB drives and 2X2TB drives and 90GB Corsair FORCE SSD for chase drive, IO Crest 16 port SATA/SAS HBA SI-PEX40097 with a PCI slot blower cooling it. If I try to access the log files it just attempts to load them for ever (last photo). This system has been working just fine for the last ~6 months. Quote Link to comment
RobJ Posted September 23, 2016 Share Posted September 23, 2016 What you omitted are your diagnostics! Please see Need help? Read me first!, and attach the diagnostics zip. Quote Link to comment
Justin_ Posted September 23, 2016 Author Share Posted September 23, 2016 Ok here is the diagnostics file (on onedrive as it is too big to attach) https://1drv.ms/u/s!Ana78VDArzTrnzDt-XPucAsk062E Quote Link to comment
JorgeB Posted September 23, 2016 Share Posted September 23, 2016 You had an interrupt error and after that 8 of your disks dropped offline: Sep 21 23:21:39 Tower kernel: Uhhuh. NMI received for unknown reason 21 on CPU 0. Sep 21 23:21:39 Tower kernel: Do you have a strange power saving mode enabled? Sep 21 23:21:39 Tower kernel: Dazed and confused, but trying to continue Never seen this before, I'm sure Rob can provide more info, but rebooting should bring the array back, question is if this is a one time error or if it will happen again in the future. Quote Link to comment
RobJ Posted September 23, 2016 Share Posted September 23, 2016 System boots on Sep 13 at 5am. Some comments: * That 16 port board is actually 4 Marvell 9215's, with 4 ports each! * The first (has the Corsair) and third (has Disk 5 and 6) are working, the other 2 are not, with 4 drives on each. That's the 8 drives that have been dropped. * There are numerous timing issues. Check for a newer BIOS, yours is from 2009. * Plex failed to install, seek help in the support thread for your Plex: Sep 13 05:01:35 Tower emhttp: Installing Plex Media Server... Sep 13 05:01:39 Tower emhttp: Install failed: Failed integrity test * The Cache drive is mounted read-only, because it's formatted as NTFS, and mounting with the ntfs module, which only supports read-only. Either it needs to be formatted with a supported unRAID file system, or it needs to be mounted with the ntfs-3g module, which does support read write operations. * A parity check is started, due to an unclean shutdown. * Network is bouncing up and down. It loses and regains the connection multiple times. * Later, the Mover checks the Cache drive, and from the folders found, this is a Windows system drive! This should not be your Cache drive, and since it isn't working, unassign it! * System runs fine until Sep 21 at 11:21pm, when major trouble happens with the machine: Sep 21 23:21:39 Tower kernel: Uhhuh. NMI received for unknown reason 21 on CPU 0. Sep 21 23:21:39 Tower kernel: Do you have a strange power saving mode enabled? Sep 21 23:21:39 Tower kernel: Dazed and confused, but trying to continue Sep 21 23:22:06 Tower kernel: ata22.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 21 23:22:06 Tower kernel: ata22.00: failed command: READ DMA EXT Sep 21 23:22:06 Tower kernel: ata22.00: cmd 25/00:40:68:12:c6/00:05:0e:01:00/e0 tag 13 dma 688128 in Sep 21 23:22:06 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 21 23:22:06 Tower kernel: ata22.00: status: { DRDY } Sep 21 23:22:06 Tower kernel: ata22: hard resetting link * You can completely ignore the syslog from that point on! All of the errors are because the 2 controllers appear to have failed, and their 8 drives became inaccessible, causing all of the errors. * The good news is all of your data should be fine, and all of your drives are fine, once we can talk to them again. Usually a reboot (power off then boot again) will fix everything. * The bad news is something serious may be wrong with the power or CPU or motherboard. A BIOS update may improve things. Check also for a firmware update for the '16 port' card. And run a long Memtest from the boot menu, suggested time a good 24 hours. If none of those help, then you may need to look into a new power supply or a new motherboard and CPU. * You have 6 great SATA ports on the motherboard, none of which you are using. They are your best ports, why not use them? * You have another port or 2 on the motherboard available, but it would be best if: on next boot go into the BIOS settings and change the SATA support to be either AHCI if it's there, or a native SATA mode, anything but the IDE emulating mode it is in now. Quote Link to comment
RobJ Posted September 23, 2016 Share Posted September 23, 2016 I see Johnnie beat me! I also should have added that it is possible that it's the 16 port card that issued that NMI, and that it may be failing, not the other items I mentioned. But at this point, I can't tell which one caused it. Quote Link to comment
Justin_ Posted September 23, 2016 Author Share Posted September 23, 2016 Ok thanks, ill try those steps now. Why is is that UnRAID is still reporting that the discs are good if it can't access them? Quote Link to comment
Justin_ Posted September 24, 2016 Author Share Posted September 24, 2016 Ok a quick reboot and everything seems to be functioning, I will be keeping an eye on it though. Thanks everyone! Quote Link to comment
RobJ Posted September 24, 2016 Share Posted September 24, 2016 Ok thanks, ill try those steps now. Why is is that UnRAID is still reporting that the discs are good if it can't access them? That's an often noticed problem, does create confusion. Modern operating systems work in layers, with considerable complexity hidden in lower layers. The unRAID module on top has not yet discovered the drives are 'missing', as that was all handled at much lower levels. It's more complex than that, but that's the simplistic answer. Quote Link to comment
Justin_ Posted September 24, 2016 Author Share Posted September 24, 2016 I do have one more question, do I need to make UnRAID move the files off the cache drive before I reformat it? Quote Link to comment
RobJ Posted September 24, 2016 Share Posted September 24, 2016 I do have one more question, do I need to make UnRAID move the files off the cache drive before I reformat it? You would need to transfer the files yourself. Unless they are all files that the Mover would move, then run the Mover. Quote Link to comment
Justin_ Posted September 25, 2016 Author Share Posted September 25, 2016 Ok but how would I go about that? the ssd doesn't show up as a separate drive that I Can just grab files off of in explorer Quote Link to comment
trurl Posted September 25, 2016 Share Posted September 25, 2016 Ok but how would I go about that? the ssd doesn't show up as a separate drive that I Can just grab files off of in explorer Normally no drives show up, only user shares to avoid the User Share Copy Bug. mc (Midnight Commander) is the simplest way to move files around. No need to get your PC and network involved in moving files between disks on the server. Be sure you don't mix disks and user shares when moving/copying files. Just copy from /mnt/cache to /mnt/disk# Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.