NAS RUNNING UNRAID NOT WORKING drives seem to be good but can't access files


Justin_

Recommended Posts

So guys I am a bit freaked out because I can't access any of my files on my UnRAID (6.1.8 stable) NAS, all of the drives are reporting that they are good, and I can browse the file tree but if I try to open a file it won't work and I will just get an error. Hoping to get some help here. my NAS is constructed with a INTEL server board with 2X Intel Xeon E5340 CPU's, 4X4GB ECC buffered RAM and 2X1GB DDR2 ECC buffered RAM, 8X4TB drives and 2X2TB drives and 90GB Corsair FORCE SSD for chase drive, IO Crest 16 port SATA/SAS HBA SI-PEX40097 with a PCI slot blower cooling it. If I try to access the log files it just attempts to load them for ever (last photo). This system has been working just fine for the last ~6 months.

Capture.png

Capture6.png

Capture2.png

Capture3.png

Capture5.png

Capture4.png

Link to comment

You had an interrupt error and after that 8 of your disks dropped offline:

 

Sep 21 23:21:39 Tower kernel: Uhhuh. NMI received for unknown reason 21 on CPU 0.
Sep 21 23:21:39 Tower kernel: Do you have a strange power saving mode enabled?
Sep 21 23:21:39 Tower kernel: Dazed and confused, but trying to continue

 

Never seen this before, I'm sure Rob can provide more info, but rebooting should bring the array back, question is if this is a one time error or if it will happen again in the future.

Link to comment

System boots on Sep 13 at 5am.  Some comments:

 

* That 16 port board is actually 4 Marvell 9215's, with 4 ports each!

 

* The first (has the Corsair) and third (has Disk 5 and 6) are working, the other 2 are not, with 4 drives on each.  That's the 8 drives that have been dropped.

 

* There are numerous timing issues.  Check for a newer BIOS, yours is from 2009.

 

* Plex failed to install, seek help in the support thread for your Plex:

Sep 13 05:01:35 Tower emhttp: Installing Plex Media Server...

Sep 13 05:01:39 Tower emhttp: Install failed: Failed integrity test

 

* The Cache drive is mounted read-only, because it's formatted as NTFS, and mounting with the ntfs module, which only supports read-only.  Either it needs to be formatted with a supported unRAID file system, or it needs to be mounted with the ntfs-3g module, which does support read write operations.

 

* A parity check is started, due to an unclean shutdown.

 

* Network is bouncing up and down.  It loses and regains the connection multiple times.

 

* Later, the Mover checks the Cache drive, and from the folders found, this is a Windows system drive!  This should not be your Cache drive, and since it isn't working, unassign it!

 

* System runs fine until Sep 21 at 11:21pm, when major trouble happens with the machine:

Sep 21 23:21:39 Tower kernel: Uhhuh. NMI received for unknown reason 21 on CPU 0.

Sep 21 23:21:39 Tower kernel: Do you have a strange power saving mode enabled?

Sep 21 23:21:39 Tower kernel: Dazed and confused, but trying to continue

Sep 21 23:22:06 Tower kernel: ata22.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Sep 21 23:22:06 Tower kernel: ata22.00: failed command: READ DMA EXT

Sep 21 23:22:06 Tower kernel: ata22.00: cmd 25/00:40:68:12:c6/00:05:0e:01:00/e0 tag 13 dma 688128 in

Sep 21 23:22:06 Tower kernel:        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Sep 21 23:22:06 Tower kernel: ata22.00: status: { DRDY }

Sep 21 23:22:06 Tower kernel: ata22: hard resetting link

 

* You can completely ignore the syslog from that point on!  All of the errors are because the 2 controllers appear to have failed, and their 8 drives became inaccessible, causing all of the errors.

 

* The good news is all of your data should be fine, and all of your drives are fine, once we can talk to them again.  Usually a reboot (power off then boot again) will fix everything.

 

* The bad news is something serious may be wrong with the power or CPU or motherboard.  A BIOS update may improve things.  Check also for a firmware update for the '16 port' card.  And run a long Memtest from the boot menu, suggested time a good 24 hours.  If none of those help, then you may need to look into a new power supply or a new motherboard and CPU.

 

* You have 6 great SATA ports on the motherboard, none of which you are using.  They are your best ports, why not use them?

 

* You have another port or 2 on the motherboard available, but it would be best if: on next boot go into the BIOS settings and change the SATA support to be either AHCI if it's there, or a native SATA mode, anything but the IDE emulating mode it is in now.

Link to comment

Ok thanks, ill try those steps now. Why is is that UnRAID is still reporting that the discs are good if it can't access them?

 

That's an often noticed problem, does create confusion.  Modern operating systems work in layers, with considerable complexity hidden in lower layers.  The unRAID module on top has not yet discovered the drives are 'missing', as that was all handled at much lower levels.  It's more complex than that, but that's the simplistic answer.

Link to comment

Ok but how would I go about that? the ssd doesn't show up as a separate drive that  I Can just  grab files off of in explorer

Normally no drives show up, only user shares to avoid the User Share Copy Bug.

 

mc (Midnight Commander) is the simplest way to move files around. No need to get your PC and network involved in moving files between disks on the server. Be sure you don't mix disks and user shares when moving/copying files. Just copy from /mnt/cache to /mnt/disk#

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.