(SOLVED) [v 6.9.0] Changed hardware (case, RAM, GPU), disks disabled and/or unmountable


Recommended Posts

I am not sure what's wrong; I've attached diagnostics.

 

I changed to a new case and am now having issues with my disks. I have reboot several times (using GUI). So far I've had at least one of these issues:

 

  • Disk 5 is always disabled.
  • Disks are missing
  • Disks are unmountable
  • Disks are read-only
  • Disks have read errors

 

Current setup:

 

Case: SilverStone CS380 [disks are now connected to a SATA backplane]

Motherboard: Gigabyte GA-Z97-HD3

CPU: Intel Core i5-4690K

RAM: 32GB DDR3-1866 CL10 (Kingston HyperX Fury 4x8gb) [changed from 2x8gb Corsair Vengeance LPX]

PSU: Solid Gear 650W

GPU: EVGA GeForce GTX 660 2 GB [added; there was no GPU in the system before]

Disks: 1x 8TB WD White Label (EMAZ), 4x 4TB WD Red, 1x 4TB WD Blue [this is the always-disabled disk]

 

When I changed the case, I also changed the RAM and added a GPU.

 

I confirmed that the drives are fully seated in the bays.

Changed to new SATA cables.

 

The parity check said it would take 11 days.

 

I'd appreciate any help troubleshooting.

 

The system is currently shutdown.

 

EDIT: Solved. Steps taken:

 

  1. Replaced power supply.
  2. Fixed Disk 5 unmountable state
  3. Rebuilt Disk 5 

 

 

 

Edited by mxcherryred
removed attached diagnostics; added writeup of solution
Link to comment
Apr 29 09:10:50 MissionCtrl kernel: ata1: link is slow to respond, please be patient (ready=0)
Apr 29 09:10:54 MissionCtrl kernel: ata1: COMRESET failed (errno=-16)
Apr 29 09:10:54 MissionCtrl kernel: ata1: hard resetting link
...
Apr 29 09:10:56 MissionCtrl kernel: ata2.00: cmd 61/18:b8:a0:2c:23/00:00:75:00:00/40 tag 23 ncq dma 12288 out
Apr 29 09:10:56 MissionCtrl kernel:         res 61/04:00:00:00:00/00:00:00:00:00/00 Emask 0x401 (device error) <F>
Apr 29 09:10:56 MissionCtrl kernel: ata2.00: status: { DRDY DF ERR }
Apr 29 09:10:56 MissionCtrl kernel: ata2.00: error: { ABRT }
Apr 29 09:10:56 MissionCtrl kernel: ata2.00: failed to read native max address (err_mask=0x1)
Apr 29 09:10:56 MissionCtrl kernel: ata2.00: HPA support seems broken, skipping HPA handling
Apr 29 09:10:56 MissionCtrl kernel: ata2.00: failed to enable AA (error_mask=0x1)
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Apr 29 09:10:56 MissionCtrl kernel: ata2.00: configured for UDMA/133 (device error ignored)
Apr 29 09:10:56 MissionCtrl kernel: ata2: EH complete
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: exception Emask 0x0 SAct 0xf800 SErr 0x0 action 0x0
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: irq_stat 0x40000008
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: failed command: READ FPDMA QUEUED
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: cmd 60/38:58:d0:b2:7f/00:00:95:01:00/40 tag 11 ncq dma 28672 in
Apr 29 09:10:56 MissionCtrl kernel:         res 61/04:00:00:00:00/00:00:00:00:00/00 Emask 0x401 (device error) <F>
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: status: { DRDY DF ERR }
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: error: { ABRT }
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: failed to read native max address (err_mask=0x1)
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: HPA support seems broken, skipping HPA handling
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: failed to enable AA (error_mask=0x1)
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Apr 29 09:10:56 MissionCtrl kernel: ata5.00: configured for UDMA/133 (device error ignored)
Apr 29 09:10:56 MissionCtrl kernel: ata5: EH complete
Apr 29 09:10:57 MissionCtrl kernel: ata3.00: exception Emask 0x0 SAct 0xa000 SErr 0x0 action 0x0
Apr 29 09:10:57 MissionCtrl kernel: ata3.00: irq_stat 0x40000008
Apr 29 09:10:57 MissionCtrl kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 29 09:10:57 MissionCtrl kernel: ata3.00: cmd 60/18:68:90:f7:ea/00:00:5f:01:00/40 tag 13 ncq dma 12288 in
Apr 29 09:10:57 MissionCtrl kernel:         res 61/04:00:00:00:00/00:00:00:00:00/00 Emask 0x401 (device error) <F>
Apr 29 09:10:57 MissionCtrl kernel: ata3.00: status: { DRDY DF ERR }
Apr 29 09:10:57 MissionCtrl kernel: ata3.00: error: { ABRT }

 

Errors on multiple disks, this is the onboard controller, so I would start with by testing with a different power supply.

Link to comment
Error 14272 occurred at disk power-on lifetime: 6896 hours (287 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 02 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 a0 08      00:33:19.293  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      00:33:19.293  IDENTIFY DEVICE
  ef 10 02 00 00 00 a0 08      00:33:19.287  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      00:33:19.287  IDENTIFY DEVICE

Agree. Multiple disks are reporting device fault. Odds of having this error on multiple disks are astronomical, to my knowledge.

 

Definitely suspect power supply. As further information though; what power supply are you using? I'm concerned because a huge percentage of power supplies are usually cheap trash garbage, especially if it came with a case. (Some cases come with good PSUs, sure, but most ....don't.)

Link to comment
On 4/29/2021 at 7:05 PM, codefaux said:

a huge percentage of power supplies are usually cheap trash garbage, especially if it came with a case

Definitely was cheap; came as part of a barebones/combo kit and was 6+ years old.

 

Purchased and installed a new PSU. Attached are the diagnostics with the new power supply.

 

Disk 5 is still disabled and unmountable. The SMART report shows no errors. 

 

Not getting any other errors. I think I can focus on repairing Disk 5. Can I repair and re-enable it?

Edited by mxcherryred
removed diagnostics; don't need help anymore
Link to comment

 

May  1 23:03:33 MissionCtrl kernel: XFS (md5): Corruption warning: Metadata has LSN (1:1871537) ahead of current LSN (1:1871467). Please unmount and run xfs_repair (>= v4.3) to resolve.

May have something to do with disk5 being disabled. My understanding is Disabled means it knows it should be able to mount, but cannot. This jives. Use the webUI to start filesystem checking for that disk, under Check Filesystem Status, if you can bring the array up in Maintenance mode. This inherently has a risk of data loss, as does any reparative filesystem check operation, but it's one of those "pay a data specialist or accept the loss" level of things. Typically it'll be a single file loses changes, or disappears. I have rarely had a file become internally inconsistent due to an xfs_repair operation.

 

All of the drives except two of them are showing SMART errors, which was part of what lead us to suspecting the power supply. If you can, make note of how many errors each disk has seen in the SMART error log section of the UI, you'll have to click the Show button to see it. I bring it up because once a disk has SMART reports, it can be easy to miss a few more.

 

If those don't increment and your logs are clean for a bit, sounds like you're stable again. Congratulations.

Link to comment
3 hours ago, mxcherryred said:

isk 5 is still disabled and unmountable. The SMART report shows no errors. 

These are two separate independent states (although they can be related) and have different recovery actions.
 

A disk is disabled because a write to it has failed and when this happens unRaid will stop writing to it and start ‘emulating’ it using the combination of the other drives plus parity.    Does unRaid say it is emulating this drive?   If so then this section of the online documentation (available via the ‘Manual’ link at the bottom of the unRaid GUI) is relevant.

 

an unmountable disk instead means that there is some level of file system corruption detected and handling this is covered here in the online documentation.  Note that if the disk is being emulated then carrying out a file system repair will be operating on the emulated disk - not the physical one.    This can be important as it leaves the physical disk untouched at this point which leaves other options for recovery available if the repair fails for some reason.  I would recommend that you should try and fix the unmountable state first.

Link to comment
On 5/2/2021 at 3:01 AM, itimpi said:

These are two separate independent states (although they can be related) and have different recovery actions.

Thank you for the clarification.

 

On 5/2/2021 at 3:01 AM, itimpi said:

I would recommend that you should try and fix the unmountable state first.

This rec was very helpful.

 

I was able to resolve the issues.

  • Like 1
Link to comment
  • mxcherryred changed the title to (SOLVED) [v 6.9.0] Changed hardware (case, RAM, GPU), disks disabled and/or unmountable

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.