Jump to content

Device is disabled, contents emulated - smart test OK


siege801

Recommended Posts

Hello everyone,

 

Can anyone please give some guidance on what steps I need to take with this server and array? The disk in question is a 12Tb Seagate IronWolf disk drive. There are three installed, two of which are parity drives, the third is this problematic drive. It's my plan to slowly swap out the 2Tb drives with more of these (or larger if price-point dictates).

 

Diagnostics attached.

 

Many thanks in advance.

lucindraid-diagnostics-20230926-1409.zip

Link to comment

It's not logged as a disk problem and the disk looks healthy, could be a power/connection problem or an issue with the LSI, I've been seeing some issues with LSI SAS2 controllers and large capacity Seagate drives, but since you have more identical drives, if that's the only one causing issues it's more likely power/connection.

Link to comment
  • 3 months later...

So I finally got around to this (newborn entered the house, and digital life went rightly out the window).

 

I pulled it open and checked all cables, re-seating specifically the cable connecting to the drive that was showing as disabled. Everything seemed fine.

I also added a new 2Tb disk to replace the missing disk.

Currently the array is rebuilding, but it seems to be stuck. The drive stats all seem to be remaining the same, and the percentage complete hasn't moved in over 30 minutes.

 

Updated diagnostics attached. Any guidance would be greatly appreciated.

lucindraid-diagnostics-20240114-1312.zip

Link to comment

Update #1: With the number of read errors on Parity disk #2, I figure that disk is not detecting correctly. I've powered down to re-check the cables again.

 

Update #2: Checked the cables on Parity #2, they seem fine. Booted back up and started the sync/rebuild, same issue with numerous read errors on Parity #2. I'm powering down now to swap cables with another disk.

 

Update #3: With the server up on the desk, I noticed what sounds like a disk repeatedly powering down / powering up coming from Parity #2. I've connected Parity #2 to a completely different power channel and the same sound occurs, and the same read errors occur. I'm beginning to suspect a failed disk. To be sure, I'll swap the data cable as well and try again.

 

Update #4: With the power cable swapped to a previously unused power channel, and the data cable swapped with a known-good line, the issue has remained with Parity Disk #2. Unless someone has suggestions, I can't think of any other cause than a failure on this disk. I'm going to pull it out of the server and connect it to a 3.5" disk dock and see if the power off/on sounds persist.

 

Update #5: Having connected the disk to a USB disk dock, it seems to be running stable. The disk is currently running a "long" smartctl test (smartctl -t long). The short test passed. I'm now beginning to suspect a power supply issue in the NAS.

 

Update #6: Sigh. I stopped the long test and connected the disk back to the NAS via a sata cable directly to the motherboard (not the LSI RAID card). It seems to be stable. Maybe I'm back to a RAID cable issue?

 

Update #7: I plugged the disk back into the LSI on the same port. Disk detects and can run a short-test through unRAID's GUI. However, attempting to run the rebuild again returns numerous read errors on the disk (Parity #2). So I powered down again and swapped the base of the raid cables on the LSI ports. (To clarify, the LSI has two ports for the 4-way split SATA cables. I've now swapped Cable A from LSI port 1, to be in LSI port 2). Still Parity Disk #2 gives read errors, despite being in a different raid port on the LSI. So, I then scrambled the SATA cables across four disks on the same RAID channel, booted up, and attempted the rebuild. The rebuild bombed out in seconds and immediately Parity disk #2 went offline. Can I be convinced the disk is failing/has failed yet?

 

Update #8: Ok, so the array rebuild completed with Parity Disk #2 offline. As a reminder, compared to where I started, Parity Disk #2 is now powered by a different power channel from the PSU, a different RAID port on the LSI, and is using a different SATA cable. I think I can be convinced the disk is as fault here?

 

 

HOWEVER... Having brought the array online, I'm faced with the new 2Tb drive, and the existing Disk #6 (12Tb) showing as Umountable: Unsupported or no filesystem. I have no idea how to safely proceed from here. My outstanding issues are:

 

  1. Disks 5 and 6 showing as unmountable.
  2. Parity Disk #2 status unknown. How can I determine that the disk is faulty surely enough for warranty?

 

For now, I've pulled the array back down and will await the wisdom of the wizards in here. Please help. Updated diagnostics attached.

 

 

lucindraid-diagnostics-20240114-1641.zip

Edited by siege801
Link to comment
6 hours ago, siege801 said:

Parity Disk #2 status unknown. How can I determine that the disk is faulty surely enough for warranty?

For this see if it can pass the Extended SMART test either on Unraid, or on another system.   Failing that should be enough for a RMA>

Link to comment
36 minutes ago, itimpi said:

For this see if it can pass the Extended SMART test either on Unraid, or on another system.   Failing that should be enough for a RMA>

Thanks! Based on the above, do you agree it's likely failing?

 

Also, do you have any input regarding the two unformatted disks?

Link to comment

Update #9: I seem to be missing significant amounts of data on the resumed array, which I assume is because I at least need either the 12Tb Parity #2 disk, or the other 12Tb array disk online and functional in order to rebuild the array with all data. I'm going to try connecting the Parity #2 disk direct to the motherboard's onboard SATA to see if I get read errors that way. Previously when I connected it this way (see update #6) I did not receive read errors, but given the slower interface, the rebuild was threatening to take days. Maybe that's what I'm forced endure?

 

Update #10: Parity Disk #2 shows Parity device is offline. I've got it running an extended SMART test as per the suggestion above. I'm going to drop in on the Discord group to see what advice I can get.

Edited by siege801
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...