[SOLVED] How to interpret Disk Status

November 23, 201510 yr

Got an email from my server this morning indicating that Disk 8 is having problems. Main shows the following:

http://my.jetscreenshot.com/12412/20151123-ir3q-124kb.jpg

Details on the drive shows the following: http://my.jetscreenshot.com/12412/20151123-h3e4-91kb.jpg

Not able to run Smart on the drive from within the console.

SMART report follows:

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.0.4-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               /1:0:4:0
Product:              
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
Physical block size:  3807568608 bytes
Lowest aligned LBA:   14896
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Assuming my only response is to replace the drive? Would taking it offline and reformat do anything beneficial?

Thanks!

Jeff...

Quote

November 23, 201510 yr

Community Expert

The disk is marked as disabled because a write to it failed. Once that happens then it stays in that state until you take appropriate recovery action.

It is possible the disk dropped offline due to a cabling/power problem. If so rebooting the system should bring it back online and give you a chance to get a SMART status. If you still have problems then the drive has probably really failed and needs replacing.

Quote

November 24, 201510 yr

Author

Good call. Rebooted and the disk became responsive to SMART.

SMART ERROR LOG:

ATA Error Count: 327 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 327 occurred at disk power-on lifetime: 18071 hours (752 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 07 00 00 00  Error: ICRC, ABRT at LBA = 0x00000007 = 7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 07 00 00 40 ff      00:09:34.607  READ FPDMA QUEUED
  60 08 00 00 00 00 40 00      00:09:34.601  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:09:34.601  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:09:34.601  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:09:34.600  IDENTIFY DEVICE

Error 326 occurred at disk power-on lifetime: 18071 hours (752 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 07 00 00 00  Error: ICRC, ABRT at LBA = 0x00000007 = 7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 07 00 00 40 ff      00:09:34.297  READ FPDMA QUEUED
  60 08 00 00 00 00 40 00      00:09:34.260  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:09:31.588  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:09:31.588  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:09:31.587  IDENTIFY DEVICE

Error 325 occurred at disk power-on lifetime: 18071 hours (752 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 00 00      00:08:32.395  IDENTIFY DEVICE
  b0 da 00 00 4f c2 40 00      00:08:27.064  SMART RETURN STATUS
  25 00 01 ae 88 e0 40 00      00:08:26.622  READ DMA EXT
  ef 03 42 00 00 00 40 00      00:08:22.208  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 00 00      00:08:22.207  IDENTIFY DEVICE

Error 324 occurred at disk power-on lifetime: 18071 hours (752 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 00 00      00:07:30.212  IDENTIFY DEVICE
  b0 da 00 00 4f c2 40 00      00:07:24.880  SMART RETURN STATUS
  25 00 01 ae 88 e0 40 00      00:07:24.478  READ DMA EXT
  ef 03 42 00 00 00 40 00      00:07:20.045  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 00 00      00:07:20.044  IDENTIFY DEVICE

Error 323 occurred at disk power-on lifetime: 18071 hours (752 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 00 00      00:05:40.721  IDENTIFY DEVICE
  b0 da 00 00 4f c2 40 00      00:05:35.420  SMART RETURN STATUS
  25 00 01 ae 88 e0 40 00      00:05:34.977  READ DMA EXT
  ef 03 42 00 00 00 40 00      00:05:30.559  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 00 00      00:05:30.558  IDENTIFY DEVICE

Ran a long and short test on the drive. No errors reported.

http://my.jetscreenshot.com/12412/20151124-lx58-35kb.jpg

Should I pull the drive, do a low-level format and reinsert into the array and let it get rebuilt?

Quote

November 24, 201510 yr

Community Expert

That SMART report you posted only contains part of the SMART information - we really want to see the current values of the SMART attributes on all your drives to see if any look suspicious. A syslog is also a good idea to see if any errors are being reported. If you post the file produced by Tools->Diagnostics it will include both of these (plus other useful diagnostic information).

A low level format is unlikely to help on modern drives so not much point in doing that. After checking the diagnostics output, if that all looks OK, then one can force a disk to rebuild onto itself by the following process

stop the array and unassigned the problem disk
start the array with the disk unassigned. You should get a warning about the array being unprotected but it should start OK. The purpose of this step is to make unRAID 'forget' the serial number of the problem disk
stop the array and reassign the problem disk. UnRAID will now treat the disk as if it was a new replacement disk and tell you that starting the array will trigger a rebuild
start the array and let the rebuild go ahead

Quote

November 24, 201510 yr

Author

Ah, sorry, thought I was including all that was needed. I've run the collection process. File attached.

hunternas-diagnostics-20151124-1048.zip

Quote

December 3, 201510 yr

Author

Ok, so new drives arrived today (3 TB). Shutdown server, pulled the old drive and inserted the new drive in the appropriate slot. When the server rebooted, it started a rebuild without my asking it to! I stopped the rebuild, shutdown the array and tried to do a preclear. Preclear could not see the disk (preclear_disk.sh -l). I get:

root@HunterNAS:/boot# pc.sh -l
====================================1.15
Disks not assigned to the unRAID array
  (potential candidates for clearing)
========================================
No un-assigned disks detected

I rebooted the server, pulled the new drive and rebooted the server. Then I shut down the server again, inserted the new drive and rebooted.

This is the Array Devices showing Disk 8 unassigned.

http://my.jetscreenshot.com/12412/20151203-8dis-68kb.jpg

With the Array off-line you see the new drive in the unassigned slot

http://my.jetscreenshot.com/12412/20151203-kx81-31kb.jpg

But when I open the preclear plugin no drive shows up

http://my.jetscreenshot.com/12412/20151203-gic9-38kb.jpg

And when I do a preclear_disk.sh -l I see the following:

http://my.jetscreenshot.com/12412/20151203-lwhd-20kb.jpg

So I can't preclear the disk.

However, if the disk gets assigned to Disk 8, then a rebuild is possible.

I must be missing something basic...

Quote

December 3, 201510 yr

Author

Update: I used the second drive with the array stopped (so it would auto rebuild) and was able to get the disk found. Evidently the combination of events I was doing did not allow Unraid to forget the drive.

Is there a step by step process for replacing a drive where you have only one server and need to preclear the drive on the server that its being replaced on?

Here's what I think the steps are...

1. I have Drive Bays, so I can turn off power to each drive. So turn off power on affected drive. Originally I powered down the server, this was my failure point, because I have autostart of the arrac in the go file.

2. Replace bad drive with new drive.

3. Power on drive

4. Load GUI, navigate to settings>Preclear Disk, the new drive is displayed. Alternatively, you can go old school and telnet to run preclear_disk.sh

5. When preclear finishes, goto Main>Array Operation, stop the array

6. Goto Main>Array Devices, assign the drive to the failed disk.

7. Start the array (and be happy you didn't lose a second drive during this whole process! And also be happy that the next versions of Unraid have 2 disk tolerance in the roadmap! )

Correct? If you don't have the ability to power down a drive individually, I supposed you'd have to edit the go file and remove cd /boot/unmenu & /uu??

Quote

December 3, 201510 yr

... When the server rebooted, it started a rebuild without my asking it to!

Not sure what you did but ... unRAID will NEVER rebuild a replacement disk automatically. You need to tell it which 'new' disk you want to use, this assignment is done when the array is off-line, and once you bring the array on-line it starts the rebuild process. These are all steps a user has to start/take.

Quote

December 3, 201510 yr

Author

Yep, sorry, I rushed documenting that step. I powered up and assigned the drive - and when I did that, it started to rebuild, then I realized that I needed to preclear first. After I had unassigned the drive, then I could not do anything with it other than let the rebuild go.

Quote

December 3, 201510 yr

Community Expert

... I have autostart of the arrac in the go file.

Autostart is in Disk Settings. It is not in go. Starting the webUI is in go, and the webUI will start the array if it is set to autostart.

...

I supposed you'd have to edit the go file and remove cd /boot/unmenu & /uu??

Those lines in go are starting unMenu, which has nothing to do with any of this. The emhttp line is the line that starts the webUI. You don't want to remove that line!

Quote

December 4, 201510 yr

Author

Duh, of course on both points. Once I got the server going it's been flawless...how quickly we forget...

Quote

[SOLVED] How to interpret Disk Status

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)