Jump to content
We're Hiring! Full Stack Developer ×

Is my drive failing?


Recommended Posts

I am very new to unraid so please bear with me.

 

Last night I attempted to do a parity check and immediately one of the disks became red. I ran a SMART test and the drive seemed fine. I decided to proceed with a rebuild. This morning I noticed there were several errors to the drive I rebuilt as well as one of the other drives. When I attempted to do a parity check again the drive immediately went back into red status. About half of my data is no longer listed in my shares and one share is entirely gone however the data still seems to be there on the remaining drives. Is that normal? I'm not sure what steps to take but here is my SMART file.

 

I am unsure how I should proceed.

 

EDIT:

Also i've tried to get my syslog but when I enter the commend: cp /var/log/syslog /boot/syslog.txt

I get the error:

cannot create regular file '/boot/syslog.txt': No space left on device

 

I assume it's trying to write to my flash drive and there is almost a gig of space so im uncertain what the problem is.

smart.txt

Link to comment

Nothing wrong with that drive. If a drive is redballed, a write to that drive failed...

 

Post a syslog for further help

http://lime-technology.com/forum/index.php?topic=9880.0

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
 3 Spin_Up_Time            0x0027   163   163   021    Pre-fail  Always       -       6816
 4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       317
 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
 9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       14942
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       15
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       317
194 Temperature_Celsius     0x0022   119   109   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

Link to comment

I'll check all the cabling although I think it's fairly fool proof with my particular case. My power supply is more then ample for the number of drives but it is getting older and it's possible it could be failing. Upon reboot of the server the drive is now showing up with a blue ball. I'm not sure what that means...

Link to comment

I am very new to unraid so please bear with me.

 

Last night I attempted to do a parity check and immediately one of the disks became red. I ran a SMART test and the drive seemed fine. I decided to proceed with a rebuild. This morning I noticed there were several errors to the drive I rebuilt as well as one of the other drives. When I attempted to do a parity check again the drive immediately went back into red status. About half of my data is no longer listed in my shares and one share is entirely gone however the data still seems to be there on the remaining drives. Is that normal? I'm not sure what steps to take but here is my SMART file.

 

I am unsure how I should proceed.

 

EDIT:

Also i've tried to get my syslog but when I enter the commend: cp /var/log/syslog /boot/syslog.txt

I get the error:

cannot create regular file '/boot/syslog.txt': No space left on device

 

I assume it's trying to write to my flash drive and there is almost a gig of space so I'm uncertain what the problem is.

MS-DOS is kind of brain-dead.  You'll get the exact same message if there are too many files at the top level directory on the flash drive. (I seem to remember the FAT file system had a limit on that) "The number of root directory entries available for FAT12 and FAT16 is determined when the volume is formatted, and is stored in a 16-bit field"
Link to comment

I'll check all the cabling although I think it's fairly fool proof with my particular case. My power supply is more then ample for the number of drives but it is getting older and it's possible it could be failing. Upon reboot of the server the drive is now showing up with a blue ball. I'm not sure what that means...

A blue ball indicates it thinks it is a new drive, which basically indicates the drive was NOT detected the last time you started the array.

 

Any case can have an intermittent connection... Sounds like you have one.

 

What specific make/model of power supply?   How many "green" drives?  How many non-green??

 

Joe L.

Link to comment

Im also using 2 Supermicro AOC-SAT2-MV8 8-port SATA Controller and I just noticed one of them appears to not be powered up (no lights are on)

 

Also only 5 of my drives appear to be detected by the bios.

 

This is rather strange though as the drives still seemed operational within unraid at least on some level...

 

EDIT: Ahh NM it appears 5 of my drives are connected to one of the controllers and the other 2 directly to the motherboard. It just shows up strangely in the bios.

Link to comment

I've tried everything I can think in terms of connectors/hardware yet the hard drive still shows up as blue.

 

What would be my best bet to continue troubleshooting? I'm having a hard time identifying which harddrive is connected to which port. Is there an easy way to determine this? Can I just pull drives out and shuffle their order to determine if its specific connectors that have failed? As I have space for 13 more drives which are currently unused and I could move this particular drive to a different spot if I could identify it.

 

EDIT: Going to try and rebuild again. *fingers crossed*

Link to comment

I've tried everything I can think in terms of connectors/hardware yet the hard drive still shows up as blue.

 

What would be my best bet to continue troubleshooting? I'm having a hard time identifying which harddrive is connected to which port. Is there an easy way to determine this? Can I just pull drives out and shuffle their order to determine if its specific connectors that have failed? As I have space for 13 more drives which are currently unused and I could move this particular drive to a different spot if I could identify it.

 

EDIT: Going to try and rebuild again. *fingers crossed*

Blue indicates it thinks it is a new drive.  (Basically, unRAID has forgotten the old serial number of the "old" drive when you rebooted when it was not responding.  Now that it is responding once more, it is thought to be "new" and therefore shown as "blue")

 

I'd not shuffle disks around right now if you intend to reconstruct the "failed" drive, as odds are you'll get to where reconstruction is not possible.

 

Link to comment

Did the rebuild and everything appears to be OK.

 

Disk1 showed 116 errors but all my data and shares appear to be intact.

 

I ran a smart test on disk1 but I can't see any problems.

 

Running another parity check now but im uncertain if this will really do anything.

 

Still hoping my fiddling with all my connectors solved the issue.

Link to comment

Looks like the beginning of a problem - and likely the errors that are showing on the disk from the unRAID UI.  I've read here that disks have >1000 spare sectors...but once they start to go bad "pending/offline" - they don't usually stop.  Each one of those pending sectors indicates a read or write operation failed to that sector.  Once is a write is attempted to the pending sector, it will reallocate the sector, and mark the bad sector as "offline".

Link to comment

It actually increases the reallocated sector count. Offline uncorrectable increase when an access fails; it is always increasing. Pending can go up but then decreases when a subsequent read succeeds and the sector in not remapped. If the subsequent access also fails then the sector is remapped on the next write and pending is decreased and reallocated goes up.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...