Is my drive failing?

October 25, 201114 yr

I am very new to unraid so please bear with me.

Last night I attempted to do a parity check and immediately one of the disks became red. I ran a SMART test and the drive seemed fine. I decided to proceed with a rebuild. This morning I noticed there were several errors to the drive I rebuilt as well as one of the other drives. When I attempted to do a parity check again the drive immediately went back into red status. About half of my data is no longer listed in my shares and one share is entirely gone however the data still seems to be there on the remaining drives. Is that normal? I'm not sure what steps to take but here is my SMART file.

I am unsure how I should proceed.

EDIT:

Also i've tried to get my syslog but when I enter the commend: cp /var/log/syslog /boot/syslog.txt

I get the error:

cannot create regular file '/boot/syslog.txt': No space left on device

I assume it's trying to write to my flash drive and there is almost a gig of space so im uncertain what the problem is.

smart.txt

Quote

October 25, 201114 yr

Nothing wrong with that drive. If a drive is redballed, a write to that drive failed...

Post a syslog for further help

http://lime-technology.com/forum/index.php?topic=9880.0

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
 3 Spin_Up_Time            0x0027   163   163   021    Pre-fail  Always       -       6816
 4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       317
 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
 9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       14942
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       15
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       317
194 Temperature_Celsius     0x0022   119   109   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

Quote

October 25, 201114 yr

Author

Exactly what I needed to get my syslog, thanks!

syslog.zip

Quote

October 25, 201114 yr

Shut the server down and run check disk in the flash in a PC.

Quote

October 25, 201114 yr

Author

Your device or disc was successfully scanned:

No problems were found on the device or disc. It is ready to use.

Quote

October 25, 201114 yr

Other things to consider,

inadequate power supply or loose cabling.

Quote

October 25, 201114 yr

Author

I'll check all the cabling although I think it's fairly fool proof with my particular case. My power supply is more then ample for the number of drives but it is getting older and it's possible it could be failing. Upon reboot of the server the drive is now showing up with a blue ball. I'm not sure what that means...

Quote

October 25, 201114 yr

I am very new to unraid so please bear with me.

Last night I attempted to do a parity check and immediately one of the disks became red. I ran a SMART test and the drive seemed fine. I decided to proceed with a rebuild. This morning I noticed there were several errors to the drive I rebuilt as well as one of the other drives. When I attempted to do a parity check again the drive immediately went back into red status. About half of my data is no longer listed in my shares and one share is entirely gone however the data still seems to be there on the remaining drives. Is that normal? I'm not sure what steps to take but here is my SMART file.

I am unsure how I should proceed.

EDIT:

Also i've tried to get my syslog but when I enter the commend: cp /var/log/syslog /boot/syslog.txt

I get the error:

cannot create regular file '/boot/syslog.txt': No space left on device

I assume it's trying to write to my flash drive and there is almost a gig of space so I'm uncertain what the problem is.

MS-DOS is kind of brain-dead. You'll get the exact same message if there are too many files at the top level directory on the flash drive. (I seem to remember the FAT file system had a limit on that) "The number of root directory entries available for FAT12 and FAT16 is determined when the volume is formatted, and is stored in a 16-bit field"

Quote

October 25, 201114 yr

Author

Just to verify is my syslog looking ok?

Quote

October 25, 201114 yr

I'll check all the cabling although I think it's fairly fool proof with my particular case. My power supply is more then ample for the number of drives but it is getting older and it's possible it could be failing. Upon reboot of the server the drive is now showing up with a blue ball. I'm not sure what that means...

A blue ball indicates it thinks it is a new drive, which basically indicates the drive was NOT detected the last time you started the array.

Any case can have an intermittent connection... Sounds like you have one.

What specific make/model of power supply? How many "green" drives? How many non-green??

Joe L.

Quote

October 25, 201114 yr

Author

Corsair TX 750

There are 7 of the WD10EADS drives

Quote

October 25, 201114 yr

Author

Im also using 2 Supermicro AOC-SAT2-MV8 8-port SATA Controller and I just noticed one of them appears to not be powered up (no lights are on)

Also only 5 of my drives appear to be detected by the bios.

This is rather strange though as the drives still seemed operational within unraid at least on some level...

EDIT: Ahh NM it appears 5 of my drives are connected to one of the controllers and the other 2 directly to the motherboard. It just shows up strangely in the bios.

Quote

October 25, 201114 yr

Corsair TX 750

There are 7 of the WD10EADS drives

unlikely to be the power supply. Very likely to be connectors/splitters/drive trays, or heat/vibration sensitive drive and/or cables.

Quote

October 25, 201114 yr

Author

I've tried everything I can think in terms of connectors/hardware yet the hard drive still shows up as blue.

What would be my best bet to continue troubleshooting? I'm having a hard time identifying which harddrive is connected to which port. Is there an easy way to determine this? Can I just pull drives out and shuffle their order to determine if its specific connectors that have failed? As I have space for 13 more drives which are currently unused and I could move this particular drive to a different spot if I could identify it.

EDIT: Going to try and rebuild again. *fingers crossed*

Quote

October 26, 201114 yr

Does it have a serial number?

Quote

October 26, 201114 yr

I've tried everything I can think in terms of connectors/hardware yet the hard drive still shows up as blue.

What would be my best bet to continue troubleshooting? I'm having a hard time identifying which harddrive is connected to which port. Is there an easy way to determine this? Can I just pull drives out and shuffle their order to determine if its specific connectors that have failed? As I have space for 13 more drives which are currently unused and I could move this particular drive to a different spot if I could identify it.

EDIT: Going to try and rebuild again. *fingers crossed*

Blue indicates it thinks it is a new drive. (Basically, unRAID has forgotten the old serial number of the "old" drive when you rebooted when it was not responding. Now that it is responding once more, it is thought to be "new" and therefore shown as "blue")

I'd not shuffle disks around right now if you intend to reconstruct the "failed" drive, as odds are you'll get to where reconstruction is not possible.

Quote

October 26, 201114 yr

Author

Did the rebuild and everything appears to be OK.

Disk1 showed 116 errors but all my data and shares appear to be intact.

I ran a smart test on disk1 but I can't see any problems.

Running another parity check now but im uncertain if this will really do anything.

Still hoping my fiddling with all my connectors solved the issue.

Quote

October 26, 201114 yr

Does disk 1 currently show 114 errors? If so, post the smart report.

Quote

October 26, 201114 yr

Author

Smart report attached

reported 199 errors this time around on the drive, no errors during the parity check.

smart.txt

Quote

October 26, 201114 yr

This disk does not look healthy. It has 6 sectors pending reallocation. I would pre-clear it again and see how it does.

Quote

October 26, 201114 yr

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 6

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 5

Quote

October 26, 201114 yr

Author

I would pre-clear it again and see how it does.

I'm not entirely certain I know what you mean by this. Forgive my newbie ways but please explain

Quote

October 26, 201114 yr

Author

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 6

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 5

I take it this is bad?

Quote

October 26, 201114 yr

Looks like the beginning of a problem - and likely the errors that are showing on the disk from the unRAID UI. I've read here that disks have >1000 spare sectors...but once they start to go bad "pending/offline" - they don't usually stop. Each one of those pending sectors indicates a read or write operation failed to that sector. Once is a write is attempted to the pending sector, it will reallocate the sector, and mark the bad sector as "offline".

Quote

October 27, 201114 yr

It actually increases the reallocated sector count. Offline uncorrectable increase when an access fails; it is always increasing. Pending can go up but then decreases when a subsequent read succeeds and the sector in not remapped. If the subsequent access also fails then the sector is remapped on the next write and pending is decreased and reallocated goes up.

Quote

Is my drive failing?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)