Getting lot of Errors...maybe causing hard drives to redball and/or unformat?


Recommended Posts

unraid pro 5.0.5 - 4gb memory - 2.93 ghz processor - mobo = Asus P5G41T-M LX PLUS

 

 

So I've been using unraid since 2011 until probably couple months ago flawlessly. Didn't have to touch it. Just would keep humming and no shutdown. Since couple months ago keep getting red ball / orange ball drives. I'm a computer tech so getting drives is not a prob.  Now my 2tb RED NAS drive is red balled and just got it not too long ago.  It can't be bad so i think its something else. On bootup i noted my flash drive was corrupted. fixed it but still getting lot of weird errors i have never seen before like { DRDY ERR } and looks like memory errors???  Little further investigation says my 2 port multiplier cards may be causing silent corruption. Maybe this is why this is happening? If so i will just get a better port multiplier, but would have to be 8 port since there is only 4 on my motherboard. 

 

Errors never seen before:

Aug 26 18:28:41 Tower kernel: ata6.00: status: { DRDY ERR }

Aug 26 18:28:41 Tower kernel: ata6.00: error: { ABRT }

Aug 26 18:28:41 Tower kernel: ata6.00: hard resetting link

Aug 26 18:28:41 Tower kernel: ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 310)

Aug 26 18:28:41 Tower kernel: ata6.00: configured for UDMA/133

Aug 26 18:28:41 Tower kernel: ata6: EH complete

Aug 26 18:28:41 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Aug 26 18:28:41 Tower kernel: ata6.00: failed command: READ DMA

Aug 26 18:28:41 Tower kernel: ata6.00: cmd c8/00:08:48:00:00/00:00:00:00:00/e0 tag 0 dma 4096

syslog.zip

Link to comment

Disk 7 and Disk 9 were unable to mount because no Reiser file system was found on either.  However, their situations were very different.  Disk 7 (on the motherboard) looks good, very good SMART report, and no disk errors at all.  Disk 9 (on the second port multiplier) looks bad, and was the cause of all of the disk errors.  Its SMART report is very odd, very incomplete, indicates it doesn't support most of SMART, so there aren't any SMART attributes to examine.  The error flags seem to indicate either a bad drive (corrupted firmware) or a malfunction in the controller interface to the drive.

 

I do have one recommendation, I see ata_piix being used, which usually means that in your BIOS SATA settings, you have SATA mode set to IDE emulation.  I strongly urge you to change that to AHCI, anything but an IDE emulation mode.  It should be slightly faster, and a little safer.

Link to comment

I'm glad you say that about disk7 because thats the Red Nas drive i really love. It is interesting you say that. I went to run a reiserfsck on all but my parity drive and all of them said FAILED TO OPEN FILESYSTEM and recommended i run --rebuild-sb.  I think it could be my port multiplying cards. If i do get rid of them to one of the ones on the known good list how do i fix this or do you think it will be fine once i get rid of the port multipliers?

 

 

note- Forgot to mention earlier these are the 2 port muliplier cards i am using SYBA SY-PCI40037 SATA II (3.0Gb/s) 1:5 (5x1) Port Multiplier PCI Mounting Card . 

Link to comment

since sda is usb and sdb is parity

 

reiserfsck --check /dev/sdc.......... all the way up to sdk

Preferred is /dev/md?, optional (can invalidate parity) is /dev/sd?1, NEVER /dev/sd?.

The file system as unraid sets it up resides on the first partition, which would in your case have been /dev/sdc1. I hope you didn't actually run any other reiserfsck commands besides --check.

Link to comment

just started running --check now. guess i should've started with /dev/md1.  Like i said its been running pretty good and really haven't had to do much like this. Unraid is unlike alot that i deal with. Good thing for these forums and people like you.  Was there any other errors like memory? Should i get a different port multiplier? was thinking about this one on the list of working ones.  SYBA SI-PEX40071..

Link to comment

I don't see any evidence of memory issues.  The only thing that is suspicious about the second port multiplier is the type of disk errors associated with Disk 9, and it's in no way conclusive.  You might shift drives around, particularly Disk 7 and Disk 9, to see if the problems follow the drive or stay with the port.

 

For various reasons, port multipliers tend to be the last choice for adding SATA ports, because of the inherent bottleneck.  The SYBA SI-PEX40071 looks good, but do check the Newegg reviews, it isn't compatible with everything.

 

How is the reiserfsck check of Disk 7 and Disk 9?  And can you get a decent SMART report for Disk 9?

Link to comment

since sda is usb and sdb is parity

 

reiserfsck --check /dev/sdc.......... all the way up to sdk

this would always fail.  If you are going to use the raw devices then you need to use

 

reiserfsck --check /dev/sdc1.......... all the way up to sdk1

 

However as been mentioned this will destroy parity, so you should put the array into maintenance mode and use something like

 

reiserfsck --check /dev/md1

 

which maintains parity.  I think it MIGHT be possible to run the --check option without putting the array into maintenance mode but I am not sure.

 

 

Link to comment

since sda is usb and sdb is parity

 

reiserfsck --check /dev/sdc.......... all the way up to sdk

and those are the INCORRECT names.  They will never in unRAID be the correct names to use for a reiserfsck command.

 

If you had run the --rebuild-sb command on those device names, you would have corrupted the file-systems (possibly beyond repair)

Link to comment

duly noted. i don't know what i was thinking. i will not run reiserfsck command like that again.  Ran reiserfsck the correct way and all disks were free of corruption except disk 7 and disk9. no suprises there. The text went a little like this....picture of actual message attached along with smart reports.

 

The problem has occured looks like a hardware problem. if you have bad blocks, we advise you to get a new hard drive...etc

error.jpg.56d1ef7534835275b830ee40253e296d.jpg

disk7.txt

disk9.txt

Link to comment
  • 2 weeks later...

so i pulled disk 7 and disk 9 out and ran seagate tools for dos on both of them. they both passed tests and smart tests on a different computer i used. What do i do at this point? focus first on the red balled disk7 or the orange unformatted disk9? i know the array was good before both these drives suddenly went south so i am not worried about parity being wrong or anything like that. Should i take the red balled disk7 out of the array, start it, stop it, and then readd disk7 and hopefully it will rebuild disk 9 once that is green again?  Or should i just wait until i get a new sata controller? I'm just assuming its that since i never seen any of the errors on boot and takes forever to boot now.

 

thanks,

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.