Jump to content

[SOLVED] disc error appears


Recommended Posts

hey everyone.

 

after running my server for weeks now - without any trouble

today i moved it to another position and now (dont know, if its a consequence of the moving), my syslog looks like 3000 lines filled with:

Jan 8 22:51:14 Tower kernel: mdcmd (5913): spindown 2

Jan 8 22:51:14 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5

 

its unRAID 4.5.6

disc temperature is "0°C", therefore its maybe possible that a cable lacks of connection?

over 500 errors for disc operations are shown up in the disc management for this specific disc.

 

can someone translate the error message for me please?

 

thanks

 

 

summary:

tons of errors were related to loose sata cable connections within my unraid server.

i wasn't aware of sata cables with locking abilities. they solved all problems at once.

interessting find: the hdd with the really obvious loose cable connection worked fine.

problems occured at 2 hdds that didnt looked like loose connected.

scroll down to my last post for further informations

Link to comment

When you browse the main web page, do any of the drives have a RED ball next to them? If so, then a write to the drive failed for whatever reason and is being simulated by the rest of the array and parity.

 

It could be because of loose drive cables, either power or data. Or quite possibly the drive failed during/after the move. Sometimes questionable electronics will continue to work but then fail on the next power cycle.

Link to comment

good morning.

 

yes, those are my theories, too. and yes, a red light in front of the device.

i did some forum research and it seems it is a cable issue. well at least its the thing, you all always suggest in this case.

 

i unplugged/replugged the cables of all drives (some connections are very loose and seem to fall out even by doing nothing),

i remember, i have had a similar problem some month ago. all because of a loose cable.

 

but today this problem still exists and it looks like i have to reboot and unplug/replug again.

has anyone good tips for loose cables? is it usually the connector or the cable? is there any advice?

on my drives all cables are in a solid connection. but my mainboard and especially my sata controller have very bad connectors. well maybe its the cable. i dont know

 

i really dont want to rebuild my drive once in a year, only because of the cable connection

 

 

here is a list of the red error entrys in my syslog after a fresh start.

maybe something other comes up:

 

Jan 9 12:04:44 Tower kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU1._OSC] (Node f740e198), AE_ALREADY_EXISTS

Jan 9 12:04:44 Tower kernel: ACPI: Marking method _OSC as Serialized because of AE_ALREADY_EXISTS error

Jan 9 12:04:44 Tower kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU1._PDC] (Node f740e180), AE_ALREADY_EXISTS

Jan 9 12:04:44 Tower kernel: ACPI: Marking method _PDC as Serialized because of AE_ALREADY_EXISTS error

Jan 9 12:04:44 Tower kernel: processor LNXCPU:00: registered as cooling_device0

Jan 9 12:04:44 Tower kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU2._OSC] (Node f740e270), AE_ALREADY_EXISTS

Jan 9 12:04:44 Tower kernel: ACPI: Marking method _OSC as Serialized because of AE_ALREADY_EXISTS error

Jan 9 12:04:44 Tower kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU2._PDC] (Node f740e258), AE_ALREADY_EXISTS

Jan 9 12:04:44 Tower kernel: ACPI: Marking method _PDC as Serialized because of AE_ALREADY_EXISTS error

 

 

Jan 9 12:04:44 Tower kernel: i801_smbus 0000:00:1f.3: PCI INT B -> GSI 19 (level, low) -> IRQ 19

Jan 9 12:04:44 Tower kernel: ata5.00: exception Emask 0x10 SAct 0x3 SErr 0x780100 action 0x6

Jan 9 12:04:44 Tower kernel: ata5.00: irq_stat 0x08000000

Jan 9 12:04:44 Tower kernel: ata5: SError: { UnrecovData 10B8B Dispar BadCRC Handshk }

Jan 9 12:04:44 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

Jan 9 12:04:44 Tower kernel: ata5.00: cmd 60/40:00:20:00:00/00:00:00:00:00/40 tag 0 ncq 32768 in

Jan 9 12:04:44 Tower kernel: res 40/00:0c:a8:88:e0/00:00:e8:00:00/40 Emask 0x10 (ATA bus error)

Jan 9 12:04:44 Tower kernel: ata5.00: status: { DRDY }

Jan 9 12:04:44 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

Jan 9 12:04:44 Tower kernel: ata5.00: cmd 60/08:08:a8:88:e0/00:00:e8:00:00/40 tag 1 ncq 4096 in

Jan 9 12:04:44 Tower kernel: res 40/00:0c:a8:88:e0/00:00:e8:00:00/40 Emask 0x10 (ATA bus error)

Jan 9 12:04:44 Tower kernel: ata5.00: status: { DRDY }

Jan 9 12:04:44 Tower kernel: ata5: hard resetting link

 

 

Jan 9 12:04:44 Tower kernel: ata5.00: qc timeout (cmd 0xec)

Jan 9 12:04:44 Tower kernel: ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Jan 9 12:04:44 Tower kernel: ata5.00: revalidation failed (errno=-5)

Jan 9 12:04:44 Tower kernel: ata5: hard resetting link

Jan 9 12:04:44 Tower kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Jan 9 12:04:44 Tower kernel: ata5.00: configured for UDMA/133

Jan 9 12:04:44 Tower kernel: ata5: EH complete

Jan 9 12:04:44 Tower kernel: ata5.00: exception Emask 0x10 SAct 0x1 SErr 0x780100 action 0x6

Jan 9 12:04:44 Tower kernel: ata5.00: irq_stat 0x08000000

Jan 9 12:04:44 Tower kernel: ata5: SError: { UnrecovData 10B8B Dispar BadCRC Handshk }

Jan 9 12:04:44 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

Jan 9 12:04:44 Tower kernel: ata5.00: cmd 60/20:00:a8:87:e0/00:00:e8:00:00/40 tag 0 ncq 16384 in

Jan 9 12:04:44 Tower kernel: res 40/00:04:a8:87:e0/00:00:e8:00:00/40 Emask 0x10 (ATA bus error)

Jan 9 12:04:44 Tower kernel: ata5.00: status: { DRDY }

Jan 9 12:04:44 Tower kernel: ata5: hard resetting link

 

Jan 9 12:04:44 Tower kernel: usb 5-1: configuration #1 chosen from 1 choice

Jan 9 12:04:44 Tower kernel: ata5.00: qc timeout (cmd 0xec)

Jan 9 12:04:44 Tower kernel: ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Jan 9 12:04:44 Tower kernel: ata5.00: revalidation failed (errno=-5)

Jan 9 12:04:44 Tower kernel: ata5: hard resetting link

Jan 9 12:04:44 Tower kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Jan 9 12:04:44 Tower kernel: ata5.00: configured for UDMA/133

Jan 9 12:04:44 Tower kernel: ata5: EH complete

Jan 9 12:04:44 Tower kernel: ata5.00: exception Emask 0x10 SAct 0x3 SErr 0x780100 action 0x6

Jan 9 12:04:44 Tower kernel: ata5.00: irq_stat 0x08000000

Jan 9 12:04:44 Tower kernel: ata5: SError: { UnrecovData 10B8B Dispar BadCRC Handshk }

Jan 9 12:04:44 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

Jan 9 12:04:44 Tower kernel: ata5.00: cmd 60/38:00:c8:87:e0/00:00:e8:00:00/40 tag 0 ncq 28672 in

Jan 9 12:04:44 Tower kernel: res 40/00:0c:20:87:e0/00:00:e8:00:00/40 Emask 0x10 (ATA bus error)

Jan 9 12:04:44 Tower kernel: ata5.00: status: { DRDY }

Jan 9 12:04:44 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

Jan 9 12:04:44 Tower kernel: ata5.00: cmd 60/20:08:20:87:e0/00:00:e8:00:00/40 tag 1 ncq 16384 in

Jan 9 12:04:44 Tower kernel: res 40/00:0c:20:87:e0/00:00:e8:00:00/40 Emask 0x10 (ATA bus error)

Jan 9 12:04:44 Tower kernel: ata5.00: status: { DRDY }

Jan 9 12:04:44 Tower kernel: ata5: hard resetting link

Jan 9 12:04:44 Tower kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Jan 9 12:04:44 Tower kernel: ata5.00: qc timeout (cmd 0xec)

Jan 9 12:04:44 Tower kernel: ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Jan 9 12:04:44 Tower kernel: ata5.00: revalidation failed (errno=-5)

Jan 9 12:04:44 Tower kernel: ata5: hard resetting link

Jan 9 12:04:44 Tower kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Jan 9 12:04:44 Tower kernel: ata5.00: configured for UDMA/133

Jan 9 12:04:44 Tower kernel: ata5: EH complete

Jan 9 12:04:44 Tower kernel: ata5: limiting SATA link speed to 1.5 Gbps

Jan 9 12:04:44 Tower kernel: ata5.00: exception Emask 0x10 SAct 0x3 SErr 0x780100 action 0x6

Jan 9 12:04:44 Tower kernel: ata5.00: irq_stat 0x08000000

Jan 9 12:04:44 Tower kernel: ata5: SError: { UnrecovData 10B8B Dispar BadCRC Handshk }

Jan 9 12:04:44 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

Jan 9 12:04:44 Tower kernel: ata5.00: cmd 60/20:00:20:87:e0/00:00:e8:00:00/40 tag 0 ncq 16384 in

Jan 9 12:04:44 Tower kernel: res 40/00:0c:c8:87:e0/00:00:e8:00:00/40 Emask 0x10 (ATA bus error)

Jan 9 12:04:44 Tower kernel: ata5.00: status: { DRDY }

Jan 9 12:04:44 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

Jan 9 12:04:44 Tower kernel: ata5.00: cmd 60/38:08:c8:87:e0/00:00:e8:00:00/40 tag 1 ncq 28672 in

Jan 9 12:04:44 Tower kernel: res 40/00:0c:c8:87:e0/00:00:e8:00:00/40 Emask 0x10 (ATA bus error)

Jan 9 12:04:44 Tower kernel: ata5.00: status: { DRDY }

Jan 9 12:04:44 Tower kernel: ata5: hard resetting link

Jan 9 12:04:44 Tower kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 9 12:04:44 Tower kernel: ata5.00: qc timeout (cmd 0xec)

Jan 9 12:04:44 Tower kernel: ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Jan 9 12:04:44 Tower kernel: ata5.00: revalidation failed (errno=-5)

Jan 9 12:04:44 Tower kernel: ata5: hard resetting link

 

 

 

Link to comment

my rebuilding is in progress - 50% done

but it seems my log is full of errors. can someone please check this? is my drive or its connection ok?

there are over 3000 lines of errors for less than 10 minutes rebuilding progress.

 

http://zbug.de/syslog.txt

 

 

:(

 

 

thanks

 

CRC errors are typically caused by bad cables (or bad connections on cables), or a noisy (electrical noise) power supply.  basically, the communications between the disk and the disk controller are being corrupted.  It could be caused if a poorly shielded SATA cable is tightly coupled to a noisy power supply line.

 

Joe L.

Link to comment

Thats the reason, why I took the time to stop by and ask.

Because (like already quoted above) I can't translate the syslog.

 

 

Jan 10 18:13:40 Tower kernel: usb 2-1: new low speed USB device using uhci_hcd and address 17

Jan 10 18:13:40 Tower kernel: usb 2-1: device not accepting address 17, error -71

Jan 10 18:13:40 Tower kernel: hub 2-0:1.0: unable to enumerate USB device on port 1

 

And even if I could understand part of an issue, like the current USB problem and also browse through the forums, there is no way for a non-linux user, to completely understand (and fix!) this without help.

 

Link to comment

Thats the reason, why I took the time to stop by and ask.

Because (like already quoted above) I can't translate the syslog.

 

 

Jan 10 18:13:40 Tower kernel: usb 2-1: new low speed USB device using uhci_hcd and address 17

Jan 10 18:13:40 Tower kernel: usb 2-1: device not accepting address 17, error -71

Jan 10 18:13:40 Tower kernel: hub 2-0:1.0: unable to enumerate USB device on port 1

 

And even if I could understand part of an issue, like the current USB problem and also browse through the forums, there is no way for a non-linux user, to completely understand (and fix!) this without help.

 

Well. you did ask...

 

A visual inspection of a cable will tell you if anything grossly is wrong.  (It is unplugged, or missing, or smoking, or charred)

 

Any other test will need to to look for abnormal symptoms. 

Some you'll be able to detect from the unRAID maagement web-page. (array will not start, disk is disabled, read errors, parity errors)

 

Yes another class of error will only be see in the system log, as in your example.

As far as knowing what it means, you, me, and lot of other people in the world have little experience with some of the potential errors.  For that, I just take part of the error message and do a "google" search.

I searched for:

device not accepting address 17, error -71

 

Guess what, other have run into the same error.  Some blame a driver module being loaded that is not the best for THEIR hardware.

See here: http://www.linuxquestions.org/questions/fedora-35/usb-devices-no-longer-work-231930/#post1591425

 

Others say the messages are quite normal upon boot up as the server attempts to identify all the hardware you have in your server.  Some, obviously, will not exist in your server, but be used in other hardware configurations.

 

You might try the same type of search for syslog messages and see if it helps you.    it is not magic... and I know a lot is over your head... it is over mine too.

 

Joe L.

Link to comment

I do recall you having a lot of troubles when you initially were setting your server and you were using some exotic motherboard at the time.

 

Keep in mind one thing - the battery on the older motherboards may not hold the charge well and once you unplug the power to move the case to another location your "custom" BIOS setup is gone and the motherboard will revert and boot with the "default" BIOS configuration once you plug the PSU again - and this may lead to "memory" problems, boot order problems, "USB" problems etc.

 

So check out your BIOS first to see if everything is as it should be (hopefully you kept some notes).

 

Then there is always the possibility to kill something by electrostatic discharge too.

 

Good luck

Link to comment

i got the battery changed 1 year ago, when i started the unraid server.

the bios settings are burned in my brain. i used to live inside my bios, before everything was working properly.

 

i have had an misunderstanding here. i didnt expected to get linux errors in my unraid syslog. this isn't really obvious for a newcommer. i thought those are specific unraid error codes.

 

while using google for nearly every peace of information i wouldnt even consider to put my unraid error msg in it, when the only source for this information should be this forum/support of limetech ;-)

 

 

the usb problem is gone, since i removed my usb cable extension between pc and keyboard.

 

last open issue is the recovered disk. after reproducing everything successfully, it continued to throw tons of errors, when writing to the drive.

i still think the lose cable is the cause, but i let you know, when my locking sata cables are delivered.

 

Link to comment

Here I am again.

 

Today i plugged in Sata cables with the locking feature

45cm-locking-sata-data-cable-302-p.jpg

 

 

It turned out, that most of my problems (diffrent red error msgs in syslog) are gone.

The system is booting much quicker.

My transferrate to all discs is gone up. (I guess its related to many lost data packages)

Discs are much more responsive (even the ones, that seemed to work fine before)

 

Therefor, if anyone of you is not using those cables (and I regret that I didnt knew this earlier), get them.

It's like 0.80 - 1.20 Euro per cable at the moment.

 

Even if you dont have any issues right now, the price of those little gadgets is nothing compared to the trouble with some disc errors and rebuilding / figuring out, because sata cables will become loose by vibrations of the discs themselves. Sooner or later.

 

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...