Disk or data cable problem?


Recommended Posts

Hi everybody,

 

A few months ago I came here asking for help to disable a supposedly damaged disk. I followed all the instructions but the last one: actually removing the disk from the array. I did it because summer holidays and such, I didn't want to hurry it or even risking to leave the array partially broken while on holidays with no access to the server.

Well, now I'm back and last weekend I had an issue with a case fan which became noisy so I replaced it and rearranged the fans in another way because the cables were too tight. Well, after it, a new array test happened and now I saw 17 errors in the Parity disk, and no error in the supposedly damaged one.

What's going on here? After that I did a new Extended SMART test from both disks and now none of them has any issues (the Disk3 reported issues before).

 

Can it be the cable? Did I accidentally moved the cables while rearranging the case fans and now it's fine?

 

It's a big weird that the issue has jumped from one disk to another one.

 

This is the array status reporting 17 errors (not parity errors, the array shows as healthy):

Captura_de_pantalla_081619_114653_AM.jpg.e79633dc35d927b965ad00fdf451289c.jpg

 

This is the Test results from the Parity Drive

Captura_de_pantalla_081619_114742_AM.jpg.22dd8be904118afc4a7f96fc099f24ce.jpg

 

This is the test results from the Disk3 Drive. Here you can notice:

- Last test, completed without error.

- The previous one (#2) Completed with read failure.

- Down in the error log, Error 19 happened at 42930 hours of age. No other error until now (2000 hours more).

Captura_de_pantalla_081619_114922_AM.thumb.jpg.5dc435c97c9dd5a525c9db2503dd96c8.jpg

 

I've attached also the diagnostics file from my server in case somebody want or can help me. Thank you in advance.

 

 

galeon-diagnostics-20190816-1010.zip

Edited by almarma
add a link to the previous related post
Link to comment
Just now, johnnie.black said:

Disk3 is OK for now, but is more likely to fail again soon, parity is failing.

Thanks for your answer. But is it 100% sure it's the disk and not the cable?

What do you mean with "Parity is failing"? As far as I know, parity is still ok and no error was corrected from parity:

Captura_de_pantalla_081619_125808_PM.thumb.jpg.d7f6d38a04493e5174ff9687882d0a41.jpg

Link to comment
On 8/16/2019 at 1:00 PM, johnnie.black said:

Parit disk is failing, and no, it's not a cable.

 

 

How can you be that sure? I just ask out of curiosity, to learn myself. Because I did the SMART extended test and it was ok and there's not a single error logged into it.

 

On 8/16/2019 at 1:03 PM, johnnie.black said:

You do appear to have a cable problem on disks 4 and 5, replace cables.

Again, how did you found that out?

 

Thank you in advance

Link to comment
8 minutes ago, almarma said:

How can you be that sure? I just ask out of curiosity, to learn myself. Because I did the SMART extended test and it was ok and there's not a single error logged into it.

Quote

Error 1 [0] occurred at disk power-on lifetime: 38981 hours (1624 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 c0 00 00 af 89 d9 88 e0 00  Error: UNC 192 sectors at LBA = 0xaf89d988 = 2945046920

You can see recent UNC @ LBA errors, these are media errors, aka bad/failing sectors, these are confirmed in the log:

 

Quote

Aug 12 21:24:16 GALEON kernel: ata2.00: cmd 25/00:c0:50:d9:89/00:00:af:00:00/e0 tag 15 dma 98304 in
Aug 12 21:24:16 GALEON kernel:         res 51/40:7f:88:d9:89/00:00:af:00:00/e0 Emask 0x9 (media error)
Aug 12 21:24:16 GALEON kernel: ata2.00: status: { DRDY ERR }
Aug 12 21:24:16 GALEON kernel: ata2.00: error: { UNC }
Aug 12 21:24:16 GALEON kernel: ata2.00: configured for UDMA/133

Showing again UNC errors (media error)

 

Now these errors can some times be intermittent, and if the disk passed the extend SMART test is OK for now, but it's much more likely to fail again in the near future.

 

The other two disks show a few ATA connection/timout errors on the log, these are usually cable/connection problem.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.