August 16, 20196 yr Hi everybody, A few months ago I came here asking for help to disable a supposedly damaged disk. I followed all the instructions but the last one: actually removing the disk from the array. I did it because summer holidays and such, I didn't want to hurry it or even risking to leave the array partially broken while on holidays with no access to the server. Well, now I'm back and last weekend I had an issue with a case fan which became noisy so I replaced it and rearranged the fans in another way because the cables were too tight. Well, after it, a new array test happened and now I saw 17 errors in the Parity disk, and no error in the supposedly damaged one. What's going on here? After that I did a new Extended SMART test from both disks and now none of them has any issues (the Disk3 reported issues before). Can it be the cable? Did I accidentally moved the cables while rearranging the case fans and now it's fine? It's a big weird that the issue has jumped from one disk to another one. This is the array status reporting 17 errors (not parity errors, the array shows as healthy): This is the Test results from the Parity Drive This is the test results from the Disk3 Drive. Here you can notice: - Last test, completed without error. - The previous one (#2) Completed with read failure. - Down in the error log, Error 19 happened at 42930 hours of age. No other error until now (2000 hours more). I've attached also the diagnostics file from my server in case somebody want or can help me. Thank you in advance. galeon-diagnostics-20190816-1010.zip Edited August 16, 20196 yr by almarma add a link to the previous related post
August 16, 20196 yr Community Expert Disk3 is OK for now, but is more likely to fail again soon, parity is failing.
August 16, 20196 yr Author Just now, johnnie.black said: Disk3 is OK for now, but is more likely to fail again soon, parity is failing. Thanks for your answer. But is it 100% sure it's the disk and not the cable? What do you mean with "Parity is failing"? As far as I know, parity is still ok and no error was corrected from parity:
August 16, 20196 yr Community Expert What do you mean with "Parity is failing"? Parit disk is failing, and no, it's not a cable.
August 16, 20196 yr Community Expert You do appear to have a cable problem on disks 4 and 5, replace cables.
August 20, 20196 yr Author On 8/16/2019 at 1:00 PM, johnnie.black said: Parit disk is failing, and no, it's not a cable. How can you be that sure? I just ask out of curiosity, to learn myself. Because I did the SMART extended test and it was ok and there's not a single error logged into it. On 8/16/2019 at 1:03 PM, johnnie.black said: You do appear to have a cable problem on disks 4 and 5, replace cables. Again, how did you found that out? Thank you in advance
August 20, 20196 yr Community Expert 8 minutes ago, almarma said: How can you be that sure? I just ask out of curiosity, to learn myself. Because I did the SMART extended test and it was ok and there's not a single error logged into it. Quote Error 1 [0] occurred at disk power-on lifetime: 38981 hours (1624 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 c0 00 00 af 89 d9 88 e0 00 Error: UNC 192 sectors at LBA = 0xaf89d988 = 2945046920 You can see recent UNC @ LBA errors, these are media errors, aka bad/failing sectors, these are confirmed in the log: Quote Aug 12 21:24:16 GALEON kernel: ata2.00: cmd 25/00:c0:50:d9:89/00:00:af:00:00/e0 tag 15 dma 98304 in Aug 12 21:24:16 GALEON kernel: res 51/40:7f:88:d9:89/00:00:af:00:00/e0 Emask 0x9 (media error) Aug 12 21:24:16 GALEON kernel: ata2.00: status: { DRDY ERR } Aug 12 21:24:16 GALEON kernel: ata2.00: error: { UNC } Aug 12 21:24:16 GALEON kernel: ata2.00: configured for UDMA/133 Showing again UNC errors (media error) Now these errors can some times be intermittent, and if the disk passed the extend SMART test is OK for now, but it's much more likely to fail again in the near future. The other two disks show a few ATA connection/timout errors on the log, these are usually cable/connection problem.
August 20, 20196 yr Community Expert Also, make sure it really passed the extended SMART test, sometimes people misinterpret the way the results are logged.
Archived
This topic is now archived and is closed to further replies.