almarma Posted August 16, 2019 Share Posted August 16, 2019 (edited) Hi everybody, A few months ago I came here asking for help to disable a supposedly damaged disk. I followed all the instructions but the last one: actually removing the disk from the array. I did it because summer holidays and such, I didn't want to hurry it or even risking to leave the array partially broken while on holidays with no access to the server. Well, now I'm back and last weekend I had an issue with a case fan which became noisy so I replaced it and rearranged the fans in another way because the cables were too tight. Well, after it, a new array test happened and now I saw 17 errors in the Parity disk, and no error in the supposedly damaged one. What's going on here? After that I did a new Extended SMART test from both disks and now none of them has any issues (the Disk3 reported issues before). Can it be the cable? Did I accidentally moved the cables while rearranging the case fans and now it's fine? It's a big weird that the issue has jumped from one disk to another one. This is the array status reporting 17 errors (not parity errors, the array shows as healthy): This is the Test results from the Parity Drive This is the test results from the Disk3 Drive. Here you can notice: - Last test, completed without error. - The previous one (#2) Completed with read failure. - Down in the error log, Error 19 happened at 42930 hours of age. No other error until now (2000 hours more). I've attached also the diagnostics file from my server in case somebody want or can help me. Thank you in advance. galeon-diagnostics-20190816-1010.zip Edited August 16, 2019 by almarma add a link to the previous related post Quote Link to comment
JorgeB Posted August 16, 2019 Share Posted August 16, 2019 Disk3 is OK for now, but is more likely to fail again soon, parity is failing. Quote Link to comment
almarma Posted August 16, 2019 Author Share Posted August 16, 2019 Just now, johnnie.black said: Disk3 is OK for now, but is more likely to fail again soon, parity is failing. Thanks for your answer. But is it 100% sure it's the disk and not the cable? What do you mean with "Parity is failing"? As far as I know, parity is still ok and no error was corrected from parity: Quote Link to comment
JorgeB Posted August 16, 2019 Share Posted August 16, 2019 What do you mean with "Parity is failing"? Parit disk is failing, and no, it's not a cable. Quote Link to comment
JorgeB Posted August 16, 2019 Share Posted August 16, 2019 You do appear to have a cable problem on disks 4 and 5, replace cables. Quote Link to comment
almarma Posted August 20, 2019 Author Share Posted August 20, 2019 On 8/16/2019 at 1:00 PM, johnnie.black said: Parit disk is failing, and no, it's not a cable. How can you be that sure? I just ask out of curiosity, to learn myself. Because I did the SMART extended test and it was ok and there's not a single error logged into it. On 8/16/2019 at 1:03 PM, johnnie.black said: You do appear to have a cable problem on disks 4 and 5, replace cables. Again, how did you found that out? Thank you in advance Quote Link to comment
JorgeB Posted August 20, 2019 Share Posted August 20, 2019 8 minutes ago, almarma said: How can you be that sure? I just ask out of curiosity, to learn myself. Because I did the SMART extended test and it was ok and there's not a single error logged into it. Quote Error 1 [0] occurred at disk power-on lifetime: 38981 hours (1624 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 c0 00 00 af 89 d9 88 e0 00 Error: UNC 192 sectors at LBA = 0xaf89d988 = 2945046920 You can see recent UNC @ LBA errors, these are media errors, aka bad/failing sectors, these are confirmed in the log: Quote Aug 12 21:24:16 GALEON kernel: ata2.00: cmd 25/00:c0:50:d9:89/00:00:af:00:00/e0 tag 15 dma 98304 in Aug 12 21:24:16 GALEON kernel: res 51/40:7f:88:d9:89/00:00:af:00:00/e0 Emask 0x9 (media error) Aug 12 21:24:16 GALEON kernel: ata2.00: status: { DRDY ERR } Aug 12 21:24:16 GALEON kernel: ata2.00: error: { UNC } Aug 12 21:24:16 GALEON kernel: ata2.00: configured for UDMA/133 Showing again UNC errors (media error) Now these errors can some times be intermittent, and if the disk passed the extend SMART test is OK for now, but it's much more likely to fail again in the near future. The other two disks show a few ATA connection/timout errors on the log, these are usually cable/connection problem. 1 Quote Link to comment
JorgeB Posted August 20, 2019 Share Posted August 20, 2019 Also, make sure it really passed the extended SMART test, sometimes people misinterpret the way the results are logged. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.