jang430 Posted August 10, 2019 Share Posted August 10, 2019 Reads are 22,617,230,667,232, Writes are 22,617,230,667,232. And of course, thousands of errors. Aug 10 21:12:27 Tower kernel: md: disk1 read error, sector=39195248 Aug 10 21:12:27 Tower kernel: md: disk2 read error, sector=39195248 Aug 10 21:12:27 Tower kernel: md: disk3 read error, sector=39195248 Aug 10 21:12:27 Tower kernel: md: disk4 read error, sector=39195248 Aug 10 21:12:27 Tower kernel: md: disk7 read error, sector=39195248 Aug 10 21:12:27 Tower kernel: md: disk8 read error, sector=39195248 Aug 10 21:12:27 Tower kernel: md: disk9 read error, sector=39195248 Aug 10 21:12:27 Tower kernel: md: disk0 read error, sector=39195248 Aug 10 21:12:27 Tower kernel: md: disk1 read error, sector=39195256 Aug 10 21:12:27 Tower kernel: md: disk2 read error, sector=39195256 Aug 10 21:12:27 Tower kernel: md: disk3 read error, sector=39195256 Aug 10 21:12:27 Tower kernel: md: disk4 read error, sector=39195256 Aug 10 21:12:27 Tower kernel: md: disk7 read error, sector=39195256 Aug 10 21:12:27 Tower kernel: md: disk8 read error, sector=39195256 Not sure if it's my HBA controller that I removed, and installed back again. Before that, it was ok. No spare HBA. As for the SFF-8087 cable, using 2 cables for 7 drives. Can't be possible that all drives are failing if 1 of the cables are defective. tower-diagnostics-20190810-1324.zip Link to comment
itimpi Posted August 10, 2019 Share Posted August 10, 2019 It might be worth checking that the HBA is well seated in the motherboard slot. I once had similar problems with one that was not properly seated. Link to comment
jang430 Posted August 10, 2019 Author Share Posted August 10, 2019 BTW, I forgot to mention that it starts out ok. No errors. I can even access shares, then after a few minutes, those errors showed. I've removed it, and placed it well once again. But ok, will move slots this time. Any other suggestions from other members? Hope this is not the end of my HBA Thanks itimpi Link to comment
jang430 Posted August 11, 2019 Author Share Posted August 11, 2019 @itimpi, reseating the hba controller didn't help. but unrolling my SFF8087 breakout cable did. Now, able to access my array without problems. It also performed a parity check, and found 0 errors. Could it be possible that both my SFF8087 cables failed at the same time? To be honest, I highly doubt it. But I don't have additional cables to troubleshoot. My cables since long, were a bit rolled. But it has been like that before. I've only removed, and reseated my HBA controller. Don't know why this happened. Link to comment
itimpi Posted August 11, 2019 Share Posted August 11, 2019 No idea why unrolling the cables worked unless they were putting some strain on the connectors so that vibration could make them momentarily break contact. I personally got relatively short cables to reduce clutter. I do know that it is recommended you do NOT try and tidy SATA cables by taping them together as this tends to increase cross-talk but not sure how true this really is. Link to comment
JonathanM Posted August 11, 2019 Share Posted August 11, 2019 16 hours ago, itimpi said: I do know that it is recommended you do NOT try and tidy SATA cables by taping them together as this tends to increase cross-talk but not sure how true this really is. Personally I think the issue the extremely poor retention at the connector, and any attempt to bundle the cables is more likely to pull one of the ends out of alignment. The connector must be completely square in all dimensions to make a proper link that will stay connected during normal system vibration. Link to comment
jang430 Posted August 14, 2019 Author Share Posted August 14, 2019 So I got a new set of cables, just arrived today. No mess. Connected the cables, and now, I have only 1 drive says mounting. Nothing happening. Array not started. Please help. Attaching diagnostics. BTW, changed to a different slot of PCIe already. tower-diagnostics-20190814-1218.zip Link to comment
JorgeB Posted August 14, 2019 Share Posted August 14, 2019 Lot of errors on parity, difficult to say if it's a disk problem since it's not generating a complete SMART report, try connecting it on a different cable/controller to rule that out. Link to comment
jang430 Posted August 14, 2019 Author Share Posted August 14, 2019 OK. Using new cable already. But will remove from case, and try again.. Is it possible that it's a PSU problem? Link to comment
JorgeB Posted August 14, 2019 Share Posted August 14, 2019 Could be, but unlikely since it's only parity disk with problem, also suspicious SMART report is incomplete. Link to comment
jang430 Posted August 14, 2019 Author Share Posted August 14, 2019 so parity drive has issue? it's one of the older drives as well. Link to comment
JorgeB Posted August 14, 2019 Share Posted August 14, 2019 24 minutes ago, johnnie.black said: difficult to say if it's a disk problem since it's not generating a complete SMART report, Link to comment
jang430 Posted August 14, 2019 Author Share Posted August 14, 2019 Thanks. Will investigate further. Link to comment
jang430 Posted August 14, 2019 Author Share Posted August 14, 2019 I've done some further testing. I've switched power between drives, still the same problem. I've connected the parity drive directly to M/B instead of connected to HBA controller, (using regular sata cable) still having the same problems. I'm attaching a screenshot here. My Drive 3 is also disabled since I think it died first before the parity. Won't array start without parity? Any suggestion what to do next? Link to comment
JorgeB Posted August 14, 2019 Share Posted August 14, 2019 Please post current diags, with the parity disk on the MB controller. Link to comment
jang430 Posted August 14, 2019 Author Share Posted August 14, 2019 Hi. This is the new diagnostics with parity connected directly to M/B. Also swapped the PSU connector with another drive. tower-diagnostics-20190814-2334.zip Link to comment
JorgeB Posted August 15, 2019 Share Posted August 15, 2019 Now it's more clear, parity disk is failing. Link to comment
jang430 Posted August 15, 2019 Author Share Posted August 15, 2019 so as you can see, I have disk 3 that's currently disabled due to drive failed. What steps do I start with to recover as much as possible :D? Link to comment
JorgeB Posted August 15, 2019 Share Posted August 15, 2019 Disk 3 also appears to be failing, best options IMHO are manually copying everything you can from old disk3 to a new disk or using ddrescue to clone it, when that's done also get a new disk for parity, do a new config with new disk3 and remaining disks and sync parity. Link to comment
jang430 Posted August 15, 2019 Author Share Posted August 15, 2019 Great plan @johnnie.black I hope I can get some files out of disk3 still. Link to comment
jang430 Posted August 18, 2019 Author Share Posted August 18, 2019 While rebuilding parity, I am seeing a lot of errors, though the array is accessible. Is this normal? Could it finally be my Dell Perc H310 that is causing problems? Any way to find out? I'm using new SFF-8087 cables already. tower-diagnostics-20190818-0437.zip Link to comment
jang430 Posted August 18, 2019 Author Share Posted August 18, 2019 I interrupted the rebuilding of parity and just restarted the NAS. I don’t see any parity now, but array is working fine, I don’t see any errors in the main page. What could this possibly be? tower-diagnostics-20190818-0538.zip Link to comment
jang430 Posted August 18, 2019 Author Share Posted August 18, 2019 I have accessed shows from the array. It seems without any heavy transfer or access, the array seems to be working fine. I am afraid to do anything heavy right now. Hope you can help point out possible cause to these issues Link to comment
JorgeB Posted August 19, 2019 Share Posted August 19, 2019 On 8/18/2019 at 5:38 AM, jang430 said: Could it finally be my Dell Perc H310 that is causing problems? Looks like it: Aug 18 09:55:38 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!! Try it in a different slot, also make sure it's sufficiently cooled. You're also having problems with the parity disk, and that one is on the onboard SATA ports, it dropped offline so there's no SMART report, but looks more like a connection problem. Link to comment
jang430 Posted August 19, 2019 Author Share Posted August 19, 2019 Parity disk is brand new though. Recently passed Pre clearing.Sent from my iPhone using Tapatalk Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.