clowncracker Posted September 12, 2024 Posted September 12, 2024 (edited) This is a followup to this thread: I'm having issues with my server still. After rebuilding disk 5, I opted to replace disk 8 because it had 96 relocated sector errors. I ran a parity check after rebuilding disk 5 without issue. As soon as I replaced disk 8, I started running into issues. I would get read errors and UDMA CRC error count on multiple drives simultaneously. I thought maybe the issue was the SAS card or cables, since the log was being spammed with the following: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) I replaced the SAS card and cables and I'm still running into issues. I just have no idea what to do at this point. Here are my diagnostics and I've included my syslog right after replacing drive 8 before I tried replacing the cables/SAS card. The diagnostics provided are after I replaced my cables/SAS card. Edited September 21, 2024 by clowncracker removed diagnostic Quote
JorgeB Posted September 13, 2024 Posted September 13, 2024 Looks more like a power/connection issue, did you also check/replace the power cables? Quote
clowncracker Posted September 13, 2024 Author Posted September 13, 2024 (edited) On 9/13/2024 at 1:18 AM, JorgeB said: Looks more like a power/connection issue, did you also check/replace the power cables? I just tried replacing the power supply entirely and I'm still running into issues. Still seeing this in my log a lot and more read errors: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Edited September 21, 2024 by clowncracker removed diagnostic Quote
JorgeB Posted September 14, 2024 Posted September 14, 2024 Do you have a different controller you could test with? Could be a controller problem, or it doesn't like or or more of the disks. Quote
clowncracker Posted September 14, 2024 Author Posted September 14, 2024 (edited) 6 hours ago, JorgeB said: Do you have a different controller you could test with? Could be a controller problem, or it doesn't like or or more of the disks. This is the controller I'm using. I replaced it with an exact copy and swapped out the cables and I'm still having issues. Of note I was using the old controller for years without issue. https://www.amazon.com/LSI-Logic-9207-8i-Controller-LSI00301/dp/B0085FT2JC Edited September 14, 2024 by clowncracker Quote
JorgeB Posted September 14, 2024 Posted September 14, 2024 That's fine as long as it's genuine, though it does look too cheap if it's new. Quote
clowncracker Posted September 14, 2024 Author Posted September 14, 2024 I could buy another one to test it, but looks genuine to me. It looks identical to the old one, same logo and the PCBs look the same. Unless I was using a counterfeit one for years without issue. https://www.ebay.com/itm/165949664051?_skw=9207-8i+it+mode&itmmeta=01J7RFKPNBDMB2EA8T23GDJE0X&hash=item26a35eeb33:g:GNQAAOSwkxFkwCjz Quote
JonathanM Posted September 14, 2024 Posted September 14, 2024 Counterfeit electronics many times work fine, it's just the QC that's non-existent, and since it doesn't have the manufacturers backing there is no real warranty. The people making them don't care how many work or not, as long as some work for a while. Do you have forced ventilation over the heatsink? Those cards get rather warm, and marginal chips can fail when they get too hot. Quote
clowncracker Posted September 14, 2024 Author Posted September 14, 2024 (edited) I didn't with that original card, but with the new card I added some because I read that can cause failures. Edited September 21, 2024 by clowncracker Quote
clowncracker Posted September 14, 2024 Author Posted September 14, 2024 (edited) I guess I'll bite the bullet and buy one from the art of the server. All of the drives having issues are stemming from the SAS card (old and new). After replacing the SAS cables and PSU, at this point it can only be the card or the motherboard is failing (I've tried both PCIe lanes). The motherboard dying seems unlikely, it's less than a year old at this point. Edited September 14, 2024 by clowncracker Quote
clowncracker Posted September 16, 2024 Author Posted September 16, 2024 (edited) On 9/14/2024 at 7:36 AM, JorgeB said: That's fine as long as it's genuine, though it does look too cheap if it's new. The new card came today, but the last time I started the system it caused a device to be disabled. At this point I'm afraid to start the array since I already have two drives that are now disabled. Is there an issue with me using a backup of my USB file to clear out the errors before I start the array? I'm afraid there is going to be data loss, even though I haven't actually tried to write anything to the array. I've attached diagnostics just in case. Is there anything that would indicate that the new SAS card wouldn't solve the problem? I did have a lot of UDMA CRC error counts when I turned the system on (not started the array), but I don't know if that's just leftover from the last SAS card. Edited September 21, 2024 by clowncracker removed diagnostic Quote
JorgeB Posted September 16, 2024 Posted September 16, 2024 If no data was written to the disabled disks once they got disable, you could do a new config, trust parity and then run a correcting parity check instead, it would be safer for the data. I don't see any new disk errors for now, so the UDMA CRC errors are likely old, but monitor to see if they don't keep increasing, if yes, there's still a problem. Quote
clowncracker Posted September 16, 2024 Author Posted September 16, 2024 (edited) 13 minutes ago, JorgeB said: If no data was written to the disabled disks once they got disable, you could do a new config, trust parity and then run a correcting parity check instead, it would be safer for the data. I don't see any new disk errors for now, so the UDMA CRC errors are likely old, but monitor to see if they don't keep increasing, if yes, there's still a problem. Disk 8 is emulated because it's new, but disk 10 had read/write errors from starting the array and trying to rebuild disk 8. Would a new config still be the best option? I still have the old disk 8 if that helps. The disk errors would have been in syslog-previous.txt, I restarted the server after the UDMA CRC errors came up (both boots with the new SAS card) just to see if I would get any more errors. Edited September 16, 2024 by clowncracker Quote
JorgeB Posted September 16, 2024 Posted September 16, 2024 1 hour ago, clowncracker said: Would a new config still be the best option? I still have the old disk 8 if that helps. Disk10 looks OK, so ti should still be a good option, but you'd need to also use the old disk8, assuming that one is also OK. Quote
clowncracker Posted September 16, 2024 Author Posted September 16, 2024 (edited) I'm just concerned because the old disk 8 had 96 relocated sector counts. I've never used a new config before and I'm not going to lie the warning on this page scares me a bit, Neither of the parity drives are on the SAS controller, so it makes me feel like doing a new config isn't the best option. What are the benefits of building a new config vs using a USB backup from before all of these issues started vs just trying to rebuild disk 8/10? Ideally I want to go with the option that will protect the data, but also something that will verify the new SAS card will actually solve my problems. Edited September 21, 2024 by clowncracker Quote
JorgeB Posted September 17, 2024 Posted September 17, 2024 If the old disk8 is suspect, before attempting a new config, unassign disks 8 and 10 and post new diags after array start. Quote
clowncracker Posted September 17, 2024 Author Posted September 17, 2024 (edited) On 9/17/2024 at 12:47 AM, JorgeB said: If the old disk8 is suspect, before attempting a new config, unassign disks 8 and 10 and post new diags after array start. I installed the old disk 8 and unassigned them from the array. Here is the new config: Edited September 21, 2024 by clowncracker removed diagnostic Quote
Solution JorgeB Posted September 17, 2024 Solution Posted September 17, 2024 I assume this is old disk8? Device Model: TOSHIBA HDWE150 Serial Number: 76J7KC8JF57D If yes, there are a few reallocated sectors, so would recommend running an extended SMART test. Both emulated disks are mounting, assuming contents look correct, you can rebuild, also no controller errors so far, but any issues are more likely to appear under load, like during a rebuild. Keep old disk8 intact for now, and rebuild to the new disk, assuming there's no other spare, disk10 would need to be rebuilt on top, so you could try to rebuild only disk8 for now to see how it goes, keeping old disk10 intact as well. Quote
clowncracker Posted September 19, 2024 Author Posted September 19, 2024 (edited) On 9/17/2024 at 9:00 AM, JorgeB said: I assume this is old disk8? Device Model: TOSHIBA HDWE150 Serial Number: 76J7KC8JF57D If yes, there are a few reallocated sectors, so would recommend running an extended SMART test. Both emulated disks are mounting, assuming contents look correct, you can rebuild, also no controller errors so far, but any issues are more likely to appear under load, like during a rebuild. Keep old disk8 intact for now, and rebuild to the new disk, assuming there's no other spare, disk10 would need to be rebuilt on top, so you could try to rebuild only disk8 for now to see how it goes, keeping old disk10 intact as well. I ended up rebuilding over disks 8 & 10 without issue and my server is back online and operational. I appreciate all of the help. I guess my final question is do you think the airflow for the LSI card is good or should I add another NF-A4x20 over the heatstink? Edited September 19, 2024 by clowncracker Quote
JonathanM Posted September 20, 2024 Posted September 20, 2024 1 hour ago, clowncracker said: do you think the airflow for the LSI card is good Do you have any way to check the temperature of the heatsink in the final configuration, i.e. case closed? Quote
clowncracker Posted September 20, 2024 Author Posted September 20, 2024 I do not, maybe I should just buy the fan to be safe. I'd rather spend $15 now then to have the card fail prematurely in the future. Quote
clowncracker Posted September 21, 2024 Author Posted September 21, 2024 I'm hoping this is good enough. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.