Multiple disks failing (all Toshiba disks) and multiple disks with errors


abhi.ko

Recommended Posts

Hello All - I have multiple disks in the array failing and multiple disks with errors, out of the blue. Seems like it is my controller or cables that is causing the issue, but not sure, i did check everything recently when I added some RAM and all looked good. I have the server shutdown currently since there are 2 failed disks now. The attached diagnostics was before the second one had failed.

 

Multiple disks with errors as well, I did replace one failed disk and while the parity sync was going on I got a log full of errors and multiple disks were reporting errors and one failed at the beginning of parity sync and the second one towards the end. I did recently update to 6.10 rc2, but the initial issue started while I was on 6.9 stable, referenced here, the disk now in failed state is the same one referenced in that thread. I did a Win 11 VM yesterday which got added fine and everything was working well, and then this started.

 

Please help. I have an HA virtual machine that is always running and hence my home automation is not working either currently.

 

I am trying to determine next course of action, all hardware is pretty new.

tower-diagnostics-20220207-0650.zip

Edited by abhi.ko
Link to comment
  • Replies 71
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

I have an LSI 9300 16i controller and 4 SAS cables plugged into it and it has a power connection from the psu as well. psu is a 1000w EVGA psu which should be plenty of power I thought, shouldn't it be? 

 

I just reconnected all the cables to the LSI card again and re-seated the card. I started the server and see the two disks are in disabled error state. I have dual parity can I just rebuild it on to itself. 

 

Attached a diagnostics output without starting the array, not sure if that helps.

tower-diagnostics-20220207-1833.zip

Edited by abhi.ko
Attached diagnostics
Link to comment

What do you suggest? Should I start the array with the two disabled disks and run diagnostics to post here. Array is mountable though. 

 

I am just worried about loosing data. I keep getting sector reallocated errors and the counts going up on all the Toshiba disks I have on the array, weird that it is just those disks they are spread over in different trays (physically) on the case as well. So it is not like one tray/backplate has gone bad, other disks are not having the same errors. Is there any issues with Toshiba disks that has been reported, especially with RC2?

 

Screenshot below of all the warnings I got when I turned the server on for a few minutes. 

 

Any advice on next steps please? 

 

 

Screenshot_20220208-060916.png

Edited by abhi.ko
Link to comment

Okay I will turn on and listen to it and see if I hear anything. I have reconnected all the cables and re-seated the LSI card and made sure all connections are tight. 

 

Question - I have two drives that are disabled in the array - what are the next steps when I turn it on, do I just unassign them and start the array and stop again and reassign the same drives and let the parity resync run for both drives at once, or do I do one drive at a time, or should I do something else? I have dual parity, so if one more drive becomes disabled then I will loose data wouldn't I? 

 

I just rebuilt another old Seagate drive that failed last week, not sure if that is related to this or not, so I'm concerned whether one of these drives with reallocated sectors will go bad before the parity sync finishes and cause me to loose data.

 

Any suggestions you have for next steps would be very helpful, as I had asked I can turn the array on and run/post diagnostics as a first step, and then shutdown the server,  if you think more information would help and if that is safer.

Edited by abhi.ko
Link to comment
2 hours ago, abhi.ko said:

I do not know why it is only those Toshiba drives

 

I had a few Toshiba drives do the same thing, it was power related (too many drives on one line). So it's possible. Only the Toshiba drives would start reallocating sectors, the WD drives would just fall out of the array. If you're using splitters, try eliminating/using as few as possible

Link to comment

Thank you @Michael_P 

 

What do you mean by splitters? Like a SAS to SATA cable? I use a Norco 4224 case which has a SAS backplane with 6 SAS connectors (1 per 4 drive tray) and I have 8 Sata slots on my motherboard, which are connected using 2 SAS to 4 Sata reverse breakout cables and 16 drives goes directly to the LSI 9300 16i card, using SAS connectors similar to this.

 

Both failed drives are on the SAS cables connected directly to the HBA card. Only one of the sata connected discs are showing errors. 

 

Do you mean the reverse breakout cables when you say splitters?

Link to comment
3 hours ago, JorgeB said:

First post the diags after array start like asked, so we can see if the emulated disks are mounting.

 

 

Diagnostics attached. Also attached is a picture I took from the monitor attached to the server, seems like that is for the two disabled disks, but attached just in case if it gave more info. All other disks mounted fine.

 

No sounds other than the normal bootup and fan noises were noticed. Hopefully this diagnostics has enough information. 

PXL_20220208_184049356.jpg

tower-diagnostics-20220208-1240.zip

Link to comment
18 hours ago, abhi.ko said:

psu is a 1000w EVGA psu which should be plenty of power I thought, shouldn't it be? 

Yes, but just means on 12v

 

6 hours ago, abhi.ko said:

Is there any issues with Toshiba disks that has been reported, especially with RC2?

No, I have ~18 Toshiba 6TB disk no any issue.

 

1 hour ago, abhi.ko said:

What do you mean by splitters?

He means molex/sata power spliter, as you use backplate then it won't that issue.

 

Btw, it look like PSU problem.

Edited by Vr2Io
Link to comment
2 hours ago, JorgeB said:

Yep, check filesystem on both disabled disks, then if they mount look for a lost+found folder, if there's a lot of files there it's probably best to re-sync parity instead of rebuilding.

Thank you @JorgeB I will do it. should I do something about the power situation in my case before that.

 

Based on other comments here from @Michael_P and @Vr2Io  - Thank you both and yes I am using power splitters to connect all 6 backplanes to a single PSU connector (picture attached) - which I think might be causing all of this,  please correct me if I am wrong. Should I get a different PSU - I currently have this - which I believe is a single +12V rail PSU with a 83A max output. I have a total of 23 disks including parity and cache (cache is an SSD) and majority of these HDD's are the 7200RPM ones. If I should change - do you have any recommendations?

 

Or should I change how they are powered?

 

PXL_20220208_203908855_3.thumb.jpg.73378f0308ad37b6cd9d6eb3c81c6c92.jpg

 

 

Edited by abhi.ko
Link to comment

Your full modual PSU have 1 PERF and 3 SATA power socket, all four could use for backplane ( need modify ), but still miss two for 6 required.

 

220-G5-1000-X7_XL_5.png

If you know how to DIY molex power plug on cable then you can DIY ( any wrong could burn all stuff ), otherwise you need ask EVGA to buy 3 more molex cable or found other PSU which suit your need.

 

https://www.moddiy.com/products/DIY-IDE-Molex-Power-EZ-Crimp-Connector-%2d-Black.html

 

Edited by Vr2Io
Link to comment
1 hour ago, Vr2Io said:

Your full modual PSU have 1 PERF and 3 SATA power socket, all four could use for backplane ( need modify ), but still miss two for 6 required.

 

 

If you know how to DIY molex power plug on cable then you can DIY ( any wrong could burn all stuff ), otherwise you need ask EVGA to buy 3 more molex cable or found other PSU which suit your need.

 

https://www.moddiy.com/products/DIY-IDE-Molex-Power-EZ-Crimp-Connector-%2d-Black.html

 

Thanks - but I am a little confused, because this is all still on a single 12V line right, irrespective of what connector we plug it into? So how does it distribute the load? Apologies if I am missing something obvious. Is the 83A power draw enough to boot up the whole system, I thought that was the problem and I needed a more beefier PSU with more amperage on the single 12V line.

 

If yes, then I have a few of these lying around - shouldn't these do the trick, connect them to the SATA connectors from the PSU, and connect 6 backplanes to the 4 SATA/PERIF connectors. Not 1:1 but that distribution of the load should help right, currently everything is on one connector to the PSU.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.