Multiple disk error occurring when copying or otherwise using the system


Georg

Recommended Posts

Hi all,

I have some weird disk errors that occur when I for instance start to copy files with the windows explorer onto a network share or otherwise use the system. It is not exactly logical to me what triggers the error. They happen usually on multiple disks at the exact same time and it does not seem to affect the system to hard but usually resulting in a copy error. After a restart they are obviously deleted but reoccur again.

image.png.171eea7f9f8a196ff725883541bd9138.png    image.png.f76baf45860d18c76c57602bafa848b3.png

 

 

I am also not sure whether it is is connected to reaching a high water level mark. In those cases I get the following windows copy error: 

image.png.3af8196ae9cc69dee6ab9e3aaa6f5cb5.png

Do you know how to avoid this kind of error? Each time a high water level is reached I have to at least once stop and restart the array to be able to start the copying process again. Shouldn't it normally switch automatically to another disc?

 

A third thing which may could play a role in this is that I sometimes see this and are then subsequently not able to view the log file:

image.png.76a775a163e689df780732864bf7e433.png

 

image.png.3a17df1b0d9b4b388bdb07457c4d546e.png

 

Those errors have happend several times now, so I have multiple diagnostic files:

atlas-diagnostics-20191128-2305.zip

atlas-diagnostics-20191125-2006.zip

 

Thanks in advance!

 

Regards,

Georg

 

Link to comment

It's affecting multiple disks on different controllers:

Nov 27 23:42:02 Atlas kernel: sd 13:0:2:0: [sdj] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 27 23:42:02 Atlas kernel: sd 13:0:2:0: [sdj] tag#2 Sense Key : 0x5 [current]
Nov 27 23:42:02 Atlas kernel: sd 13:0:2:0: [sdj] tag#2 ASC=0x20 ASCQ=0x0
Nov 27 23:42:02 Atlas kernel: sd 13:0:2:0: [sdj] tag#2 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 04 00 00 00
Nov 27 23:42:02 Atlas kernel: print_req_error: critical target error, dev sdj, sector 64
Nov 27 23:42:02 Atlas kernel: sd 13:0:3:0: [sdk] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 27 23:42:02 Atlas kernel: sd 13:0:3:0: [sdk] tag#0 Sense Key : 0x5 [current]
Nov 27 23:42:02 Atlas kernel: sd 13:0:3:0: [sdk] tag#0 ASC=0x20 ASCQ=0x0
Nov 27 23:42:02 Atlas kernel: sd 13:0:3:0: [sdk] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 04 00 00 00
Nov 27 23:42:02 Atlas kernel: print_req_error: critical target error, dev sdk, sector 64
Nov 27 23:42:02 Atlas kernel: sd 12:0:2:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 27 23:42:02 Atlas kernel: sd 12:0:2:0: [sde] tag#0 Sense Key : 0x5 [current]
Nov 27 23:42:02 Atlas kernel: sd 12:0:2:0: [sde] tag#0 ASC=0x20 ASCQ=0x0
Nov 27 23:42:02 Atlas kernel: sd 12:0:2:0: [sde] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 04 00 00 00
Nov 27 23:42:02 Atlas kernel: print_req_error: critical target error, dev sde, sector 64
Nov 27 23:42:02 Atlas kernel: sd 12:0:1:0: [sdd] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 27 23:42:02 Atlas kernel: sd 12:0:1:0: [sdd] tag#1 Sense Key : 0x5 [current]
Nov 27 23:42:02 Atlas kernel: sd 12:0:1:0: [sdd] tag#1 ASC=0x20 ASCQ=0x0
Nov 27 23:42:02 Atlas kernel: sd 12:0:1:0: [sdd] tag#1 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 04 00 00 00

 

This would suggest for example a power issue, do you have another PSU you can use to test?

Link to comment

First of all thank you for your answer!

 

13 hours ago, johnnie.black said:

This would suggest for example a power issue, do you have another PSU you can use to test?

Well not easy but I could use the one from my PC. I think I will first try to update the SAS 9300 16i as it should be done anyway. Do you coincidentally have some tips for that? How likely is it that I kill the card when having no experience with it at all?^^ (I would try to follow that thing https://www.broadcom.com/support/knowledgebase/1211161501344/flashing-firmware-and-bios-on-lsi-sas-hbas) I will probably try it on the weekend.

When switching the PSU, would you suggest a specific test or should I just try, use it and see what happens? The one installed is a Corsair HX850i and te one currently in my PC is a Corsair RM750x.  

Link to comment
  • 3 months later...
On 12/4/2019 at 8:13 AM, johnnie.black said:

LSI firmware update is very easy, like doing a bios update.

 

Sorry for the long time...
I have successfully updated the HBA card, thinking at first that this was the solution but it wasn't... I also tried another PSU, switching drives around, changing cables. By doing so, I think, I could narrow down the error to one SFF8643 port of the SAS 9300 i16. In that plug it is always the first bay/hdd of the four which has the error. I also got the impression that it occurs more often when writing to that disk than when reading from it but this might be wrong.
Do you think there is a way to further trouble shoot that error?
 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.