Jump to content
Georg

Multiple disk error occurring when copying or otherwise using the system

5 posts in this topic Last Reply

Recommended Posts

Hi all,

I have some weird disk errors that occur when I for instance start to copy files with the windows explorer onto a network share or otherwise use the system. It is not exactly logical to me what triggers the error. They happen usually on multiple disks at the exact same time and it does not seem to affect the system to hard but usually resulting in a copy error. After a restart they are obviously deleted but reoccur again.

image.png.171eea7f9f8a196ff725883541bd9138.png    image.png.f76baf45860d18c76c57602bafa848b3.png

 

 

I am also not sure whether it is is connected to reaching a high water level mark. In those cases I get the following windows copy error: 

image.png.3af8196ae9cc69dee6ab9e3aaa6f5cb5.png

Do you know how to avoid this kind of error? Each time a high water level is reached I have to at least once stop and restart the array to be able to start the copying process again. Shouldn't it normally switch automatically to another disc?

 

A third thing which may could play a role in this is that I sometimes see this and are then subsequently not able to view the log file:

image.png.76a775a163e689df780732864bf7e433.png

 

image.png.3a17df1b0d9b4b388bdb07457c4d546e.png

 

Those errors have happend several times now, so I have multiple diagnostic files:

atlas-diagnostics-20191128-2305.zip

atlas-diagnostics-20191125-2006.zip

 

Thanks in advance!

 

Regards,

Georg

 

Share this post


Link to post

It's affecting multiple disks on different controllers:

Nov 27 23:42:02 Atlas kernel: sd 13:0:2:0: [sdj] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 27 23:42:02 Atlas kernel: sd 13:0:2:0: [sdj] tag#2 Sense Key : 0x5 [current]
Nov 27 23:42:02 Atlas kernel: sd 13:0:2:0: [sdj] tag#2 ASC=0x20 ASCQ=0x0
Nov 27 23:42:02 Atlas kernel: sd 13:0:2:0: [sdj] tag#2 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 04 00 00 00
Nov 27 23:42:02 Atlas kernel: print_req_error: critical target error, dev sdj, sector 64
Nov 27 23:42:02 Atlas kernel: sd 13:0:3:0: [sdk] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 27 23:42:02 Atlas kernel: sd 13:0:3:0: [sdk] tag#0 Sense Key : 0x5 [current]
Nov 27 23:42:02 Atlas kernel: sd 13:0:3:0: [sdk] tag#0 ASC=0x20 ASCQ=0x0
Nov 27 23:42:02 Atlas kernel: sd 13:0:3:0: [sdk] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 04 00 00 00
Nov 27 23:42:02 Atlas kernel: print_req_error: critical target error, dev sdk, sector 64
Nov 27 23:42:02 Atlas kernel: sd 12:0:2:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 27 23:42:02 Atlas kernel: sd 12:0:2:0: [sde] tag#0 Sense Key : 0x5 [current]
Nov 27 23:42:02 Atlas kernel: sd 12:0:2:0: [sde] tag#0 ASC=0x20 ASCQ=0x0
Nov 27 23:42:02 Atlas kernel: sd 12:0:2:0: [sde] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 04 00 00 00
Nov 27 23:42:02 Atlas kernel: print_req_error: critical target error, dev sde, sector 64
Nov 27 23:42:02 Atlas kernel: sd 12:0:1:0: [sdd] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 27 23:42:02 Atlas kernel: sd 12:0:1:0: [sdd] tag#1 Sense Key : 0x5 [current]
Nov 27 23:42:02 Atlas kernel: sd 12:0:1:0: [sdd] tag#1 ASC=0x20 ASCQ=0x0
Nov 27 23:42:02 Atlas kernel: sd 12:0:1:0: [sdd] tag#1 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 04 00 00 00

 

This would suggest for example a power issue, do you have another PSU you can use to test?

Share this post


Link to post

First of all thank you for your answer!

 

13 hours ago, johnnie.black said:

This would suggest for example a power issue, do you have another PSU you can use to test?

Well not easy but I could use the one from my PC. I think I will first try to update the SAS 9300 16i as it should be done anyway. Do you coincidentally have some tips for that? How likely is it that I kill the card when having no experience with it at all?^^ (I would try to follow that thing https://www.broadcom.com/support/knowledgebase/1211161501344/flashing-firmware-and-bios-on-lsi-sas-hbas) I will probably try it on the weekend.

When switching the PSU, would you suggest a specific test or should I just try, use it and see what happens? The one installed is a Corsair HX850i and te one currently in my PC is a Corsair RM750x.  

Share this post


Link to post

LSI firmware update is very easy, like doing a bios update.

 

After switching PSU just use the server normally and look and the log for similar errors as above, you can also run a non correct parity check, it can probably cause the errors to appear sooner.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.