Lignumaqua Posted February 15 Share Posted February 15 System went down this evening with, eventually, a disabled parity disk. I don't have full log, but I would appreciate any advice on how to interpret the section below which i was able to capture. Was this actually an HDD failure or could it have been a failure of something else? (Shut down system and now rebuilding parity back to the same disk. No SMART errors reported.) Feb 14 21:53:17 Tower kernel: sd 5:0:3:0: attempting task abort!scmd(0x000000004a167f29), outstanding for 15259 ms & timeout 15000 ms Feb 14 21:53:17 Tower kernel: sd 5:0:3:0: [sde] tag#245 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Feb 14 21:53:17 Tower kernel: scsi target5:0:3: handle(0x000c), sas_address(0x4433221102000000), phy(2) Feb 14 21:53:17 Tower kernel: scsi target5:0:3: enclosure logical id(0x500605b001048b70), slot(1) Feb 14 21:53:48 Tower kernel: mpt2sas_cm0: In func: mpt3sas_scsih_issue_tm Feb 14 21:53:48 Tower kernel: mpt2sas_cm0: Command Timeout Feb 14 21:53:48 Tower kernel: mf: Feb 14 21:53:48 Tower kernel: #011 Feb 14 21:53:48 Tower kernel: 0100000c Feb 14 21:53:48 Tower kernel: 00000100 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: Feb 14 21:53:48 Tower kernel: #011 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 00000000 Feb 14 21:53:48 Tower kernel: 000000f6 Feb 14 21:53:48 Tower kernel: Feb 14 21:53:58 Tower kernel: mpt2sas_cm0: sending diag reset !! Feb 14 21:53:59 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS Feb 14 21:53:59 Tower kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k Feb 14 21:54:14 Tower kernel: mpt2sas_cm0: config_request: manufacturing(0), action(0), form(0x00000000), smid(3428) Feb 14 21:54:14 Tower kernel: mpt2sas_cm0: _config_request: command timeout Feb 14 21:54:14 Tower kernel: mpt2sas_cm0: Command Timeout Feb 14 21:54:14 Tower kernel: mf: Feb 14 21:54:14 Tower kernel: #011 Feb 14 21:54:14 Tower kernel: 04000000 Feb 14 21:54:14 Tower kernel: 00000000 Feb 14 21:54:14 Tower kernel: 00000000 Feb 14 21:54:14 Tower kernel: 00000000 Feb 14 21:54:14 Tower kernel: 00000000 Feb 14 21:54:14 Tower kernel: 09000000 Feb 14 21:54:14 Tower kernel: 00000000 Feb 14 21:54:14 Tower kernel: d3000000 Feb 14 21:54:14 Tower kernel: Feb 14 21:54:14 Tower kernel: #011 Feb 14 21:54:14 Tower kernel: ffffffff Feb 14 21:54:14 Tower kernel: ffffffff Feb 14 21:54:14 Tower kernel: 00000000 Feb 14 21:54:14 Tower kernel: Feb 14 21:54:14 Tower kernel: mpt2sas_cm0: mpt3sas_base_hard_reset_handler: FAILED Feb 14 21:54:14 Tower kernel: sd 5:0:3:0: task abort: FAILED scmd(0x000000004a167f29) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: attempting task abort!scmd(0x00000000bf95fe9f), outstanding for 72192 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: [sdb] tag#160 CDB: opcode=0x88 88 00 00 00 00 03 48 e6 4f a8 00 00 00 20 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:0: handle(0x000b), sas_address(0x4433221103000000), phy(3) Feb 14 21:54:14 Tower kernel: scsi target5:0:0: enclosure logical id(0x500605b001048b70), slot(0) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: No reference found at driver, assuming scmd(0x00000000bf95fe9f) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: task abort: SUCCESS scmd(0x00000000bf95fe9f) Feb 14 21:54:14 Tower kernel: sd 5:0:1:0: attempting task abort!scmd(0x00000000b17000f0), outstanding for 72192 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:1:0: [sdc] tag#161 CDB: opcode=0x88 88 00 00 00 00 03 48 e6 4f a8 00 00 00 20 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:1: handle(0x0009), sas_address(0x4433221100000000), phy(0) Feb 14 21:54:14 Tower kernel: scsi target5:0:1: enclosure logical id(0x500605b001048b70), slot(3) Feb 14 21:54:14 Tower kernel: sd 5:0:1:0: No reference found at driver, assuming scmd(0x00000000b17000f0) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:1:0: task abort: SUCCESS scmd(0x00000000b17000f0) Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: attempting task abort!scmd(0x00000000f8eb2847), outstanding for 70142 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: [sdd] tag#247 CDB: opcode=0x88 88 00 00 00 00 02 ef c6 e7 18 00 00 01 00 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:2: handle(0x000a), sas_address(0x4433221101000000), phy(1) Feb 14 21:54:14 Tower kernel: scsi target5:0:2: enclosure logical id(0x500605b001048b70), slot(2) Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: No reference found at driver, assuming scmd(0x00000000f8eb2847) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: task abort: SUCCESS scmd(0x00000000f8eb2847) Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: attempting task abort!scmd(0x00000000bc8dfafc), outstanding for 70142 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: [sdd] tag#246 CDB: opcode=0x88 88 00 00 00 00 02 ef c6 e6 18 00 00 01 00 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:2: handle(0x000a), sas_address(0x4433221101000000), phy(1) Feb 14 21:54:14 Tower kernel: scsi target5:0:2: enclosure logical id(0x500605b001048b70), slot(2) Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: No reference found at driver, assuming scmd(0x00000000bc8dfafc) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: task abort: SUCCESS scmd(0x00000000bc8dfafc) Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: attempting task abort!scmd(0x0000000010fc6b1f), outstanding for 70142 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: [sdd] tag#162 CDB: opcode=0x88 88 00 00 00 00 02 ef c6 e5 18 00 00 01 00 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:2: handle(0x000a), sas_address(0x4433221101000000), phy(1) Feb 14 21:54:14 Tower kernel: scsi target5:0:2: enclosure logical id(0x500605b001048b70), slot(2) Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: No reference found at driver, assuming scmd(0x0000000010fc6b1f) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:2:0: task abort: SUCCESS scmd(0x0000000010fc6b1f) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: attempting task abort!scmd(0x000000009d34caf0), outstanding for 61718 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: [sdb] tag#163 CDB: opcode=0x88 88 00 00 00 00 00 00 01 c1 98 00 00 00 08 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:0: handle(0x000b), sas_address(0x4433221103000000), phy(3) Feb 14 21:54:14 Tower kernel: scsi target5:0:0: enclosure logical id(0x500605b001048b70), slot(0) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: No reference found at driver, assuming scmd(0x000000009d34caf0) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: task abort: SUCCESS scmd(0x000000009d34caf0) Feb 14 21:54:14 Tower kernel: sd 5:0:1:0: attempting task abort!scmd(0x00000000491b1367), outstanding for 61718 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:1:0: [sdc] tag#164 CDB: opcode=0x88 88 00 00 00 00 00 00 01 c1 98 00 00 00 08 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:1: handle(0x0009), sas_address(0x4433221100000000), phy(0) Feb 14 21:54:14 Tower kernel: scsi target5:0:1: enclosure logical id(0x500605b001048b70), slot(3) Feb 14 21:54:14 Tower kernel: sd 5:0:1:0: No reference found at driver, assuming scmd(0x00000000491b1367) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:1:0: task abort: SUCCESS scmd(0x00000000491b1367) Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: attempting task abort!scmd(0x00000000868fe54a), outstanding for 50176 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: [sdg] tag#170 CDB: opcode=0x88 88 00 00 00 00 03 2f 11 4d 68 00 00 00 08 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:5: handle(0x000e), sas_address(0x4433221107000000), phy(7) Feb 14 21:54:14 Tower kernel: scsi target5:0:5: enclosure logical id(0x500605b001048b70), slot(4) Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: No reference found at driver, assuming scmd(0x00000000868fe54a) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: task abort: SUCCESS scmd(0x00000000868fe54a) Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: attempting task abort!scmd(0x000000002aa71d1e), outstanding for 50176 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: [sdg] tag#169 CDB: opcode=0x88 88 00 00 00 00 01 d1 fb c5 c0 00 00 00 20 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:5: handle(0x000e), sas_address(0x4433221107000000), phy(7) Feb 14 21:54:14 Tower kernel: scsi target5:0:5: enclosure logical id(0x500605b001048b70), slot(4) Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: No reference found at driver, assuming scmd(0x000000002aa71d1e) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: task abort: SUCCESS scmd(0x000000002aa71d1e) Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: attempting task abort!scmd(0x000000008cf2aa2f), outstanding for 50176 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: [sdg] tag#168 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 c0 00 00 00 20 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:5: handle(0x000e), sas_address(0x4433221107000000), phy(7) Feb 14 21:54:14 Tower kernel: scsi target5:0:5: enclosure logical id(0x500605b001048b70), slot(4) Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: No reference found at driver, assuming scmd(0x000000008cf2aa2f) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:5:0: task abort: SUCCESS scmd(0x000000008cf2aa2f) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: attempting task abort!scmd(0x0000000049c7fb2c), outstanding for 50176 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: [sdb] tag#167 CDB: opcode=0x88 88 00 00 00 00 03 2f 11 4d 68 00 00 00 08 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:0: handle(0x000b), sas_address(0x4433221103000000), phy(3) Feb 14 21:54:14 Tower kernel: scsi target5:0:0: enclosure logical id(0x500605b001048b70), slot(0) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: No reference found at driver, assuming scmd(0x0000000049c7fb2c) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: task abort: SUCCESS scmd(0x0000000049c7fb2c) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: attempting task abort!scmd(0x000000008ffdff4a), outstanding for 50176 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: [sdb] tag#166 CDB: opcode=0x88 88 00 00 00 00 01 d1 fb c5 c0 00 00 00 20 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:0: handle(0x000b), sas_address(0x4433221103000000), phy(3) Feb 14 21:54:14 Tower kernel: scsi target5:0:0: enclosure logical id(0x500605b001048b70), slot(0) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: No reference found at driver, assuming scmd(0x000000008ffdff4a) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: task abort: SUCCESS scmd(0x000000008ffdff4a) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: attempting task abort!scmd(0x00000000025932ab), outstanding for 50176 ms & timeout 30000 ms Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: [sdb] tag#165 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 c0 00 00 00 20 00 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:0: handle(0x000b), sas_address(0x4433221103000000), phy(3) Feb 14 21:54:14 Tower kernel: scsi target5:0:0: enclosure logical id(0x500605b001048b70), slot(0) Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: No reference found at driver, assuming scmd(0x00000000025932ab) might have completed Feb 14 21:54:14 Tower kernel: sd 5:0:0:0: task abort: SUCCESS scmd(0x00000000025932ab) Feb 14 21:54:14 Tower kernel: sd 5:0:3:0: attempting device reset! scmd(0x000000004a167f29) Feb 14 21:54:14 Tower kernel: sd 5:0:3:0: [sde] tag#245 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Feb 14 21:54:14 Tower kernel: scsi target5:0:3: handle(0x000c), sas_address(0x4433221102000000), phy(2) Feb 14 21:54:14 Tower kernel: scsi target5:0:3: enclosure logical id(0x500605b001048b70), slot(1) Feb 14 21:54:44 Tower kernel: mpt2sas_cm0: In func: mpt3sas_scsih_issue_tm Feb 14 21:54:44 Tower kernel: mpt2sas_cm0: Command Timeout Quote Link to comment
JorgeB Posted February 15 Share Posted February 15 Full diags would be better but it's not logged as a disk problem. Quote Link to comment
Lignumaqua Posted February 15 Author Share Posted February 15 Thanks. I suspect it was a problem with the LSI card but rather more serious than the similar spin-down related messages which have been posted a number of times on this forum because it actually caused a major shutdown. I have recently rebuilt this server and this is a new (to me!) LSI card so it could be the cause. I think I'll disable spin-down for now and see what happens. Quote Link to comment
JonathanM Posted February 15 Share Posted February 15 47 minutes ago, Lignumaqua said: I have recently rebuilt this server and this is a new (to me!) LSI card Do you have airflow directed at the card's heatsink? Those cards require active cooling, they were designed to be in server cases that force air across all slots. Quote Link to comment
Lignumaqua Posted February 15 Author Share Posted February 15 That is an excellent question. I’ve used these cards before without issues but this is a new case so airflow will be different. There is a case ventilation extractor fan above it, and the case runs very cool overall, but I’m not specifically cooling the card. I’ll measure its temperature. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.