Jump to content

[SOLVED] New SAS 9207-8i and now errors in logfile - EDIT: Found an easier solution


Zonediver

Recommended Posts

Changed my both Adaptec 1430SA to a new LSI card with new cables and now i have errors in the log:

Sep 15 10:44:57 unraid kernel: print_req_error: I/O error, dev sdb, sector 7814036992
Sep 15 10:45:00 unraid kernel: print_req_error: I/O error, dev sdc, sector 7814036992
Sep 15 10:45:02 unraid kernel: print_req_error: I/O error, dev sdd, sector 7814036992
Sep 15 10:45:07 unraid kernel: print_req_error: I/O error, dev sde, sector 5860532992
Sep 15 10:45:10 unraid kernel: print_req_error: I/O error, dev sdf, sector 7814036992
Sep 15 10:45:13 unraid kernel: print_req_error: I/O error, dev sdg, sector 7814036992
Sep 15 10:45:18 unraid kernel: print_req_error: I/O error, dev sdh, sector 5860532992
Sep 15 10:45:23 unraid kernel: print_req_error: I/O error, dev sdi, sector 5860532992

All eight drives on the controller are affected but the system is working normal.

The errors only occur when I get the system out of sleep.

I have read there is something wrong with IDE/AHCI and/or sleep.

The question is now how to fix this or can this errors be ignored?

Any advice will be welcome.

Thanks for your help

Edited by Zonediver
Link to comment

I have an addition to the errors in the log here - maybe that makes things a bit more transparent.

Could this be a cable problem?

Cables are brand new...

Sep 21 20:22:07 unraid kernel: sd 1:0:0:0: Power-on or device reset occurred
Sep 21 20:22:09 unraid kernel: sd 1:0:0:0: [sdb] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:09 unraid kernel: sd 1:0:0:0: [sdb] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:09 unraid kernel: print_req_error: I/O error, dev sdb, sector 7814036992
Sep 21 20:22:09 unraid kernel: sd 1:0:1:0: Power-on or device reset occurred
Sep 21 20:22:11 unraid kernel: sd 1:0:1:0: [sdc] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:11 unraid kernel: sd 1:0:1:0: [sdc] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:11 unraid kernel: print_req_error: I/O error, dev sdc, sector 7814036992
Sep 21 20:22:11 unraid kernel: sd 1:0:2:0: Power-on or device reset occurred
Sep 21 20:22:13 unraid kernel: sd 1:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:13 unraid kernel: sd 1:0:2:0: [sdd] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:13 unraid kernel: print_req_error: I/O error, dev sdd, sector 7814036992
Sep 21 20:22:13 unraid kernel: sd 1:0:3:0: Power-on or device reset occurred
Sep 21 20:22:17 unraid kernel: sd 1:0:3:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:17 unraid kernel: sd 1:0:3:0: [sde] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 5d 50 a3 00 00 00 00 08 00 00
Sep 21 20:22:17 unraid kernel: print_req_error: I/O error, dev sde, sector 5860532992
Sep 21 20:22:17 unraid kernel: sd 1:0:4:0: Power-on or device reset occurred
Sep 21 20:22:20 unraid kernel: sd 1:0:4:0: [sdf] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:20 unraid kernel: sd 1:0:4:0: [sdf] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:20 unraid kernel: print_req_error: I/O error, dev sdf, sector 7814036992
Sep 21 20:22:20 unraid kernel: sd 1:0:5:0: Power-on or device reset occurred
Sep 21 20:22:22 unraid kernel: sd 1:0:5:0: [sdg] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:22 unraid kernel: sd 1:0:5:0: [sdg] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:22 unraid kernel: print_req_error: I/O error, dev sdg, sector 7814036992
Sep 21 20:22:22 unraid kernel: sd 1:0:6:0: Power-on or device reset occurred
Sep 21 20:22:26 unraid kernel: sd 1:0:6:0: [sdh] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:26 unraid kernel: sd 1:0:6:0: [sdh] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 5d 50 a3 00 00 00 00 08 00 00
Sep 21 20:22:26 unraid kernel: print_req_error: I/O error, dev sdh, sector 5860532992
Sep 21 20:22:26 unraid kernel: sd 1:0:7:0: Power-on or device reset occurred
Sep 21 20:22:32 unraid kernel: sd 1:0:7:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:32 unraid kernel: sd 1:0:7:0: [sdi] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 5d 50 a3 00 00 00 00 08 00 00
Sep 21 20:22:32 unraid kernel: print_req_error: I/O error, dev sdi, sector 5860532992

 

Edited by Zonediver
Link to comment
6 minutes ago, itimpi said:

Since this is occurring on multiple drives I would not think it is a SATA cabling problem but it might be on the power side.

I have already considered that - unfortunately I have no spare power supply at hand to test it. The built-in power supply unfortunately has more than "one" 12V rail - so it could be that the power supply is to blame...

Edited by Zonediver
Link to comment
25 minutes ago, itimpi said:

The OP error was related to waking from sleep  - is yours?   Also is the system working OK other than logging these errors?

Yes - The error appears exactly once after waking up on all 8 connected disk. The server works otherwise perfectly.

But I'm not sure if I should ignore that...

Edited by Zonediver
Link to comment
3 hours ago, Zonediver said:

Yes - The error appears exactly once after waking up on all 8 connected disk. The server works otherwise perfectly.

But I'm not sure if I should ignore that...

I looked up that Power Supply and it has THREE +12v busses of 25 Amperes!   I would not be surprised if it was not having problems spinning up all of your hard drives at once!!!  You need to research that PS and see exactly what each of those +12v busses is assigned to supply.  Plus, the total +12v current rating is 58 amperes. 

Edited by Frank1940
Link to comment
1 hour ago, Frank1940 said:

I looked up that Power Supply and it has THREE +12v busses of 25 Amperes!   I would not be surprised if it was not having problems spinning up all of your hard drives at once!!!  You need to research that PS and see exactly what each of those +12v busses is assigned to supply.  Plus, the total +12v current rating is 58 amperes. 

I dont think its a problem - measured max. power input (during boot) is 220W for 2sec (whole Server) - so it cant be that the spinup of "all" disks will exceed the max. powerrange auf 696W on the 12V-rails...

The disks are grouped into 4 blocks (powercables) at 4 disks each, the last block has only 3 disk.

The 13 WD-RED's are rated with 1,75A/12V max. so we have a max. of ~ 24A on the 12V-rails (288W) or 6A on each 12V-output-connector of the PSU.

This is working since 2010 - why should it be a problem now? I swapped only the both Adaptecs against one LSI-controller - powerconsumption is the same (~9,5W).

Biggest problem: Find a PSU with 16 SATA-Powerconnectors... 😉

Edited by Zonediver
Link to comment
20 minutes ago, Zonediver said:

dont think its a problem - measured max. power input (during boot) is 217W for 2sec - so it cant be that the spinup of "all" disks will exceed the max. powerrange auf 696W on the 12V-rails...

It is extremely difficult to measure inrush current required by a HD unless you are using a oscilloscope setup to measure the actual current waveform directly on the +12v buss.  It only lasts a few hundred milliseconds.  I suggested that you investigate this situation and you are convinced that it is not a problem.  By the way, most PS's use electronic overcurrent protection on each buss and the delay before the trip protection activates is measured in milliseconds!  And, is the trip point set at 25.1A or 30A?  Al interesting things to consider...

Link to comment
14 minutes ago, Frank1940 said:

It is extremely difficult to measure inrush current required by a HD unless you are using a oscilloscope setup to measure the actual current waveform directly on the +12v buss.  It only lasts a few hundred milliseconds.  I suggested that you investigate this situation and you are convinced that it is not a problem.  By the way, most PS's use electronic overcurrent protection on each buss and the delay before the trip protection activates is measured in milliseconds!  And, is the trip point set at 25.1A or 30A?  Al interesting things to consider...

I found a test for the Enermax 700W powersupply (in german) and it says:

OCP:

3,3V ... 30A

5V ... 46A

12V1 ... 40A

12V2 ... 41A

12V3 ... 38A

Voltage from 5% to 110% Load: 12,17V - 11,99V

This should be ok for a short phase (powerup/spinup).

I would exclude the power supply - but you never know ... 😉

Edited by Zonediver
Link to comment
On 9/22/2019 at 6:22 PM, Squid said:

Not necessarily the problem, but I wouldn't be surprised if LSI based controllers do not support sleep (consistently). They are after all designed for servers and it's not a common situation for a server to go to sleep

Of course, I am aware of that and perfectly clear. If this error is not recoverable, I will live with it.

The sleep-function is essential for me - a 24/7 running server isn't an option...

Edited by Zonediver
Link to comment

Guys, I found the solution to this problem...

Long story short: Its a "logical" problem.

 

Description:

The hard disks have two operating states: "active/idle" and "standby".

For a successful sleep, the hard drives must be in standby.

But when the server is taken out of sleep, all disks have the state "active/idle".

This mode prevents the server from going to sleep next time (The "wait until array inactive" setting must be enabled in the sleep plugin).

The server will never go back to sleep.

 

The sleep plugin has a setting to implement custom commands after wakup.

I got this custom Command from bonienl to set all HDDs to "standby":

hdparm -y $(ls /dev/sd*|grep '[a-z]$') >/dev/null 2>&1

This setting seems to interfere the initialization process - maybe the SAS 9207 is not fast enough or the command is executed too soon.

And that's the reason for the errors in the logfile.

 

Solution:

I removed this command from the sleep plugin custom command setting and execute it now every 15min over an other plugin "user scripts" and a cron setting.

The controller now has enough time to correctly initialize the hard drives after a wakup and the errors are gone - and done 😉

 

Note: This custom command does not affect running hard disks!

Edited by Zonediver
Link to comment

EDIT: I found a better and easier solution...

Open the sleep-plugin settings and put the following in (under "Custom Commands after wakeup"):

sleep 120;
hdparm -y $(ls /dev/sd*|grep "[a-z]$") >/dev/null 2>&1

This will do the standby command for all HDDs "120sec after the wakup only once" - and this is exactly what i want to do.

Important: Dont forget the semicolon 😉

Edited by Zonediver
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...