[SOLVED] New SAS 9207-8i and now errors in logfile - EDIT: Found an easier solution

Zonediver · September 15, 2019

Changed my both Adaptec 1430SA to a new LSI card with new cables and now i have errors in the log:

Sep 15 10:44:57 unraid kernel: print_req_error: I/O error, dev sdb, sector 7814036992
Sep 15 10:45:00 unraid kernel: print_req_error: I/O error, dev sdc, sector 7814036992
Sep 15 10:45:02 unraid kernel: print_req_error: I/O error, dev sdd, sector 7814036992
Sep 15 10:45:07 unraid kernel: print_req_error: I/O error, dev sde, sector 5860532992
Sep 15 10:45:10 unraid kernel: print_req_error: I/O error, dev sdf, sector 7814036992
Sep 15 10:45:13 unraid kernel: print_req_error: I/O error, dev sdg, sector 7814036992
Sep 15 10:45:18 unraid kernel: print_req_error: I/O error, dev sdh, sector 5860532992
Sep 15 10:45:23 unraid kernel: print_req_error: I/O error, dev sdi, sector 5860532992

All eight drives on the controller are affected but the system is working normal.

The errors only occur when I get the system out of sleep.

I have read there is something wrong with IDE/AHCI and/or sleep.

The question is now how to fix this or can this errors be ignored?

Any advice will be welcome.

Thanks for your help

Edited October 1, 2019 by Zonediver

Zonediver · September 18, 2019

...just a "little" bump to this...

95 views and no hint? At least one from an expert? 😉

trott · September 18, 2019

I have the same issue, my fix is don't let the disks spin down

Zonediver · September 18, 2019

5 minutes ago, trott said:

I have the same issue, my fix is don't let the disks spin down

Thanks but this isn't an option 😉

As i mentioned, all is working fine - just the error in the log... i think i will ignore it...

Edited September 18, 2019 by Zonediver

Zonediver · September 22, 2019

I have an addition to the errors in the log here - maybe that makes things a bit more transparent.

Could this be a cable problem?

Cables are brand new...

Sep 21 20:22:07 unraid kernel: sd 1:0:0:0: Power-on or device reset occurred
Sep 21 20:22:09 unraid kernel: sd 1:0:0:0: [sdb] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:09 unraid kernel: sd 1:0:0:0: [sdb] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:09 unraid kernel: print_req_error: I/O error, dev sdb, sector 7814036992
Sep 21 20:22:09 unraid kernel: sd 1:0:1:0: Power-on or device reset occurred
Sep 21 20:22:11 unraid kernel: sd 1:0:1:0: [sdc] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:11 unraid kernel: sd 1:0:1:0: [sdc] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:11 unraid kernel: print_req_error: I/O error, dev sdc, sector 7814036992
Sep 21 20:22:11 unraid kernel: sd 1:0:2:0: Power-on or device reset occurred
Sep 21 20:22:13 unraid kernel: sd 1:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:13 unraid kernel: sd 1:0:2:0: [sdd] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:13 unraid kernel: print_req_error: I/O error, dev sdd, sector 7814036992
Sep 21 20:22:13 unraid kernel: sd 1:0:3:0: Power-on or device reset occurred
Sep 21 20:22:17 unraid kernel: sd 1:0:3:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:17 unraid kernel: sd 1:0:3:0: [sde] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 5d 50 a3 00 00 00 00 08 00 00
Sep 21 20:22:17 unraid kernel: print_req_error: I/O error, dev sde, sector 5860532992
Sep 21 20:22:17 unraid kernel: sd 1:0:4:0: Power-on or device reset occurred
Sep 21 20:22:20 unraid kernel: sd 1:0:4:0: [sdf] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:20 unraid kernel: sd 1:0:4:0: [sdf] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:20 unraid kernel: print_req_error: I/O error, dev sdf, sector 7814036992
Sep 21 20:22:20 unraid kernel: sd 1:0:5:0: Power-on or device reset occurred
Sep 21 20:22:22 unraid kernel: sd 1:0:5:0: [sdg] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:22 unraid kernel: sd 1:0:5:0: [sdg] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Sep 21 20:22:22 unraid kernel: print_req_error: I/O error, dev sdg, sector 7814036992
Sep 21 20:22:22 unraid kernel: sd 1:0:6:0: Power-on or device reset occurred
Sep 21 20:22:26 unraid kernel: sd 1:0:6:0: [sdh] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:26 unraid kernel: sd 1:0:6:0: [sdh] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 5d 50 a3 00 00 00 00 08 00 00
Sep 21 20:22:26 unraid kernel: print_req_error: I/O error, dev sdh, sector 5860532992
Sep 21 20:22:26 unraid kernel: sd 1:0:7:0: Power-on or device reset occurred
Sep 21 20:22:32 unraid kernel: sd 1:0:7:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep 21 20:22:32 unraid kernel: sd 1:0:7:0: [sdi] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 5d 50 a3 00 00 00 00 08 00 00
Sep 21 20:22:32 unraid kernel: print_req_error: I/O error, dev sdi, sector 5860532992

Edited September 22, 2019 by Zonediver

itimpi · September 22, 2019

Since this is occurring on multiple drives I would not think it is a SATA cabling problem but it might be on the power side.

Zonediver · September 22, 2019

6 minutes ago, itimpi said:

Since this is occurring on multiple drives I would not think it is a SATA cabling problem but it might be on the power side.

I have already considered that - unfortunately I have no spare power supply at hand to test it. The built-in power supply unfortunately has more than "one" 12V rail - so it could be that the power supply is to blame...

Edited September 22, 2019 by Zonediver

itimpi · September 22, 2019

The OP error was related to waking from sleep - is yours? Also is the system working OK other than logging these errors?

Zonediver · September 22, 2019

25 minutes ago, itimpi said:

The OP error was related to waking from sleep - is yours? Also is the system working OK other than logging these errors?

Yes - The error appears exactly once after waking up on all 8 connected disk. The server works otherwise perfectly.

But I'm not sure if I should ignore that...

Edited September 22, 2019 by Zonediver

Frank1940 · September 22, 2019

3 hours ago, Zonediver said:

Yes - The error appears exactly once after waking up on all 8 connected disk. The server works otherwise perfectly.

But I'm not sure if I should ignore that...

I looked up that Power Supply and it has THREE +12v busses of 25 Amperes! I would not be surprised if it was not having problems spinning up all of your hard drives at once!!! You need to research that PS and see exactly what each of those +12v busses is assigned to supply. Plus, the total +12v current rating is 58 amperes.

Edited September 22, 2019 by Frank1940

Zonediver · September 22, 2019

1 hour ago, Frank1940 said:

I looked up that Power Supply and it has THREE +12v busses of 25 Amperes! I would not be surprised if it was not having problems spinning up all of your hard drives at once!!! You need to research that PS and see exactly what each of those +12v busses is assigned to supply. Plus, the total +12v current rating is 58 amperes.

I dont think its a problem - measured max. power input (during boot) is 220W for 2sec (whole Server) - so it cant be that the spinup of "all" disks will exceed the max. powerrange auf 696W on the 12V-rails...

The disks are grouped into 4 blocks (powercables) at 4 disks each, the last block has only 3 disk.

The 13 WD-RED's are rated with 1,75A/12V max. so we have a max. of ~ 24A on the 12V-rails (288W) or 6A on each 12V-output-connector of the PSU.

This is working since 2010 - why should it be a problem now? I swapped only the both Adaptecs against one LSI-controller - powerconsumption is the same (~9,5W).

Biggest problem: Find a PSU with 16 SATA-Powerconnectors... 😉

Edited September 22, 2019 by Zonediver

Frank1940 · September 22, 2019

20 minutes ago, Zonediver said:

dont think its a problem - measured max. power input (during boot) is 217W for 2sec - so it cant be that the spinup of "all" disks will exceed the max. powerrange auf 696W on the 12V-rails...

It is extremely difficult to measure inrush current required by a HD unless you are using a oscilloscope setup to measure the actual current waveform directly on the +12v buss. It only lasts a few hundred milliseconds. I suggested that you investigate this situation and you are convinced that it is not a problem. By the way, most PS's use electronic overcurrent protection on each buss and the delay before the trip protection activates is measured in milliseconds! And, is the trip point set at 25.1A or 30A? Al interesting things to consider...

Zonediver · September 22, 2019

14 minutes ago, Frank1940 said:

It is extremely difficult to measure inrush current required by a HD unless you are using a oscilloscope setup to measure the actual current waveform directly on the +12v buss. It only lasts a few hundred milliseconds. I suggested that you investigate this situation and you are convinced that it is not a problem. By the way, most PS's use electronic overcurrent protection on each buss and the delay before the trip protection activates is measured in milliseconds! And, is the trip point set at 25.1A or 30A? Al interesting things to consider...

I found a test for the Enermax 700W powersupply (in german) and it says:

OCP:

3,3V ... 30A

5V ... 46A

12V1 ... 40A

12V2 ... 41A

12V3 ... 38A

Voltage from 5% to 110% Load: 12,17V - 11,99V

This should be ok for a short phase (powerup/spinup).

I would exclude the power supply - but you never know ... 😉

Edited September 22, 2019 by Zonediver

Squid · September 22, 2019

Not necessarily the problem, but I wouldn't be surprised if LSI based controllers do not support sleep (consistently). They are after all designed for servers and it's not a common situation for a server to go to sleep

Sent from my NSA monitored device

Zonediver · September 22, 2019

On 9/22/2019 at 6:22 PM, Squid said:

Not necessarily the problem, but I wouldn't be surprised if LSI based controllers do not support sleep (consistently). They are after all designed for servers and it's not a common situation for a server to go to sleep

Of course, I am aware of that and perfectly clear. If this error is not recoverable, I will live with it.

The sleep-function is essential for me - a 24/7 running server isn't an option...

Edited October 1, 2019 by Zonediver

Zonediver · September 25, 2019

Guys, I found the solution to this problem...

Long story short: Its a "logical" problem.

Description:

The hard disks have two operating states: "active/idle" and "standby".

For a successful sleep, the hard drives must be in standby.

But when the server is taken out of sleep, all disks have the state "active/idle".

This mode prevents the server from going to sleep next time (The "wait until array inactive" setting must be enabled in the sleep plugin).

The server will never go back to sleep.

The sleep plugin has a setting to implement custom commands after wakup.

I got this custom Command from bonienl to set all HDDs to "standby":

hdparm -y $(ls /dev/sd*|grep '[a-z]$') >/dev/null 2>&1

This setting seems to interfere the initialization process - maybe the SAS 9207 is not fast enough or the command is executed too soon.

And that's the reason for the errors in the logfile.

Solution:

I removed this command from the sleep plugin custom command setting and execute it now every 15min over an other plugin "user scripts" and a cron setting.

The controller now has enough time to correctly initialize the hard drives after a wakup and the errors are gone - and done 😉

Note: This custom command does not affect running hard disks!

Edited October 1, 2019 by Zonediver

Zonediver · October 1, 2019

EDIT: I found a better and easier solution...

Open the sleep-plugin settings and put the following in (under "Custom Commands after wakeup"):

sleep 120;
hdparm -y $(ls /dev/sd*|grep "[a-z]$") >/dev/null 2>&1

This will do the standby command for all HDDs "120sec after the wakup only once" - and this is exactly what i want to do.

Important: Dont forget the semicolon 😉

Edited October 1, 2019 by Zonediver

[SOLVED] New SAS 9207-8i and now errors in logfile - EDIT: Found an easier solution

Recommended Posts

Zonediver

Link to comment

Zonediver

Link to comment

trott

Link to comment

Zonediver

Link to comment

Zonediver

Link to comment

itimpi

Link to comment

Zonediver

Link to comment

itimpi

Link to comment

Zonediver

Link to comment

Frank1940

Link to comment

Zonediver

Link to comment

Frank1940

Link to comment

Zonediver

Link to comment

Squid

Link to comment

Zonediver

Link to comment

Zonediver

Link to comment

Zonediver

Link to comment

Join the conversation