SSD cache drive errors - problem? [SOLVED]

nraygun · August 30, 2019

I had my 1TB SSD cache drive on my flashed H200 and started seeing errors such as this:

Aug 1 06:00:50 server kernel: print_req_error: critical target error, dev sdc, sector 1996910981

So as someone recommended, I popped it out off of the backplane and installed it into a PCI card: https://www.amazon.com/gp/product/B01452SP1O/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

I thought all was well, but recently saw this:

Aug 29 02:00:09 server kernel: print_req_error: I/O error, dev sdc, sector 979775864

I ran Check Filesystem Status check and it didn't seem to indicate an error. I then ran a SMART extended test and it showed "Completed without error".

Do I need to be concerned? Do I need to replace the SSD? It's relatively new (purchased 5/2019).

Or is everything OK?

[SOLVED]

Added a script to wake up the unassigned backup drive 15 minutes before the backup script runs. The script is as follows. The "Default spin down delay" setting in "Drive Settings" does not appear to work on unassigned disks.

#!/bin/bash
dd if=/dev/sdc bs=4096 count=1 of=/dev/null iflag=direct

Edited November 20, 2019 by nraygun

JorgeB · August 30, 2019

Please post the diagnostics: Tools -> Diagnostics

nraygun · August 30, 2019

Here you go. Thanks!

flores-diagnostics-20190830-1253.zip

JorgeB · August 30, 2019

sdc is the 2TB WD unassigned disk, errors appear to be happening when Unraid is trying to spin it down, should be harmless, but you can try disabling spin down for that disk or connecting it to a different controller if possible.

nraygun · August 30, 2019

Doh! I'm embarrassed - I thought it was the cache drive. Maybe things moved around when I moved the drives around?

Not sure I even needed to put the SSD on a PCI card. Oh well. At least now I have an extra drive bay for more storage.

The sdc drive is my backup drive for borg. It's old and probably needs to be replaced.

Thanks for taking a look!

Edited August 30, 2019 by nraygun

nraygun · October 30, 2019

So I replaced the drive in question and I still get the error. It looks like error happens right at the start of my backup script at 1am.

I'm going to try replacing the cable next.

I don't think I can turn off spin-down since the drive is not part of the array.

Any other ideas?

Oct 30 01:00:05 server kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 
Oct 30 01:00:05 server kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3a 76 e8 68 00 00 08 00 
Oct 30 01:00:05 server kernel: print_req_error: I/O error, dev sdc, sector 980871272

nraygun · November 7, 2019

I replaced the cable and I got a few days with no errors but today I got the same error.

Not sure what to do, if anything. Ignore it?

Nov 7 01:00:24 server kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 
Nov 7 01:00:24 server kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3a 76 ea 78 00 00 08 00 
Nov 7 01:00:24 server kernel: print_req_error: I/O error, dev sdc, sector 980871800

JorgeB · November 7, 2019

did you try:

On 8/30/2019 at 2:40 PM, johnnie.black said:

but you can try disabling spin down for that disk or connecting it to a different controller if possible.

nraygun · November 7, 2019

Not sure I know how. The disk in question is unassigned so I can't seem to find where to disable spin down for it.

Can you tell me where it would be for an unassigned disk?

Would the setting Settings/DiskSettings/Default spin down delay affect ALL disks (whether in the array or not)?

Edited November 7, 2019 by nraygun

JorgeB · November 7, 2019

2 minutes ago, nraygun said:

Would the setting Settings/DiskSettings/Default spin down delay affect ALL disks (whether in the array or not)?

I believe yes but not sure.

nraygun · November 7, 2019

I've had this out there but nobody has answered yet.

Guess I'll just try it and see what happens tomorrow morning when the backup script runs that uses this unassigned disk.

Thanks for the help!

nraygun · November 8, 2019

So far, so good. No errors this morning. But then again, I didn't get errors the first few days after changing out the cable.

Will continue to monitor.

nraygun · November 10, 2019

Went just a couple of days without error, then I got this today:

Nov 10 01:00:05 flores kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Nov 10 01:00:05 flores kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3a 76 f4 90 00 00 08 00
Nov 10 01:00:05 flores kernel: print_req_error: I/O error, dev sdc, sector 980874384

Any other suggestions? Maybe power supply? This is running in a Dell R710 with only one of the power supplies installed, maybe I should try installing the other one too?

nraygun · November 12, 2019

I installed the other power supply and this morning I see the error happened, but not at the start of the script.

Nov 12 02:00:46 server kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Nov 12 02:00:46 server kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3b 76 b4 40 00 00 08 00
Nov 12 02:00:46 server kernel: print_req_error: I/O error, dev sdc, sector 997635136

Does anyone have any suggestions? This is the backup drive and if I get errors on it, I'd be concerned that I don't have a good backup at any given moment.

JorgeB · November 12, 2019

If possible try a different controller and/or disk

nraygun · November 12, 2019

The disk I'm using is brand new. I bought it recently because I thought the older drive I was using was the problem.

I still get the same problem on this new drive.

If I change controllers, will I have to rebuild the array? I'm currently using a flashed H200 controller.

Any suggestions on a replacement? It's in the special internal slot in the R710, not in the regular PCI slots.

Edited November 12, 2019 by nraygun

JorgeB · November 12, 2019

30 minutes ago, nraygun said:

If I change controllers, will I have to rebuild the array?

No, as along as it's an HBA, but if it's an LSI HBA it's already a recommended controller, I still think it might be spin down related.

Doesn't the server have any onboard SATA ports?

nraygun · November 12, 2019

I don't believe the Dell R710 has onboard SATA ports.

I was thinking it was spin down too, but I turned off spin down in Disk Settings.

And now that I look at the logs again, there was another occurrence of the issue at ~11am in addition to the one at ~2am. It's getting worse? This is after I installed the second power supply. I think this was during a "borg check" command.

Nov 12 11:02:11 server kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Nov 12 11:02:11 server kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 01 0d b8 00 00 08 00
Nov 12 11:02:11 server kernel: print_req_error: I/O error, dev sdc, sector 69048

I see that I have an older version of Borg. I'll try updating it to the latest 1.1.10.

nraygun · November 13, 2019

The newer version of Borg had no effect. I still got an error at the start of the script.

I'll try adding a script to do an "ls -R" before the script runs to see if I can wake up the drive.

Edited November 14, 2019 by nraygun

nraygun · November 14, 2019

I see that my script ran 15 minutes before the backup script.

Unfortunately, I still get the error.

Next, I'll try reseating the controller in the internal slot. I'll clean off the contacts too with isopropyl.

Edited November 14, 2019 by nraygun

nraygun · November 14, 2019

So wait a minute.

I just went into the Main area of unRaid to check the filesystem of this unassigned drive.

The drive was spun down!

I tried to spin it up in the test menu, but it seemed to not want to spin up. I hit the spin up button a few times and it finally woke up. I got no errors in the log during this event.

I have the "Default spin down delay" set to Never in Disk Settings.

Maybe I need to add a short write sequence to my "wake up" script? Any suggestions?

nraygun · November 14, 2019

I'll try this script at 12:45am just before the 1am start of the backup script:

#!/bin/bash
ls -R /mnt/disks/backup1/
dd if=/dev/sdc bs=4096 count=1 of=/dev/null iflag=direct

Crossing fingers...

nraygun · November 14, 2019

Looks like the unassigned drive spins down after an hour regardless of settings in Disk Settings.

Once spun down, I ran the ls -R. It didn't wake up.

But it did wake up with the dd command. I found it somewhere and the "iflag=direct" avoids the use of the cache and goes directly to the hardware.

I changed the script just to have the dd command only.

Crossing fingers again to see what happens in the morning.

nraygun · November 15, 2019

The script didn't appear to run. And it also seemed like the edit to the script didn't take yesterday.

So I deleted the script and added a new one with only the dd command and scheduled it for 12:45am before the 1am backup script runs.

Will have to wait until tomorrow to see if all this ran and the effect on the issue.

Crossing fingers again...

nraygun · November 16, 2019

I still got the error.

But I'm not sure of the sequence of the scripts.

The wakeup script says it ran on Friday at 12:45am and the backup script says it ran on Saturday at 1am.

Shouldn't they both say they ran on Saturday?

I changed the start time of the scripts to be 1am for the wakeup and 1:15am for the backup. I expect them both to say they ran on Sunday tomorrow morning.

Stay tuned.

Edited November 16, 2019 by nraygun

SSD cache drive errors - problem? [SOLVED]

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation