SSD cache drive errors - problem? [SOLVED]


nraygun

Recommended Posts

I had my 1TB SSD cache drive on my flashed H200 and started seeing errors such as this:

Aug 1 06:00:50 server kernel: print_req_error: critical target error, dev sdc, sector 1996910981

So as someone recommended, I popped it out off of the backplane and installed it into a PCI card: https://www.amazon.com/gp/product/B01452SP1O/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

 

I thought all was well, but recently saw this:

Aug 29 02:00:09 server kernel: print_req_error: I/O error, dev sdc, sector 979775864

I ran Check Filesystem Status check and it didn't seem to indicate an error. I then ran a SMART extended test and it showed "Completed without error".

 

Do I need to be concerned? Do I need to replace the SSD? It's relatively new (purchased 5/2019).

Or is everything OK?

 

[SOLVED]

Added a script to wake up the unassigned backup drive 15 minutes before the backup script runs. The script is as follows. The "Default spin down delay" setting in "Drive Settings" does not appear to work on unassigned disks.

#!/bin/bash
dd if=/dev/sdc bs=4096 count=1 of=/dev/null iflag=direct

 

Edited by nraygun
Link to comment

Doh! I'm embarrassed - I thought it was the cache drive. Maybe things moved around when I moved the drives around?

Not sure I even needed to put the SSD on a PCI card. Oh well. At least now I have an extra drive bay for more storage.

The sdc drive is my backup drive for borg. It's old and probably needs to be replaced.

Thanks for taking a look!

Edited by nraygun
Link to comment
  • 1 month later...

So I replaced the drive in question and I still get the error. It looks like error happens right at the start of my backup script at 1am.

I'm going to try replacing the cable next.

I don't think I can turn off spin-down since the drive is not part of the array.

Any other ideas? 

Oct 30 01:00:05 server kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 
Oct 30 01:00:05 server kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3a 76 e8 68 00 00 08 00 
Oct 30 01:00:05 server kernel: print_req_error: I/O error, dev sdc, sector 980871272

 

Link to comment
  • 2 weeks later...

I replaced the cable and I got a few days with no errors but today I got the same error.

Not sure what to do, if anything. Ignore it?

Nov 7 01:00:24 server kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 
Nov 7 01:00:24 server kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3a 76 ea 78 00 00 08 00 
Nov 7 01:00:24 server kernel: print_req_error: I/O error, dev sdc, sector 980871800

 

Link to comment

Not sure I know how. The disk in question is unassigned so I can't seem to find where to disable spin down for it.

Can you tell me where it would be for an unassigned disk?

Would the setting Settings/DiskSettings/Default spin down delay affect ALL disks (whether in the array or not)?

Edited by nraygun
Link to comment

Went just a couple of days without error, then I got this today:

Nov 10 01:00:05 flores kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Nov 10 01:00:05 flores kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3a 76 f4 90 00 00 08 00
Nov 10 01:00:05 flores kernel: print_req_error: I/O error, dev sdc, sector 980874384

Any other suggestions? Maybe power supply? This is running in a Dell R710 with only one of the power supplies installed, maybe I should try installing the other one too?

Link to comment

I installed the other power supply and this morning I see the error happened, but not at the start of the script.

Nov 12 02:00:46 server kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Nov 12 02:00:46 server kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3b 76 b4 40 00 00 08 00
Nov 12 02:00:46 server kernel: print_req_error: I/O error, dev sdc, sector 997635136

Does anyone have any suggestions? This is the backup drive and if I get errors on it, I'd be concerned that I don't have a good backup at any given moment.

Link to comment

The disk I'm using is brand new. I bought it recently because I thought the older drive I was using was the problem.

I still get the same problem on this new drive.

If I change controllers, will I have to rebuild the array? I'm currently using a flashed H200 controller.

Any suggestions on a replacement? It's in the special internal slot in the R710, not in the regular PCI slots.

Edited by nraygun
Link to comment

I don't believe the Dell R710 has onboard SATA ports.

I was thinking it was spin down too, but I turned off spin down in Disk Settings.

And now that I look at the logs again, there was another occurrence of the issue at ~11am in addition to the one at ~2am. It's getting worse? This is after I installed the second power supply. I think this was during a "borg check" command.

Nov 12 11:02:11 server kernel: sd 3:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Nov 12 11:02:11 server kernel: sd 3:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 01 0d b8 00 00 08 00
Nov 12 11:02:11 server kernel: print_req_error: I/O error, dev sdc, sector 69048

I see that I have an older version of Borg. I'll try updating it to the latest 1.1.10.

Link to comment

So wait a minute.

I just went into the Main area of unRaid to check the filesystem of this unassigned drive.

The drive was spun down!

I tried to spin it up in the test menu, but it seemed to not want to spin up. I hit the spin up button a few times and it finally woke up. I got no errors in the log during this event.

I have the "Default spin down delay" set to Never in Disk Settings.

Maybe I need to add a short write sequence to my "wake up" script? Any suggestions?

Link to comment

Looks like the unassigned drive spins down after an hour regardless of settings in Disk Settings.

Once spun down, I ran the ls -R. It didn't wake up.

But it did wake up with the dd command. I found it somewhere and the "iflag=direct" avoids the use of the cache and goes directly to the hardware.

I changed the script just to have the dd command only.

Crossing fingers again to see what happens in the morning.

Link to comment

The script didn't appear to run. And it also seemed like the edit to the script didn't take yesterday.

So I deleted the script and added a new one with only the dd command and scheduled it for 12:45am before the 1am backup script runs.

Will have to wait until tomorrow to see if all this ran and the effect on the issue.

Crossing fingers again...

Link to comment

I still got the error.

But I'm not sure of the sequence of the scripts. 

The wakeup script says it ran on Friday at 12:45am and the backup script says it ran on Saturday at 1am.

Shouldn't they both say they ran on Saturday?

I changed the start time of the scripts to be 1am for the wakeup and 1:15am for the backup. I expect them both to say they ran on Sunday tomorrow morning.

Stay tuned.

Edited by nraygun
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.