Formatting Cache Drive Failed


Matt4982

Recommended Posts

Hey guys, just isntalled an ssd in my unraid setup. I took the drive from a working laptop. When I try to format it either through unraid or the unassigned devices plugin, it fails to do so. Here's a copy of the log coming up.

 

Feb 15 11:45:39 Unraid kernel: ata14.00: status: { DRDY ERR }
Feb 15 11:45:39 Unraid kernel: ata14.00: error: { ABRT }
Feb 15 11:45:39 Unraid kernel: ata14.00: supports DRM functions and may not be fully accessible
Feb 15 11:45:39 Unraid kernel: ata14.00: NCQ Send/Recv Log not supported
Feb 15 11:45:39 Unraid kernel: ata14.00: supports DRM functions and may not be fully accessible
Feb 15 11:45:39 Unraid kernel: ata14.00: NCQ Send/Recv Log not supported
Feb 15 11:45:39 Unraid kernel: ata14.00: configured for UDMA/133
Feb 15 11:45:39 Unraid kernel: ata14: EH complete
Feb 15 11:45:39 Unraid kernel: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 15 11:45:39 Unraid kernel: ata14.00: irq_stat 0x40000001
Feb 15 11:45:39 Unraid kernel: ata14.00: failed command: READ DMA
Feb 15 11:45:39 Unraid kernel: ata14.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 10 dma 4096 in
Feb 15 11:45:39 Unraid kernel: res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
Feb 15 11:45:39 Unraid kernel: ata14.00: status: { DRDY ERR }
Feb 15 11:45:39 Unraid kernel: ata14.00: error: { ABRT }
Feb 15 11:45:39 Unraid kernel: ata14.00: supports DRM functions and may not be fully accessible
Feb 15 11:45:39 Unraid kernel: ata14.00: NCQ Send/Recv Log not supported
Feb 15 11:45:39 Unraid kernel: ata14.00: supports DRM functions and may not be fully accessible
Feb 15 11:45:39 Unraid kernel: ata14.00: NCQ Send/Recv Log not supported
Feb 15 11:45:39 Unraid kernel: ata14.00: configured for UDMA/133
Feb 15 11:45:39 Unraid kernel: ata14: EH complete
Feb 15 11:45:39 Unraid kernel: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 15 11:45:39 Unraid kernel: ata14.00: irq_stat 0x40000001
Feb 15 11:45:39 Unraid kernel: ata14.00: failed command: READ DMA
Feb 15 11:45:39 Unraid kernel: ata14.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 11 dma 4096 in
Feb 15 11:45:39 Unraid kernel: res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
Feb 15 11:45:39 Unraid kernel: ata14.00: status: { DRDY ERR }
Feb 15 11:45:39 Unraid kernel: ata14.00: error: { ABRT }
Feb 15 11:45:39 Unraid kernel: ata14.00: supports DRM functions and may not be fully accessible
Feb 15 11:45:39 Unraid kernel: ata14.00: NCQ Send/Recv Log not supported
Feb 15 11:45:39 Unraid kernel: ata14.00: supports DRM functions and may not be fully accessible
Feb 15 11:45:39 Unraid kernel: ata14.00: NCQ Send/Recv Log not supported
Feb 15 11:45:39 Unraid kernel: ata14.00: configured for UDMA/133
Feb 15 11:45:39 Unraid kernel: ata14: EH complete
Feb 15 11:45:39 Unraid kernel: sdb: unable to read partition table
Feb 15 11:45:39 Unraid unassigned.devices: Reload partition table result: BLKRRPART failed: Input/output error /dev/sdb: re-reading partition table
Feb 15 11:45:39 Unraid unassigned.devices: Formatting disk '/dev/sdb' with 'xfs' filesystem.
Feb 15 11:45:39 Unraid unassigned.devices: Format disk '/dev/sdb' with 'xfs' filesystem failed! Result: 

Link to comment
4 minutes ago, johnnie.black said:

You're using a controller with port multipliers, they're prone to issues, looks like there are no onboard SATA ports? If so it should work fine on a cheap 2 port Asmedia controller.

 

It will also work on the HBA, but trim won't work.

 

 

Do you have a suggestion for a card without port multipliers? I guess I'm not entirely sure what would be the difference. 

 

Unfortunately with the 12bay R510, the internal SATA ports are disabled and you'd have to splice to get power, so that's why I was looking at pci-e cards that did both data and power.

Edited by Matt4982
Added Amazon links
Link to comment
3 minutes ago, Matt4982 said:

These are the two cards I've tried.

The diagnostics were with one of those?

 

Weird as it is reported on the syslog as a 10 port SATA controller, but it's likely the problem, but sorry can't recommend a known working one of that type, since they are not commonly used with Unraid.

 

You could always get an NVMe device, those would work without issues, but of course they are more expensive.

Link to comment
28 minutes ago, johnnie.black said:

The diagnostics were with one of those?

 

Weird as it is reported on the syslog as a 10 port SATA controller, but it's likely the problem, but sorry can't recommend a known working one of that type, since they are not commonly used with Unraid.

 

You could always get an NVMe device, those would work without issues, but of course they are more expensive.

Yep, the second one is the one currently in the system. Of course the array is also running on an H200 flashed to IT mode.

Link to comment

Alright, so I got a new card and switched to an NVME drive. It allowed me to format after some struggles and got setup, however, the dockers are now crawling. I looked through the logs and am seeing tons of timeouts for the drive. Not sure if I'm missing something that needs to be done.

 

Feb 22 00:15:50 Unraid kernel: nvme nvme0: I/O 927 QID 16 timeout, completion polled
Feb 22 00:15:50 Unraid kernel: nvme nvme0: I/O 928 QID 16 timeout, completion polled
Feb 22 00:15:50 Unraid kernel: nvme nvme0: I/O 929 QID 16 timeout, completion polled
Feb 22 00:24:32 Unraid kernel: nvme nvme0: I/O 24 QID 0 timeout, completion polled
Feb 22 00:25:34 Unraid kernel: nvme nvme0: I/O 5 QID 0 timeout, completion polled
Feb 22 00:46:03 Unraid kernel: nvme nvme0: I/O 2 QID 0 timeout, completion polled
Feb 22 00:47:04 Unraid kernel: nvme nvme0: I/O 29 QID 0 timeout, completion polled
Feb 22 00:48:12 Unraid kernel: nvme nvme0: I/O 929 QID 16 timeout, completion polled
Feb 22 00:48:12 Unraid kernel: nvme nvme0: I/O 930 QID 16 timeout, completion polled

unraid-diagnostics-20190222-0836.zip

Link to comment

Ok, new version installed. Now getting this error over and over

 

Feb 22 10:24:50 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 12700 start 0
Feb 22 10:24:50 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 12700 start 0
Feb 22 10:24:51 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 12700 start 0
Feb 22 10:24:51 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 12700 start 0
Feb 22 10:24:52 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 12700 start 0
Feb 22 10:24:52 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 12700 start 0
Feb 22 10:24:53 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 15997 start 0
Feb 22 10:24:53 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 15997 start 0
Feb 22 10:24:53 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 12700 start 0
Feb 22 10:24:53 Unraid kernel: BTRFS info (device nvme0n1p1): no csum found for inode 12700 start 0
Feb 22 10:24:54 Unraid kernel: btrfs_dev_stat_print_on_error: 122 callbacks suppressed
Feb 22 10:24:54 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7316, flush 0, corrupt 0, gen 0
Feb 22 10:24:54 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7317, flush 0, corrupt 0, gen 0
Feb 22 10:24:54 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7318, flush 0, corrupt 0, gen 0
Feb 22 10:24:54 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7319, flush 0, corrupt 0, gen 0
Feb 22 10:24:54 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7320, flush 0, corrupt 0, gen 0
Feb 22 10:24:54 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7321, flush 0, corrupt 0, gen 0
Feb 22 10:24:55 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7322, flush 0, corrupt 0, gen 0
Feb 22 10:24:55 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7323, flush 0, corrupt 0, gen 0
Feb 22 10:24:55 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7324, flush 0, corrupt 0, gen 0
Feb 22 10:24:55 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 7325, flush 0, corrupt 0, gen 0

unraid-diagnostics-20190222-1626.zip

Link to comment

Looks like there are still issues, problem start here:

Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
Feb 22 10:14:57 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 10, flush 0, corrupt 0, gen 0

 

These are read errors, though not seeing any error before about the device dropping, there's this:

Feb 22 10:14:47 Unraid kernel: nvme nvme0: failed to set APST feature (-19)

Which I'm not sure if it's a reason for concern, either there's a problem with the device or still some issues with current kernel, if you can replace it with a Samsung NVMe device, those are known to work without issues.

Link to comment

So, at this point I feel pretty hosed and might be beyond my knowledge. The appdata folder is on teh cache drive, and since its getting read errors, most of the docker containers aren't working. I tried to use the mover server, but it doesnt appear to be working. I also tried going in through midnight commander and moving the appdata, but it is timing out. I did a backup via CA Backup and Restore prior to all this, so now I'm wondering what the easiest method would be to just restoring the appdata on the array.

Link to comment

If I move them manually from the zip, so I need to do anything special to retain the proper permissions? The zip is currently on a different source

 

I worry about doing a restore through can backup and restore since some of my Dockers are already off the cache drive and the backup will now be a few days old

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.