wolfinabox Posted January 25, 2022 Share Posted January 25, 2022 (edited) Hi! I'm thinking one of my cache SSDs might be failing, but I'd appreciate a second set of eyes on the info (it's not even a year old so hopefully I can RMA it if necessary) I'd been noticing some "READ_FPDMA_QUEUED" and "WRITE_FPDMA_QUEUED" errors popping up in the logs for this particular disk, but I thought it might be a bad SATA cable (was just using whatever old cables I had on hand). I replaced it with a brand new cable as soon as possible, and also swapped to a different SATA port in the process, but during a BTRFS scrub, the disk is still getting the same errors: (Full log for during the scrub attached) Jan 24 20:38:03 boxserver kernel: ata12.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0 Jan 24 20:38:03 boxserver kernel: ata12.00: irq_stat 0x40000008 Jan 24 20:38:03 boxserver kernel: ata12.00: failed command: READ FPDMA QUEUED Jan 24 20:38:03 boxserver kernel: ata12.00: cmd 60/08:f8:f0:5f:2f/00:00:07:00:00/40 tag 31 ncq dma 4096 in Jan 24 20:38:03 boxserver kernel: res 41/40:08:f0:5f:2f/00:00:07:00:00/00 Emask 0x409 (media error) <F> Jan 24 20:38:03 boxserver kernel: ata12.00: status: { DRDY ERR } Jan 24 20:38:03 boxserver kernel: ata12.00: error: { UNC } Jan 24 20:38:03 boxserver kernel: ata12.00: supports DRM functions and may not be fully accessible Jan 24 20:38:03 boxserver kernel: ata12.00: supports DRM functions and may not be fully accessible Jan 24 20:38:03 boxserver kernel: ata12.00: configured for UDMA/133 Jan 24 20:38:03 boxserver kernel: ata12: EH complete Jan 24 20:38:03 boxserver kernel: ata12.00: Enabling discard_zeroes_data 17 read errors during the scrub, 0 corrected/uncorrected/unverified though The smart report for that drive also shows "Errors occurred - Check SMART report" (smart report attached) Since the issue is persisting, I'm thinking now that it's the drive itself sadly, is there anything else I should check? Build is here, and the cache SSDs are currently in raid 0 (appdata gets backed up to array regularly, VMs don't have important data) EDIT: I've pulled all the data from the cache to the array (using mover), and during that there were many of these same errors, only from /dev/sdg (the same drive). All files seemed to make it over though, so I removed sdg from the pool (and reformatted sdf, the other cache drive, into a single drive pool) and transferred the cache contents back over to sdf no problem. Signs point to that drive being bad syslog.txt Samsung_SSD_870_EVO_1TB_S6PTNZ0R608029R-20220124-2050.txt Edited January 25, 2022 by wolfinabox Quote Link to comment
JorgeB Posted January 25, 2022 Share Posted January 25, 2022 SMART test is failing so device needs to be replaced. Quote Link to comment
wolfinabox Posted January 25, 2022 Author Share Posted January 25, 2022 9 hours ago, JorgeB said: SMART test is failing so device needs to be replaced. Gotcha, thought so, just wanted to be sure. I suppose since those results are from the disk itself, they can't really be caused by the interface/cable anyway, makes sense! Will work on replacing that, ty! Now to find out which identical SSD it is in my tower... Quote Link to comment
JorgeB Posted January 25, 2022 Share Posted January 25, 2022 8 minutes ago, wolfinabox said: I suppose since those results are from the disk itself, they can't really be caused by the interface/cable anyway, makes sense! Correct, a full device write might fix it though, at least for some time, but if it does it's difficult to predict for how long. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.