Jump to content

wolfinabox

Members
  • Posts

    3
  • Joined

  • Last visited

Everything posted by wolfinabox

  1. Gotcha, thought so, just wanted to be sure. I suppose since those results are from the disk itself, they can't really be caused by the interface/cable anyway, makes sense! Will work on replacing that, ty! Now to find out which identical SSD it is in my tower...
  2. Hi! I'm thinking one of my cache SSDs might be failing, but I'd appreciate a second set of eyes on the info (it's not even a year old so hopefully I can RMA it if necessary) I'd been noticing some "READ_FPDMA_QUEUED" and "WRITE_FPDMA_QUEUED" errors popping up in the logs for this particular disk, but I thought it might be a bad SATA cable (was just using whatever old cables I had on hand). I replaced it with a brand new cable as soon as possible, and also swapped to a different SATA port in the process, but during a BTRFS scrub, the disk is still getting the same errors: (Full log for during the scrub attached) Jan 24 20:38:03 boxserver kernel: ata12.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0 Jan 24 20:38:03 boxserver kernel: ata12.00: irq_stat 0x40000008 Jan 24 20:38:03 boxserver kernel: ata12.00: failed command: READ FPDMA QUEUED Jan 24 20:38:03 boxserver kernel: ata12.00: cmd 60/08:f8:f0:5f:2f/00:00:07:00:00/40 tag 31 ncq dma 4096 in Jan 24 20:38:03 boxserver kernel: res 41/40:08:f0:5f:2f/00:00:07:00:00/00 Emask 0x409 (media error) <F> Jan 24 20:38:03 boxserver kernel: ata12.00: status: { DRDY ERR } Jan 24 20:38:03 boxserver kernel: ata12.00: error: { UNC } Jan 24 20:38:03 boxserver kernel: ata12.00: supports DRM functions and may not be fully accessible Jan 24 20:38:03 boxserver kernel: ata12.00: supports DRM functions and may not be fully accessible Jan 24 20:38:03 boxserver kernel: ata12.00: configured for UDMA/133 Jan 24 20:38:03 boxserver kernel: ata12: EH complete Jan 24 20:38:03 boxserver kernel: ata12.00: Enabling discard_zeroes_data 17 read errors during the scrub, 0 corrected/uncorrected/unverified though The smart report for that drive also shows "Errors occurred - Check SMART report" (smart report attached) Since the issue is persisting, I'm thinking now that it's the drive itself sadly, is there anything else I should check? Build is here, and the cache SSDs are currently in raid 0 (appdata gets backed up to array regularly, VMs don't have important data) EDIT: I've pulled all the data from the cache to the array (using mover), and during that there were many of these same errors, only from /dev/sdg (the same drive). All files seemed to make it over though, so I removed sdg from the pool (and reformatted sdf, the other cache drive, into a single drive pool) and transferred the cache contents back over to sdf no problem. Signs point to that drive being bad syslog.txt Samsung_SSD_870_EVO_1TB_S6PTNZ0R608029R-20220124-2050.txt
  3. So I made a few changes to my Unraid server and Win10 vm recently, and I'm not sure if those are causing this error, or if it's actually a real problem. I swapped out my two random 250gb cache SSDs (raid0) for two 1tb 870 evos (raid0) recently, using the mover method to move all my data onto and back off the array. That all seemed to go smoothly, and all my VMs and docker containers started right back up afterwards. I also re-sized my Win10 VM's vdisk from 250gb to 500gb using the option in Unraid's VM page, and then re-sized the partition in Windows to make use of it. Now I'm getting a BTRFS Checksum error in the system logs, only (from what I can tell) for that vdisk file. (The only inode that shows up in the entire syslog is 68179, which I confirmed to be the vdisk) This was while the VM was running, and (from just using it) it seemed to still run fine (I could read/write to the disk in Windows) I don't really care too much about the VM (just for remote gaming, I could have another up and running in an hour), but I'm more concerned if this is actually a real issue, or possibly caused by me resizing the disk? And, is it fixable (if it's not a real problem can I tell BTRFS "it's okay" somehow, or should I just delete and recreate?) I've attached the diagnostics and syslog, and this is the output from running a corrective BTRFS scrub (said Uncorrectable: 27) Any insight would be appreciated! boxserver-diagnostics-20210807-1246.zip syslog.txt
×
×
  • Create New...