Neldonado Posted May 6, 2023 Share Posted May 6, 2023 (edited) I started getting these notifications a few weeks ago, "fstrim: /mnt/mediacache: the discard operation is not supported". I went to poke around and checked my logs to find a wall of this: BTRFS error (device sdaa1): bdev /dev/sdaa1 errs: wr 106017132, rd 82906, flush 0, corrupt 0, gen 0 what causes this, and what should I do to remedy the situation? Shut down, reseat cables, start back up? I've uploaded my diagnostics skynet-diagnostics-20230506-0817.zip Edited May 7, 2023 by Neldonado Quote Link to comment
Neldonado Posted May 7, 2023 Author Share Posted May 7, 2023 I’ve got roughly 1.2TB of data on this cache pool, would it be safe to run the mover and get the data off? Quote Link to comment
JorgeB Posted May 7, 2023 Share Posted May 7, 2023 Syslog rotated so cannot see the beginning of the problem, but looks like this device dropped offline a few days ago: May 4 04:43:00 Skynet kernel: BTRFS error (device sdaa1): bdev /dev/sdaa1 errs: wr 67401451, rd 44447, flush 0, corrupt 0, gen 0 Reboot and post new diags after array start. Quote Link to comment
Neldonado Posted May 7, 2023 Author Share Posted May 7, 2023 new diagnostics skynet-diagnostics-20230507-0831.zip Quote Link to comment
Neldonado Posted May 7, 2023 Author Share Posted May 7, 2023 looks like the drive is throwing the same errors, I imagine it'll drop offline any minute. Quote Link to comment
JorgeB Posted May 8, 2023 Share Posted May 8, 2023 No device errors so far, the ones you see logged is btrfs bringing that device up to sync, SMART looks good, you should now run a scrub, if it happens again replace the cables, also take a look here for better pool monitoring so you're notified if there's a problem. Quote Link to comment
Neldonado Posted May 8, 2023 Author Share Posted May 8, 2023 3 hours ago, JorgeB said: No device errors so far, the ones you see logged is btrfs bringing that device up to sync, SMART looks good, you should now run a scrub, if it happens again replace the cables, also take a look here for better pool monitoring so you're notified if there's a problem. Are these the same errors (see picture) sdx is the other drive in this cache pool. BTFS error (device sdl: state EA): parent transid verify failed on 316334 9286912 wanted 36459 found 36425 uploading diagnostics again. Somethings weird going on, I replaced all my cables a month or two ago and I just started noticing these errors out of nowhere. skynet-diagnostics-20230508-0436.zip Quote Link to comment
JorgeB Posted May 8, 2023 Share Posted May 8, 2023 20 minutes ago, Neldonado said: you should now run a scrub make sure all errors are corrected. Quote Link to comment
Neldonado Posted May 8, 2023 Author Share Posted May 8, 2023 So is this what I want to do? btrfs scrub start -B -d -r /dev/sdaa1 and btrfs scrub start -B -d -r /dev/sdx1 Quote Link to comment
JorgeB Posted May 8, 2023 Share Posted May 8, 2023 You can use the GUI, click on the first pool member and scroll down to the scrub section. Quote Link to comment
Neldonado Posted May 8, 2023 Author Share Posted May 8, 2023 14 minutes ago, JorgeB said: You can use the GUI, click on the first pool member and scroll down to the scrub section. So I do that and it refreshes and says aborted? UUID: xxxx Scrub started: Mon May 8 05:57:58 2023 Status: aborted Duration: 0:00:00 Total to scrub: 3.04TiB Rate: 0.00B/s Error summary: no errors found Quote Link to comment
JorgeB Posted May 8, 2023 Share Posted May 8, 2023 Post new diags to see if there's something there. Quote Link to comment
Neldonado Posted May 8, 2023 Author Share Posted May 8, 2023 New diagnostics skynet-diagnostics-20230508-0655.zip Quote Link to comment
JorgeB Posted May 8, 2023 Share Posted May 8, 2023 I see that the scrub is aborting but not why it is aborting, reboot and try again, if the issue persists best to backup and recreate the pool. Quote Link to comment
Neldonado Posted May 8, 2023 Author Share Posted May 8, 2023 6 minutes ago, JorgeB said: I see that the scrub is aborting but not why it is aborting, reboot and try again, if the issue persists best to backup and recreate the pool. Tons of errors being corrected… this is all good I hope? Looks like I’ve got some downtime before it’s finished. Quote Link to comment
JorgeB Posted May 8, 2023 Share Posted May 8, 2023 As long as they all are corrected it's good, disk that before dropped offline needs to be synced up. Quote Link to comment
Neldonado Posted May 10, 2023 Author Share Posted May 10, 2023 On 5/8/2023 at 7:47 AM, JorgeB said: As long as they all are corrected it's good, disk that before dropped offline needs to be synced up. Finished scrubbing and including diagnostics after finishing. also noticed my log is full, what should next steps be? skynet-diagnostics-20230510-1535.zip Quote Link to comment
JorgeB Posted May 11, 2023 Share Posted May 11, 2023 Pool should be fixed, reboot to clear the log, also see the link I posted above to reset the pool stats and keep monitoring. Quote Link to comment
Neldonado Posted May 15, 2023 Author Share Posted May 15, 2023 On 5/11/2023 at 12:53 AM, JorgeB said: Pool should be fixed, reboot to clear the log, also see the link I posted above to reset the pool stats and keep monitoring. OK, it's been almost 4 days and I started getting errors again. diagnostics attached. skynet-diagnostics-20230515-0334.zip Quote Link to comment
JorgeB Posted May 15, 2023 Share Posted May 15, 2023 Disk dropped offline again, check/replace cables or swap with a different disk. Quote Link to comment
Neldonado Posted May 15, 2023 Author Share Posted May 15, 2023 These cables are relatively new, is there any way to see if it’s just a bad disk or a power issue? Quote Link to comment
Solution JorgeB Posted May 15, 2023 Solution Share Posted May 15, 2023 Swap both SATA and power cable with a different disk, then see where the problem follows. Quote Link to comment
Neldonado Posted May 30, 2023 Author Share Posted May 30, 2023 Calling this closed for now, I swapped the drives around and haven't noticed anything Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.