September 1, 20214 yr For the past 2 months I have had issues on my monthly parity checks (few thousand errors, but no individual drive errors) and my docker seems to go offline, and I have to reinstall all the containers, some shares go offline as well. Attached the diagnostics, any sense out of what is going on? tower-diagnostics-20210901-1302.zip
September 1, 20214 yr Community Expert There are issues with the SAS2LP controller, those controllers are not recommended for v6, if possible replace it with an LSI HBA, if not using ECC RAM also a good idea to run memtest.
September 1, 20214 yr Author Ah, yea they have caused a bit of an issue in the past for me, trying to work on finding an affordable upgrade now. Any suggestions?
September 1, 20214 yr Author Thanks, I ordered 2 SAS9211-8I cards today. after getting them on IT mode should it be pretty easy to just plug n play?
September 2, 20214 yr Community Expert 7 hours ago, Sirkyle said: after getting them on IT mode should it be pretty easy to just plug n play? Yep.
September 8, 20214 yr Author Ok, got my new cards, installed them and ran a non-correcting parity check just to make sure everything went ok. I noticed as soon as I started the parity check the user shares all dropped offline, picked up a few errors, and now have a drive that's unassigned. Attached fresh diagnostics. tower-diagnostics-20210908-0619.zip
September 8, 20214 yr Author Looks like the unassigned device is my cache drive, although its using a different letter code SDE, and SDU
September 8, 20214 yr Author Ok, I think I have another breakout cable I can try. Ill report back in a bit
September 8, 20214 yr Community Expert 1 hour ago, Sirkyle said: although its using a different letter code SDE, and SDU The letter code can always change between boot as it is assigned dynamically during boot process. It can also happen if a drive drops of line temporarily and then reconnects and is given a different letter.
September 8, 20214 yr Author 57 minutes ago, itimpi said: The letter code can always change between boot as it is assigned dynamically during boot process. It can also happen if a drive drops of line temporarily and then reconnects and is given a different letter. So swap the cable, reboot, and see what happens?
September 8, 20214 yr Author Ok cables swapped, no unassigned drive showing up. user shares are back, need to run any sort of parity check you think?
September 8, 20214 yr Community Expert Parity hasn't got anything to do with cache of course, but sounds like you never got to finish the parity check you wanted to do earlier to test your new controller.
September 8, 20214 yr Author Just now, trurl said: Parity hasn't got anything to do with cache of course, but sounds like you never got to finish the parity check you wanted to do earlier to test your new controller. It did complete, 2020 errors. after the test is when I noticed the cache issue
September 8, 20214 yr Community Expert 5 hours ago, Sirkyle said: non-correcting parity check Doesn't look like it was non-correcting Sep 7 12:21:36 Tower kernel: mdcmd (36): check Sep 7 12:21:36 Tower kernel: md: recovery thread: check P Q ... Sep 7 12:35:11 Tower kernel: md: recovery thread: PQ corrected, sector=206897064 Sep 7 12:35:11 Tower kernel: md: recovery thread: PQ corrected, sector=206897072 ... Sep 7 12:35:11 Tower kernel: md: recovery thread: PQ corrected, sector=206899896 Sep 7 12:35:11 Tower kernel: md: recovery thread: PQ corrected, sector=206899904 Sep 7 12:35:11 Tower kernel: md: recovery thread: stopped logging 9 minutes ago, Sirkyle said: 2020 errors You can't allow sync errors. Post new diagnostics, then run a NON-correcting parity check.
September 8, 20214 yr Author Apologies, I swore I had unchecked the write corrections to parity. New diagnostics attached and starting the non correcting check now tower-diagnostics-20210908-1151.zip
September 8, 20214 yr Author I notice when I start a parity check a lot of my user shares go offline. Ill post more diagnostics when its done, probably 11 hours
September 8, 20214 yr Community Expert Just now, Sirkyle said: I notice when I start a parity check a lot of my user shares go offline. Ill post more diagnostics when its done, probably 11 hours Go ahead and post them now, probably a symptom of disconnected disk.
September 8, 20214 yr Community Expert Sep 8 11:23:30 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token unrelated, but you should close all browsers to your server after reboot and start with a new browser session.
September 8, 20214 yr Community Expert Sep 8 11:53:46 Tower kernel: XFS (sde1): log I/O error -5 Sep 8 11:53:46 Tower kernel: XFS (sde1): xfs_do_force_shutdown(0x2) called from line 1196 of file fs/xfs/xfs_log.c. Return address = 000000009975f2dd Sep 8 11:53:46 Tower kernel: XFS (sde1): Log I/O Error Detected. Shutting down filesystem Sep 8 11:53:46 Tower kernel: XFS (sde1): Please unmount the filesystem and rectify the problem(s) Why are you using an SMR drive as cache? Model Family: Seagate BarraCuda 3.5 (SMR)
September 8, 20214 yr Author 19 minutes ago, trurl said: Sep 8 11:53:46 Tower kernel: XFS (sde1): log I/O error -5 Sep 8 11:53:46 Tower kernel: XFS (sde1): xfs_do_force_shutdown(0x2) called from line 1196 of file fs/xfs/xfs_log.c. Return address = 000000009975f2dd Sep 8 11:53:46 Tower kernel: XFS (sde1): Log I/O Error Detected. Shutting down filesystem Sep 8 11:53:46 Tower kernel: XFS (sde1): Please unmount the filesystem and rectify the problem(s) Why are you using an SMR drive as cache? Model Family: Seagate BarraCuda 3.5 (SMR) To be honest im not really sure what that is?
September 8, 20214 yr Community Expert Cache device dropped again, if you swapped both power and SATA cables it could be a device problem, you can also try connecting it to the onboard SATA ports, but I've used the same disk model with an LSI without issues.
September 8, 20214 yr Author so recommendation? I have some brand new drives around but its a 4tb. I don't have any other small drives. I swapped cables from the backplane to the LSI. im happy to format the cache drive and try some specific diagnostics for that? Or keep letting the non-correcting parity check roll?
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.