timekiller Posted November 19, 2021 Share Posted November 19, 2021 So I finally took everyone here's advice and replaced my Marvell based controller cards (IO Crest 16 Port) with 2 LSI 9201-16i cards. In addition I needed to shuffle some disks around, so when I installed the new cards I also wound having to do a new config and start a parity rebuild. It's been running for about 33 hours and everything was going great until about 30 minutes ago when I got an error deleting a file. Investigation shows that I lost /mnt/user - "Transport endpoint is not connected". Interestingly, /mnt/user0 is still connected and the array is accessible from there. Of course all of my docker container and shares use /mnt/user, so now the entire server is effectively offline. I stopped all my docker containers to hopefully avoid further issues there. I assume a reboot will fix this, but 1) I'd like to know what happened here, and 2) I don't want to have to restart the parity rebuild. There is currently an estimated 9 hours left and it appears to be running fine. Do I have any options beyond reboot and start over, or go 9 hours or more without my server? Diagnostics attached storage-diagnostics-20211119-0937.zip Quote Link to comment
JorgeB Posted November 19, 2021 Share Posted November 19, 2021 AFAIK only a reboot will fix it, also it was caused by NFS, so disable if you don't need it or can change everything to SMB. Quote Link to comment
timekiller Posted November 19, 2021 Author Share Posted November 19, 2021 2 minutes ago, JorgeB said: AFAIK only a reboot will fix it, also it was caused by NFS, so disable if you don't need it or can change everything to SMB. My desktop is Linux, so definitely need NFS. Never seen NFS cause the array to go offline before, any idea how this happened? Quote Link to comment
JorgeB Posted November 19, 2021 Share Posted November 19, 2021 Not really, just that NFS crashed bringing down the user shares. Quote Link to comment
timekiller Posted November 19, 2021 Author Share Posted November 19, 2021 Just now, JorgeB said: Not really, just that NFS crashed bringing down the user shares. Thanks, at least it's not hardware related this time! So I can better diagnose in the future, where did you find this? I'm looking through the diagnostics file and don't see it. Quote Link to comment
JorgeB Posted November 19, 2021 Share Posted November 19, 2021 Nov 19 09:28:27 Storage kernel: Call Trace: Nov 19 09:28:27 Storage kernel: fh_getattr+0x45/0x5f [nfsd] Nov 19 09:28:27 Storage kernel: fill_post_wcc+0x2c/0x94 [nfsd] Nov 19 09:28:27 Storage kernel: fh_unlock+0x12/0x33 [nfsd] Nov 19 09:28:27 Storage kernel: nfsd3_proc_rmdir+0x4a/0x4f [nfsd] Nov 19 09:28:27 Storage kernel: nfsd_dispatch+0xb0/0x11e [nfsd] Nov 19 09:28:27 Storage kernel: svc_process+0x3dd/0x546 [sunrpc] Nov 19 09:28:27 Storage kernel: ? nfsd_svc+0x27e/0x27e [nfsd] Nov 19 09:28:27 Storage kernel: nfsd+0xef/0x146 [nfsd] Nov 19 09:28:27 Storage kernel: ? nfsd_destroy+0x57/0x57 [nfsd] Nov 19 09:28:27 Storage kernel: kthread+0xe5/0xea Nov 19 09:28:27 Storage kernel: ? __kthread_bind_mask+0x57/0x57 Nov 19 09:28:27 Storage kernel: ret_from_fork+0x22/0x30 Nov 19 09:28:27 Storage kernel: ---[ end trace 83e36f2bb8ca0fa2 ]--- nfsd is the NFS daemon. Quote Link to comment
timekiller Posted November 19, 2021 Author Share Posted November 19, 2021 9 minutes ago, JorgeB said: Nov 19 09:28:27 Storage kernel: Call Trace: Nov 19 09:28:27 Storage kernel: fh_getattr+0x45/0x5f [nfsd] Nov 19 09:28:27 Storage kernel: fill_post_wcc+0x2c/0x94 [nfsd] Nov 19 09:28:27 Storage kernel: fh_unlock+0x12/0x33 [nfsd] Nov 19 09:28:27 Storage kernel: nfsd3_proc_rmdir+0x4a/0x4f [nfsd] Nov 19 09:28:27 Storage kernel: nfsd_dispatch+0xb0/0x11e [nfsd] Nov 19 09:28:27 Storage kernel: svc_process+0x3dd/0x546 [sunrpc] Nov 19 09:28:27 Storage kernel: ? nfsd_svc+0x27e/0x27e [nfsd] Nov 19 09:28:27 Storage kernel: nfsd+0xef/0x146 [nfsd] Nov 19 09:28:27 Storage kernel: ? nfsd_destroy+0x57/0x57 [nfsd] Nov 19 09:28:27 Storage kernel: kthread+0xe5/0xea Nov 19 09:28:27 Storage kernel: ? __kthread_bind_mask+0x57/0x57 Nov 19 09:28:27 Storage kernel: ret_from_fork+0x22/0x30 Nov 19 09:28:27 Storage kernel: ---[ end trace 83e36f2bb8ca0fa2 ]--- nfsd is the NFS daemon. yup, saw the call trace and my eyes skimmed right past the nfsd stuff - thanks! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.