Shares are gone again.


tucansam

Recommended Posts

root@ffs2:/mnt# ls
/bin/ls: cannot access 'user': Transport endpoint is not connected
RecycleBin/  disk11/  disk14/  disk4/  disk7/  disks/    user0/
disk1/       disk12/  disk2/   disk5/  disk8/  remotes/
disk10/      disk13/  disk3/   disk6/  disk9/  user/
root@ffs2:/mnt# ls user
/bin/ls: cannot access 'user': Transport endpoint is not connected
root@ffs2:/mnt#
 

 

All shares gone, not accessible at all.  'ls /mnt/user' fails.  Happened a few weeks ago too.  Reboot fixed it.  Never did find out what caused it.

 

Diags attached.

 

My SSD, mounted with unassigned devices, is fully accessible.

 

Starting and stopping the array had no effect, must be a reboot to get it back up.

 

Any ideas?

 

 

ffs2-diagnostics-20210905-1528.zip

Edited by tucansam
Link to comment
4 hours ago, Squid said:

Do you have an expander?  On a quick glance, it looks like everything attached to it dropped and then reconnected...

 

As a matter of fact, I do have an expander.  However, the first time this problem occurred was before the 2nd HBA for the expander was installed.  Right now nothing is connected to the expander.

 

I have an LSI HBA and all of the array members, except the parity disks, are connected to that HBA.  I have a second LSI that will eventually connect to the expander.  The expander is reserved for when I get a JBOD completed for expansion.

 

Edited by tucansam
Link to comment
  • 4 months later...

This has now happened again.  Array stopped on its own at some point during the night.  The GUI worked, however restarting the array had no effect (it tried, the web interface allowed me to click the button to start the array, but after several minutes of waiting, it never happened).  Rebooting from the GUI had no effect.  I had to 'sudo reboot' from the shell, and now its doing a parity check. 
 

This is probably the fourth or fifth time now that this has occured.  At the suggestion of you good folks, I have unraid logging to a syslog server, so now I have logs after a reboot.

 

Here's a section that looks pertinent:

 

Feb  1 17:47:14 ffs2 emhttpd: read SMART /dev/sdl
Feb  1 17:47:33 ffs2 emhttpd: read SMART /dev/sdp
Feb  1 17:47:42 ffs2 emhttpd: read SMART /dev/sdv
Feb  1 17:47:49 ffs2 emhttpd: read SMART /dev/sdt
Feb  1 17:47:56 ffs2 emhttpd: read SMART /dev/sdx
Feb  1 17:48:03 ffs2 emhttpd: read SMART /dev/sdu
Feb  1 17:48:15 ffs2 emhttpd: read SMART /dev/sdj
Feb  1 17:48:24 ffs2 emhttpd: read SMART /dev/sdb
Feb  1 17:48:37 ffs2 emhttpd: read SMART /dev/sdc
Feb  1 17:49:48 ffs2 emhttpd: read SMART /dev/sdw
Feb  1 18:37:55 ffs2 ntpd[1686]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Feb  1 18:37:55 ffs2 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2002.0 try https://www.rsyslog.com/e/2359 ]
Feb  1 20:33:05 ffs2 emhttpd: spinning down /dev/sdr
Feb  1 20:33:05 ffs2 emhttpd: spinning down /dev/sdq
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdm
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdh
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdj
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdg
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdw
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdd
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdt
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sde
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdu
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdb
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdx
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdf
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdv
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdc
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sds
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdn
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdl
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdi
Feb  1 20:33:12 ffs2 emhttpd: spinning down /dev/sdp
Feb  2 03:00:01 ffs2 CA Backup/Restore: It doesn't appear that the array is running.  Exiting CA Backup
Feb  2 07:00:01 ffs2 crond[1707]: exit status 3 from user root /usr/local/sbin/mover &> /dev/null
Feb  2 09:04:10 ffs2 webGUI: Successful login user root from 192.168.0.105
Feb  2 09:04:28 ffs2 emhttpd: shcmd (874446): /usr/local/sbin/set_ncq sdq 1
Feb  2 09:04:28 ffs2 kernel: mdcmd (37): set md_num_stripes 768
Feb  2 09:04:28 ffs2 kernel: mdcmd (38): set md_queue_limit 80
Feb  2 09:04:28 ffs2 kernel: mdcmd (39): set md_sync_limit 5
Feb  2 09:04:28 ffs2 kernel: mdcmd (40): set md_write_method
Feb  2 09:04:28 ffs2 emhttpd: shcmd (874447): echo 128 > /sys/block/sdq/queue/nr_requests
Feb  2 09:04:28 ffs2 emhttpd: shcmd (874448): /usr/local/sbin/set_ncq sdp 1
Feb  2 09:04:28 ffs2 emhttpd: shcmd (874449): echo 128 > /sys/block/sdp/queue/nr_requests
Feb  2 09:04:28 ffs2 emhttpd: shcmd (874450): /usr/local/sbin/set_ncq sde 1
Feb  2 09:04:28 ffs2 emhttpd: shcmd (874451): echo 128 > /sys/block/sde/queue/nr_requests
Feb  2 09:04:28 ffs2 emhttpd: shcmd (874452): /usr/local/sbin/set_ncq sdf 1
Feb  2 09:04:28 ffs2 emhttpd: shcmd (874453): echo 128 > /sys/block/sdf/queue/nr_requests
 

Any ideas?

 

Link to comment
17 hours ago, tucansam said:

Yes.  Right at 0900 is when I noticed the array was stopped.  Everything up to the point of the server time error looked normal.  The log covers more than a week and I didn't want to post it all, but I can.

Not likely to tell anything.

 

Latest diagnostics might tell us at least your current configuration and hardware.

Link to comment
1 hour ago, trurl said:

Are you sure you have adequate power? Cooling?

 

Yeah, for sure cooling it fine.  The main server's PS is a bit old, been running for years.  But its a bulletproof Corsair model and i've got no indications of power issues.  Tons of drives and even during parity checks everything just hums along, have never had issues before.

 

The last major change was the addition of six disks on an SAS expander.  But now that I've added those disks to the array, I can't exactly undo it if it broke the system somehow.....

 

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.