ajeffco

Members
  • Posts

    169
  • Joined

  • Last visited

Everything posted by ajeffco

  1. While I know some people appear to still be having trouble with it, I wanted to give feedback that since the 6.6.1 upgrade I have experienced no crashes. root@tower:~# uptime 00:54:53 up 10 days, 2:25, 1 user, load average: 0.01, 0.11, 0.24 root@tower:~#
  2. I'm using it for backups also. Time Machine from 2 macbook pros Synology Hyberbackup Proxmox VM Backups from 4 nodes, 13 VM's In addition to the media center stuff (including the sab download target). I'm using cache disks on my array, wonder if that makes a difference.
  3. root@tower:~# uptime 08:09:56 up 3 days, 9:40, 1 user, load average: 2.50, 2.59, 2.15 No trouble since the 2 changes.
  4. Hello, Another night of no crash. root@tower:~# uptime 11:28:15 up 1 day, 12:58, 1 user, load average: 0.01, 0.00, 0.00 @Frank76 Sorry to hear your still having problems. I've run Synology backups manually and let them run by schedule and haven't had trouble since the two changes. I've also run 2 macbook timemachine backups at the same time as a manual synology backup scan each day since the changes, and it hasn't crashed. I want to say that for certain one of my crashes occurred when there was no I/O going to the unraid rig.
  5. Good Morning, My unraid rig has survived the night without crashing! I'll be watching it closely and will report back if anything happens. @Frank76 I also turned off the Tunable (enable Direct IO): hich had been enabled.
  6. Thanks Tom, Updating, will give feedback in the morning. Al
  7. Good Morning, The system crashed again overnight. There is nothing on the console beyond the login prompt. Fresh diagnostic attached. tower-diagnostics-20180927-0900.zip
  8. Changed. And the clients are fine during/after the change. So far
  9. I disabled the direct_io tunable on the global shares page earlier this morning after getting everything back in place, will report back any events.
  10. Thanks for responding... --- /etc/exports --- # See exports(5) for a description. # This file contains a list of all directories exported to other computers. # It is used by rpc.nfsd and rpc.mountd. "/mnt/user/downloads" -async,no_subtree_check,fsid=116 10.10.10.0/24(sec=sys,rw,no_root_squash,insecure) "/mnt/user/home" -async,no_subtree_check,fsid=118 10.10.10.0/24(sec=sys,rw,no_root_squash,insecure) "/mnt/user/movies" -async,no_subtree_check,fsid=119 10.10.10.0/24(sec=sys,rw,no_root_squash,insecure) "/mnt/user/tv" -async,no_subtree_check,fsid=117 10.10.10.0/24(sec=sys,rw,no_root_squash,insecure) --- end of exports --- There are likely reads and writes occurring to/from the server. I can't say the transfer sizes. The clients are as follows: /mnt/user/downloads : radarr, sonarr, sabnzbd and plex /mnt/user/home : plex /mnt/user/movies : radarr, plex /mnt/user/tv : sonarr, plex Not shown in exports and not shared in any form (NFS, SMB, AFP) is /mnt/user/synology, which is an rsync target for Synology Hyper Backup. According to my synology Hyper Backup logs the night backup had finished successfully about 2 minutes before the first timestamp on the kernel message. Also not shown in exports but shared via SMB is /mnt/user/sort, which would have no traffic at all at that time of day. On the clients the first indication of a problem is "03:48:07,970::ERROR::[misc:1634] Cannot change permissions of /downloads". My servers sync time via NTP, so it's in the same timeframe although not exactly the same. Here's the fstab entry for the /mnt/user/downloads export, the same settings are used on all NFS clients. tower:/mnt/user/downloads /downloads nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800,soft,_netdev 0 0 Let me know if I can get you anything else, and thanks again. Al
  11. stopping and starting the array does not resolve the issue, it takes a full reboot of the server.
  12. Forgot to mention. When this happens the web GUI works, ssh works. Just noticed also that /mnt/user is missing! root@tower:~# df df: /mnt/user: Transport endpoint is not connected Filesystem 1K-blocks Used Available Use% Mounted on rootfs 16367896 708240 15659656 5% / tmpfs 32768 428 32340 2% /run devtmpfs 16367912 0 16367912 0% /dev tmpfs 16448628 0 16448628 0% /dev/shm cgroup_root 8192 0 8192 0% /sys/fs/cgroup tmpfs 131072 524 130548 1% /var/log /dev/sda1 1000336 451424 548912 46% /boot /dev/loop0 8320 8320 0 100% /lib/modules /dev/loop1 4992 4992 0 100% /lib/firmware /dev/md1 2930266532 1651302708 1278054652 57% /mnt/disk1 /dev/md3 3907018532 3466271080 439927960 89% /mnt/disk3 /dev/md4 3907018532 3467094748 439014820 89% /mnt/disk4 /dev/md5 3907018532 3485190236 421011284 90% /mnt/disk5 /dev/md6 3907018532 3483896356 421228428 90% /mnt/disk6 /dev/md7 2930266532 1100410644 1829428332 38% /mnt/disk7 /dev/md9 3907018532 3474464252 431616164 89% /mnt/disk9 /dev/md10 3907018532 3437929976 468108360 89% /mnt/disk10 /dev/md11 3907018532 3192044520 713767432 82% /mnt/disk11 /dev/md12 3907018532 3485902824 420362712 90% /mnt/disk12 /dev/md13 3907018532 2182583140 1722297612 56% /mnt/disk13 /dev/md14 3907018532 3485206232 420995080 90% /mnt/disk14 /dev/md15 3907018532 3355538164 550484364 86% /mnt/disk15 /dev/md16 3907018532 3368869408 537176896 87% /mnt/disk16 /dev/md17 2930266532 1085994852 1843976364 38% /mnt/disk17 /dev/sds1 878906148 43783224 834069176 5% /mnt/cache shfs 59582040512 43729910024 15836218248 74% /mnt/user0 /dev/md2 3907018532 7210884 3898767788 1% /mnt/disk2
  13. Hello, Running unraid 6.6.0 stable, with a mostly NFS shares server. NFS appears to be crashing. Below is the first indication in the log file of a problem. All clients lock up with "nfs: server tower not responding, timed out" from that point forward. I have a coworker running unraid who has had the same issue, and while we initially thought it was just NFS, all CIFS AND RSYNC shares become unavailable also when this happens. When this happens unraid becomes 100% unusable for file operations for any client! This appears to have been reported already at [ 6.6.0-RC4 ] NFS CRASHES. I submitted another since this is 6.6.0 stable. HOW TO REPRODUCE: Reboot and just wait. My coworker has had this happen a few times, this is my first issue. Sep 26 03:48:41 tower kernel: ------------[ cut here ]------------ Sep 26 03:48:41 tower kernel: nfsd: non-standard errno: -103 Sep 26 03:48:41 tower kernel: WARNING: CPU: 2 PID: 12478 at fs/nfsd/nfsproc.c:817 nfserrno+0x44/0x4a [nfsd] Sep 26 03:48:41 tower kernel: Modules linked in: md_mod nfsd lockd grace sunrpc bonding mlx4_en mlx4_core igb sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp ast ttm kvm_intel drm_kms_helper kvm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd agpgart glue_helper intel_cstate intel_uncore ipmi_ssif intel_rapl_perf syscopyarea mpt3sas i2c_i801 i2c_algo_bit i2c_core ahci sysfillrect pcc_cpufreq libahci sysimgblt fb_sys_fops raid_class scsi_transport_sas wmi acpi_power_meter ipmi_si acpi_pad button [last unloaded: md_mod] Sep 26 03:48:41 tower kernel: CPU: 2 PID: 12478 Comm: nfsd Not tainted 4.18.8-unRAID #1 Sep 26 03:48:41 tower kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.0a 02/08/2018 Sep 26 03:48:41 tower kernel: RIP: 0010:nfserrno+0x44/0x4a [nfsd] Sep 26 03:48:41 tower kernel: Code: c0 48 83 f8 22 75 e2 80 3d b3 06 01 00 00 bb 00 00 00 05 75 17 89 fe 48 c7 c7 3b 9a 18 a0 c6 05 9c 06 01 00 01 e8 8a ec ec e0 <0f> 0b 89 d8 5b c3 48 83 ec 18 31 c9 ba ff 07 00 00 65 48 8b 04 25 Sep 26 03:48:41 tower kernel: RSP: 0018:ffffc9000c743db8 EFLAGS: 00010286 Sep 26 03:48:41 tower kernel: RAX: 0000000000000000 RBX: 0000000005000000 RCX: 0000000000000007 Sep 26 03:48:41 tower kernel: RDX: 0000000000000000 RSI: ffff88087fc96470 RDI: ffff88087fc96470 Sep 26 03:48:41 tower kernel: RBP: ffffc9000c743e08 R08: 0000000000000003 R09: ffffffff82202400 Sep 26 03:48:41 tower kernel: R10: 000000000000087f R11: 000000000000a9e4 R12: ffff8802b01ea808 Sep 26 03:48:41 tower kernel: R13: ffff8807febb2a58 R14: 0000000000000002 R15: ffffffffa01892a0 Sep 26 03:48:41 tower kernel: FS: 0000000000000000(0000) GS:ffff88087fc80000(0000) knlGS:0000000000000000 Sep 26 03:48:41 tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 26 03:48:41 tower kernel: CR2: 00001501e0097000 CR3: 0000000001e0a005 CR4: 00000000001606e0 Sep 26 03:48:41 tower kernel: Call Trace: Sep 26 03:48:41 tower kernel: nfsd_open+0x15e/0x17c [nfsd] Sep 26 03:48:41 tower kernel: nfsd_write+0x4c/0xaa [nfsd] Sep 26 03:48:41 tower kernel: nfsd3_proc_write+0xad/0xdb [nfsd] Sep 26 03:48:41 tower kernel: nfsd_dispatch+0xb4/0x169 [nfsd] Sep 26 03:48:41 tower kernel: svc_process+0x4b5/0x666 [sunrpc] Sep 26 03:48:41 tower kernel: ? nfsd_destroy+0x48/0x48 [nfsd] Sep 26 03:48:41 tower kernel: nfsd+0xeb/0x142 [nfsd] Sep 26 03:48:41 tower kernel: kthread+0x10b/0x113 Sep 26 03:48:41 tower kernel: ? kthread_flush_work_fn+0x9/0x9 Sep 26 03:48:41 tower kernel: ret_from_fork+0x35/0x40 Sep 26 03:48:41 tower kernel: ---[ end trace 0df913a547279c0d ]--- tower-diagnostics-20180926-0904.zip
  14. Yea, used an old laptop HDD. Thanks, Al
  15. yea, I know all that. And after I wrote that I thought "Well, that's basically not unRAID anymore, it's more like Tom's Docker Manager" :) Thanks tho.
  16. I hate to resurrect a somewhat old thread, but it's the only one I could find. I would *LOVE* to have the ability to create a cache drive only unRAID rig just to run docker/vm on a multi disk BTRFS cache pool, without having to create/waste data disks. I'm using a 30 drive array now, and using 4 SSD drives as cache only, with only the docker/vm images on the cache pool. I'd move the 4 SSD drives in a heartbeat to a new rig running a cache only setup.
  17. Yea, I'm not running any VM's, or using this in that fashion. It's a media storage only. But I have just copied a very large amount of data to it, and had the first few drives at 98/99%, and then added drives and am currently balancing out the drive utilization. I figured after I'm done it would be good to defragment the drives that had gotten near 100%. Thanks Johnnie, Al
  18. Hello, Using the BTRFS file system on for data drives, does anyone know if there is any danger in running "btrfs filesystem defragment /mnt/disk#"? Thank you, Al
  19. I've completed the process of reformatting my drives to xfs. When I emptied disk 2, as expected disk 3 emptied also. Interestingly when I mounted the original 1TB disk to look at it also, it's showing empty as well instead of a 1TB drive with a 4TB size. Not sure why out of 24 total drives moved from the Linux/BTRFS machine, these 3 whacked out. Unfortunately since I didn't capture a syslog we'll never know. Lesson learned. Thanks again for the help. Al
  20. Ok. I've started migrating data off of disk 2. Will be interesting to see what happens to disk 3. Thanks jonnie.black
  21. I'm in the process of migrating data from the btrfs formatted drives to xfs formatted drives. When I get to the last 3 drives that are part of the original issue, any advice as to what I should do? Al
  22. I had a thought. Assuming Drive 3 has been in a bad state since it was brought in does that mean any data written to the drive since then wasn't lost? Or would it be gone regardless? I used rsync to copy from Linux/BTRFS/SMB to unraid and rsync verified everything written. That however would go to /mnt/user/wherever... Just trying to figure out what I might've lost that might need to be recovered.
  23. Oh. I'm not in any rush, just didn't know if there was a "formal" process for them to keep track of these types of issues.