Jump to content

ajeffco

Members
  • Posts

    169
  • Joined

  • Last visited

Everything posted by ajeffco

  1. While I know some people appear to still be having trouble with it, I wanted to give feedback that since the 6.6.1 upgrade I have experienced no crashes. root@tower:~# uptime 00:54:53 up 10 days, 2:25, 1 user, load average: 0.01, 0.11, 0.24 root@tower:~#
  2. I'm using it for backups also. Time Machine from 2 macbook pros Synology Hyberbackup Proxmox VM Backups from 4 nodes, 13 VM's In addition to the media center stuff (including the sab download target). I'm using cache disks on my array, wonder if that makes a difference.
  3. root@tower:~# uptime 08:09:56 up 3 days, 9:40, 1 user, load average: 2.50, 2.59, 2.15 No trouble since the 2 changes.
  4. Hello, Another night of no crash. root@tower:~# uptime 11:28:15 up 1 day, 12:58, 1 user, load average: 0.01, 0.00, 0.00 @Frank76 Sorry to hear your still having problems. I've run Synology backups manually and let them run by schedule and haven't had trouble since the two changes. I've also run 2 macbook timemachine backups at the same time as a manual synology backup scan each day since the changes, and it hasn't crashed. I want to say that for certain one of my crashes occurred when there was no I/O going to the unraid rig.
  5. Good Morning, My unraid rig has survived the night without crashing! I'll be watching it closely and will report back if anything happens. @Frank76 I also turned off the Tunable (enable Direct IO): hich had been enabled.
  6. Thanks Tom, Updating, will give feedback in the morning. Al
  7. Good Morning, The system crashed again overnight. There is nothing on the console beyond the login prompt. Fresh diagnostic attached. tower-diagnostics-20180927-0900.zip
  8. Changed. And the clients are fine during/after the change. So far
  9. I disabled the direct_io tunable on the global shares page earlier this morning after getting everything back in place, will report back any events.
  10. Thanks for responding... --- /etc/exports --- # See exports(5) for a description. # This file contains a list of all directories exported to other computers. # It is used by rpc.nfsd and rpc.mountd. "/mnt/user/downloads" -async,no_subtree_check,fsid=116 10.10.10.0/24(sec=sys,rw,no_root_squash,insecure) "/mnt/user/home" -async,no_subtree_check,fsid=118 10.10.10.0/24(sec=sys,rw,no_root_squash,insecure) "/mnt/user/movies" -async,no_subtree_check,fsid=119 10.10.10.0/24(sec=sys,rw,no_root_squash,insecure) "/mnt/user/tv" -async,no_subtree_check,fsid=117 10.10.10.0/24(sec=sys,rw,no_root_squash,insecure) --- end of exports --- There are likely reads and writes occurring to/from the server. I can't say the transfer sizes. The clients are as follows: /mnt/user/downloads : radarr, sonarr, sabnzbd and plex /mnt/user/home : plex /mnt/user/movies : radarr, plex /mnt/user/tv : sonarr, plex Not shown in exports and not shared in any form (NFS, SMB, AFP) is /mnt/user/synology, which is an rsync target for Synology Hyper Backup. According to my synology Hyper Backup logs the night backup had finished successfully about 2 minutes before the first timestamp on the kernel message. Also not shown in exports but shared via SMB is /mnt/user/sort, which would have no traffic at all at that time of day. On the clients the first indication of a problem is "03:48:07,970::ERROR::[misc:1634] Cannot change permissions of /downloads". My servers sync time via NTP, so it's in the same timeframe although not exactly the same. Here's the fstab entry for the /mnt/user/downloads export, the same settings are used on all NFS clients. tower:/mnt/user/downloads /downloads nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800,soft,_netdev 0 0 Let me know if I can get you anything else, and thanks again. Al
  11. stopping and starting the array does not resolve the issue, it takes a full reboot of the server.
  12. Forgot to mention. When this happens the web GUI works, ssh works. Just noticed also that /mnt/user is missing! root@tower:~# df df: /mnt/user: Transport endpoint is not connected Filesystem 1K-blocks Used Available Use% Mounted on rootfs 16367896 708240 15659656 5% / tmpfs 32768 428 32340 2% /run devtmpfs 16367912 0 16367912 0% /dev tmpfs 16448628 0 16448628 0% /dev/shm cgroup_root 8192 0 8192 0% /sys/fs/cgroup tmpfs 131072 524 130548 1% /var/log /dev/sda1 1000336 451424 548912 46% /boot /dev/loop0 8320 8320 0 100% /lib/modules /dev/loop1 4992 4992 0 100% /lib/firmware /dev/md1 2930266532 1651302708 1278054652 57% /mnt/disk1 /dev/md3 3907018532 3466271080 439927960 89% /mnt/disk3 /dev/md4 3907018532 3467094748 439014820 89% /mnt/disk4 /dev/md5 3907018532 3485190236 421011284 90% /mnt/disk5 /dev/md6 3907018532 3483896356 421228428 90% /mnt/disk6 /dev/md7 2930266532 1100410644 1829428332 38% /mnt/disk7 /dev/md9 3907018532 3474464252 431616164 89% /mnt/disk9 /dev/md10 3907018532 3437929976 468108360 89% /mnt/disk10 /dev/md11 3907018532 3192044520 713767432 82% /mnt/disk11 /dev/md12 3907018532 3485902824 420362712 90% /mnt/disk12 /dev/md13 3907018532 2182583140 1722297612 56% /mnt/disk13 /dev/md14 3907018532 3485206232 420995080 90% /mnt/disk14 /dev/md15 3907018532 3355538164 550484364 86% /mnt/disk15 /dev/md16 3907018532 3368869408 537176896 87% /mnt/disk16 /dev/md17 2930266532 1085994852 1843976364 38% /mnt/disk17 /dev/sds1 878906148 43783224 834069176 5% /mnt/cache shfs 59582040512 43729910024 15836218248 74% /mnt/user0 /dev/md2 3907018532 7210884 3898767788 1% /mnt/disk2
  13. Hello, Running unraid 6.6.0 stable, with a mostly NFS shares server. NFS appears to be crashing. Below is the first indication in the log file of a problem. All clients lock up with "nfs: server tower not responding, timed out" from that point forward. I have a coworker running unraid who has had the same issue, and while we initially thought it was just NFS, all CIFS AND RSYNC shares become unavailable also when this happens. When this happens unraid becomes 100% unusable for file operations for any client! This appears to have been reported already at [ 6.6.0-RC4 ] NFS CRASHES. I submitted another since this is 6.6.0 stable. HOW TO REPRODUCE: Reboot and just wait. My coworker has had this happen a few times, this is my first issue. Sep 26 03:48:41 tower kernel: ------------[ cut here ]------------ Sep 26 03:48:41 tower kernel: nfsd: non-standard errno: -103 Sep 26 03:48:41 tower kernel: WARNING: CPU: 2 PID: 12478 at fs/nfsd/nfsproc.c:817 nfserrno+0x44/0x4a [nfsd] Sep 26 03:48:41 tower kernel: Modules linked in: md_mod nfsd lockd grace sunrpc bonding mlx4_en mlx4_core igb sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp ast ttm kvm_intel drm_kms_helper kvm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd agpgart glue_helper intel_cstate intel_uncore ipmi_ssif intel_rapl_perf syscopyarea mpt3sas i2c_i801 i2c_algo_bit i2c_core ahci sysfillrect pcc_cpufreq libahci sysimgblt fb_sys_fops raid_class scsi_transport_sas wmi acpi_power_meter ipmi_si acpi_pad button [last unloaded: md_mod] Sep 26 03:48:41 tower kernel: CPU: 2 PID: 12478 Comm: nfsd Not tainted 4.18.8-unRAID #1 Sep 26 03:48:41 tower kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.0a 02/08/2018 Sep 26 03:48:41 tower kernel: RIP: 0010:nfserrno+0x44/0x4a [nfsd] Sep 26 03:48:41 tower kernel: Code: c0 48 83 f8 22 75 e2 80 3d b3 06 01 00 00 bb 00 00 00 05 75 17 89 fe 48 c7 c7 3b 9a 18 a0 c6 05 9c 06 01 00 01 e8 8a ec ec e0 <0f> 0b 89 d8 5b c3 48 83 ec 18 31 c9 ba ff 07 00 00 65 48 8b 04 25 Sep 26 03:48:41 tower kernel: RSP: 0018:ffffc9000c743db8 EFLAGS: 00010286 Sep 26 03:48:41 tower kernel: RAX: 0000000000000000 RBX: 0000000005000000 RCX: 0000000000000007 Sep 26 03:48:41 tower kernel: RDX: 0000000000000000 RSI: ffff88087fc96470 RDI: ffff88087fc96470 Sep 26 03:48:41 tower kernel: RBP: ffffc9000c743e08 R08: 0000000000000003 R09: ffffffff82202400 Sep 26 03:48:41 tower kernel: R10: 000000000000087f R11: 000000000000a9e4 R12: ffff8802b01ea808 Sep 26 03:48:41 tower kernel: R13: ffff8807febb2a58 R14: 0000000000000002 R15: ffffffffa01892a0 Sep 26 03:48:41 tower kernel: FS: 0000000000000000(0000) GS:ffff88087fc80000(0000) knlGS:0000000000000000 Sep 26 03:48:41 tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 26 03:48:41 tower kernel: CR2: 00001501e0097000 CR3: 0000000001e0a005 CR4: 00000000001606e0 Sep 26 03:48:41 tower kernel: Call Trace: Sep 26 03:48:41 tower kernel: nfsd_open+0x15e/0x17c [nfsd] Sep 26 03:48:41 tower kernel: nfsd_write+0x4c/0xaa [nfsd] Sep 26 03:48:41 tower kernel: nfsd3_proc_write+0xad/0xdb [nfsd] Sep 26 03:48:41 tower kernel: nfsd_dispatch+0xb4/0x169 [nfsd] Sep 26 03:48:41 tower kernel: svc_process+0x4b5/0x666 [sunrpc] Sep 26 03:48:41 tower kernel: ? nfsd_destroy+0x48/0x48 [nfsd] Sep 26 03:48:41 tower kernel: nfsd+0xeb/0x142 [nfsd] Sep 26 03:48:41 tower kernel: kthread+0x10b/0x113 Sep 26 03:48:41 tower kernel: ? kthread_flush_work_fn+0x9/0x9 Sep 26 03:48:41 tower kernel: ret_from_fork+0x35/0x40 Sep 26 03:48:41 tower kernel: ---[ end trace 0df913a547279c0d ]--- tower-diagnostics-20180926-0904.zip
  14. Yea, used an old laptop HDD. Thanks, Al
  15. yea, I know all that. And after I wrote that I thought "Well, that's basically not unRAID anymore, it's more like Tom's Docker Manager" :) Thanks tho.
  16. I hate to resurrect a somewhat old thread, but it's the only one I could find. I would *LOVE* to have the ability to create a cache drive only unRAID rig just to run docker/vm on a multi disk BTRFS cache pool, without having to create/waste data disks. I'm using a 30 drive array now, and using 4 SSD drives as cache only, with only the docker/vm images on the cache pool. I'd move the 4 SSD drives in a heartbeat to a new rig running a cache only setup.
  17. :sigh: I meant is there a way to have PRECLEAR do this... And yea, as I mentioned, this drive just came out of another server that had heavy usage, the 2TB is nearly full, I don't need the data as it was moved to the unraid system I'm moving the drive to via teracopy. It has had preclear run on it in the not too distant past. If it was a new drive, I'd preclear no questions.
  18. I've got a drive that came from a second unraid server, is there a way to have unraid just clear the drive and write the signature, but not the whole reads/writes? The drive has been in service for a while, I trust it already, just need it moved between rigs.
  19. Thanks Joe for looking at this for me. Yea, the temp thing is my fault. My unraid setup is in an Antec P182 case. I hadn't been using the lower drive cage yet. I put them into the lower drive cage, and didn't realize that the fan that cools them had died at some point. I was watching the clear on screen though, and started looking at the system to see what the deal was. I should have shut it down first and then looked. When I looked back up, it was cooking. So, shut it down, and replaced the 120mm fan, and it peaked at 38C during the last preclear. Thanks again.
  20. You'll help me make the report even better. Glad I could help Ok, it finally finished... Here's the output of the clears, nearly the same as before. /dev/sdb ========================================================================1.6 == invoked as: ./preclear_disk.sh /dev/sdb == ST3500641NS == Disk /dev/sdb has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 3:11:47 (43 MB/s) == Last Cycle's Zeroing time : 3:02:53 (45 MB/s) == Last Cycle's Post Read Time : 7:47:55 (17 MB/s) == Last Cycle's Total Time : 14:03:44 == == Total Elapsed Time 14:03:45 == == Disk Start Temperature: 35C == == Current Disk Temperature: 37C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdb /tmp/smart_finish_sdb ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Airflow_Temperature_Cel = 63 65 45 In_the_past 37 Temperature_Celsius = 37 35 0 ok 37 Hardware_ECC_Recovered = 61 64 0 ok 110939896 Current_Pending_Sector = 1 1 0 near_thresh 4294967294 Offline_Uncorrectable = 1 1 0 near_thresh 4294967294 No SMART attributes are FAILING_NOW 4294967294 sectors were pending re-allocation before the start of the preclear. 4294967294 sectors were pending re-allocation after pre-read in cycle 1 of 1. 4294967294 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 4294967294 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ============================================================================ /dev/sdc ========================================================================1.6 == invoked as: ./preclear_disk.sh /dev/sdc == ST3500641NS == Disk /dev/sdc has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 3:12:33 (43 MB/s) == Last Cycle's Zeroing time : 3:06:19 (44 MB/s) == Last Cycle's Post Read Time : 7:48:40 (17 MB/s) == Last Cycle's Total Time : 14:08:40 == == Total Elapsed Time 14:08:40 == == Disk Start Temperature: 33C == == Current Disk Temperature: 36C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdc /tmp/smart_finish_sdc ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Reported_Uncorrect = 1 1 0 near_thresh 602 High_Fly_Writes = 70 72 0 ok 30 Airflow_Temperature_Cel = 64 67 45 In_the_past 36 Temperature_Celsius = 36 33 0 ok 36 Hardware_ECC_Recovered = 58 60 0 ok 32622883 Current_Pending_Sector = 1 1 0 near_thresh 4294967295 Offline_Uncorrectable = 1 1 0 near_thresh 4294967295 No SMART attributes are FAILING_NOW 4294967295 sectors were pending re-allocation before the start of the preclear. 4294967295 sectors were pending re-allocation after pre-read in cycle 1 of 1. 4294967295 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 4294967295 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 2 sectors had been re-allocated before the start of the preclear. 2 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ============================================================================ And I just saw the /boot/preclear_reports directory holds multiple runs, so i'm posting them all... If I would have looked earlier, I would have posted the first set .
  21. I rebooted for an ACPI related problem which is now fixed. I'm currently running preclear again, and will post the results with the files when they are completed later tonight. Thank you Joe.
  22. Hello, I've run preclear on two drives that I've added to my array from a Windows 7 PC, where they've been running for quite a while (couple of years at least). They were configured for RAID-0 on the Intel mobo "raid adapter" in that machine. Any advice on the output? I'm still looking around, but thought I'd ask here as these forums are an excellent resource I'm running preclears again, and will post the next output. Running preclear 1.6. ========================================================================1.6 == invoked as: ./preclear_disk.sh /dev/sdb == ST3500641NS == Disk /dev/sdb has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 3:15:25 (42 MB/s) == Last Cycle's Zeroing time : 3:01:25 (45 MB/s) == Last Cycle's Post Read Time : 8:15:55 (16 MB/s) == Last Cycle's Total Time : 14:33:53 == == Total Elapsed Time 14:33:53 == == Disk Start Temperature: 27C == == Current Disk Temperature: 35C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdb /tmp/smart_finish_sdb ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Airflow_Temperature_Cel = 64 73 45 In_the_past 36 Temperature_Celsius = 36 27 0 ok 36 Hardware_ECC_Recovered = 61 87 0 ok 110955072 Current_Pending_Sector = 1 1 0 near_thresh 4294967294 Offline_Uncorrectable = 1 1 0 near_thresh 4294967294 No SMART attributes are FAILING_NOW 4294967294 sectors were pending re-allocation before the start of the preclear. 4294967294 sectors were pending re-allocation after pre-read in cycle 1 of 1. 4294967294 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 4294967294 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ============================================================================ ========================================================================1.6 == invoked as: ./preclear_disk.sh /dev/sdc == ST3500641NS == Disk /dev/sdc has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 3:19:59 (41 MB/s) == Last Cycle's Zeroing time : 2:58:11 (46 MB/s) == Last Cycle's Post Read Time : 7:53:12 (17 MB/s) == Last Cycle's Total Time : 14:12:29 == == Total Elapsed Time 14:12:29 == == Disk Start Temperature: 33C == == Current Disk Temperature: 34C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdc /tmp/smart_finish_sdc ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Reported_Uncorrect = 1 1 0 near_thresh 602 Airflow_Temperature_Cel = 66 67 45 In_the_past 34 Temperature_Celsius = 34 33 0 ok 34 Hardware_ECC_Recovered = 59 72 0 ok 195743418 Current_Pending_Sector = 1 1 0 near_thresh 4294967295 Offline_Uncorrectable = 1 1 0 near_thresh 4294967295 No SMART attributes are FAILING_NOW 4294967295 sectors were pending re-allocation before the start of the preclear. 4294967295 sectors were pending re-allocation after pre-read in cycle 1 of 1. 4294967295 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 4294967295 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 2 sectors had been re-allocated before the start of the preclear. 2 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ============================================================================
×
×
  • Create New...