Jump to content

jeffreywhunter

Members
  • Posts

    1,129
  • Joined

  • Last visited

Posts posted by jeffreywhunter

  1. I'm seeing a recurring log entry (every 4 minutes) from a Docker (GS-Server).  Such a useful docker, love how well it works except for this log issue.  I've spoken with the Docker creator who is unable to recreate the error and Goodsync tech support.  Goodsync says the problem is a "known issue" with Linux and requires some kind of update to fix (they said google it, but I've not been able to find anything specific and recent).

     

    So I'm posting this in hopes that someone is aware of a fix.  While everything works fine, my log is filling up with useless data!  I've attached a copy just FYI...

     

    Thanks in advance!

     

    hunternas-syslog-20200210-2052.zip

  2. I've had a disk (HGST HUS726060ALA640 - 6TB) throw disk read errors causing the array health report to fail.  Ran extended SMART self-test which passed.  Last SMART test result attached.  Not sure how to read these reports.  I see 3 errors in the SMART Extended Comprehensive Error Log, which seems to indicate Error 10 and an LBA number.

     

    Should I replace this disk?  Its the Parity disk and the newest disk in my array.

     

    This is the alert I get in email:

    Event: Unraid Status
    Subject: Notice [HUNTERNAS] - array health report [FAIL]
    Description: Array has 13 disks (including parity & cache)
    Importance: warning
    
    Parity - HGST_HUS726060ALA640_AR11021EHJAHKB (sdc) - active 33 C (disk has read errors) [NOK] Disk 1 - ST4000DM000-1F2168_W30086L5 (sdl) - active 38 C [OK] Disk 2 - Hitachi_HDS5C3020ALA632_ML0220F31HKTDN (sdd) - standby [OK] Disk 3 - Hitachi_HDS5C3020ALA632_ML0220F30M33ZD (sde) - standby [OK] Disk 4 - Hitachi_HDS5C3020ALA632_ML0220F30MWAYD (sdf) - active 35 C [OK] Disk 5 - ST5000DM000-1FK178_W4J0YEEN (sdg) - active 36 C [OK] Disk 6 - Hitachi_HUA723030ALA640_MK0371YVG9PYUA (sdj) - active 40 C [OK] Disk 7 - Hitachi_HDS5C3020ALA632_ML0220F30YS4XD (sdk) - standby [OK] Disk 8 - Hitachi_HUA723030ALA640_MK0373YVHH0A4C (sdh) - active 41 C [OK] Disk 9 - Hitachi_HUA723030ALA640_MK0361YHJ0N8JD (sdi) - active 42 C [OK] Disk 10 - Hitachi_HDS5C3020ALA632_ML0220F311GHND (sdn) - standby [OK] Disk 11 - ST5000DM000-1FK178_W4J0B3G1 (sdm) - active 34 C [OK] Cache - Samsung_SSD_850_EVO_500GB_S2RANX0H811137T (sdb) - active 31 C [OK]
    
    Parity is valid
    Last checked on Mon 18 Nov 2019 09:02:01 AM CST (yesterday), finding 0 errors.
    Duration: 17 hours, 31 minutes, 36 seconds. Average speed: 95.1 MB/s

     

    hunternas-smart-20191119-0106.zip

  3. I'm see the following error in my log, but no reference to any bad files.

     

    BLAKE2 hash key mismatch,  is corrupted

     

    This was the only entry in the log.  Is there somewhere else I need to check?

     

    Also, if we do see errors, do we need to reformat the file or just recopy the indicated files?  I do see errors on a couple of the other disks with file names indicated.

     

    Thanks in advance!

  4. Having problems with preclear on 6.6.7.  It starts up fine, but a few hours later I come back and its stopped.  Screenshot is of the preclear screens/log and the system log.  I can't tell what is causing it to stop.  Could it just be the disk failing?  Seems I'd have a error or something that would tell me...

     

    Thanks in advance!

    hunternas-diagnostics-20190312-0656.zip

  5. UR 6.5.5.  I've been using File Integrity for a long time.  No real issues. Today, I received an email notice with the following error displayed.

    Jan 6 14:58:06 HunterNAS bunker: error: BLAKE2 hash key mismatch, /mnt/disk1/My Backups/_gsdata_/a9ebea1d339a02ae40d547cf8068834d.tib is corrupted
    Jan 6 14:58:08 HunterNAS bunker: error: BLAKE2 hash key mismatch, /mnt/disk1/My Backups/_gsdata_/3bef27d30a51b20186a922805386f65a.tib is corrupted

     

    This happened back in Oct with the same files.  

    Oct 21 15:09:53 HunterNAS bunker: error: BLAKE2 hash key mismatch, /mnt/disk1/My Backups/_gsdata_/a9ebea1d339a02ae40d547cf8068834d.tib is corrupted
    Oct 21 15:09:54 HunterNAS bunker: error: BLAKE2 hash key mismatch, /mnt/disk1/My Backups/_gsdata_/3bef27d30a51b20186a922805386f65a.tib is corrupted

    These are backups of one of my workstations.  Large file size.  Anything to worry about?  Diagnostics file attached.

     

    Thanks in advance!

    hunternas-diagnostics-20190106-2330.zip

  6. Ok, I'll reinstall and try that.  Will keep you posted.  In thinking about how a backup job might work.  Is there some aspect of backing up via FTP that could cause ProFTPd to have a problem/memory leak over time?  On a nightly basis I might backup 1000 files.  Mostly documents, sometimes music/video files.  If you had an idea, I could build a test case that would push the boundaries...

  7. On 5/19/2018 at 1:44 AM, SlrG said:

    @jeffreywhunter

    I really want to help you solve your problem. But please do as I say and remove my proftpd plugin and see if the server still crashes. Then we will know for sure it is related to proftpd and can start digging in only that direction. Do you use ftp for other purposes as your personal backups? Because if only yourself are accessing your server, you don't need to jail users. In that case you could use unRAIDs internal ftp to keep your backup jobs running while the plugin is gone.

     

    Looking at your log and seeing it is hourly, the memory goes down, but in the last log entry before the crash it is back to 6.6 gig again. Maybe if you increase the logging interval to every 5 minutes or minute we will get a better view. Also it could be helpful to increase logging in proftpd as described here and reenable the transfer log as described here.

     

    Ok, I've followed your directions.  I've been up for more than 4 days.  Far longer than I've ever gone (usually crashes within 24 hours).  I think it has to do with FTP file transfers.  As I've mentioned I have about 12 goodsync FTP jobs that run nightly to backup various directories on my PC and such.  So, evidently, on my system, something about ProFTPd causes my system to crash.  What diagnostics can I conduct to isolate the problem?

     

    FYI, I've included my latest diagnostics.

     

    Thanks in advance!

    hunternas-diagnostics-20180524-1644.zip

  8. 14 hours ago, bobbintb said:

     

    I'm not quite sure how to use this. What I'm looking for really is the same GUI experience I get with Goodsync in Windows in my UnRaid NAS. I haven't used the command line Linux version. I'm fine with command line but like the ease of visualizing the information and setting it up that the GUI provides. I doubt anything like that is available at the moment though. I might look into making a Windows based docker image if I can't find anything else. I'm just using a VM right now but it's not as efficient on resources.

     

    There would be no need for a GUI Goodsync version that runs in the UnRaid WebGui (although that would be cool).  What I'm looking for is someone to setup a docker with the Linux version/server for Goodsync as describe here: http://www.goodsync.com/for-linux

     

    Once the Goodsync docker is running, you'd use the Windows interface of Goodsync to run the backups.  It would interface to the Goodsync Linux server/docker and handle the file management on the Linux side.  

  9. @SlrG - So I've built a little log using User Scripts.  Works really well.  In trying to uncover what's causing my server to crash and trying to eliminate ProFTPd as related to the source.  What I'm finding (log attached) is that the problem appears at night, while backups are running.  I have about 10 jobs that backup via ProFTPd.  They start around midnight and end around 6am.  If you look at my log, you see that during the day, memory usage is pretty stable.  But at night, when the backup jobs run, memory usage expands until (and this is frustrating, SOMETIMES) it crashes.  Perhaps it has to do with how much data movement?  No idea how to diagnose that. 

     

    One other important note.  It appears it often crashes on the job that backups my workstation image.  It can be anything from a small file (48mb) to a large file (10gb) depending on changes, etc.  I have a 512gb cache drive, so that shouldn't be a problem.

     

    Perhaps this log will shed some light.  Perhaps there are some additional commands I can add to my log to help diagnose the cause?  Not saying its ProFTPd, but maybe something related to backups?

     

    My log script is pretty simple:

     

    #!/bin/bash
    echo '======Start=======' >> /boot/log_free_memory.log
    date >> /boot/log_free_memory.log
    free -h >> /boot/log_free_memory.log
    echo '======End=======' >> /boot/log_free_memory.log

    And I run it hourly.  In the attached log, the server crashed after the 4:47AM log dump.  I was out all day and didn't get the server restarted until 4PM.

     

    Thanks in advance for taking a look!  log_free_memory.log

  10. 4 hours ago, SlrG said:

    If you still have both icons in your settings, its a bit odd.

     

    the apache plugin webserver stopped running with the latest versions and will get no update. 

     

     

    @SlrG

    I only have the Proftpd icon in settings.  But in your comment to Meaux you said the Apache plugin wasn't working and you can't edit the file.  I edited my config just to test and it looked liked it edited, but then got an error saving.  Is that the current behavior?  If the Apache Plugin is not working properly, could it be the cause of the problems I'm having?

     

    I've deleted the plugin from my system and disabled the reboot cron.  I'll let you know if that solves my problem.

     

    I see there are a couple other Apache plugins out there.  Do you recommend using them?  Or is the editor Meaux is using be your new recommendation?

  11. On 2/15/2017 at 11:04 AM, gubbgnutten said:

     

    No, you're not experiencing the same, "missing csrf_token" and "wrong csrf_token" are completely different things. You probably have a web browser active somewhere with a stale web ui. Close or reload it/them.

     

    I just experienced this 'wrong csrf_token' when I rebooted today and had several browser windows opened.  The Web GUI reconnected to the server after the reboot completed.  I thought it was pretty cool that it did, until I saw all the wrong csrf_token messages.  Closing the browser solved the problem obviously, but my simple question is why does this happen?  Is it a security issue?

  12. 1 hour ago, John_M said:

     

    It's like he said:

    
    root@Mandaue:~# ls -l /sbin/reboot
    lrwxrwxrwx 1 root root 4 Apr 21 19:00 /sbin/reboot -> halt*
    root@Mandaue:~# ls -l /usr/local/sbin/reboot
    /bin/ls: cannot access '/usr/local/sbin/reboot': No such file or directory
    root@Mandaue:~#

     

    Sorry, should have been clearer.  My script is using 

    /usr/local/sbin/reboot

    And it works, so was wondering if the path was incomplete.  Evidently either works?

  13. On 3/15/2018 at 8:38 AM, nuhll said:

    Edit2: Okay it worked once, now its not working anymore, any idea? (if i click run script, its just empty)

     

     

    Did you get this figured out?  Is what you have working now? 

     

    You probably already know that In the scripts sub-directory for the cron you want to run, the script should be in a file simply called script.  First time I tried to setup a script, I created the script in the cron directory (i.e. nightly_reboot) and called it nightly_reboot.cron, which was ignored by user scripts and it created a file in the nightly_reboot folder called script, which was empty and thus ran that instead.

  14. I'm wanting a clean reboot on a daily basis and I don't want to cause unclean shutdowns.  I run plex, mysql, pydio and other dockers.  Will this script work to cleanly shutdown the dockers, then reboot?  I'd put this in the User Scripts plugin...  

     

    /etc/rc.d/rc.docker stop && /etc/rc.d/rc.docker start
    /usr/local/sbin/powerdown -r

    Thanks in advance!

  15. @SlrG

    I've run several backups and noticed no problems with proftpd.  I do see that the process is only alive when an actual ftp session is active.  i.e. it only shows up as a process when a backup is running.  Not seeing anything strange.  Is there something else I can monitor?

     

    Is it possible that the Docker.img file has some runaway logging going on?  How would i investigate that?

  16. 5 hours ago, SlrG said:

     

    Then I would monitor the shfs usage like described here and a syslog tail like you did before.

     

     

    I took a look at this, shfs on my system:

    top - 08:03:26 up 15:15,  3 users,  load average: 0.69, 0.66, 0.72
    Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  4.0 us, 37.4 sy,  0.3 ni, 58.0 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
    GiB Mem :   15.369 total,    2.964 free,    3.849 used,    8.557 buff/cache
    GiB Swap:    0.000 total,    0.000 free,    0.000 used.   10.351 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    13276 root      20   0  747644 274744    644 S   0.3  1.7   5:23.63 shfs

     

    I've started HTOP and added a memory meter just to see if the memory is growing.  So far the max I've seen is 4.44G/15.4G and its steady.

    root@HunterNAS:~# free -m
                  total        used        free      shared  buff/cache   available
    Mem:          15738        3684        3292         606        8761       10855
    Swap:             0           0           0

    Thoughts?

  17. On 5/7/2018 at 10:21 AM, SlrG said:

    @jeffreywhunter

    Well do you really need the transfer log? It doesn't seem to be correctly configured on your system anyway, so why not disable it completely by setting it to NONE?

    That should stop the message from spamming your log.

     

    I can't imagine the ProFTP plugin being responsible for your lockups, but to be absolutely sure, you will have to uninstall and run some time without the plugin. If the system still crashes, there is another underlying problem.

     

    Looking at your log (but I am no expert in this and might be wrong) there seems to be a memory leak and you are running out of memory and thus the system crashes.

     

    Maybe you could follow the tips from here and check your shfs usage and try to change the disk cache ratio?   

     

     

     

    I took your advice and disabled the transferlog (TransferLog             NONE).  It ran for about 3 days, then crashed again.  I had turned on Fix Common problems troubleshooting mode.  I've attached the file.  Not sure what to look for.  The last couple of problems was:

     

    May 10 04:12:50 HunterNAS root: Fix Common Problems Version 2018.04.25
    May 10 04:12:51 HunterNAS root: Fix Common Problems: /var/log currently 2 % full
    May 10 04:12:51 HunterNAS root: Fix Common Problems: rootfs (/) currently 8 % full
    May 10 04:20:25 HunterNAS kernel: mdcmd (149): set md_write_method 0
    May 10 04:20:25 HunterNAS kernel: 
    May 10 04:22:51 HunterNAS root: Fix Common Problems: Capturing diagnostics.  When uploading diagnostics to the forum, also upload /logs/FCPsyslog_tail.txt on the flash drive

     

    Attached is the Fix Common problems troubleshooting logs.  Appreciate any help!

    hunternas-diagnostics-20180507-1131.zip

    FCPsyslog_tail.txt

     

    Last bit, I had found some posts about OOM issues here, I subsequently changed vm.dirty_background to 1% and vm.dirty_ratio to 2% based on the post, but then still had the problem.

     

    Thanks in advance!

×
×
  • Create New...