Jump to content

numerica

Members
  • Posts

    10
  • Joined

  • Last visited

Posts posted by numerica

  1. OK, I fixed the br3 DHCP issue and the samba election issue.  What I noticed from the logs is this:

     

    Every once in a while, I don't know what it is or what it means, I get this in the logs :

     

    May  1 05:23:47 Azat-NAS kernel: hpsa 0000:03:00.0: scsi 8:0:3:0: resetting physical  Direct-Access     ATA      Samsung SSD 860  PHYS DRV SSDSmartPathCap- En- Exp=1
    May  1 05:23:59 Azat-NAS kernel: hpsa 0000:03:00.0: device is ready.
    May  1 05:23:59 Azat-NAS kernel: hpsa 0000:03:00.0: scsi 8:0:3:0: reset physical  completed successfully Direct-Access     ATA      Samsung SSD 860  PHYS DRV SSDSmartPathCap- En- Exp=1

     

    I think that's that HP SCSI driver `hpsa` resetting a disk and a few seconds later it completes the reset.  

     

    However, when the system hung that process never finished:

     

    May  7 17:10:27 Azat-NAS kernel: hpsa 0000:03:00.0: scsi 8:0:3:0: resetting physical  Direct-Access     ATA      Samsung SSD 860  PHYS DRV SSDSmartPathCap- En- Exp=1
    May  7 17:10:32 Azat-NAS nmbd[17557]: [2021/05/07 17:10:32.620474,  0] ../../source3/nmbd/nmbd_incomingdgrams.c:302(process_local_master_announce)
    May  7 17:10:32 Azat-NAS nmbd[17557]:   process_local_master_announce: Server UBUNTU at IP 192.168.3.84 is announcing itself as a local master browser for workgroup WORKGROUP and we think we are master. Forcing election.
    May  7 17:10:32 Azat-NAS nmbd[17557]: [2021/05/07 17:10:32.620638,  0] ../../source3/nmbd/nmbd_become_lmb.c:150(unbecome_local_master_success)
    May  7 17:10:32 Azat-NAS nmbd[17557]:   *****
    May  7 17:10:32 Azat-NAS nmbd[17557]:   
    May  7 17:10:32 Azat-NAS nmbd[17557]:   Samba name server AZAT-NAS has stopped being a local master browser for workgroup WORKGROUP on subnet 192.168.3.78
    May  7 17:10:32 Azat-NAS nmbd[17557]:   
    May  7 17:10:32 Azat-NAS nmbd[17557]:   *****
    May  7 17:10:50 Azat-NAS nmbd[17557]: [2021/05/07 17:10:50.641126,  0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2)
    May  7 17:10:50 Azat-NAS nmbd[17557]:   *****
    May  7 17:10:50 Azat-NAS nmbd[17557]:   
    May  7 17:10:50 Azat-NAS nmbd[17557]:   Samba name server AZAT-NAS is now a local master browser for workgroup WORKGROUP on subnet 192.168.3.78
    May  7 17:10:50 Azat-NAS nmbd[17557]:   
    May  7 17:10:50 Azat-NAS nmbd[17557]:   *****
    May  7 17:18:06 Azat-NAS nginx: 2021/05/07 17:18:06 [error] 4363#4363: *6615103 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.3.80, server: , request: "GET /Main HTTP/1.1", upstream: 

     

    So I this seems like the culprit here.  The hpsa driver is malfunctioning or the HP H240 HBA card that I am using is malfunctioning.  This issue did not happen in 6.8.7.  Perhaps it's the spinning down of drives?  I don't know, I'll experiment.

     

    If anyone has any experience with this driver, how to update it, or any info whatsoever, please reply and let me know.  Thanks!  

  2. Today, at around 23:20 EDT my unRAID OS locked up for the third time since I upgraded to 6.9.1 from 6.8.3 on March 14th, I believe.  This did not happen on 6.8.3, that was rock solid. 

     

    The first time was on April 1st at 02:00 EDT and second time was on April 4th at around 21:40 EDT.  I blamed it on a docker instance, so I shut it down.  Everything was operating smoothly until today when it happened again.  The server was running 2 VMs at the time, one using 8 threads, another using 4. 

     

    Please see the following list of attachments:

     

    • "unraid1.PNG" for a screenshot of the dashboard in locked up state.
    • "pidlist.txt" for the processes running at the time.  I tried to `kill - 9 6962` thinking that maybe it's one of the VMs, but it refused to die.  The CPU usage of that VM was 76.3% which I believe is only a % of a single thread, so that couldn't have been the issue.  These VMs were stable for months before update.
    • "ps.txt" and "pstree.txt" for the process tree. 
    • "top1.txt", "top2.txt", "top3.txt" for a snapshot of top within a few seconds.
    • "chron.txt" for the chron tasks (I have no custom ones).
    • "systemdevices.txt" for a list of the devices.
    • "unraid2.PNG" for the list of plugins.  The only one that makes sense to me to try running without is Doron's Spindown Plugin.  It doesn't really work for me anyway.

     

    I tried to look at the log, but it was unresponsive.  😔

     

    I use my computer and unRAID server for work, so this is a big deal to me.  I have to hard reset it each time (reboot doesn't work) and rebuilding parity is a big inconvenience on top of everything.  I have come to rely on the server to store all my data and when this happens all the vwmare VMs that I use as well as my main computer which relies on network shares to store data become unusable (explorer.exe gets very confused and has an emotional breakdown).   I have to figure out this issue and fix it.  Do I have to downgrade to 6.8.3?  How do I do that?  If I can somehow resolve this with 6.9.1, can someone advise me how?  I can't have technology be unstable in my life, that role is already fulfilled by people 😋

     

     

     

     

     

    unraid1.PNG

    unraid2.PNG

    pidlist.txt top1.txt top2.txt top3.txt chron.txt ps.txt pstree.txt systemdevices.txt

  3. 1 hour ago, SimonF said:

    Thats the result that should be seen, have you been able to check if the drive has spun down by touching it? or heat after it has been spun down a while.

     

    The way I have been determining it is by knowing what the box sounds like without mechanical drives and with.  After putting the drives on stand-by, I see no reduction in noise.

  4. 9 hours ago, SimonF said:

    I have a  ST4000NM0023 and it spins down fine but is connected via an LSI controller.

     

    What output to you get from sdparm -C sense /dev/sdX when unraid thinks they are spun down.

     

    Sounds like your drives are going into an idle state, as takes 10-15 seconds for mine to spin up.

     

        /dev/sdf: SEAGATE   ST3000NM0023      0003
    Additional sense: Standby condition activated by command

     

    That's the output.  Thanks for helping!

  5. Hey Doron, 

     

    Thank you for taking the time and making this plugin.  I'm sorry if you've already answered this a million times.  I've tried to look through the thread, but it's difficult to follow along.

    I have HP H240 HBA and Constellation ES.3 ST3000NM0023 drives.  I have been following this plugin since you've released it last year and I know that ES.3s were a problem.  I cried a little and then figured I'll be patient and see if there are any fixes in the future 😭😉

     

    In 6.8.3 unRAID would never even pretend that the drives were spun down and in the log I would get that red/green zebra response that I've seen in this thread.  In 6.9.1, with your plugin installed (v0.85), unRAID "spin downs" the drives, the drives make a weird noise, the green dot becomes gray, but the drives appear to still be spinning and making spinning noise.  The weird thing is that if I try to access the drive that is "spun down" (and yet still spinning) it takes a couple of seconds for it to "wake up" and allow me access to the files within.

     

    Maybe I am misunderstanding what "spinning down" is, but I assumed that they would literally stop spinning.  Also, I did check if the drives were the source of the noise and yes they are, everything else in the system is too quiet to make any audible noise when the side panel is on.  Is there anything you can do to either help me troubleshoot or is it a lost cause?  Thanks and sorry again if you've answered this before.

  6. I've had unraid 6.8.3 set up and working for quite some time and everything was fine.  I went on the Web UI and clicked Apps and it said something along the lines of "FATAL FLASH DRIVE ERROR".  

     

    I rebooted the server and the Web UI is not working.  The server is still on the same IP address as before, but when I try to access it I get a "ERR_CONNECTION_REFUSED".  

     

    I am able to access the GUI mode, but when I try to access the Web UI from there, I am unable to do so.  `localhost` server does not seem to be running.  I checked `nginx` in terminal and it gave me an error regarding SSL cert.  https://i.imgur.com/bcRgmHW.jpg

     

    I checked that the `/boot` drive is attached, but when trying to access it, it does not appear to have anything there:  https://i.imgur.com/HZRpmZA.jpg

     

    The server seems to be healthy otherwise with plenty of ram and stuff, so that's probably not an issue.  https://i.imgur.com/0Z0kPIf.jpg

     

    My array does not start automatically when the server reboots, so I cannot access any of my files.  I need them for work, so I need to fix this ASAP.  Is this a flash drive issue?  It has been plugged in the server for months now and I haven't done anything to it.  How do I figure out if the flash drive is the issue?  

     

    Is there a way to start the array at least from the terminal, I need access to my shares.  Please help 😭😭

×
×
  • Create New...