• [6.9.0-beta22] No spin down of pool HDDs


    John_M
    • Solved Minor



    User Feedback

    Recommended Comments



    Retesting with 6.9.0-beta24, pool HDDs still don't spin down after the timeout period. I deliberately set it to a short timeout of 30 minutes to test it and 3 hours later they were still spinning. When manually spun down, the GUI initially changes to show the correct status but after navigating away and then returning to the Main or Dashboard pages later shows the same problem as before (i.e. spun down pool disks are shown as active but without temperatures).

    Link to comment

    The way I can reproduce the GUI problem is to manually spin down the drives in the pool, using the down arrow icon for the pool on the Main page, noticing that the GUI updates with grey balls and no temperatures, then navigate away - I go to Tools -> System Log and study it for a while, noticing that shcmd has issued the hdparm -y command to spin down each drive in the pool, then navigate back to Main and notice that the balls are green again, but no temperatures, then navigate to Dashboard and see green balls, no temperatures, and "active" status. Confirm actual status with

    hdparm -C /dev/sdX

     

    Link to comment
    35 minutes ago, bonienl said:

    Are your drives connected directly to SATA ports on your motherboard or to some HBA controller?

     

    Here's the problem.  The 'hdparm' command apparently issues a read just before spinning the drive down.  This was the cause of the first bug (it never used to do that).  What happens is emhttpd has background poller that checks disk i/o stats every second.  If it sees that there has been disk i/o since the last time we checked, we know the disk is spinning.  The bug was that emhttpd uses hdparm to spin down the disk but it captures the initial disk count of i/o's before issuing hdparm.  So next time it checks that extra read makes emhttpd think there is new disk i/o.

     

    Clear as mud right?  The solution is next post down ;)

    Link to comment
    13 minutes ago, limetech said:

    The solution is for webGui to not use hdparm directly to spin down, instead use 'cmdSpindown=idx' where 'idx' is the value of [diskN][idx].

    that is, can use:

     

    emcmd "cmdSpindown=$disk['name']"

    Link to comment

    Did a quick test and I need to specify the device name (current implementation), e.g.

     

    emcmd "cmdSpindown=$disk['name']"

     

    It doesn't work with "idx". Can you have a look?

    Link to comment

    LOL. This is the current implementation of the GUI.

    It doesn't use hdparm for array and pool devices, only for unassigned devices.

     

     

    Link to comment
    9 hours ago, bonienl said:

    Are your drives connected directly to SATA ports on your motherboard or to some HBA controller?

    All four are connected to an LSI SAS HBA (actually a cross-flashed Dell Perc H310). The other four ports control disks in the main array and they spin down and indicate correctly in the GUI, as they've always done. All disks are SATA.

    Link to comment

    I have an Adaptec HBA and this controller has the 'nasty' habit that after a disk is spun down, it would spin up again without apparent reason, hence my question.

     

    You are saying the 4 disks on this controller assigned to the array, work as expected, but the 4 pool disks do not?

     

    Have you tried to stop docker and vm services and with these services disabled do a spin down of the pool?

     

    Link to comment
    56 minutes ago, bonienl said:

    You are saying the 4 disks on this controller assigned to the array, work as expected, but the 4 pool disks do not?

    Yes, exactly this.

    56 minutes ago, bonienl said:

    Have you tried to stop docker and vm services and with these services disabled do a spin down of the pool?

    I haven't but I will and report back. There's nothing on this pool of hard disks that's related to Docker or VMs though. It's just a chunk of extra storage that previously was just four unassigned disks. All the Docker and VM stuff is still on my SSD cache pool, unchanged.

    Link to comment
    1 hour ago, bonienl said:

    Have you tried to stop docker and vm services and with these services disabled do a spin down of the pool?

    Ok. Here's what I did. First I rebooted to get a clean set of diagnostics. I started the array, then disabled the Docker and VM services. I then manually spun down the "extra" pool of four HDDs (sdf, sdg, sdh, sdi) using the icon associated with that pool. The Main page updated, showing grey balls and no temperatures for the four disks. I switched to Tools->System Log and noticed that hdparm -y had been issued to the four disks. (I also noticed a lot of nmbd error messages, seemingly associated with stopping the Docker service.) Switching back to the Main page shows green balls but no temperatures for the four "extra" pool disks. Dashboard shows green balls and "active" and no temperatures. So, no difference. I grabbed diagnostics.

     

    lapulapu-diagnostics-20200709-1924.zip

    Edited by John_M
    nmbd errors, not smbd
    Link to comment
    20 minutes ago, John_M said:

    Ok. Here's what I did. First I rebooted to get a clean set of diagnostics. I started the array, then disabled the Docker and VM services. I then manually spun down the "extra" pool of four HDDs (sdf, sdg, sdh, sdi) using the icon associated with that pool. The Main page updated, showing grey balls and no temperatures for the four disks. I switched to Tools->System Log and noticed that hdparm -y had been issued to the four disks. (I also noticed a lot of nmbd error messages, seemingly associated with stopping the Docker service.) Switching back to the Main page shows green balls but no temperatures for the four "extra" pool disks. Dashboard shows green balls and "active" and no temperatures. So, no difference. I grabbed diagnostics.

     

    lapulapu-diagnostics-20200709-1924.zip 140.04 kB · 0 downloads

    In this state what does 'hdparm -C /dev/sdf' show?

    Link to comment
    26 minutes ago, limetech said:

    In this state what does 'hdparm -C /dev/sdf' show?

    root@Lapulapu:~# hdparm -C /dev/sdf
    
    /dev/sdf:
     drive state is:  standby
    root@Lapulapu:~# 

     

    Link to comment
    32 minutes ago, John_M said:
    
    root@Lapulapu:~# hdparm -C /dev/sdf
    
    /dev/sdf:
     drive state is:  standby
    root@Lapulapu:~# 

     

    Thank you.  I can recreate.  There is something different about how disk statistics are recorded....

    • Thanks 1
    Link to comment

    Pool HDDs now spin down automatically after the chosen timeout and the GUI correctly shows their status. Thank you.

     

    48021991_ScreenShot2020-07-13at13_02_04.png.5e8abb6c9aa843172566a1f251943e1e.png

    Link to comment

    I'm still having pool spin down issues on one of my servers, can't figure out what the problem is, shouldn't be controller related as they are on the Intel SATA ports, but can't reproduce the problem on my test server.

     

    No disks from any pool spin down, most of the pools are used once daily, if I spin them down manually they spin down and stay spun down until the pool is accessed, but after a few seconds or minutes the status changes to active and the temps are not correct, first it's all sequential numbers, then it displays the same temp as my cache device, generating disk overheating notifications for all disks, but during this the disks remain spun down, once they spin up temps start showing correct but they will also remain spun up unless manually spun down, any ideas?

     

    imagem.thumb.png.d4199bf16aee16f9a0f811b82a27784f.png

    imagem.thumb.png.dde451905a544a2acc5dfb467e4d38d8.png

    tower1-diagnostics-20200728-1441.zip

    Link to comment
    5 hours ago, johnnie.black said:

    first it's all sequential numbers, then it displays the same temp as my cache device, generating disk overheating notifications for all disks

    I never saw anything like that in the GUI. The temperatures were either correct or not displayed (asterisks). My first thought is that maybe something has become corrupted on your flash. That's the only plausible explanation I can come up with, but I guess you've already eliminated that.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.