• [unRAID 6.10.0-rc1] - emhttpd: read SMART events keep spinning up array disks


    danioj
    • Urgent

    I am raising this bug report to formally capture discussions on the 6.10.0-rc1 release thread relating to a possible bug.

     

    In summary, I have found that emhttpd keeps issuing read SMART events consistently to my array disks, which is spinning them up. The effect is that my usually spun down Array is pretty much spun up 24x7.

     

    Here are the relevant discussions in that thread.

     

    There was some suggestion that the CATurboWrite Plugin or my Marvel Controller (which only has 3 disks of 16 connected to it) could be at fault but I have since removed the plugin and have confirmed that this issue appears to also be experienced by another user @Mathervius who doesn't have a Marvel controller.

     

    Diagnostics are attached.

     

    Chosen minor but really feels a bit more than that. Having the array spun up all the time isn't desirable at all.

     

     

    unraid-diagnostics-20210902-1852.zip




    User Feedback

    Recommended Comments



    I thought that the idea was UnRaid would only try and read SMART data from drives that it thinks are spun up.   If this is correct the real problem could be in how UnRaid decides drives are spun down not working as expected.

     

    • Thanks 1
    Link to comment
    11 hours ago, itimpi said:

    I thought that the idea was UnRaid would only try and read SMART data from drives that it thinks are spun up.   If this is correct the real problem could be in how UnRaid decides drives are spun down not working as expected.

     


    My sentiments exactly.

     

    Something like this, there has to be more people impacted by it if it is related to unRAiD itself. 

     

    I actually saw @nuhll post some logs regarding the USB  backup feature of MyServers that appeared to have similar regular disk spin up events. 
     

    Open of course to the fact that it could be something else causing it - just not sure what. 

    Link to comment
    21 hours ago, danioj said:

    I actually saw @nuhll post some logs regarding the USB  backup feature of MyServers that appeared to have similar regular disk spin up events. 

     

    Can you point me to that conversation?

    Link to comment

    I dont understand such questions, besides the fact that we talked about this you literally just had to click my username and the latest topic.. but anyway... ;)

     

    I also noticed the smart thing, but since it only show when it spin down and not when it spin up im not sure if it really JUST spins up because of that smart

     

    is it normal that it reads SMART multiple times a day?

    Edited by nuhll
    Link to comment
    6 hours ago, nuhll said:

    is it normal that it reads SMART multiple times a day?

    Monitoring of SMART attributes is a standard built-in feature of Unraid that is done frequently on active disks as if any monitored attribute changes you want to know ASAP as it can indicate a disk with problems.  I would expect SMART to be monitored VERY frequently on that basis.

    Link to comment

    I have exact same issue, but im not sure if it is manufacture reason or format difference reason. 

    here are some observation I have these days

     

    Link to comment

     

    On 9/3/2021 at 6:00 PM, danioj said:

    I actually saw @nuhll post some logs regarding the USB  backup feature of MyServers that appeared to have similar regular disk spin up events. 

     

    20 hours ago, ljm42 said:

    Can you point me to that conversation?

     

    19 hours ago, nuhll said:

    I dont understand such questions, besides the fact that we talked about this you literally just had to click my username and the latest topic.. but anyway... ;)

     

    I asked, because the discussion we had about Flash Backup was in no way related to array drive spinup.

    Link to comment

    It's really strange behavior for my array. 

     

    The disks would not spin down for ~3 days and then they did spin down for ~3 days. Now they are back to not spinning down again.

    Link to comment

    @ljm42 what are your / LT initial thoughts on this report!?

     

    I am happy to stay on the RC and help debug but I think we need a pragmatic and consistent approach that multiple users can follow. 

    Link to comment

    I decided to do a bit of an investigative test.

     

    I amended all my nightly activities (backups etc) so that they were more consistent and completed at ~5am. This would prompt the array to spin down due to inactivity shortly after (there was no-one up using data on the disks at the time).

     

    As expected, the array spun down. Then, for some reason I cannot fathom, unRAID started reading SMART data from the disks about an hour later.

     

    Sep  6 05:47:17 unraid emhttpd: spinning down /dev/sdg
    Sep  6 05:47:17 unraid emhttpd: spinning down /dev/sdd
    Sep  6 05:47:17 unraid emhttpd: spinning down /dev/sde
    Sep  6 05:47:17 unraid emhttpd: spinning down /dev/sdf
    Sep  6 05:47:19 unraid emhttpd: spinning down /dev/sdh
    Sep  6 05:47:19 unraid emhttpd: spinning down /dev/sdn
    Sep  6 05:47:19 unraid emhttpd: spinning down /dev/sdp
    Sep  6 05:48:18 unraid emhttpd: spinning down /dev/sdj
    Sep  6 05:48:18 unraid emhttpd: spinning down /dev/sdc
    Sep  6 05:48:18 unraid emhttpd: spinning down /dev/sdi
    Sep  6 05:54:22 unraid emhttpd: spinning down /dev/sdl
    Sep  6 06:00:02 unraid emhttpd: spinning down /dev/sdo
    Sep  6 06:00:05 unraid emhttpd: spinning down /dev/sdm
    Sep  6 06:00:05 unraid emhttpd: spinning down /dev/sdr
    Sep  6 06:00:07 unraid emhttpd: spinning down /dev/sds
    Sep  6 06:00:09 unraid emhttpd: spinning down /dev/sdq
    Sep  6 07:37:02 unraid emhttpd: read SMART /dev/sdi
    Sep  6 07:37:23 unraid emhttpd: read SMART /dev/sdn
    Sep  6 07:37:43 unraid emhttpd: read SMART /dev/sdp
    Sep  6 07:37:56 unraid emhttpd: read SMART /dev/sdq
    Sep  6 07:38:11 unraid emhttpd: read SMART /dev/sdr
    Sep  6 07:38:33 unraid emhttpd: read SMART /dev/sdm
    Sep  6 07:38:45 unraid emhttpd: read SMART /dev/sds
    Sep  6 07:38:57 unraid emhttpd: read SMART /dev/sdo
    Sep  6 07:39:11 unraid emhttpd: read SMART /dev/sdd
    Sep  6 07:39:48 unraid emhttpd: read SMART /dev/sde
    Sep  6 07:40:03 unraid emhttpd: read SMART /dev/sdf
    Sep  6 08:00:04 unraid emhttpd: read SMART /dev/sdj
    Sep  6 08:00:04 unraid emhttpd: read SMART /dev/sdh
    Sep  6 08:00:04 unraid emhttpd: read SMART /dev/sdg
    Sep  6 08:00:04 unraid emhttpd: read SMART /dev/sdc
    Sep  6 08:00:04 unraid emhttpd: read SMART /dev/sdl

     

    This is weird because it's not just one disk, it's all of them.

    Edited by danioj
    Link to comment

    I have noticed the same behavior with my array (and unassigned drives). Right after drive goes to sleep, "read SMART" command comes and everything wakes up. 

     

    Sep 6 18:16:40 Server emhttpd: spinning down /dev/sdc
    Sep 6 18:16:56 Server emhttpd: read SMART /dev/sdc
    
    Sep 6 18:59:10 Server emhttpd: spinning down /dev/sdh
    Sep 6 18:59:10 Server SAS Assist v0.86: Spinning down device /dev/sdh
    Sep 6 18:59:23 Server emhttpd: spinning down /dev/sdg
    Sep 6 18:59:23 Server SAS Assist v0.86: Spinning down device /dev/sdg
    Sep 6 18:59:54 Server emhttpd: read SMART /dev/sdh
    Sep 6 19:00:09 Server emhttpd: read SMART /dev/sdg

     

    Link to comment
    On 9/5/2021 at 8:28 PM, ljm42 said:

     

     

     

     

    I asked, because the discussion we had about Flash Backup was in no way related to array drive spinup.

    He just saw it in my logs like he said...

     

    anyway the flash backup thing is still bugged... i dont get why we cant just say "manual " and or "once a month" and finish.

    Edited by nuhll
    Link to comment
    7 hours ago, nuhll said:

    He just saw it in my logs like he said...

     

    I'm still confused. If you see a log entry that shows flash backup spinning up the array drives, please show me :) 

    Link to comment
    4 hours ago, ljm42 said:

     

    I'm still confused. If you see a log entry that shows flash backup spinning up the array drives, please show me :) 


    Somewhat detracting from the conversation guys but what is meant by that statement is that while @nuhll was posting about one issue I saw in the logs similar entries to the ones I had.
     

    Unrelated conversations.

    • Like 1
    Link to comment
    On 9/6/2021 at 7:58 AM, danioj said:

    @ljm42 what are your / LT initial thoughts on this report!?

     

    I am happy to stay on the RC and help debug but I think we need a pragmatic and consistent approach that multiple users can follow. 

     @ljm42 any thoughts on my suggestion? Is there any feedback from the team on this?

    Edited by danioj
    Link to comment

    The log snippet you posted shows that something is asking emhttpd for smart values but I'm not sure how to tell what it is. It probably isn't a VM or docker container, I'd guess either a plugin, the webgui, or emhttpd itself. Any chance you could run in safe mode for a little while to see if you can reproduce it there?

     

    If it doesn't happen in safe mode that would imply a plugin. If it does happen in safe mode that would imply the webgui or emhttpd itself is triggering it.

    Link to comment
    1 hour ago, ljm42 said:

    The log snippet you posted shows that something is asking emhttpd for smart values but I'm not sure how to tell what it is. It probably isn't a VM or docker container, I'd guess either a plugin, the webgui, or emhttpd itself. Any chance you could run in safe mode for a little while to see if you can reproduce it there?

     

    If it doesn't happen in safe mode that would imply a plugin. If it does happen in safe mode that would imply the webgui or emhttpd itself is triggering it.

     

    Done. Just rebooted into safe mode now and manually spun the array down. Ill just leave it alone now.

     

    I can get a good 5 hours of safe mode in before the family sit down for the night and wan their media server back.

    Edited by danioj
    • Like 1
    Link to comment
    1 hour ago, danioj said:

     

    Done. Just rebooted into safe mode now and manually spun the array down. Ill just leave it alone now.

     

    I can get a good 5 hours of safe mode in before the family sit down for the night and wan their media server back.

     

    OK, this has taken a new turn for me. As part of this test I also disabled Plex (as this the only Docker I could think could have the potential to read disks and cause them to spin up).

     

    What I have observed now seems to be related to mover. I have mover set to run every hour. Mostly this captures the odd changed file I want protected or my daily camera feed backup.

     

    Well, it just so happens the daily camera feed backup conceded with this test.

     

    Now I have my camera feed share isolated to one drive. This to me should mean, that when mover moves these files, that single disk spins up along with the parity disks and they are written. As I understand it, not every drive needs to be spun up even when data is being written.

     

    What I am observing here is that the whole array spun up when the mover operation started and in addition to the single disk and parity drives being written to - ALL my array disks are getting significant reads at the same time!!

     

    The SMART entries in the log coincide with this and it seems to explain the regularity of the SMART events. My question is, why  does mover need to read all my disks at the same time it is writing to a single array disk and parity?

     

    Screen Shot 2021-09-09 at 4.24.35 pm.png         425124317_ScreenShot2021-09-09at4_36_50pm.png.ee3a898d1032690dce82d29aab9fd687.png

    Edited by danioj
    Link to comment

    The mover has now stopped. In total there was about ~100GB of data to move at a guess. 

     

    The picture below shows the effect on the array after moving that 100GB. All those reads in addition to the writes to the single disk and parity.

     

    I imagine all the disks will down spin down, until mover initiates again in an hour. At which time, if the disks are spun down and it has to something to move, it will spin up the whole array to do so and at the same time read SMART values of each disk and read each drive will it writes.

     

    This is starting to come together.

     

    Unless I am completely missing something fundamental in how unRAID works here, this shouldn't be happening right?

     

    2080783335_ScreenShot2021-09-09at4_51_48pm.png.e85cd4f4ebd730a63fc2dbebc562483c.png

    Edited by danioj
    Link to comment

    Feel like I am spamming - but is my thread so I'll go ahead.

     

    I can now replicate what is happening. Here is what I did:

     

    • I spun down the array manually.
    • I opened my personal user share on my iMac.
    • This share has files which are spread across 4 disks. As expected those 4 disks spun up so the share could be read.
    • I copied a small 1MB picture to the share. The file was written to Cache as expected.
    • I initiated mover manually.
    • All disks spun up.
    • SMART was read from each spun up disk (the initial reported problem).
    • The file was written to one of the disks and each disk was read.
    • Mover stopped.

     

    I have done this 5 times now. Each with the same result.

     

    Like I posted above. If indeed unRAID needs every disk spun up and read to make a write to a single disk (and I have missed this behaviour all these years) then there is no problem at all and this is completely on me for being ignorant. I didn't think that was the case though.

    Edited by danioj
    Link to comment
    12 minutes ago, danioj said:

    Feel like I am spamming - but is my thread so I'll go ahead.

     

    I can now replicate what is happening. Here is what I did:

     

    • I spun down the array manually.
    • I opened my personal user share on my iMac.
    • This share has files which are spread across 4 disks. As expected those 4 disks spun up so the share could be read.
    • I copied a small 1MB picture to the share. The file was written to Cache as expected.
    • I initiated mover manually.
    • All disks spun up.
    • SMART was read from each spun up disk (the initial reported problem).
    • The file was written to one of the disks and each disk was read.
    • Mover stopped.

     

    I have done this 5 times now. Each with the same result.

     

    Like I posted above. If indeed unRAID needs every disk spun up and read to make a write to a single disk (and I have missed this behaviour all these years) then there is no problem at all and this is completely on me for being ignorant. I didn't think that was the case though.

     

    Downgraded to 6.9.2 and the behaviour is the same.

     

    When mover initiates, all disks spin up to be read as part of the write operation irrespective of the number of disks being written to. When a disk spins up SMART data is read as designed. As I have mover initiating every hour (and mover logging is off) queue the endless cycles of SMART reads as disks are spun up.

     

    I feel a bit stupid here. Years and years of unRAID use and I never noticed this is how things work.

    Link to comment
    31 minutes ago, danioj said:

    Downgraded to 6.9.2 and the behaviour is the same.

    This has been happening for some time, I changed how I work with the mover/array because of that.

    Link to comment
    30 minutes ago, ChatNoir said:

    Do you have Turbo Mode / Recontruct Write activated ? or Squid's Auto Turbo Write Mode plugin ?

    It might explain why all drive spin up ?


    *Penny Drops*
     

    Yes I do. I remember now back in 6.2 when this option was officially introduced.

    I don’t have the plug-in *anymore* but I have reconstruct write enabled - and if I recall correctly that’s the mode that spins up the disks to calculate parity in a quicker way than the traditional to speed up writes. 
     

    *slaps forehead really hard*
     

    This whole thread is a non event and has to do with my settings. God I’m pissed at myself. 
     

    To anyone who has invested / wasted any time at all as a result of this thread - I’m sorry!!

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.