• Samsung 980 temp wrong


    ElectroBlvd
    • Closed

    I couldn't find a thread on this forum regarding this issue but I see countless people in many forums stating UnRaid reports their Samsung SSD's running hot when they aren't. I have this same issue. My notifications are flooded with "cache drive over temp....cache drive returned to normal....cache drive over temp....cache drive returned to normal...." and it takes a while to clear all those notifications out. There are so many of them I can't even cypher through them to find any notifications regarding an actual problem! It's annoying! It's frustrating! And it's been doing this for 2 years now! When is this going to be fixed?!?!? I was told this is a known issue with the current linux kernel Unraid uses and once Unraid is updated to a newer kernel the issue will go away. There has been many updates since then. I'm assuming none of those updates changed the kernel since this issue still persists! Why did I pay $130 for software that has an annoying bug like this that is going unaddressed??




    User Feedback

    Recommended Comments

    If your problem is incorrect temp reporting, which is specific to the newer samsung nvme devices like the 980, this thread from 6 months ago has a fix.

     

    But are you sure you don't simply need to change the temp warning ranges for your nvme?

    The default temp ranges are 45-55C for all storage devices in Unraid, which is too low for nvme and will definitely trigger warnings when under a reasonable load.

     

    I usually change this for my nvme devices to 55-65C (which is still a bit below spec for most nvmes) and rarely get temp warnings. This change can be done by clicking on the device name (e.g. Cache, Cache 2), from the MAIN tab

    image.png.b13e0e0d5f7a7819bd1f2f00eb7e5844.png

    • Thanks 1
    Link to comment
    On 1/13/2023 at 10:29 PM, tjb_altf4 said:

    If your problem is incorrect temp reporting, which is specific to the newer samsung nvme devices like the 980, this thread from 6 months ago has a fix.

     

    But are you sure you don't simply need to change the temp warning ranges for your nvme?

    The default temp ranges are 45-55C for all storage devices in Unraid, which is too low for nvme and will definitely trigger warnings when under a reasonable load.

     

    I usually change this for my nvme devices to 55-65C (which is still a bit below spec for most nvmes) and rarely get temp warnings. This change can be done by clicking on the device name (e.g. Cache, Cache 2), from the MAIN tab

    image.png.b13e0e0d5f7a7819bd1f2f00eb7e5844.png

     

    My 980's stay between 34-41C even under load. But it constantly reports one or the other is at 84C for 30 minutes and drops back to 34C. When checking the drive info, only one sensor is reporting at 84C while the other sensor is at 34-41C. They both have pretty decent size heatsinks with heat pipes and a fan along with thermal pads. I can touch the heatsink when it says its 84C and it feels cool to the touch. I read the post you linked and am wondering if this is an actual fix or a band aid to the problem. In other words, does it fix it to report the correct temp or does it just make it report a false temp to stop it from giving off notifications? I would like to know if my nvme actually does start over heating. But I'm tired of getting flooded by false readings.

     

    EDIT: Also, I built this server about 2 years ago and its been doing this from day one. I googled for a solution for days and the only info I could find was it being a kernel issue with reading the Samsung nvme temp sensors incorrectly. I was told the issue would be addressed in the next update....it wasn't....then I was told the issue would be addressed with a kernel update....2 years later this issue is still there. If the fix is a simple "add this line to the boot option" I don't understand why it isn't included in the boot option by default instead of me having to add it manually.

    Edited by ElectroBlvd
    Link to comment

    Its a Samsung hardware problem, you can either add the kernel arg to bandaid fix the problem, or update the nvme's firmware to fix at the source.

    Both solutions are detailed in the linked thread.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.