MaxwellHouse Posted November 4, 2021 Share Posted November 4, 2021 Hello all, I just updated to 6.10.0-rc2 and noticed that I'm now having issues with my NVMe cache SSDs spike up from 31 C to 84 C. I use this cache drive for Docker containers. I first noticed this issue after I 1) upgraded to rc2, and 2) installed a new container (Jellyfin). After removing the new container the problem seemed to go away until tonight. Any thoughts? Should I downgrade or is my NVMe faulty? Thanks, Brian Quote Link to comment
Vr2Io Posted November 4, 2021 Share Posted November 4, 2021 You could stop array and check does temperature drop a lot, it could be normal if NVMe without active cooling. Quote Link to comment
MaxwellHouse Posted November 4, 2021 Author Share Posted November 4, 2021 Yes! As soon as I stop the array the temperature drops back to normal. You can see the mounting of the SSD in question in the attached picture and I'm not sure why the chassis didn't come with active cooling for both M.2 slots. The other M.2 slot is right above with a nice heatsink and fan. Quote Link to comment
Michael_P Posted November 4, 2021 Share Posted November 4, 2021 Put a heatsink on it Quote Link to comment
Vr2Io Posted November 4, 2021 Share Posted November 4, 2021 Right, add a 3rd party heatsink. Suggest don't use onboard one, I found almost all miss or hard to align, no matter Asus Gigabyte EVGA. Quote Link to comment
MaxwellHouse Posted November 4, 2021 Author Share Posted November 4, 2021 Ok, very good. I'll head to the store at some point to get a heatsink. Thanks! Quote Link to comment
DivideBy0 Posted November 4, 2021 Share Posted November 4, 2021 I have a heatsink on mine and it makes a HUGE difference 1 Quote Link to comment
MaxwellHouse Posted November 4, 2021 Author Share Posted November 4, 2021 It usually sits at 31 - 32 C, but then after the rc2 update I've noticed it spiking up to 84C. It's probably a good idea to get a heatsink on it regardless. Quote Link to comment
DivideBy0 Posted November 4, 2021 Share Posted November 4, 2021 84C? That's a big difference, more than double. It may be a bad sensor reading? I can't explain why RC2 will make it jump like that Quote Link to comment
MaxwellHouse Posted November 4, 2021 Author Share Posted November 4, 2021 I know, it's weird. At first I thought it was a docker container I was trying to install (it's my Docker cache drive) but I still see the issue. You bring up a good point... it is an older NVMe that I had used before. I suppose I could touch it to see for sure! :-) Otherwise it might be time to upgrade to a 1TB drive w/ heatsink! Quote Link to comment
DivideBy0 Posted November 4, 2021 Share Posted November 4, 2021 3 hours ago, MaxwellHouse said: I know, it's weird. At first I thought it was a docker container I was trying to install (it's my Docker cache drive) but I still see the issue. You bring up a good point... it is an older NVMe that I had used before. I suppose I could touch it to see for sure! 🙂 Otherwise it might be time to upgrade to a 1TB drive w/ heatsink! Revert back to your previous urnarid version and see if issues persists. If not then is clearly a bad reading in the kernel somewhere. Quote Link to comment
MaxwellHouse Posted November 5, 2021 Author Share Posted November 5, 2021 Good call. Been running rc1 again for a while. No issues....... Will update tomorrow morning. Hmmmmmm Quote Link to comment
MaxwellHouse Posted November 5, 2021 Author Share Posted November 5, 2021 Well, after reverting I had no issues. So just to make sure I went back to rc2 but this time I ran "Update Assistant" before going from rc1 to rc2. It's been up for about 90 minutes without any issues. Quote Link to comment
Ulf Thomas Johansen Posted November 8, 2021 Share Posted November 8, 2021 (edited) On 11/4/2021 at 1:20 AM, MaxwellHouse said: Hello all, I just updated to 6.10.0-rc2 and noticed that I'm now having issues with my NVMe cache SSDs spike up from 31 C to 84 C. I use this cache drive for Docker containers. I first noticed this issue after I 1) upgraded to rc2, and 2) installed a new container (Jellyfin). After removing the new container the problem seemed to go away until tonight. Any thoughts? Should I downgrade or is my NVMe faulty? Thanks, Brian Confirming the same: Running new dual m2’s (one for docker and one for vm’s) and they are both reporting 84 degree spikes. I have had no such reports on rc1, but several a day after rc2. They are both heat sinked and operates in the 35-42 range. Even more strange is that it always spikes directly to 84 - never more, never less - before normalizing. I’m running a Ryzen 5600G rig on an Asus ROG Strix X570-F board. Attaching the latest log entries: 08-11-2021 08:11 Unraid Dockers disk message Notice - Dockers disk returned to normal temperature Samsung_SSD_980_1TB_S649NF0R675515B (nvme1n1) normal 08-11-2021 07:39 Unraid Dockers disk temperature Alert - Dockers disk overheated (84 C) Samsung_SSD_980_1TB_S649NF0R675515B (nvme1n1) alert 08-11-2021 01:27 Unraid Virtuals disk message Notice - Virtuals disk returned to normal temperature Samsung_SSD_980_1TB_S649NF0R675513Z (nvme0n1) normal 08-11-2021 01:27 Unraid Dockers disk message Notice - Dockers disk returned to normal temperature Samsung_SSD_980_1TB_S649NF0R675515B (nvme1n1) normal 08-11-2021 00:56 Unraid Dockers disk temperature Alert - Dockers disk overheated (84 C) Samsung_SSD_980_1TB_S649NF0R675515B (nvme1n1) alert 07-11-2021 23:54 Unraid Virtuals disk temperature Alert - Virtuals disk overheated (84 C) Samsung_SSD_980_1TB_S649NF0R675513Z (nvme0n1) alert 07-11-2021 22:53 Unraid Virtuals disk message Notice - Virtuals disk returned to normal temperature Samsung_SSD_980_1TB_S649NF0R675513Z (nvme0n1) normal 07-11-2021 22:22 Unraid Virtuals disk temperature Alert - Virtuals disk overheated (84 C) Samsung_SSD_980_1TB_S649NF0R675513Z (nvme0n1) alert 07-11-2021 20:21 Unraid Dockers disk message Notice - Dockers disk returned to normal temperature Samsung_SSD_980_1TB_S649NF0R675515B (nvme1n1) normal 07-11-2021 19:50 Unraid Dockers disk temperature Alert - Dockers disk overheated (84 C) Samsung_SSD_980_1TB_S649NF0R675515B (nvme1n1) alert 07-11-2021 17:49 Unraid Virtuals disk message Notice - Virtuals disk returned to normal temperature Samsung_SSD_980_1TB_S649NF0R675513Z (nvme0n1) normal 07-11-2021 16:18 Unraid Virtuals disk temperature Alert - Virtuals disk overheated (84 C) Samsung_SSD_980_1TB_S649NF0R675513Z (nvme0n1) alert //UlfThomas Edited November 8, 2021 by Ulf Thomas Johansen Quote Link to comment
Vr2Io Posted November 8, 2021 Share Posted November 8, 2021 1 hour ago, Ulf Thomas Johansen said: Even more strange is that it always spikes directly to 84 - never more, never less - before normalizing. Really weird, no middle temperature. Quote Link to comment
Ulf Thomas Johansen Posted November 8, 2021 Share Posted November 8, 2021 2 minutes ago, Vr2Io said: Really weird, no middle temperature. Indeed - which leads me to speculate that it might be a misread and not an actual temp reading perhaps? Quote Link to comment
Vr2Io Posted November 8, 2021 Share Posted November 8, 2021 6 minutes ago, Ulf Thomas Johansen said: Indeed - which leads me to speculate that it might be a misread and not an actual temp reading perhaps? Not sure, but I think you could further troubleshoot by apply different loading to NVMe to check will got middle temperature. Some thinking ongoing, i.e. in RC2, mention ACPI [rc2] Enabled additional ACPI kernel options [rc2] Updated out-of-tree drivers [rc2] Enabled TPM kernel modules (not utilized yet) - note this is for Unraid host utilizing physical TPM, not emulated TPM support for virtual machnes. Quote Link to comment
Ulf Thomas Johansen Posted November 8, 2021 Share Posted November 8, 2021 3 minutes ago, Vr2Io said: Not sure, but I think you could further troubleshoot by apply different loading to NVMe to check will got middle temperature. Any suggestions as to how I would do this? Just plain copy jobs? Quote Link to comment
Vr2Io Posted November 8, 2021 Share Posted November 8, 2021 (edited) 22 minutes ago, Ulf Thomas Johansen said: Any suggestions as to how I would do this? Just plain copy jobs? Pls use docker disk ( stop docker ) or VM disk ( stop VM ) and perform below test, adjust the count value for different loading dd if=/dev/random of=/mnt/xxx/test.bin bs=1MB count=1024 edit : pls at command prompt type sensors, check does NVMe have report its temperature, pls also post the output here Edited November 8, 2021 by Vr2Io Quote Link to comment
Ulf Thomas Johansen Posted November 8, 2021 Share Posted November 8, 2021 1 hour ago, Vr2Io said: edit : pls at command prompt type sensors, check does NVMe have report its temperature, pls also post the output here Will perform tests later today. This is the output of 'sensors'. amdgpu-pci-0a00 Adapter: PCI adapter vddgfx: 906.00 mV vddnb: 993.00 mV edge: +33.0°C power1: 1000.00 uW nvme-pci-0300 Adapter: PCI adapter Composite: +41.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +41.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +42.9°C (low = -273.1°C, high = +65261.8°C) nct6798-isa-0290 Adapter: ISA adapter in0: 1.15 V (min = +0.00 V, max = +1.74 V) in1: 1000.00 mV (min = +0.00 V, max = +0.00 V) ALARM in2: 3.38 V (min = +0.00 V, max = +0.00 V) ALARM in3: 3.31 V (min = +0.00 V, max = +0.00 V) ALARM in4: 1.01 V (min = +0.00 V, max = +0.00 V) ALARM in5: 2.04 V (min = +0.00 V, max = +0.00 V) ALARM in6: 360.00 mV (min = +0.00 V, max = +0.00 V) ALARM in7: 3.38 V (min = +0.00 V, max = +0.00 V) ALARM in8: 3.33 V (min = +0.00 V, max = +0.00 V) ALARM in9: 896.00 mV (min = +0.00 V, max = +0.00 V) ALARM in10: 1.02 V (min = +0.00 V, max = +0.00 V) ALARM in11: 496.00 mV (min = +0.00 V, max = +0.00 V) ALARM in12: 1.02 V (min = +0.00 V, max = +0.00 V) ALARM in13: 392.00 mV (min = +0.00 V, max = +0.00 V) ALARM in14: 328.00 mV (min = +0.00 V, max = +0.00 V) ALARM Array Fan: 463 RPM (min = 0 RPM) Array Fan: 1124 RPM (min = 0 RPM) SYSTIN: -62.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor CPU Temp: +30.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor AUXTIN0: +79.0°C sensor = thermistor AUXTIN1: -62.0°C sensor = thermistor MB Temp: +26.0°C sensor = thermistor AUXTIN3: +84.0°C sensor = thermistor PECI Agent 0 Calibration: +32.5°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled nvme-pci-0900 Adapter: PCI adapter Composite: +31.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +34.9°C (low = -273.1°C, high = +65261.8°C) Quote Link to comment
MaxwellHouse Posted November 8, 2021 Author Share Posted November 8, 2021 Thank you guys for the detailed follow up! I did see issues once again after upgrading to rc2. Same exact issue where it spikes to 84. I've downgraded back to rc1 and haven't had the issue again. Quote Link to comment
Ulf Thomas Johansen Posted November 8, 2021 Share Posted November 8, 2021 (edited) 3 hours ago, Vr2Io said: dd if=/dev/random of=./test.bin bs=1MB count=10240 Tested with the above command whilst extracting sensor data. It does indeed report increasing temperatures: Composite: +42.9°C (low = -273.1°C, high = +81.8°C) Sensor 1: +42.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +46.9°C (low = -273.1°C, high = +65261.8°C) Composite: +43.9°C (low = -273.1°C, high = +81.8°C) Sensor 1: +43.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +52.9°C (low = -273.1°C, high = +65261.8°C) Composite: +43.9°C (low = -273.1°C, high = +81.8°C) Sensor 1: +43.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +53.9°C (low = -273.1°C, high = +65261.8°C) Composite: +43.9°C (low = -273.1°C, high = +81.8°C) Sensor 1: +43.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +54.9°C (low = -273.1°C, high = +65261.8°C) Composite: +43.9°C (low = -273.1°C, high = +81.8°C) Sensor 1: +43.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +54.9°C (low = -273.1°C, high = +65261.8°C) Composite: +43.9°C (low = -273.1°C, high = +81.8°C) Sensor 1: +43.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +46.9°C (low = -273.1°C, high = +65261.8°C) Edited November 8, 2021 by Ulf Thomas Johansen 1 Quote Link to comment
MaxwellHouse Posted November 8, 2021 Author Share Posted November 8, 2021 Hey, I ran the dd command while monitoring sensors with rc1 and noted that Sensor 2 went up by 1° C during the course of the write. It came back down once the dd command had finished. I'll re-install rc2 and do the same exercise. 1 Quote Link to comment
Vr2Io Posted November 8, 2021 Share Posted November 8, 2021 Due to I haven't NVMe under Unraid, so can't make test on any difference with RC1 and RC2 or doing troubleshoot. Quote Link to comment
MaxwellHouse Posted November 8, 2021 Author Share Posted November 8, 2021 (edited) No worries! Thank you very much for your suggestions to try to troubleshoot this issue! I upgraded to rc2 and re-ran the random file write. Very interesting results on Sensor 2... see here: There is definitely more of a delta. Incidentally I increased the write command to 2048 counts and the Sensor2 temp peaked at 53.9C. Hmm.... EDIT: After reverting back to rc1 I noticed a delta of 5C increase when writing 2048 counts. root@maxwell:~# sensors | grep nvme-pci-0400 -A 5 nvme-pci-0400 Adapter: PCI adapter Composite: +30.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +30.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +41.9°C (low = -273.1°C, high = +65261.8°C) root@maxwell:~# sensors | grep nvme-pci-0400 -A 5 nvme-pci-0400 Adapter: PCI adapter Composite: +31.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +47.9°C (low = -273.1°C, high = +65261.8°C) root@maxwell:~# sensors | grep nvme-pci-0400 -A 5 nvme-pci-0400 Adapter: PCI adapter Composite: +31.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +47.9°C (low = -273.1°C, high = +65261.8°C) root@maxwell:~# sensors | grep nvme-pci-0400 -A 5 nvme-pci-0400 Adapter: PCI adapter Composite: +31.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +45.9°C (low = -273.1°C, high = +65261.8°C) root@maxwell:~# sensors | grep nvme-pci-0400 -A 5 nvme-pci-0400 Adapter: PCI adapter Composite: +31.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +44.9°C (low = -273.1°C, high = +65261.8°C) root@maxwell:~# sensors | grep nvme-pci-0400 -A 5 nvme-pci-0400 Adapter: PCI adapter Composite: +31.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +44.9°C (low = -273.1°C, high = +65261.8°C) root@maxwell:~# sensors | grep nvme-pci-0400 -A 5 nvme-pci-0400 Adapter: PCI adapter Composite: +31.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +43.9°C (low = -273.1°C, high = +65261.8°C) root@maxwell:~# sensors | grep nvme-pci-0400 -A 5 nvme-pci-0400 Adapter: PCI adapter Composite: +31.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +41.9°C (low = -273.1°C, high = +65261.8°C) Edited November 8, 2021 by MaxwellHouse 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.