Loss of access to NVME SSD when reading smart information


Recommended Posts

I just installed a 2TB Crucial P5 NVME SSD and lose access to the disk when UNRAID reads SMART data (when accessing from the web by clicking on the disk). I know that the problem is when reading the SMART information, because I reproduce the same error when I do it from the console with smartctl.

 

In syslog:

Oct 25 14:09:52 UNBuly kernel: DMAR: DRHD: handling fault status reg 2
Oct 25 14:09:52 UNBuly kernel: DMAR: [DMA Read] Request device [03:00.0] PASID ffffffff fault addr ffbf0000 [fault reason 06] PTE Read access is not set
Oct 25 14:10:31 UNBuly kernel: nvme nvme0: I/O 193 QID 23 timeout, aborting
Oct 25 14:10:52 UNBuly kernel: nvme nvme0: I/O 29 QID 0 timeout, reset controller
Oct 25 14:11:01 UNBuly kernel: nvme nvme0: I/O 193 QID 23 timeout, reset controller

 

The disk disappears from the system (I can't even see it in /dev /nvme0n1) and I don't get it back until I do a power off / power on.

 

smartctl displays this information before freezing:

 

# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       CT1000P5SSD8
Serial Number:                      21xxxx
Firmware Version:                   P4CR311
PCI Vendor/Subsystem ID:            0x1344
IEEE OUI Identifier:                0x00a075
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            00a075 013084ec4c
Local Time is:                      Mon Oct 25 14:09:52 2021 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     78 Celsius
Critical Comp. Temp. Threshold:     81 Celsius
Namespace 1 Features (0x08):        No_ID_Reuse

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.25W       -        -    0  0  0  0        0       0
 1 +     3.00W       -        -    1  1  1  1        0       0
 2 +     1.90W       -        -    2  2  2  2        0       0
 3 -   0.0800W       -        -    3  3  3  3    10000    2500
 4 -   0.0050W       -        -    4  4  4  4    12000   35000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        42 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    0%
Data Units Read:                    2,535,487 [1.29 TB]
Data Units Written:                 709,366 [363 GB]
Host Read Commands:                 2,922,852
Host Write Commands:                2,890,773
Controller Busy Time:               76
Power Cycles:                       12
Power On Hours:                     5
Unsafe Shutdowns:                   11
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               42 Celsius
Temperature Sensor 2:               47 Celsius
Thermal Temp. 1 Transition Count:   1

 

After displaying the last line, the command hangs and I lose access to the disk.

 

Thanks id advance.

 

un-diagnostics-20211025-1554.zip

Link to comment
  • 2 weeks later...

Solved after replace 1TB Crucial P5 NVME SSD with a Samsung 970 Evo Plus 1TB.

 

However, same Crucial P5 in same computers, works fine with Ubuntu 21.04. I think something is wrong with UnRAID 6.9.2 kernel 

 

I found another post about same problem with another NVME: 

 

photo6044204684664747524.jpg

 

Edited by buly
Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.