Jump to content

Cache drive failure


Go to solution Solved by JorgeB,

Recommended Posts

Hi there. My Unraid have been a bit problematic for a while now with different errors and this seems to be the most irritating one. 
All my dockers went offline today and I got Execution Error 403 when I tried to start them. After a reboot, which took a lot longer than usual, my cache drive is now gone.. It is a single Kingston A1000 480GB, and now I wonder what is the smartest next step? My mobo only has one M.2 slot, but some avalible SATA ports. 

I will remove the drive and try using it in a USB case to see if I can connect to it. And if all data is gone, I actually don't know what is lost. I haven't done any writes to the shares recently so there should not be any data loss there. But all appdata from Plex and Jellyfin might be gone I suppose. 

plexworm-diagnostics-20240114-2200.zip

Link to comment
On 1/15/2024 at 12:55 PM, JorgeB said:

Try power cycling the server, not just rebooting, and/or using a different SATA port/cables.

 

Thank you! That power cycle actually seems to fix it. Now it has ben up and running for quite a while. Could this be an indication of a failing m2 drive or something else? 

 

Here is a smart report for the m2 drive. The drive is about 5 years old by now. 

Spoiler

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.64-Unraid] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KINGSTON SA1000M8480G
Serial Number:                      50026B72821050AE
Firmware Version:                   E8FK11.L
PCI Vendor/Subsystem ID:            0x2646
IEEE OUI Identifier:                0x0026b7
Total NVM Capacity:                 480,103,981,056 [480 GB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          480,103,981,056 [480 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            0026b7 2821050ae5
Local Time is:                      Tue Jan 16 08:33:23 2024 PST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x04):         Ext_Get_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     94 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.90W  0.0790W       -    0  0  0  0      600     600
 1 +     7.90W  0.0790W       -    0  0  0  0      600     600
 2 +     7.90W  0.0790W       -    0  0  0  0      600     600
 3 -   0.1000W  0.0790W       -    3  3  3  3     1000    1000
 4 -   0.0050W  0.0790W       -    4  4  4  4   400000   90000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         1
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        35 Celsius
Available Spare:                    100%
Available Spare Threshold:          100%
Percentage Used:                    7%
Data Units Read:                    10,739,194 [5.49 TB]
Data Units Written:                 26,557,014 [13.5 TB]
Host Read Commands:                 74,886,228
Host Write Commands:                397,426,169
Controller Busy Time:               2,368
Power Cycles:                       3,297
Power On Hours:                     4,272
Unsafe Shutdowns:                   30
Media and Data Integrity Errors:    0
Error Information Log Entries:      156
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 2:               35 Celsius

Error Information (NVMe Log 0x01, 16 of 16 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0        156     0  0x001c  0x0005      -            6     0     -  Invalid Field in Command
  1        155     0  0x0018  0x0005      -           12     0     -  Invalid Field in Command
  2        154     0  0x4010  0x0005      -            6     0     -  Invalid Field in Command
  3        152     0  0x0010  0x0005      -            6     0     -  Invalid Field in Command
  4        151     0  0x0014  0x0005      -           12     0     -  Invalid Field in Command
  5        150     0  0x0000  0x0005      -            6     0     -  Invalid Field in Command
  6        149     0  0x001d  0x0005      -           12     0     -  Invalid Field in Command
  7        147     0  0x0004  0x0005      -           12     0     -  Invalid Field in Command
  8        144     0  0x1008  0x0005      -            6     0     -  Invalid Field in Command
  9        143     0  0x1005  0x0005      -           12     0     -  Invalid Field in Command
 10        142     0  0x1014  0x0005      -            6     0     -  Invalid Field in Command
 11        137     0  0x1011  0x0005      -           12     0     -  Invalid Field in Command
 12        130     0  0x1005  0x0005      -           12     0     -  Invalid Field in Command
 13        129     0  0x1011  0x0005      -            6     0     -  Invalid Field in Command
 14        116     0  0x6004  0x0005      -            6     0     -  Invalid Field in Command
 15        109     0  0x002c  0x0203      -            0     0     -  Invalid Queue Identifier

Self-tests not supported

 

Link to comment
30 minutes ago, Beijergard said:

Could this be an indication of a failing m2 drive or something else? 

Unlikely, but when NVMe devices drop usually only a power cycle will bring them back, this may help if it keeps dropping:

 

On the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...