(Solved) NVMe missing after fstrim failed


FQs19
Go to solution Solved by FQs19,

Recommended Posts

Hoping someone can help me with a problem that I have with a Cache pool disk going missing after fstrim failed. 

 

I have the Dynamix SSD TRIM 2020.06.21 installed and had it set to run Daily with no issues for almost a year with the current hardware. I turned it off while I troubleshoot the missing disk. 

 

Here is the error I received:

fstrim: /mnt/nvmecache: FITRIM ioctl failed: Input/output error
Followed by a Warning about the Cache Pool BTRFS missing device.

 

I disabled the Trim command and rebooted the server, but the disk is still missing. 

I haven't gone into the motherboard bios yet since I haven't touched the bios in months. Figured I would post here and see what others say. I have four NVMe devices installed, two of them are on a DIMM.2 module and two of them are on the motherboard. I forget which ones are where, but can confirm if that info is needed. 

 

My hardware is as follows:

- Ryzen Threadripper 3960x

- ASUS ROG Zenith II Extreme Alpha TRX40 Gaming (Bios version 1402)

- PNY Nvidia Quadro RTX 4000

- two Corsair MP600 Gen4 NVMe

- two Sabrent Rocket 4.0 Gen4 NVMe

- LSI Broadcom SAS 9300-8i in IT mode

- Several hard disks

 

Plugins installed:

- CA Auto Update, - CA Backup/Restore, - CA Cleanup

- Dynamix Active Streams, - Dynamix Cache Directories, - Dynamix SSD TRIM, - Dynamix System buttons, - Dynamix System Information, - Dynamix System Statistics, - Dynamix System Temperature

- Fix Common Problems

- GPU Statistics

- My Servers

- NERD Tools (with only iperf and perl installed)

- NVIDIA Driver

- plexstream

- PreClear Disks

- Tips and Tweaks

- Unassigned Devices, - Unassigned Devices Plus

- UnBALANCE

- User Scripts

 

Only Docker running is PMSLinux.

No VMs. 

I have two cache pools. One pool has both Corsair MP600's and the other cache pool has both Rocket 4.0's. 

unRAID version 6.9.2 is installed.

I've attached the diagnostics file and syslog for review.

 

Any help is appreciated. 

threadripper19-diagnostics-20210912-1903.zip threadripper19-syslog-20210913-0001.zip

Edited by FQs19
Topic Solved
Link to comment

Only 1 of the Sabrent devices is showing up in the SMART information so the other one is offline.    There is no sign in the syslog that was ever seen at all as the system was booting so it may well have failed. It might be worth power-cycling the server to see if it comes back online, and if it does try starting the array and posting new diagnostics.

 

FYI:  the diagnostics includes the syslog so normally no need to post it separately.

  • Thanks 1
Link to comment
10 hours ago, itimpi said:

Only 1 of the Sabrent devices is showing up in the SMART information so the other one is offline.    There is no sign in the syslog that was ever seen at all as the system was booting so it may well have failed. It might be worth power-cycling the server to see if it comes back online, and if it does try starting the array and posting new diagnostics.

 

FYI:  the diagnostics includes the syslog so normally no need to post it separately.

I'm going to power off the server, pull both Sabrent NVMe's, then re-seat them, start the server and go into the bios to see if they're showing up there. There's also a new motherboard bios that I might update to. 

If I can't get both Sabrent's to show up, I'll move them to different M.2 slots and see if they show up then. 

Hopefully I can. I'll post diagnostics if I get them both working. 

 

Thanks for the help and the info on the syslog being included in the diagnostics

Link to comment
  • Solution

@itimpi

 

So I shutdown my unRAID server, pulled the cover to pull the NVMe drives, but found that the Sabrent drives are on the motherboard covered with the motherboard heatsink. My Corsair MP600's are on the DIMM.2 slot. 

So I just restarted the server and went into the BIOS and saw that the motherboard recognized all four NVMe drives including the two Sabrent ones. 

I then rebooted the server into GUI mode and now see both Sabrent drives detected. 

I'm attaching the diagnostic file for your review. 

I would love to know if you see anything wrong with my server or perhaps know what could've happened. It had been a long time since that server was rebooted. Maybe that could've had an effect. 

 

Also, do you think it is safe for me to re-enable the Dynamix SSD TRIM plugin?

 

I appreciate your help with us so much.

Thank you

threadripper19-diagnostics-20210913-1632.zip

Link to comment
  • FQs19 changed the title to (Solved) NVMe missing after fstrim failed

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.