July 20, 20196 yr It went down 12 hours ago and i'm not sure why.. only discovered it went down when the docker containers werent running and said something about a readonly file system (cache). This is a 2 month old ssd drive and it was working fine all along. Attached a copy of the syslog (red logs are all the way down) and diagnostics. I have stopped my docker service. Should I shutdown my server as well? tower-syslog-20190721-0314.zip tower-diagnostics-20190721-0317.zip Edited July 20, 20196 yr by CyberMew
July 20, 20196 yr Community Expert Apparently, it went missing about here: Jul 20 15:52:05 Tower kernel: nvme nvme0: I/O 994 QID 14 timeout, aborting Jul 20 15:52:07 Tower kernel: nvme nvme0: I/O 491 QID 5 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 465 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 466 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 467 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 468 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 469 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 470 QID 4 timeout, aborting Jul 20 15:52:35 Tower kernel: nvme nvme0: I/O 994 QID 14 timeout, reset controller Jul 20 15:53:06 Tower kernel: nvme nvme0: I/O 27 QID 0 timeout, reset controller Jul 20 15:54:10 Tower kernel: nvme nvme0: Device not ready; aborting reset Jul 20 15:54:10 Tower kernel: nvme nvme0: Abort status: 0x7 It does not show up in the SMART reports in the Diagnostics file. Meaning that it is missing in action. By the way, that Internet speed test is spamming your syslog. It would probably be well to turn it off until this issue is resolved. I am not an expert on exactly how to proceed at this point but I would be tempted to reboot the server and see if it comes back online. (You might want to wait for a few hours and see if anyone else has seen anything...) If it does, at that point, get a new diagnostics file for a new post and that may contain some information about the state of the drive. Edited July 20, 20196 yr by Frank1940
July 21, 20196 yr Author Yea very weird, there is no red ball or anything, just silently disappears. Does this mean the drive data is gone?
July 21, 20196 yr When I see I/O errors, I always check my cables and replace just to rule them out. Just a thought.
July 22, 20196 yr Author Alright, will shutdown the server and hope for the best. Thanks! Will post a new diagnostics by tomorrow.
July 23, 20196 yr Author Ok, restarted the server and luckily all seems to be working fine... have attached the diagnostics again. tower-diagnostics-20190724-0213.zip tower-syslog-20190724-0214.zip
July 23, 20196 yr Community Expert NVMe device dropped offline earlier, no cables to check, just make sure it's well seated, but it might happen again.
July 23, 20196 yr Community Expert And I believe there is usually a screw to secure it. Be sure to look for the screw..
July 24, 20196 yr Author Thanks, pretty sure it's already screwed down tight a couple months ago, but will still double check the screw in case it somehow came loose. So far it's been running for 24 hours and all seems to be fine 🤞Let's see what happens during the next parity check next month..that's when the read/drive errors will usually appear again
July 26, 20196 yr Very interesting read. I also experienced drop-outs but it was the RAM - a couple of times. IIRC the connection techniques of RAM and NVME are quite similar. Reseating the RAM solved the issue but it's very annoying, especially if I'm not at home and my family wants to use the media server. I wonder if you guys ever had this kind of connection issues?
May 28, 20206 yr On 7/24/2019 at 10:31 AM, CyberMew said: Thanks, pretty sure it's already screwed down tight a couple months ago, but will still double check the screw in case it somehow came loose. So far it's been running for 24 hours and all seems to be fine 🤞Let's see what happens during the next parity check next month..that's when the read/drive errors will usually appear again Did you ever discover what was wrong? I'm having the same issue as we speak.
May 31, 20206 yr Author Unfortunately no I didn’t. Right after I turned my machine off, I (iirc) updated my bios the next boot and it was back to normal. The drive was still secured so I didn’t touch it. Haven’t had this same issue since.
Archived
This topic is now archived and is closed to further replies.