CyberMew Posted July 20, 2019 Share Posted July 20, 2019 (edited) It went down 12 hours ago and i'm not sure why.. only discovered it went down when the docker containers werent running and said something about a readonly file system (cache). This is a 2 month old ssd drive and it was working fine all along. Attached a copy of the syslog (red logs are all the way down) and diagnostics. I have stopped my docker service. Should I shutdown my server as well? tower-syslog-20190721-0314.zip tower-diagnostics-20190721-0317.zip Edited July 20, 2019 by CyberMew Quote Link to comment
Frank1940 Posted July 20, 2019 Share Posted July 20, 2019 (edited) Apparently, it went missing about here: Jul 20 15:52:05 Tower kernel: nvme nvme0: I/O 994 QID 14 timeout, aborting Jul 20 15:52:07 Tower kernel: nvme nvme0: I/O 491 QID 5 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 465 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 466 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 467 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 468 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 469 QID 4 timeout, aborting Jul 20 15:52:13 Tower kernel: nvme nvme0: I/O 470 QID 4 timeout, aborting Jul 20 15:52:35 Tower kernel: nvme nvme0: I/O 994 QID 14 timeout, reset controller Jul 20 15:53:06 Tower kernel: nvme nvme0: I/O 27 QID 0 timeout, reset controller Jul 20 15:54:10 Tower kernel: nvme nvme0: Device not ready; aborting reset Jul 20 15:54:10 Tower kernel: nvme nvme0: Abort status: 0x7 It does not show up in the SMART reports in the Diagnostics file. Meaning that it is missing in action. By the way, that Internet speed test is spamming your syslog. It would probably be well to turn it off until this issue is resolved. I am not an expert on exactly how to proceed at this point but I would be tempted to reboot the server and see if it comes back online. (You might want to wait for a few hours and see if anyone else has seen anything...) If it does, at that point, get a new diagnostics file for a new post and that may contain some information about the state of the drive. Edited July 20, 2019 by Frank1940 Quote Link to comment
CyberMew Posted July 21, 2019 Author Share Posted July 21, 2019 Yea very weird, there is no red ball or anything, just silently disappears. Does this mean the drive data is gone? Quote Link to comment
CyberMew Posted July 21, 2019 Author Share Posted July 21, 2019 Anyone else got any further input on how to proceed? Quote Link to comment
Harro Posted July 21, 2019 Share Posted July 21, 2019 When I see I/O errors, I always check my cables and replace just to rule them out. Just a thought. Quote Link to comment
CyberMew Posted July 22, 2019 Author Share Posted July 22, 2019 Alright, will shutdown the server and hope for the best. Thanks! Will post a new diagnostics by tomorrow. Quote Link to comment
Frank1940 Posted July 22, 2019 Share Posted July 22, 2019 Whatever you do, don't do any formatting of any disk! Quote Link to comment
CyberMew Posted July 23, 2019 Author Share Posted July 23, 2019 Ok, restarted the server and luckily all seems to be working fine... have attached the diagnostics again. tower-diagnostics-20190724-0213.zip tower-syslog-20190724-0214.zip Quote Link to comment
JorgeB Posted July 23, 2019 Share Posted July 23, 2019 NVMe device dropped offline earlier, no cables to check, just make sure it's well seated, but it might happen again. Quote Link to comment
Frank1940 Posted July 23, 2019 Share Posted July 23, 2019 And I believe there is usually a screw to secure it. Be sure to look for the screw.. Quote Link to comment
CyberMew Posted July 24, 2019 Author Share Posted July 24, 2019 Thanks, pretty sure it's already screwed down tight a couple months ago, but will still double check the screw in case it somehow came loose. So far it's been running for 24 hours and all seems to be fine 🤞Let's see what happens during the next parity check next month..that's when the read/drive errors will usually appear again Quote Link to comment
Fireball3 Posted July 26, 2019 Share Posted July 26, 2019 Very interesting read. I also experienced drop-outs but it was the RAM - a couple of times. IIRC the connection techniques of RAM and NVME are quite similar. Reseating the RAM solved the issue but it's very annoying, especially if I'm not at home and my family wants to use the media server. I wonder if you guys ever had this kind of connection issues? Quote Link to comment
cyberspectre Posted May 28, 2020 Share Posted May 28, 2020 On 7/24/2019 at 10:31 AM, CyberMew said: Thanks, pretty sure it's already screwed down tight a couple months ago, but will still double check the screw in case it somehow came loose. So far it's been running for 24 hours and all seems to be fine 🤞Let's see what happens during the next parity check next month..that's when the read/drive errors will usually appear again Did you ever discover what was wrong? I'm having the same issue as we speak. Quote Link to comment
CyberMew Posted May 31, 2020 Author Share Posted May 31, 2020 Unfortunately no I didn’t. Right after I turned my machine off, I (iirc) updated my bios the next boot and it was back to normal. The drive was still secured so I didn’t touch it. Haven’t had this same issue since. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.