DeadLights Posted October 23, 2022 Share Posted October 23, 2022 Disk1 has had 64 read errors. There are many instances of this example: "kernel: md: disk1 read error, sector=3915746784" SMART looks good from what I can tell. diagnostics-20221023-1255.zip Quote Link to comment
trurl Posted October 23, 2022 Share Posted October 23, 2022 syslog spammed with these Oct 22 09:40:39 CloudDefServer kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 246 Oct 22 09:40:39 CloudDefServer kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 246 Oct 22 09:40:39 CloudDefServer kernel: NVRM: The NVIDIA Quadro 4000 GPU installed in this system is Oct 22 09:40:39 CloudDefServer kernel: NVRM: supported through the NVIDIA 390.xx Legacy drivers. Please Oct 22 09:40:39 CloudDefServer kernel: NVRM: visit http://www.nvidia.com/object/unix.html for more Oct 22 09:40:39 CloudDefServer kernel: NVRM: information. The 510.73.05 NVIDIA driver will ignore Oct 22 09:40:39 CloudDefServer kernel: NVRM: this GPU. Continuing probe... Oct 22 09:40:39 CloudDefServer kernel: NVRM: No NVIDIA GPU found. Oct 22 09:40:39 CloudDefServer kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 246 maybe you don't need this plugin? nvidia-driver.plg - 2022.08.04 Quote Link to comment
DeadLights Posted October 23, 2022 Author Share Posted October 23, 2022 (edited) 1 hour ago, trurl said: syslog spammed with these Oct 22 09:40:39 CloudDefServer kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 246 Oct 22 09:40:39 CloudDefServer kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 246 Oct 22 09:40:39 CloudDefServer kernel: NVRM: The NVIDIA Quadro 4000 GPU installed in this system is Oct 22 09:40:39 CloudDefServer kernel: NVRM: supported through the NVIDIA 390.xx Legacy drivers. Please Oct 22 09:40:39 CloudDefServer kernel: NVRM: visit http://www.nvidia.com/object/unix.html for more Oct 22 09:40:39 CloudDefServer kernel: NVRM: information. The 510.73.05 NVIDIA driver will ignore Oct 22 09:40:39 CloudDefServer kernel: NVRM: this GPU. Continuing probe... Oct 22 09:40:39 CloudDefServer kernel: NVRM: No NVIDIA GPU found. Oct 22 09:40:39 CloudDefServer kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 246 maybe you don't need this plugin? nvidia-driver.plg - 2022.08.04 I noticed that. The read errors were dated well before-hand so I I thought it possible that they may not be related? Anyway, I've since removed the plugin. I feel the need to say I've inherited this setup after accepting a new position. It's not long for this world but I do need it to survive during the transition. Edited October 23, 2022 by DeadLights Quote Link to comment
JorgeB Posted October 24, 2022 Share Posted October 24, 2022 It's logged as a disk problem, run an extended SMART test. Quote Link to comment
DeadLights Posted October 24, 2022 Author Share Posted October 24, 2022 3 hours ago, JorgeB said: It's logged as a disk problem, run an extended SMART test. I noticed many of the errors were at power-on for the device. It's possible I need to replace cables and that's all it is? Nothing else jumps out at me. I've attached the report. smart-20221024-1246.zip Quote Link to comment
JorgeB Posted October 24, 2022 Share Posted October 24, 2022 Run the extended test, short does not do a surface scan. Quote Link to comment
DeadLights Posted October 24, 2022 Author Share Posted October 24, 2022 26 minutes ago, JorgeB said: Run the extended test, short does not do a surface scan. Sorry, I did. I didn't realize I posted the wrong log. Forgive my ignorance where do the extended scans logs save? Quote Link to comment
ChatNoir Posted October 24, 2022 Share Posted October 24, 2022 1 hour ago, DeadLights said: Sorry, I did. I didn't realize I posted the wrong log. Forgive my ignorance where do the extended scans logs save? In Main, clic on the drive name, go to the Self-Test tab and run the SMART extended self-test. You might have to disable spin-down on that drive (Settings tab). Quote Link to comment
DeadLights Posted October 25, 2022 Author Share Posted October 25, 2022 Okay, the extended test reported no errors. Does this test also generate a log? I've got a good backup of the data. That's something the previous tech didn't have so I was about to have a heart attack. Quote Link to comment
JorgeB Posted October 25, 2022 Share Posted October 25, 2022 2 minutes ago, DeadLights said: Does this test also generate a log? The result will be in the SMART report. Quote Link to comment
Solution DeadLights Posted October 25, 2022 Author Solution Share Posted October 25, 2022 (edited) 18 minutes ago, JorgeB said: The result will be in the SMART report. I guess I'm having a big confused over here. I ran the extended test before I made this post. I ran it again for good measure and the only thing I see is this and it was included in my first post: SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 34038 - # 2 Extended offline Completed without error 00% 33986 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) Device State: Active (0) Current Temperature: 36 Celsius Power Cycle Min/Max Temperature: 29/41 Celsius Lifetime Min/Max Temperature: 19/59 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (19) Index Estimated Time Temperature Celsius 20 2022-10-25 00:59 36 ***************** ... ..(476 skipped). .. ***************** 19 2022-10-25 08:56 36 ***************** SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) Device Statistics (GP/SMART Log 0x04) not supported Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 11 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 12 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 13873775 Vendor specific Edited October 25, 2022 by DeadLights Quote Link to comment
JorgeB Posted October 25, 2022 Share Posted October 25, 2022 33 minutes ago, DeadLights said: I see is this and it was included in my first post: You're right, I'm sorry, either I was looking at different diags or misread, anyway disk if OK for now. Quote Link to comment
DeadLights Posted October 26, 2022 Author Share Posted October 26, 2022 22 hours ago, JorgeB said: You're right, I'm sorry, either I was looking at different diags or misread, anyway disk if OK for now. Not a problem at all. I appreciate the help and confirming that the errors were most likely benign in nature. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.