brandon3055 Posted May 21, 2023 Share Posted May 21, 2023 Hi guys, Just wondering if someone can confirm my suspicion here. I recently built a new unraid nas, and it's been running great for a few weeks now. At least until 2 nights ago, when the system randomly became unresponsive. I could still access mounted shares just fine, But the web UI, ssh and my docker apps were all unresponsive. In the end, I had to do a hard reset. This prompted me to finally get remote syslog up and running, as well as Telegraf. So when it happened again last night, I actually got some useful information. This is what the syslog shows immediately before the lockup. (sdc is my boot USB) Spoiler May 21 20:53:07 Data kernel: sd 2:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s May 21 20:53:07 Data kernel: sd 2:0:0:0: [sdc] tag#0 Sense Key : 0x3 [current] May 21 20:53:07 Data kernel: sd 2:0:0:0: [sdc] tag#0 ASC=0x11 ASCQ=0x0 May 21 20:53:07 Data kernel: sd 2:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 43 72 12 00 00 f0 00 May 21 20:53:07 Data kernel: critical medium error, dev sdc, sector 4420114 op 0x0:(READ) flags 0x84700 phys_seg 30 prio class 2 May 21 21:03:50 Data kernel: sd 2:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s May 21 21:03:50 Data kernel: sd 2:0:0:0: [sdc] tag#0 Sense Key : 0x3 [current] May 21 21:03:50 Data kernel: sd 2:0:0:0: [sdc] tag#0 ASC=0x11 ASCQ=0x0 May 21 21:03:50 Data kernel: sd 2:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 47 6f 1a 00 00 f0 00 May 21 21:03:50 Data kernel: critical medium error, dev sdc, sector 4681498 op 0x0:(READ) flags 0x84700 phys_seg 30 prio class 2 May 21 21:08:19 Data kernel: sd 2:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s May 21 21:08:19 Data kernel: sd 2:0:0:0: [sdc] tag#0 Sense Key : 0x3 [current] May 21 21:08:19 Data kernel: sd 2:0:0:0: [sdc] tag#0 ASC=0x11 ASCQ=0x0 May 21 21:08:19 Data kernel: sd 2:0:0:0: [sdc] tag#0 CDB: opcode=0x28 28 00 00 44 f2 ea 00 00 f0 00 May 21 21:08:19 Data kernel: critical medium error, dev sdc, sector 4518634 op 0x0:(READ) flags 0x84700 phys_seg 30 prio class 2 May 21 21:37:04 Data nginx: 2023/05/21 21:37:04 [alert] 18639#18639: *387236 open socket #24 left in connection 11 May 21 21:37:04 Data nginx: 2023/05/21 21:37:04 [alert] 18639#18639: *387297 open socket #25 left in connection 17 May 21 21:37:04 Data nginx: 2023/05/21 21:37:04 [alert] 18639#18639: *387219 open socket #5 left in connection 20 May 21 21:37:04 Data nginx: 2023/05/21 21:37:04 [alert] 18639#18639: *387245 open socket #6 left in connection 21 May 21 21:37:04 Data nginx: 2023/05/21 21:37:04 [alert] 18639#18639: *387242 open socket #28 left in connection 23 May 21 21:37:04 Data nginx: 2023/05/21 21:37:04 [alert] 18639#18639: aborting May 21 21:43:53 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:43:53 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:43:53 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:43:53 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:43:53 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:43:53 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:43:53 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:43:53 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:43:53 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:43:53 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:44:05 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:44:05 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:44:05 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:44:05 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:44:05 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:44:05 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:44:05 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:44:05 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 May 21 21:44:05 Data kernel: SQUASHFS error: xz decompression failed, data probably corrupt May 21 21:44:05 Data kernel: SQUASHFS error: Failed to read block 0x807744: -5 The Telegraf data seems to support this. The last thing it shows is a sharp spike in ioWait I initially assumed this was caused by a docker container I installed a few hours before the first lockup. But this data seems to point squarely at my boot USB. The USB is a Cruzer Fit 16GB which worked flawlessly in my previous unraid NAS for several years. The first thing I did the first time this happened was create a flash backup so worst case I can recover. I'm just looking for a second opinion. I have attached my syslog and diagnostics from immediately after the last lockup. data-diagnostics-20230521-2249.zip syslog-10.0.0.133.log Quote Link to comment
itimpi Posted May 22, 2023 Share Posted May 22, 2023 Not sure why you think that suggests a problem with the flash drive? The snippet you posted refers to sdc whereas the flash drive is sda. It COULD be the flash drive but until you get something pointing more directly to it that may be a hasty conclusion. Quote Link to comment
brandon3055 Posted May 22, 2023 Author Share Posted May 22, 2023 2 minutes ago, itimpi said: Not sure why you think that suggests a problem with the flash drive? The snippet you posted refers to sdc whereas the flash drive is sda. It COULD be the flash drive but until you get something pointing more directly to it that may be a hasty conclusion. The boot drive is definitely sdc I currently have 2 other flash drives installed which are using sda and sdb. One of those is my dummy array (I'm using a raidz2 pool as my main storage) Quote Link to comment
itimpi Posted May 22, 2023 Share Posted May 22, 2023 I wad going by the diagnostics posted that had a flash drive as sda and show a 4TB drive as sdc and no other flash drives. Not sure why they do not agree. If you are correct it may well be the flash drive Quote Link to comment
brandon3055 Posted May 25, 2023 Author Share Posted May 25, 2023 On 5/22/2023 at 2:57 PM, itimpi said: I wad going by the diagnostics posted that had a flash drive as sda and show a 4TB drive as sdc and no other flash drives. Not sure why they do not agree. If you are correct it may well be the flash drive I think the sdc errors were a bit of a red herring. I have done some more investigation, and it looks like the issue is the last docker container I added. It seems to have a memory leak or something, because it slowly consumes more and more ram until the system eventually locks up. The thing that threw me off is telegraf. It looks like there is around a gig of ram free, But apparently that's not the case. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.