Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

6.12.3 - Intermittent System Issues, Root Cause Trouble Shooting Help Needed

Featured Replies

I would really appreciate some help on how to proceed with trouble shooting. I've been having intermittent issues with Unraid over the last 1.5 years. Any time I resolve what seems to be the immediate issue, another one crops up. I am now back where I started. I am concerned that through all my effort I am just solving symptoms and not the assumed underlying root cause.

 

The system was stable for over 3 years prior to these issues with no hardware changes. I am running this as a bare bones setup, it's an intel NUC with one internal HDD, one M.2 cache drive, and one external USB HDD. No parity drive (I acknowledge the data loss risk in doing this).

 

Trouble Shooting Attempts: check / repair bad filesystem, replace a bad HDD (new USB cable w/ new HDD), wiped and rebuilt the array / dockers from scratch, move key directories to the cache drive, review system logs, review drive logs, run SMART tests on the drives, run memtest X86, run disk repair on the Unraid thumb-drive, updated the BIOS.

 

Issues: Unraid UI slow / unresponsive, docker start errors, disk errors, disks missing, docker permission errors, Unriad fails updating (intermittent network error), TPM error, other intermittent weird issues.

 

 

If someone could advise on how to solve what I assume is a root problem causing these issues, I'd greatly appreciate it.

Logs as of today are attached.

nuc-diagnostics-20230730-1219.zip

Edited by wlsn0chrs

Looks like your USB drive is disconnected. One of many reasons USB not recommended in array or pools. Since you have no parity it won't be out-of-sync, maybe reboot will reset the connection.

  • Author

Thank you, that appears to have resolved the external USB HDD not mounting. After rebooting it has remounted and docker is running.

 

Checking the system log, I now have some BTRFS warnings again. Does this indicate the cache drive is failing? Any advice on how to resolve the issue?

 

A few weeks ago I had a BTRFS warning for a different sector on the cache drive which mapped to the docker.img file. I removed the file, performed a balance, scrub, checked the file system status, and ran a SMART test of the cache drive. For good measure, I ran memtest X86 a few times. All OK. I then reinstalled the previous dockers via CA. This took several attempts with the installers failing mid download reporting a "Network failure" until it finally succeeded. Everything seemed OK after that. The cache drive is an internal m.2 NVME SSD.

 

I tried seeing what file was mapped to 144457728, but nothing is returned when I execute the command through the terminal.

Quote

root@nuc:~# find /mnt/cache -inum 144457728
root@nuc:~# 

 

Quote

Jul 30 18:13:23 nuc kernel: BTRFS warning (device loop2): checksum verify failed on logical 144457728 mirror 1 wanted 0x0d7e520c found 0x6946c52d level 0
Jul 30 18:13:23 nuc kernel: BTRFS info (device loop2): read error corrected: ino 0 off 144457728 (dev /dev/loop2 sector 298528)
Jul 30 18:13:23 nuc kernel: BTRFS info (device loop2): read error corrected: ino 0 off 144461824 (dev /dev/loop2 sector 298536)
Jul 30 18:13:23 nuc kernel: BTRFS info (device loop2): read error corrected: ino 0 off 144465920 (dev /dev/loop2 sector 298544)
Jul 30 18:13:23 nuc kernel: BTRFS info (device loop2): read error corrected: ino 0 off 144470016 (dev /dev/loop2 sector 298552)

 

nuc-diagnostics-20230730-2156.zip

Edited by wlsn0chrs
typo / wording

loop2 is the docker image, you should recreate it, also run a correcting scrub on the pool and post the result.

  • Author

1. Recreated the docker image per your linked instructions. Ran scrub w/ no errors found. Logs attached.

Quote

UUID: 5ef80807-f297-43ce-9616-72ee886e3443

Scrub started: Sun Aug 6 22:02:27 2023

Status: finished

Duration: 0:00:20

Total to scrub: 7.66GiB

Rate: 392.14MiB/s

Error summary: no errors found

 

2. Reinstall docker apps via CA. During the installs, many of the dockers have to repeatedly retry downloading the files. Two dockers fail with "Error: local error: tls: bad record MAC". These two dockers are continuing to fail to install. I will retry again later in the week (see screenshot).

Screenshot 2023-08-06 at 9.25.57 PM.png

 

3. The system log is being spammed "kernel: tpm tpm0: A TPM error (257) occurred attempting get random".

 

4. In the archived notifications I have two disk warnings: "Array has 1 disk with read errors warning Disk 2 - WD_Game_Drive_57583332443331443556374A-0:0 (sda) (errors 1)". However, the disk log and attributes do not reflect this.

 

nuc-diagnostics-20230806-2203.zip

Edited by wlsn0chrs

Diags don't show any read errors, did you reboot?

  • Author

Yes, unfortunately the machine was inadvertently unplugged last week around the 8/1. I've enabled the local syslog server to avoid losing the logs going forward.

 

Does the disk read error count get reset on a system reboot? I've gone through my dump of old log files and unfortunately don't have any covering the periods of the listed disk read errors. Screenshot of the archived disk error notifications below.

239768250_Screenshot2023-08-07at9_07_35AM.thumb.png.61c8bddaa9267048c7cad7acf0f7803d.png

 

 

8 minutes ago, wlsn0chrs said:

Does the disk read error count get reset on a system reboot?

Yes, if it happens again post new diags.

  • 4 weeks later...
  • Author

Docker image deleted and recreated. System rebooted 3 days ago. Please find logs attached.

 

Network error when trying to update to 6.12.4:

Quote

plugin: unRAIDServer-6.12.4-x86_64.zip download failure: Network failure

 

BTRFS error:

Quote

BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 51800, gen 0
BTRFS warning (device loop2): csum failed root 327 ino 1297 off 143360 csum 0xcfbb2b59 expected csum 0x11bf5696 mirror 1

 

nuc-diagnostics-20230901-1009.zip

Docker image started detecting new corruptions just after start, cache filesystem also has corruption, suggesting a hardware problem, start by running memtest.

  • 3 weeks later...

Recreate the pool to make sure it's a new filesystem and monitor for new errors, if more corruption errors appear there's likely still some hardware issue.

  • 3 weeks later...

@wlsn0chrs did you ever solve this? I have very similar issues

  • 3 weeks later...
  • Author

@Yivey_unraid I have not resolved what is going on yet. I had a few things come up and haven't gotten around to recreating the pool yet. Please let me know if you figure out is going on with your set up.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.