-
UNRAID 7.3.0 --> 7.3.1 array start/performance related issues
Adding another data point: for roughly the last month I’ve been seeing a similar class of errors with an internal LSI 9305-16i / SAS3224 using mpt3sas. I initially thought it might be related to recent Unraid 7.3.x upgrades. Controller firmware: mpt3sas_cm0: LSISAS3224: FWVersion(16.00.12.00) SAS expander: vendor_id: INTEL; product_id: RES3FV288; FWVersion(B057) The errors are repeated attempting task abort, timeout 30000 ms, UNKNOWN(0x2003) Result: hostbyte=0x03, I/O error, and Power-on or device reset occurred. The most visible failures are on cache SSDs behind the SAS/expander path, with BTRFS correcting reads afterward. SMART and Unraid disk health do not indicate normal disk failure. I monitor these types of errors aggressively, so a lot of users may not see them as they do not trigger drive health changes or produce normal alerts. I downgraded to 7.3.0 with no hardware or firmware changes. The system initially booted cleanly, but I later saw the same class of post-boot errors on 7.3.0 as well, including task aborts/resets on the cache SSDs and a BTRFS corrected read. So in my case the downgrade did not fully resolve it, but 7.3.x still not entirely ruled out.
-
Docker reporting 26gb of 41gb img file
Hi all, just bumping this one up. Anyone have any thoughts on what I'm missing and where the disk utilization may be taking place?
-
Kboogie started following Docker update process very slow after upgrade to v7 and Docker reporting 26gb of 41gb img file
-
Docker reporting 26gb of 41gb img file
I've been searching and troubleshooting for hours, but the normal methods don't seem to resolve it. Appreciate the help. The docker service reports 26gb of usage but the image file is 41gb. Trying to find the 15gb but not having any luck. Ran Spaceinvaders Script - Nothing identified in local volumes TYPE TOTAL ACTIVE SIZE RECLAIMABLE Images 31 31 24.35GB 2.402GB (9%) Containers 32 10 984.1MB 233.6MB (23%) Local Volumes 79 4 745.8kB 745.8kB (100%) Build Cache 0 0 0B 0B Prune Docker Images root@unraid:~# docker image prune -a WARNING! This will remove all images without at least one container associated to them. Are you sure you want to continue? [y/N] y Total reclaimed space: 0B Check Log Size du -ah /var/lib/docker/containers/ | grep -v "/$" | sort -rh | head -60 | grep .log nothing larger than 90mb unraid-diagnostics-20250408-2219.zip
-
Docker update process very slow after upgrade to v7
I tested every Bond option from simple to more complex and they seemed to work as network traffic was confirmed, but I had the slow docker update issue in all cases and I'm assuming there were larger issues at play. At this point I'm going to give up on it, as the juice doesn't seem worth the squeeze. Thanks for the advice.
-
Docker update process very slow after upgrade to v7
Thank you! Disabled link aggregation and the performance returned to normal for docker. But doing speed test on the aggregated and non-aggregated configurations show as the same. Any thoughts on how to troubleshoot the aggregation configuration?
-
Docker update process very slow after upgrade to v7
Thank you. I did Ping and nslookup tests and they both seem as expected. Would you recommend any other test?
-
Docker update process very slow after upgrade to v7
I've noticed that the Docker update process is very slow after updating to v7. Initiating the "Check for updates" process on the Docker page takes several minutes, when it used to take 10 seconds. The individual container updates are very slow. The download process is slow, a 30mb download may take a minute or two. I did a speed test with the system, internet connectivity is fine and containers are running as expected. Any thoughts on what might be causing it? Many thanks! [edit] I checked the usual things. Installed docker speed check app (all good). Checked that all appdata and system files were located on SSD cache (confirmed). I did recently switch network configuration from standard to link aggregation. But I have a unifi switch that supports it and it is configured correctly and appears to be performing as expected with network tests. unraid-diagnostics-20250223-1200.zip
-
Kboogie started following Wrong Pool State on upgrade to 7.0.0
-
Wrong Pool State on upgrade to 7.0.0
Was unable to start array as a result of this error for a cache pool. Had to revert to v6. unraid-diagnostics-20250220_0908.zip
-
[Plugin] CA User Scripts
Working on an rclone script and I cannot get the script to read from the logs. It's been really challenging. I finally got it writing the log correctly, but now I get en error when I try to extract transferred and deleted counts from the log. (last two lines) Error is; grep: /config/logs/rclone_sync_log.txt: No such file or directory grep: /config/logs/rclone_sync_log.txt: No such file or directory But when I look in the log it is being written correctly. Thanks for any sugestions. #!/bin/bash # Define your rclone remote and paths LOCAL="unraid:/array/" REMOTE="Synology:/backup/" LOG_FILE="/config/logs/rclone_sync_log.txt" #LOG_FILE="/tmp/user.scripts/tmpScripts/Backup Plex Libraries/rclone_sync_log.txt" # Perform a dry run and save output to log file #rclone sync --dry-run "$LOCAL" "$REMOTE" -v > "$LOG_FILE" docker exec Nacho-Rclone-Native-GUI rclone sync --dry-run "$LOCAL" "$REMOTE" -v --log-file "$LOG_FILE" # Function to extract the first group of numbers extract_first_number() { echo "$1" | grep -Eo '[0-9]+' | head -1 } # Extract Transferred and Deleted counts from log file transferred_count=$(grep 'Transferred: [0-9]*' "$LOG_FILE" | tail -1 | extract_first_number) deleted_count=$(grep 'Deleted: [0-9]*' "$LOG_FILE" | tail -1 | extract_first_number)
-
Script to check rclone dry-run results prior to running (grep problems)
I'm in the process of building a script to confirm that I am Transferring more files than Deleting prior to executing an rclone sync. It all works very nicely but I cannot seem to grep the deleted and transferred values from the logs. Simple test script below, and full script with system alerts, etc below that. Can anyone tell me what I'm doing wrong with the grep? It always returns null but it works when testing in regex validators. #!/bin/bash # Example rclone output rclone_output="Deleted: 24603 (files), 3457 (dirs), 5.602 TiB (freed) Transferred: 22682 / 22682, 100%" # Function to extract the first group of numbers extract_first_number() { echo "$1" | grep -Eo '[0-9]+' | head } # Parse Transferred and Deleted counts, extracting only the first group of numbers transferred_count=$(echo "$rclone_output" | grep 'Transferred: [0-9]+' | tail -1 | extract_first_number) deleted_count=$(echo "$rclone_output" | grep 'Deleted: [0-9]+' | tail -1 | extract_first_number) # Output the extracted values echo "Expected Deleted: 24603, Extracted Deleted: $deleted_count" echo "Expected Transferred: 22682, Extracted Transferred: $transferred_count" Here's the larger script with the logic and the alerts, someone might find it helpful. #!/bin/bash # Define your rclone remote and paths LOCAL="unraid:/array/" REMOTE="Synology:/backup/" BACKUPNAME="Backup Array" # Perform a dry run rclone_output=$(docker exec Nacho-Rclone-Native-GUI rclone sync --dry-run "$LOCAL" "$REMOTE" -v) # Wait for 5 seconds sleep 5 # Function to extract the first group of numbers extract_first_number() { echo "$1" | grep -o '[0-9]\+' | head -1 } # Parse Transferred and Deleted counts, extracting only the first group of numbers transferred_count=$(echo "$rclone_output" | grep 'Transferred: [0-9]*' | tail -1 | grep -o '[0-9]\+' ) deleted_count=$(echo "$rclone_output" | grep 'Deleted: [0-9]*' | tail -1 | grep -o '[0-9]\+' ) echo "Transferred count ($transferred_count) and Deleted count ($deleted_count)" # Check if Transferred count is greater than Deleted count if [ "$transferred_count" -gt "$deleted_count" ]; then echo "Syncing as Transferred count ($transferred_count) is greater than Deleted count ($deleted_count)" # Execute the sync command and capture the output sync_output=$(docker exec Nacho-Rclone-Native-GUI rclone sync "$LOCAL" "$REMOTE" -v --transfers=1) if [ $? -eq 0 ]; then # Parse the final Transferred and Deleted counts from the sync output, ensuring they are integers final_transferred_count=$(echo "$sync_output" | grep -oP 'Transferred: *\d+ / \d+,' | tail -1 | awk '{print $2}') final_deleted_count=$(echo "$sync_output" | grep -oP 'Deleted: *\d+ ' | tail -1 | awk '{print $2}') final_transferred_count=${final_transferred_count:-0} final_deleted_count=${final_deleted_count:-0} # Trigger Unraid system alert on success #logger "rclone sync successful: Transferred $final_transferred_count files, Deleted $final_deleted_count files" /usr/local/emhttp/webGui/scripts/notify -i normal -s "$BACKUPNAME" -d "Rclone Backup was successful, $final_transferred_count files transfered, $final_deleted_count deleted." else #logger "rclone sync failed" /usr/local/emhttp/webGui/scripts/notify -i warning -s "$BACKUPNAME" -d "Rclone Backup completed, but more files were deleted than transfered, $final_transferred_count files transfered, $final_deleted_count deleted!" fi else echo "Skipping sync as Transferred count ($transferred_count) is not greater than Deleted count ($deleted_count)" #logger "rclone sync skipped: Transferred count ($transferred_count) is not greater than Deleted count ($deleted_count)" /usr/local/emhttp/webGui/scripts/notify -i normal -s "$BACKUPNAME" -d "Rclone Backup was skipped as Transferred count ($transferred_count) is not greater than Deleted count ($deleted_count)" fi
-
-
Parity Drive keeps failing
Thanks so much for the help. Could I drop the drive on the SAS controller from the pool and then reintroduce it without reformatting?
-
Parity Drive keeps failing
That could be it. One of the cache drives is on the SAS controller, so it was acting weird too. Here are the results. [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 [/dev/sdi1].write_io_errs 8027649 [/dev/sdi1].read_io_errs 215782 [/dev/sdi1].flush_io_errs 266 [/dev/sdi1].corruption_errs 185739 [/dev/sdi1].generation_errs 0 Running a scrub now, although it does not seem to be progressing. Wonder if that is related to ongoing parity rebuild. UUID: 3e318b48-677a-4a28-8e79-9fefe817451e Scrub started: Fri Mar 31 07:12:03 2023 Status: running Duration: 0:02:35 Time left: 0:00:00 ETA: Fri Mar 31 07:14:38 2023 Total to scrub: 257.94GiB Bytes scrubbed: 0.00B (0.00%) Rate: 0.00B/s Error summary: no errors found
-
Parity Drive keeps failing
It looks like you are right. I took the controller card out of the machine, reseated it and restarted parity build. We're 9 hours in and it seems to be operating as expected. The problems did start after reconnecting cables, it may have jarred the PCI connection? However, I am still having Docker issues. Dockers are not running and the Docker page will not load. One strange thing that I noticed is that the docker.img file exists on both the cache drive and disk 3 with different modification dates. It's a single file, is this correct? Disk 3 (2023-03-29 22:42) cache (2023-03-30 21:44) Edit: VMs also seem to be broken and will not start. unraid-diagnostics-20230331-0636.zip
-
Parity Drive keeps failing
Thanks Jorge. It's been so happy for so long though!! Here's the log file after the parity build failed, in case anything stands out. Is there any way to validate that it's the card? Thanks for the help. unraid-diagnostics-20230330-2037.zip
-
Parity Drive keeps failing
My parity drive failed. I thought it was a disk problem, replaced the disk and started rebuild. Rebuild failed in 5 min. Took all the drives out of the enclosure, reordered them. Reset all cables. Started cache rebuild and it failed again in 5 min. I am seeing additional strange behavior: * Array takes a long time to start * Array will not stop, "retry unmounting disk share(s)" * Docker takes a long time to load applications I recently swapped an SSD cache for an HDD cache drive, but there were no system related files on this cache. I also recently added a new SSD to the system cache pool and converted to raid 1 mode. This is all I can see. Nothing stands out in Fix Common Problems. Googled a bunch of stuff, couldn't find anything similar. Any thought? unraid-diagnostics-20230329-2015.zip
Kboogie
Members
-
Joined
-
Last visited