enmesh-parisian-latest

December 15, 2023

7 hours ago, Frette said:

I am having a similar issue where being on 6.12.6 and pervious versions of 6.12 are causing my Unraid Server to crash. I've gone though and cleaned up what I thought might be the issue like NVIDIA drivers etc. I've even removed hardware thinking that might be an issue but still the server keeps locking up and I cant even log in to the physical device.

My next option is to down grade to a more stable version everything worked fine with 6.11.6

syslog-192.168.1.34.log 258.32 kB · 2 downloads

nrem-diagnostics-20231214-1049.zip 194.23 kB · 1 download

Did you switch the docker network type to ipvlan?

December 6, 2023

Thanks, rebuilding now, everything looks fine so far

December 5, 2023

Hi, so I haven't gone through this process before, seems a little scary

The device is "disabled, contents emulated", are you suggesting the disk may be fine, no replacement? If I start the array and view the disk contents everything looks ok

try rebuilding?

December 5, 2023

One of my kids accidentally kicked my server in the motherboard, the mobo appears dead so I upgraded the whole case, mobo, ram & cpu, and migrated my raid card connecting my drive array. Trouble is, one of my array disks is now disabled.

I pulled the disabled disk out and inserted again, swapped drive bays but it's not coming back online. It seems like too much of a coincidence that the drive would fail at the same time as the main case was damaged, Could the raid card be damaged? Are there any tests I can do?

tobor-server-diagnostics-20231205-2347.zip

September 5, 2023

5 hours ago, 19dubgs3109 said:

Any updates on this?

I think I got a very similar issue. I am currently not at home and my unraid machine crashed. I will have someone reboot it and try to change the settings remotely. But any uodate if this worked for you would be much appreciated 👍

Hey, the ipvlan switch fixed the main crashes (i was getting 1 every 1-2 days). There's still some nagging problem which is causing a random crash every month or so, but I think that's related somehow to my cpu.

July 19, 2023

On 7/18/2023 at 12:25 AM, JorgeB said:

If any more issues grab and post diags before rebooting.

I'm still rebuilding parity, but I noticed some kernel errors in the system log last night:

Jul 19 02:38:07 tobor-server kernel: PMS LoudnessCmd[31931]: segfault at 0 ip 000014da6a0d7060 sp 000014da658460d8 error 4 in libswresample.so.4[14da6a0cf000+18000] likely on CPU 47 (core 13, socket 1)
Jul 19 02:38:07 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jul 19 02:38:08 tobor-server kernel: PMS LoudnessCmd[32119]: segfault at 0 ip 0000150d92c2f060 sp 0000150d8e5b80d8 error 4 in libswresample.so.4[150d92c27000+18000] likely on CPU 23 (core 13, socket 1)
Jul 19 02:38:08 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jul 19 02:38:08 tobor-server kernel: PMS LoudnessCmd[32151]: segfault at 0 ip 00001498864b8900 sp 0000149881cd00d8 error 4 in libswresample.so.4[1498864b0000+18000] likely on CPU 16 (core 4, socket 1)
Jul 19 02:38:08 tobor-server kernel: Code: cc cc cc cc cc cc cc cc cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 7c 66 2e 0f 1f 84 00 00 00 00 00 <f3> 0f 10 06 f3 0f 5a c0 f2 0f 11 07 f3 0f 10 04 06 48 01 c6 f3 0f
Jul 19 02:38:40 tobor-server kernel: PMS LoudnessCmd[32179]: segfault at 0 ip 000014ae7be78060 sp 000014ae779440d8 error 4 in libswresample.so.4[14ae7be70000+18000] likely on CPU 11 (core 13, socket 0)
Jul 19 02:38:40 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jul 19 02:39:22 tobor-server kernel: PMS LoudnessCmd[34204]: segfault at 0 ip 000014b820278060 sp 000014b81bf970d8 error 4 in libswresample.so.4[14b820270000+18000] likely on CPU 47 (core 13, socket 1)
Jul 19 02:39:22 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jul 19 02:39:23 tobor-server kernel: PMS LoudnessCmd[36896]: segfault at 0 ip 000014e50e890060 sp 000014e50a00b0d8 error 4 in libswresample.so.4[14e50e888000+18000] likely on CPU 42 (core 8, socket 1)
Jul 19 02:39:23 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06

Is this some clue to the original problem?

tobor-server-diagnostics-20230719-1146.zip

July 17, 2023

3 hours ago, JorgeB said:

Parity is never formatted, what is the current issue? Logs look normal and cache is mounting.

It's true everything appears to be working now, but for my cache drives and parity to fail within two days of each other, I feel like something bigger is the problem, I'm only addressing the symptoms but haven't found the cause of the problems. I am hoping the diagnostics and logs can help identify this problem. Attached is the system log, however it's missing the period where my parity failed.

syslog-10.0.0.200.log

July 17, 2023

Hey, I've been having issues since 6.12.0 (now on 6.12.3). The system was regularly crashing which I posted about here. While attempting to apply a recommended fix, it became clear that the docker image was corrupted, leading to me realise the problem was bigger and the cache drive partition was corrupted (was only operating in read only). I cleared and reformatted the cache drives and then began transferring my data back, when I noticed the parity drive was now not readable.

I couldn't generate a SMART report or perform any checks on the parity drive, so I shut down, checked cables and connections and rebooted, the parity drive was no longer visible in the UI, so as an experiment I switched the parity drive bay with another drive, now the parity drive is back, can generate SMART reports but needs to be formatted and parity rebuilt.

I'm now rebuilding the parity, but I get the feeling I might be missing some bigger issue. I've attached diagnostics and a SMART report for the parity drive, is there anything here I should be worried about?

As a small side note, I noticed that FCP is reporting the "Write Cache is disabled" on the parity and drives 1-22, however I have 23 disks in my array (plus the parity), it seems odd that one disk would not be reporting the same "Write Cache is disabled"...

tobor-server-diagnostics-20230717-1503.zip WD161KFGX-68AFSM_2VGD275B_3500605ba011718e8-20230717-1500.txt

July 15, 2023

59 minutes ago, JorgeB said:
Jul 14 06:51:44 tobor-server kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Jul 14 06:51:44 tobor-server kernel: ? _raw_spin_unlock+0x14/0x29
Jul 14 06:51:44 tobor-server kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

Interesting, I've never considered that. I have plenty of containers with custom IPs. I'll switch it now and report back. Thanks

July 15, 2023

Hey, so I had a crash yesterday, and one just before these logs were generated. Is there anything here which could identify the problem?

syslog-10.0.0.200.log

July 7, 2023

Fyi

June 29, 2023

Thanks, I'll report back on the next crash

June 29, 2023

My unraid box has been stable for years, however with the latest 6.12.x updates something is causing random crashes. The whole system becomes unresponsive, no output to the monitor or terminal access. It's occurred 4 times now on both 6.12.0 and 6.12.1. Attached is my most recent diagnostics generated after my last crash (sometime in the last hour or two.

Any advice would be fantastic, thanks

tobor-server-diagnostics-20230629-1742.zip

June 3, 2022

Ok thanks, still trying to work this out though

June 2, 2022

I have an hourly rsync script running in the userscripts plugin, it used to create files/directories with permissions of 655 and 755 owned by nobody/users. Now with the latest unraid os update to 6.10.2 it's creating files/directories 600 and 755 owned by root/root. Any ideas how to fix this and what has changed? My containers running as nobody/users can't access the files.

October 21, 2021

39 minutes ago, JorgeB said:

Looks like a controller problem:

Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: midlevel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: lowlevel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: error handler-48
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: firmware-33
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: kernel-0
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Controller reset type is 3
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Issuing IOP reset
Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: IOP reset failed
Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: ARC Reset attempt failed

If possible use one the recommended controllers, like an LSI HBA, you also have filesystem corruption on multiple disks.

I'm using an Adaptec RAID 71605 which has served me well for years, although I did a force reboot and the controller was giving me a high pitched alert beep indicating it was overheated. I'll shut down and let it cool off a bit then try again. The missing drives were back but the parity drive is still listed as being in error state.

Can you suggest how best to deal with the filesystem corruption? I assume this is somehow related to the raid controller messing up.

EDIT: regarding fs corruption I'm following the instructions here: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

October 21, 2021

As mentioned, I first received an email about the parity drive being in an error state followed by 5 other disks. Several docker containers are reporting they have no read access to the appdata directory which is on the cache drive.

The array is currently stalled "unmounting disks" as I try to stop the array

Any ideas?

.tobor-server-diagnostics-20211021-1329.zip

 ls -la /mnt/
/bin/ls: cannot access '/mnt/disk18': Input/output error
/bin/ls: cannot access '/mnt/disk16': Input/output error
/bin/ls: cannot access '/mnt/disk10': Input/output error
/bin/ls: cannot access '/mnt/disk7': Input/output error
/bin/ls: cannot access '/mnt/disk6': Input/output error
total 16
drwxr-xr-x 26 root   root  520 Sep 15 18:57 ./
drwxr-xr-x 21 root   root  440 Oct 21 12:12 ../
drwxrwxrwx  1 nobody users  66 Oct 17 04:30 cache/
drwxrwxrwx  7 nobody users 108 Oct 17 04:30 disk1/
d?????????  ? ?      ?       ?            ? disk10/
drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk11/
drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk12/
drwxrwxrwx  5 nobody users  69 Oct 10 04:30 disk13/
drwxrwxrwx  5 nobody users  69 Oct 17 04:30 disk14/
drwxrwxrwx  5 nobody users  53 Oct 17 04:30 disk15/
d?????????  ? ?      ?       ?            ? disk16/
drwxrwxrwx  4 nobody users  36 Oct 17 04:30 disk17/
d?????????  ? ?      ?       ?            ? disk18/
drwxrwxrwx  5 nobody users  67 Oct 17 04:30 disk19/
drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk2/
drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk3/
drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk4/
drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk5/
d?????????  ? ?      ?       ?            ? disk6/
d?????????  ? ?      ?       ?            ? disk7/
drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk8/
drwxrwxrwx  7 nobody users 109 Oct 17 04:30 disk9/
drwxrwxrwt  2 nobody users  40 Sep 15 18:57 disks/
drwxrwxrwt  2 nobody users  40 Sep 15 18:57 remotes/
drwxrwxrwx  1 nobody users 108 Oct 17 04:30 user/
drwxrwxrwx  1 nobody users 108 Oct 17 04:30 user0/

June 5, 2021

12 minutes ago, JorgeB said:

Even SMART report is failing, looks like a disk issue.

Disk is under warranty, I forked out for a replacement and will RMA it soon. Thanks for input.

June 5, 2021

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  101) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1250) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   ---    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   135   135   ---    Old_age   Offline      -       108
  3 Spin_Up_Time            0x0007   080   080   ---    Pre-fail  Always       -       397 (Average 397)
  4 Start_Stop_Count        0x0012   100   100   ---    Old_age   Always       -       49
  5 Reallocated_Sector_Ct   0x0033   100   100   ---    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   ---    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   133   133   ---    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   100   100   ---    Old_age   Always       -       6700
 10 Spin_Retry_Count        0x0012   100   100   ---    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       49
 22 Helium_Level            0x0023   100   100   ---    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   ---    Old_age   Always       -       2787
193 Load_Cycle_Count        0x0012   100   100   ---    Old_age   Always       -       2787
194 Temperature_Celsius     0x0002   100   100   ---    Old_age   Always       -       22 (Min/Max 18/50)
196 Reallocated_Event_Count 0x0032   100   100   ---    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   ---    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   ---    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   100   100   ---    Old_age   Always       -       0

Read SMART Log Directory failed: scsi error medium or hardware error (serious)

Read SMART Error Log failed: scsi error medium or hardware error (serious)

Read SMART Self-test Log failed: scsi error medium or hardware error (serious)

Read SMART Selective Self-test Log failed: scsi error medium or hardware error (serious)

I Tried plugging the drive into my Adaptec RAID card but the disk didn't even appear.

I ran smartctl -a /dev/sdv and the output is pasted above.

There are a lot of errors in the drive log too, for example:

Jun 5 11:18:17 t-server kernel: blk_update_request: I/O error, dev sdv, sector 23437770624 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jun 5 11:18:17 t-server kernel: Buffer I/O error on dev sdv, logical block 2929721328, async page read

The recurring and most common error is:

Jun 5 13:56:36 t-server kernel: ata5.00: failed to get NCQ Send/Recv Log Emask 0x1
Jun 5 13:56:36 t-server kernel: ata5.00: failed to get NCQ Non-Data Log Emask 0x1

June 4, 2021

12 minutes ago, JorgeB said:

Same or different controller? It's failing to initialize, currently it's on the Intel SCU controller, these are known to not be as rock solid as the standard SATA controller, if not yet done try swapping with a disk on the Intel SATA controller, if the same happens there it could be a power or disk problem.

Same controller, just a different port on motherboard. I'll try switching power and controllers.

June 4, 2021

The disk appears in Unassigned devices but I can't seem to perform any operation on the disk. I swapped SATA cables and moved the mobo port but still no dice. The disk is only about 18 months old, any ideas?

tobor-server-diagnostics-20210604-2105.zip tobor-server-smart-20210604-2103.zip

May 18, 2021

5 hours ago, Afool2cry said:

I am having the same problem ... gui refuses connection and I see the above errors repeating in the log

Has anyone been able to find a fix yet?

Traceback (most recent call last):
File "/usr/bin/beet", line 33, in <module>
sys.exit(load_entry_point('beets==1.4.9', 'console_scripts', 'beet')())
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1266, in main
_raw_main(args)
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1249, in _raw_main
subcommands, plugins, lib = _setup(options, lib)
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1144, in _setup
lib = _open_library(config)
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1201, in _open_library
get_path_formats(),
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 619, in get_path_formats
path_formats.append((query, template(view.as_str())))
File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 571, in template
return Template(fmt)
File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 581, in __init__
self.compiled = self.translate()
File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 614, in translate
func = compile_func(
File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 155, in compile_func
prog = compile(mod, '<generated>', 'exec')
ValueError: Name node can't be used with 'None' constant

I added the :nightly tag to the container and it's working again

https://github.com/linuxserver/docker-beets/issues/80

August 5, 2020

This may sound odd, but could the error be linked to github being down? I noticed that while the error was spamming I couldn't load my plugins page, I looked into that further and found that could occur when github is down. 12h later, github is fine, plugins page can be accessed and errors have stopped. In addition, several people appear to get the error at the same time.

August 3, 2020

Same problem here on 6.8.3

August 3, 2020

Having this problem too. Get an error every 1-2 seconds all day every day

enmesh-parisian-latest

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by enmesh-parisian-latest

Frequent crashes since 6.12.0 appear to have corrupted cache & parity

Replaced server hardware, what's up with my disabled disk?

Replaced server hardware, what's up with my disabled disk?

Replaced server hardware, what's up with my disabled disk?

System crashing since 6.12.0, updated to 6.12.1 and still occurring.

Frequent crashes since 6.12.0 appear to have corrupted cache & parity

Frequent crashes since 6.12.0 appear to have corrupted cache & parity

Frequent crashes since 6.12.0 appear to have corrupted cache & parity

System crashing since 6.12.0, updated to 6.12.1 and still occurring.

System crashing since 6.12.0, updated to 6.12.1 and still occurring.

System crashing since 6.12.0, updated to 6.12.1 and still occurring.

System crashing since 6.12.0, updated to 6.12.1 and still occurring.

System crashing since 6.12.0, updated to 6.12.1 and still occurring.

cron/rsync task creating files/directories as root with latest OS version: 6.10.2, what has changed?

cron/rsync task creating files/directories as root with latest OS version: 6.10.2, what has changed?

Parity disk in error state followed by 5 other disks showing Input/output errors, cache drive now unreadable

Parity disk in error state followed by 5 other disks showing Input/output errors, cache drive now unreadable

Disk listed as Not Installed, appears in Unassigned Devices but unresponsive, please help!

Disk listed as Not Installed, appears in Unassigned Devices but unresponsive, please help!

Disk listed as Not Installed, appears in Unassigned Devices but unresponsive, please help!

Disk listed as Not Installed, appears in Unassigned Devices but unresponsive, please help!

[Support] Linuxserver.io - Beets

Odd Ngnix error filling up my syslog

WebUI is angry with me

Odd Ngnix error filling up my syslog