-
Posts
121 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by enmesh-parisian-latest
-
-
Thanks, rebuilding now, everything looks fine so far
-
Hi, so I haven't gone through this process before, seems a little scary
The device is "disabled, contents emulated", are you suggesting the disk may be fine, no replacement? If I start the array and view the disk contents everything looks ok
try rebuilding?
-
One of my kids accidentally kicked my server in the motherboard, the mobo appears dead so I upgraded the whole case, mobo, ram & cpu, and migrated my raid card connecting my drive array. Trouble is, one of my array disks is now disabled.
I pulled the disabled disk out and inserted again, swapped drive bays but it's not coming back online. It seems like too much of a coincidence that the drive would fail at the same time as the main case was damaged, Could the raid card be damaged? Are there any tests I can do?
-
5 hours ago, 19dubgs3109 said:
Any updates on this?
I think I got a very similar issue. I am currently not at home and my unraid machine crashed. I will have someone reboot it and try to change the settings remotely. But any uodate if this worked for you would be much appreciated 👍
Hey, the ipvlan switch fixed the main crashes (i was getting 1 every 1-2 days). There's still some nagging problem which is causing a random crash every month or so, but I think that's related somehow to my cpu.
-
On 7/18/2023 at 12:25 AM, JorgeB said:
If any more issues grab and post diags before rebooting.
I'm still rebuilding parity, but I noticed some kernel errors in the system log last night:
Jul 19 02:38:07 tobor-server kernel: PMS LoudnessCmd[31931]: segfault at 0 ip 000014da6a0d7060 sp 000014da658460d8 error 4 in libswresample.so.4[14da6a0cf000+18000] likely on CPU 47 (core 13, socket 1) Jul 19 02:38:07 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Jul 19 02:38:08 tobor-server kernel: PMS LoudnessCmd[32119]: segfault at 0 ip 0000150d92c2f060 sp 0000150d8e5b80d8 error 4 in libswresample.so.4[150d92c27000+18000] likely on CPU 23 (core 13, socket 1) Jul 19 02:38:08 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Jul 19 02:38:08 tobor-server kernel: PMS LoudnessCmd[32151]: segfault at 0 ip 00001498864b8900 sp 0000149881cd00d8 error 4 in libswresample.so.4[1498864b0000+18000] likely on CPU 16 (core 4, socket 1) Jul 19 02:38:08 tobor-server kernel: Code: cc cc cc cc cc cc cc cc cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 7c 66 2e 0f 1f 84 00 00 00 00 00 <f3> 0f 10 06 f3 0f 5a c0 f2 0f 11 07 f3 0f 10 04 06 48 01 c6 f3 0f Jul 19 02:38:40 tobor-server kernel: PMS LoudnessCmd[32179]: segfault at 0 ip 000014ae7be78060 sp 000014ae779440d8 error 4 in libswresample.so.4[14ae7be70000+18000] likely on CPU 11 (core 13, socket 0) Jul 19 02:38:40 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Jul 19 02:39:22 tobor-server kernel: PMS LoudnessCmd[34204]: segfault at 0 ip 000014b820278060 sp 000014b81bf970d8 error 4 in libswresample.so.4[14b820270000+18000] likely on CPU 47 (core 13, socket 1) Jul 19 02:39:22 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Jul 19 02:39:23 tobor-server kernel: PMS LoudnessCmd[36896]: segfault at 0 ip 000014e50e890060 sp 000014e50a00b0d8 error 4 in libswresample.so.4[14e50e888000+18000] likely on CPU 42 (core 8, socket 1) Jul 19 02:39:23 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Is this some clue to the original problem?
-
3 hours ago, JorgeB said:
Parity is never formatted, what is the current issue? Logs look normal and cache is mounting.
It's true everything appears to be working now, but for my cache drives and parity to fail within two days of each other, I feel like something bigger is the problem, I'm only addressing the symptoms but haven't found the cause of the problems. I am hoping the diagnostics and logs can help identify this problem. Attached is the system log, however it's missing the period where my parity failed.
-
Hey, I've been having issues since 6.12.0 (now on 6.12.3). The system was regularly crashing which I posted about here. While attempting to apply a recommended fix, it became clear that the docker image was corrupted, leading to me realise the problem was bigger and the cache drive partition was corrupted (was only operating in read only). I cleared and reformatted the cache drives and then began transferring my data back, when I noticed the parity drive was now not readable.
I couldn't generate a SMART report or perform any checks on the parity drive, so I shut down, checked cables and connections and rebooted, the parity drive was no longer visible in the UI, so as an experiment I switched the parity drive bay with another drive, now the parity drive is back, can generate SMART reports but needs to be formatted and parity rebuilt.I'm now rebuilding the parity, but I get the feeling I might be missing some bigger issue. I've attached diagnostics and a SMART report for the parity drive, is there anything here I should be worried about?
As a small side note, I noticed that FCP is reporting the "Write Cache is disabled" on the parity and drives 1-22, however I have 23 disks in my array (plus the parity), it seems odd that one disk would not be reporting the same "Write Cache is disabled"...
tobor-server-diagnostics-20230717-1503.zip WD161KFGX-68AFSM_2VGD275B_3500605ba011718e8-20230717-1500.txt
-
59 minutes ago, JorgeB said:
Jul 14 06:51:44 tobor-server kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Jul 14 06:51:44 tobor-server kernel: ? _raw_spin_unlock+0x14/0x29 Jul 14 06:51:44 tobor-server kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).
Interesting, I've never considered that. I have plenty of containers with custom IPs. I'll switch it now and report back. Thanks
-
Hey, so I had a crash yesterday, and one just before these logs were generated. Is there anything here which could identify the problem?
-
-
Thanks, I'll report back on the next crash
-
My unraid box has been stable for years, however with the latest 6.12.x updates something is causing random crashes. The whole system becomes unresponsive, no output to the monitor or terminal access. It's occurred 4 times now on both 6.12.0 and 6.12.1. Attached is my most recent diagnostics generated after my last crash (sometime in the last hour or two.
Any advice would be fantastic, thanks
-
Ok thanks, still trying to work this out though
-
I have an hourly rsync script running in the userscripts plugin, it used to create files/directories with permissions of 655 and 755 owned by nobody/users. Now with the latest unraid os update to 6.10.2 it's creating files/directories 600 and 755 owned by root/root. Any ideas how to fix this and what has changed? My containers running as nobody/users can't access the files.
-
39 minutes ago, JorgeB said:
Looks like a controller problem:
Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: midlevel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: lowlevel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: error handler-48 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: firmware-33 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: kernel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Controller reset type is 3 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Issuing IOP reset Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: IOP reset failed Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: ARC Reset attempt failed
If possible use one the recommended controllers, like an LSI HBA, you also have filesystem corruption on multiple disks.
I'm using an Adaptec RAID 71605 which has served me well for years, although I did a force reboot and the controller was giving me a high pitched alert beep indicating it was overheated. I'll shut down and let it cool off a bit then try again. The missing drives were back but the parity drive is still listed as being in error state.
Can you suggest how best to deal with the filesystem corruption? I assume this is somehow related to the raid controller messing up.
EDIT: regarding fs corruption I'm following the instructions here: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui
-
As mentioned, I first received an email about the parity drive being in an error state followed by 5 other disks. Several docker containers are reporting they have no read access to the appdata directory which is on the cache drive.
The array is currently stalled "unmounting disks" as I try to stop the array
Any ideas?
.tobor-server-diagnostics-20211021-1329.zip
ls -la /mnt/ /bin/ls: cannot access '/mnt/disk18': Input/output error /bin/ls: cannot access '/mnt/disk16': Input/output error /bin/ls: cannot access '/mnt/disk10': Input/output error /bin/ls: cannot access '/mnt/disk7': Input/output error /bin/ls: cannot access '/mnt/disk6': Input/output error total 16 drwxr-xr-x 26 root root 520 Sep 15 18:57 ./ drwxr-xr-x 21 root root 440 Oct 21 12:12 ../ drwxrwxrwx 1 nobody users 66 Oct 17 04:30 cache/ drwxrwxrwx 7 nobody users 108 Oct 17 04:30 disk1/ d????????? ? ? ? ? ? disk10/ drwxrwxrwx 6 nobody users 88 Oct 17 04:30 disk11/ drwxrwxrwx 6 nobody users 88 Oct 17 04:30 disk12/ drwxrwxrwx 5 nobody users 69 Oct 10 04:30 disk13/ drwxrwxrwx 5 nobody users 69 Oct 17 04:30 disk14/ drwxrwxrwx 5 nobody users 53 Oct 17 04:30 disk15/ d????????? ? ? ? ? ? disk16/ drwxrwxrwx 4 nobody users 36 Oct 17 04:30 disk17/ d????????? ? ? ? ? ? disk18/ drwxrwxrwx 5 nobody users 67 Oct 17 04:30 disk19/ drwxrwxrwx 6 nobody users 88 Oct 17 04:30 disk2/ drwxrwxrwx 5 nobody users 51 Oct 17 04:30 disk3/ drwxrwxrwx 6 nobody users 88 Oct 17 04:30 disk4/ drwxrwxrwx 5 nobody users 51 Oct 17 04:30 disk5/ d????????? ? ? ? ? ? disk6/ d????????? ? ? ? ? ? disk7/ drwxrwxrwx 5 nobody users 51 Oct 17 04:30 disk8/ drwxrwxrwx 7 nobody users 109 Oct 17 04:30 disk9/ drwxrwxrwt 2 nobody users 40 Sep 15 18:57 disks/ drwxrwxrwt 2 nobody users 40 Sep 15 18:57 remotes/ drwxrwxrwx 1 nobody users 108 Oct 17 04:30 user/ drwxrwxrwx 1 nobody users 108 Oct 17 04:30 user0/
-
12 minutes ago, JorgeB said:
Even SMART report is failing, looks like a disk issue.
Disk is under warranty, I forked out for a replacement and will RMA it soon. Thanks for input.
-
=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 101) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: (1250) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 --- Pre-fail Always - 0 2 Throughput_Performance 0x0004 135 135 --- Old_age Offline - 108 3 Spin_Up_Time 0x0007 080 080 --- Pre-fail Always - 397 (Average 397) 4 Start_Stop_Count 0x0012 100 100 --- Old_age Always - 49 5 Reallocated_Sector_Ct 0x0033 100 100 --- Pre-fail Always - 0 7 Seek_Error_Rate 0x000a 100 100 --- Old_age Always - 0 8 Seek_Time_Performance 0x0004 133 133 --- Old_age Offline - 18 9 Power_On_Hours 0x0012 100 100 --- Old_age Always - 6700 10 Spin_Retry_Count 0x0012 100 100 --- Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 49 22 Helium_Level 0x0023 100 100 --- Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 100 100 --- Old_age Always - 2787 193 Load_Cycle_Count 0x0012 100 100 --- Old_age Always - 2787 194 Temperature_Celsius 0x0002 100 100 --- Old_age Always - 22 (Min/Max 18/50) 196 Reallocated_Event_Count 0x0032 100 100 --- Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 --- Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 --- Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 100 100 --- Old_age Always - 0 Read SMART Log Directory failed: scsi error medium or hardware error (serious) Read SMART Error Log failed: scsi error medium or hardware error (serious) Read SMART Self-test Log failed: scsi error medium or hardware error (serious) Read SMART Selective Self-test Log failed: scsi error medium or hardware error (serious)
I Tried plugging the drive into my Adaptec RAID card but the disk didn't even appear.
I ran smartctl -a /dev/sdv and the output is pasted above.There are a lot of errors in the drive log too, for example:
Jun 5 11:18:17 t-server kernel: blk_update_request: I/O error, dev sdv, sector 23437770624 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Jun 5 11:18:17 t-server kernel: Buffer I/O error on dev sdv, logical block 2929721328, async page read
The recurring and most common error is:
Jun 5 13:56:36 t-server kernel: ata5.00: failed to get NCQ Send/Recv Log Emask 0x1 Jun 5 13:56:36 t-server kernel: ata5.00: failed to get NCQ Non-Data Log Emask 0x1
-
12 minutes ago, JorgeB said:
Same or different controller? It's failing to initialize, currently it's on the Intel SCU controller, these are known to not be as rock solid as the standard SATA controller, if not yet done try swapping with a disk on the Intel SATA controller, if the same happens there it could be a power or disk problem.
Same controller, just a different port on motherboard. I'll try switching power and controllers.
-
The disk appears in Unassigned devices but I can't seem to perform any operation on the disk. I swapped SATA cables and moved the mobo port but still no dice. The disk is only about 18 months old, any ideas?
tobor-server-diagnostics-20210604-2105.zip tobor-server-smart-20210604-2103.zip
-
5 hours ago, Afool2cry said:
I am having the same problem ... gui refuses connection and I see the above errors repeating in the log
Has anyone been able to find a fix yet?
Traceback (most recent call last):
File "/usr/bin/beet", line 33, in <module>
sys.exit(load_entry_point('beets==1.4.9', 'console_scripts', 'beet')())
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1266, in main
_raw_main(args)
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1249, in _raw_main
subcommands, plugins, lib = _setup(options, lib)
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1144, in _setup
lib = _open_library(config)
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1201, in _open_library
get_path_formats(),
File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 619, in get_path_formats
path_formats.append((query, template(view.as_str())))
File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 571, in template
return Template(fmt)
File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 581, in __init__
self.compiled = self.translate()
File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 614, in translate
func = compile_func(
File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 155, in compile_func
prog = compile(mod, '<generated>', 'exec')
ValueError: Name node can't be used with 'None' constantI added the :nightly tag to the container and it's working again
- 1
-
This may sound odd, but could the error be linked to github being down? I noticed that while the error was spamming I couldn't load my plugins page, I looked into that further and found that could occur when github is down. 12h later, github is fine, plugins page can be accessed and errors have stopped. In addition, several people appear to get the error at the same time.
- 1
-
Same problem here on 6.8.3
-
Having this problem too. Get an error every 1-2 seconds all day every day
- 1
Frequent crashes since 6.12.0 appear to have corrupted cache & parity
in General Support
Posted
Did you switch the docker network type to ipvlan?