Jump to content

enmesh-parisian-latest

Members
  • Posts

    121
  • Joined

  • Last visited

Posts posted by enmesh-parisian-latest

  1. 7 hours ago, Frette said:

    I am having a similar issue where being on 6.12.6 and pervious versions of 6.12 are causing my Unraid Server to crash.  I've gone though and cleaned up what I thought might be the issue like NVIDIA drivers etc. I've even removed hardware thinking that might be an issue but still the server keeps locking up and I cant even log in to the physical device.

     

    My next option is to down grade to a more stable version everything worked fine with 6.11.6

     

     

    syslog-192.168.1.34.log 258.32 kB · 2 downloads

    nrem-diagnostics-20231214-1049.zip 194.23 kB · 1 download

    Did you switch the docker network type to ipvlan? 

     

  2. One of my kids accidentally kicked my server in the motherboard, the mobo appears dead so I upgraded the whole case, mobo, ram & cpu, and migrated my raid card connecting my drive array. Trouble is, one of my array disks is now disabled.

     

    I pulled the disabled disk out and inserted again, swapped drive bays but it's not coming back online. It seems like too much of a coincidence that the drive would fail at the same time as the main case was damaged, Could the raid card be damaged? Are there any tests I can do?

    tobor-server-diagnostics-20231205-2347.zip

  3. 5 hours ago, 19dubgs3109 said:

    Any updates on this?

    I think I got a very similar issue. I am currently not at home and my unraid machine crashed. I will have someone reboot it and try to change the settings remotely. But any uodate if this worked for you would be much appreciated 👍

     

    Hey, the ipvlan switch fixed the main crashes (i was getting 1 every 1-2 days). There's still some nagging problem which is causing a random crash every month or so, but I think that's related somehow to my cpu. 

  4. On 7/18/2023 at 12:25 AM, JorgeB said:

    If any more issues grab and post diags before rebooting.

    I'm still rebuilding parity, but I noticed some kernel errors in the system log last night:

     

    Jul 19 02:38:07 tobor-server kernel: PMS LoudnessCmd[31931]: segfault at 0 ip 000014da6a0d7060 sp 000014da658460d8 error 4 in libswresample.so.4[14da6a0cf000+18000] likely on CPU 47 (core 13, socket 1)
    Jul 19 02:38:07 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
    Jul 19 02:38:08 tobor-server kernel: PMS LoudnessCmd[32119]: segfault at 0 ip 0000150d92c2f060 sp 0000150d8e5b80d8 error 4 in libswresample.so.4[150d92c27000+18000] likely on CPU 23 (core 13, socket 1)
    Jul 19 02:38:08 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
    Jul 19 02:38:08 tobor-server kernel: PMS LoudnessCmd[32151]: segfault at 0 ip 00001498864b8900 sp 0000149881cd00d8 error 4 in libswresample.so.4[1498864b0000+18000] likely on CPU 16 (core 4, socket 1)
    Jul 19 02:38:08 tobor-server kernel: Code: cc cc cc cc cc cc cc cc cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 7c 66 2e 0f 1f 84 00 00 00 00 00 <f3> 0f 10 06 f3 0f 5a c0 f2 0f 11 07 f3 0f 10 04 06 48 01 c6 f3 0f
    Jul 19 02:38:40 tobor-server kernel: PMS LoudnessCmd[32179]: segfault at 0 ip 000014ae7be78060 sp 000014ae779440d8 error 4 in libswresample.so.4[14ae7be70000+18000] likely on CPU 11 (core 13, socket 0)
    Jul 19 02:38:40 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
    Jul 19 02:39:22 tobor-server kernel: PMS LoudnessCmd[34204]: segfault at 0 ip 000014b820278060 sp 000014b81bf970d8 error 4 in libswresample.so.4[14b820270000+18000] likely on CPU 47 (core 13, socket 1)
    Jul 19 02:39:22 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
    Jul 19 02:39:23 tobor-server kernel: PMS LoudnessCmd[36896]: segfault at 0 ip 000014e50e890060 sp 000014e50a00b0d8 error 4 in libswresample.so.4[14e50e888000+18000] likely on CPU 42 (core 8, socket 1)
    Jul 19 02:39:23 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06

     

    Is this some clue to the original problem?
     

    tobor-server-diagnostics-20230719-1146.zip

  5. 3 hours ago, JorgeB said:

    Parity is never formatted, what is the current issue? Logs look normal and cache is mounting.

    It's true everything appears to be working now, but for my cache drives and parity to fail within two days of each other, I feel like something bigger is the problem, I'm only addressing the symptoms but haven't found the cause of the problems. I am hoping the diagnostics and logs can help identify this problem. Attached is the system log, however it's missing the period where my parity failed.

    syslog-10.0.0.200.log

  6. Hey, I've been having issues since 6.12.0 (now on 6.12.3). The system was regularly crashing which I posted about here. While attempting to apply a recommended fix, it became clear that the docker image was corrupted, leading to me realise the problem was bigger and the cache drive partition was corrupted (was only operating in read only). I cleared and reformatted the cache drives and then began transferring my data back, when I noticed the parity drive was now not readable.

    I couldn't generate a SMART report or perform any checks on the parity drive, so I shut down, checked cables and connections and rebooted, the parity drive was no longer visible in the UI, so as an experiment I switched the parity drive bay with another drive, now the parity drive is back, can generate SMART reports but needs to be formatted and parity rebuilt.

     

    I'm now rebuilding the parity, but I get the feeling I might be missing some bigger issue. I've attached diagnostics and a SMART report for the parity drive, is there anything here I should be worried about?

     

    As a small side note, I noticed that FCP is reporting the "Write Cache is disabled" on the parity and drives 1-22, however I have 23 disks in my array (plus the parity), it seems odd that one disk would not be reporting the same "Write Cache is disabled"...

    tobor-server-diagnostics-20230717-1503.zip WD161KFGX-68AFSM_2VGD275B_3500605ba011718e8-20230717-1500.txt

  7. 59 minutes ago, JorgeB said:
    Jul 14 06:51:44 tobor-server kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
    Jul 14 06:51:44 tobor-server kernel: ? _raw_spin_unlock+0x14/0x29
    Jul 14 06:51:44 tobor-server kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

     

    Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

    Interesting, I've never considered that. I have plenty of containers with custom IPs. I'll switch it now and report back. Thanks

  8. My unraid box has been stable for years, however with the latest 6.12.x updates something is causing random crashes. The whole system becomes unresponsive, no output to the monitor or terminal access. It's occurred 4 times now on both 6.12.0 and 6.12.1. Attached is my most recent diagnostics generated after my last crash (sometime in the last hour or two.

     

    Any advice would be fantastic, thanks

    tobor-server-diagnostics-20230629-1742.zip

  9. I have an hourly rsync script running in the userscripts plugin, it used to create files/directories with permissions of 655 and 755 owned by nobody/users. Now with the latest unraid os update to 6.10.2 it's creating files/directories 600 and 755 owned by root/root. Any ideas how to fix this and what has changed? My containers running as nobody/users can't access the files.

  10. 39 minutes ago, JorgeB said:

    Looks like a controller problem:

     

    Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: midlevel-0
    Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: lowlevel-0
    Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: error handler-48
    Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: firmware-33
    Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: kernel-0
    Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Controller reset type is 3
    Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Issuing IOP reset
    Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: IOP reset failed
    Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: ARC Reset attempt failed

     

    If possible use one the recommended controllers, like an LSI HBA, you also have filesystem corruption on multiple disks.

     

    I'm using an Adaptec RAID 71605 which has served me well for years, although I did a force reboot and the controller was giving me a high pitched alert beep indicating it was overheated. I'll shut down and let it cool off a bit then try again. The missing drives were back but the parity drive is still listed as being in error state.

     

    Can you suggest how best to deal with the filesystem corruption? I assume this is somehow related to the raid controller messing up.

     

    EDIT: regarding fs corruption I'm following the instructions here: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

  11. As mentioned, I first received an email about the parity drive being in an error state followed by 5 other disks. Several docker containers are reporting they have no read access to the appdata directory which is on the cache drive.

     

    The array is currently stalled "unmounting disks" as I try to stop the array

    Any ideas?

     

    .tobor-server-diagnostics-20211021-1329.zip

     

     ls -la /mnt/
    /bin/ls: cannot access '/mnt/disk18': Input/output error
    /bin/ls: cannot access '/mnt/disk16': Input/output error
    /bin/ls: cannot access '/mnt/disk10': Input/output error
    /bin/ls: cannot access '/mnt/disk7': Input/output error
    /bin/ls: cannot access '/mnt/disk6': Input/output error
    total 16
    drwxr-xr-x 26 root   root  520 Sep 15 18:57 ./
    drwxr-xr-x 21 root   root  440 Oct 21 12:12 ../
    drwxrwxrwx  1 nobody users  66 Oct 17 04:30 cache/
    drwxrwxrwx  7 nobody users 108 Oct 17 04:30 disk1/
    d?????????  ? ?      ?       ?            ? disk10/
    drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk11/
    drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk12/
    drwxrwxrwx  5 nobody users  69 Oct 10 04:30 disk13/
    drwxrwxrwx  5 nobody users  69 Oct 17 04:30 disk14/
    drwxrwxrwx  5 nobody users  53 Oct 17 04:30 disk15/
    d?????????  ? ?      ?       ?            ? disk16/
    drwxrwxrwx  4 nobody users  36 Oct 17 04:30 disk17/
    d?????????  ? ?      ?       ?            ? disk18/
    drwxrwxrwx  5 nobody users  67 Oct 17 04:30 disk19/
    drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk2/
    drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk3/
    drwxrwxrwx  6 nobody users  88 Oct 17 04:30 disk4/
    drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk5/
    d?????????  ? ?      ?       ?            ? disk6/
    d?????????  ? ?      ?       ?            ? disk7/
    drwxrwxrwx  5 nobody users  51 Oct 17 04:30 disk8/
    drwxrwxrwx  7 nobody users 109 Oct 17 04:30 disk9/
    drwxrwxrwt  2 nobody users  40 Sep 15 18:57 disks/
    drwxrwxrwt  2 nobody users  40 Sep 15 18:57 remotes/
    drwxrwxrwx  1 nobody users 108 Oct 17 04:30 user/
    drwxrwxrwx  1 nobody users 108 Oct 17 04:30 user0/

     

  12. === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x80)	Offline data collection activity
    					was never started.
    					Auto Offline Data Collection: Enabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		(  101) seconds.
    Offline data collection
    capabilities: 			 (0x5b) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					No Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   2) minutes.
    Extended self-test routine
    recommended polling time: 	 (1250) minutes.
    SCT capabilities: 	       (0x003d)	SCT Status supported.
    					SCT Error Recovery Control supported.
    					SCT Feature Control supported.
    					SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   ---    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0004   135   135   ---    Old_age   Offline      -       108
      3 Spin_Up_Time            0x0007   080   080   ---    Pre-fail  Always       -       397 (Average 397)
      4 Start_Stop_Count        0x0012   100   100   ---    Old_age   Always       -       49
      5 Reallocated_Sector_Ct   0x0033   100   100   ---    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000a   100   100   ---    Old_age   Always       -       0
      8 Seek_Time_Performance   0x0004   133   133   ---    Old_age   Offline      -       18
      9 Power_On_Hours          0x0012   100   100   ---    Old_age   Always       -       6700
     10 Spin_Retry_Count        0x0012   100   100   ---    Old_age   Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       49
     22 Helium_Level            0x0023   100   100   ---    Pre-fail  Always       -       100
    192 Power-Off_Retract_Count 0x0032   100   100   ---    Old_age   Always       -       2787
    193 Load_Cycle_Count        0x0012   100   100   ---    Old_age   Always       -       2787
    194 Temperature_Celsius     0x0002   100   100   ---    Old_age   Always       -       22 (Min/Max 18/50)
    196 Reallocated_Event_Count 0x0032   100   100   ---    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0022   100   100   ---    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0008   100   100   ---    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x000a   100   100   ---    Old_age   Always       -       0
    
    Read SMART Log Directory failed: scsi error medium or hardware error (serious)
    
    Read SMART Error Log failed: scsi error medium or hardware error (serious)
    
    Read SMART Self-test Log failed: scsi error medium or hardware error (serious)
    
    Read SMART Selective Self-test Log failed: scsi error medium or hardware error (serious)

     

     

    I Tried plugging the drive into my Adaptec RAID card but the disk didn't even appear.

    I ran smartctl -a /dev/sdv and the output is pasted above.

     

    There are a lot of errors in the drive log too, for example:

     

    Jun 5 11:18:17 t-server kernel: blk_update_request: I/O error, dev sdv, sector 23437770624 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    Jun 5 11:18:17 t-server kernel: Buffer I/O error on dev sdv, logical block 2929721328, async page read

     

    The recurring and most common error is:

     

    Jun 5 13:56:36 t-server kernel: ata5.00: failed to get NCQ Send/Recv Log Emask 0x1
    Jun 5 13:56:36 t-server kernel: ata5.00: failed to get NCQ Non-Data Log Emask 0x1

     

  13. 12 minutes ago, JorgeB said:

    Same or different controller? It's failing to initialize, currently it's on the Intel SCU controller, these are known to not be as rock solid as the standard SATA controller, if not yet done try swapping with a disk on the Intel SATA controller, if the same happens there it could be a power or disk problem.

    Same controller, just a different port on motherboard. I'll try switching power and controllers.

  14. 5 hours ago, Afool2cry said:

    I am having the same problem ... gui refuses connection and I see the above errors repeating in the log

     

    Has anyone been able to find a fix yet?

     

    Traceback (most recent call last):
    File "/usr/bin/beet", line 33, in <module>
    sys.exit(load_entry_point('beets==1.4.9', 'console_scripts', 'beet')())
    File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1266, in main
    _raw_main(args)
    File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1249, in _raw_main
    subcommands, plugins, lib = _setup(options, lib)
    File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1144, in _setup
    lib = _open_library(config)
    File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 1201, in _open_library
    get_path_formats(),
    File "/usr/lib/python3.8/site-packages/beets/ui/__init__.py", line 619, in get_path_formats
    path_formats.append((query, template(view.as_str())))
    File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 571, in template
    return Template(fmt)
    File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 581, in __init__
    self.compiled = self.translate()
    File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 614, in translate
    func = compile_func(
    File "/usr/lib/python3.8/site-packages/beets/util/functemplate.py", line 155, in compile_func
    prog = compile(mod, '<generated>', 'exec')
    ValueError: Name node can't be used with 'None' constant

     

     

    I added the :nightly tag to the container and it's working again

     

    https://github.com/linuxserver/docker-beets/issues/80

    • Like 1
  15. This may sound odd, but could the error be linked to github being down? I noticed that while the error was spamming I couldn't load my plugins page, I looked into that further and found that could occur when github is down. 12h later, github is fine, plugins page can be accessed and errors have stopped. In addition, several people appear to get the error at the same time.

    • Like 1
×
×
  • Create New...