RIDGID

April 10, 2021

Good call.

Correctable Memory ECC @ DIMMC1(CPU1) - Asserted

Repeated ad nauseum in the event log. Believe I've isolated the bad stick and removed it, though it was in H1 not C1 so I will monitor for additional issues.

Marking this solved as I know to look at syslog and impi events now. Thanks for the assistance.

April 10, 2021

Here is my syslog from the most recent crash. Looking at this bit

Apr  9 06:52:48 Supermicro kernel: mce: [Hardware Error]: Machine check events logged
Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010093
Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: TSC 6d1be24ad4e8c 
Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: ADDR c4fce24c0 
Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: MISC 40381286 
Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1617965568 SOCKET 0 APIC 0
Apr  9 06:52:48 Supermicro kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0xc4fce2 offset:0x4c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1)

I'm guessing bad memory stick?

syslog

April 6, 2021

Thanks, that is great info I never would have picked up on myself!

Looks like my libvert.img file got moved to disk 11 somehow, I've moved it back onto the cache/system folder. No clue how it could have happened but I'm glad to fix it and will keep an eye on it in the future.

As for the docker.img, a few years back I had an issue with it getting filled (I believe by radarr or rutorrent logs or something) and I probably increased the size hoping to fix the issue as I had a 2TB cache at the time. its been that way so long I forgot 50g wasnt the default. Any advantage to making it smaller other than saving 30g on the cache?

April 6, 2021

I do not have the syslog unfortunately, but I will moving forward. Attached diagnostics with array running (none of my dockers or VMs are running during the parity check though), though it looks like syslog is where the useful info will be so I will probably have to reproduce the issue.

supermicro-diagnostics-20210406-0921.zip

April 6, 2021

Recently I have had unraid go full unresponsive on me a couple times. Webui gone, no SSH, can't ping, not visible to my router, no video output, but still powered on and active. Trying to restart from IPMI give the error below.

1664122049_Screenshot2021-04-05181406.png.1e900ad09b591d85ca185e001cf3b0f7.png

First unclean shutdown was 31Mar, 12TB parity check finshed 02Apr @ 5am. Crashed again 04Apr, parity check started at 2pm. Server unresponsive again this afternoon 05Apr 6pm. When I rebooted I get a notification that the parity check finished with 0 errors (Average speed: nan B/s), I assume it failed. I've now rebooted and the parity check is running currently.

I've attached my diagnostics, maybe someone smarter than me could lend some insight as to what may be causing these crashes? Nothing in the logs jumped out at me, but I am not quite sure what to look for.

Only two notable changes I've made recently to the otherwise stable server are:

1. Changing the server name via Settings>Identification

2. Upgrading to 6.9.1

supermicro-diagnostics-20210405-1821.zip

July 11, 2020

Could someone help me understand what this process means and why it is taking up so much of my CPU usage (20-90% at times, 1-2% with more or less everything idle)

"/usr/local/sbin/shfs /mnt/user -disks 65535 2048000000 -o noatime,allow_other -o remember=0"

I'm having issues with high CPU usage associated with sonarr/radarr and figuring out this process seems to be a starting point. Any insight appreciated.

definer5-diagnostics-20200711-1018.zip

May 27, 2020

Came here looking for help getting to the OpenVPN-AS webui but I will probably just give WireGuard a shot instead.

May 23, 2020

A couple days ago Plex and some other dockers I had running became sluggish and unresponsive. I went to stop and restart the docker service but was unable to get it to stop. I didn't think much about it and went to stop the array which got stuck hung up trying to unmount the cache drive. The diagnostics show the following 5 lines repeating.

May 20 22:51:06 DefineR5 emhttpd: Retry unmounting disk share(s)...
May 20 22:51:11 DefineR5 emhttpd: Unmounting disks...
May 20 22:51:11 DefineR5 emhttpd: shcmd (94604): umount /mnt/cache
May 20 22:51:11 DefineR5 root: umount: /mnt/cache: target is busy.
May 20 22:51:11 DefineR5 emhttpd: shcmd (94604): exit status: 32

At this point I opted to reboot the system. I've attached the diagnostics from that shutdown.

Upon rebooting, my cache drive--a 1TB sandisk ssd--was missing. Taking a trip to the bios I see that the SSD is indeed missing, sort of...

https://imgur.com/DdHYURN.jpg

You can see that it does not show up in the storage config (it is the only device plugged into the mobo sata) but interestingly does show up as a boot device.

Since it is still sort of recognized I am not sure it is dead-dead. It seems mostly dead, but mostly dead is slightly alive.

I pulled the drive and put it into a USB enclosure and hooked it back into my server. From the system log on plugging it in:

May 23 11:04:38 DefineR5 kernel: usb 2-6: new SuperSpeed Gen 1 USB device number 26 using xhci_hcd
May 23 11:04:38 DefineR5 kernel: usb-storage 2-6:1.0: USB Mass Storage device detected
May 23 11:04:38 DefineR5 kernel: scsi host11: usb-storage 2-6:1.0
May 23 11:04:39 DefineR5 kernel: scsi 11:0:0:0: Direct-Access TO Exter nal USB 3.0 6101 PQ: 0 ANSI: 6
May 23 11:04:39 DefineR5 kernel: sd 11:0:0:0: Attached scsi generic sg18 type 0
May 23 11:04:49 DefineR5 kernel: sd 11:0:0:0: [sds] Spinning up disk...
May 23 11:05:16 DefineR5 kernel: ....ready
May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] 1875385008 512-byte logical blocks: (960 GB/894 GiB)
May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Write Protect is off
May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Mode Sense: 47 00 00 08
May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current]
May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1
May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
May 23 11:08:17 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0
May 23 11:08:17 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read
May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current]
May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1
May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
May 23 11:11:17 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0
May 23 11:11:17 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read
May 23 11:11:17 DefineR5 kernel: ldm_validate_partition_table(): Disk read failed.
May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current]
May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1
May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
May 23 11:14:18 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0
May 23 11:14:18 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read
May 23 11:14:18 DefineR5 kernel: sds: unable to read partition table
May 23 11:14:18 DefineR5 rc.diskinfo[10409]: SIGHUP received, forcing refresh of disks info.
May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] Spinning up disk...
May 23 11:17:18 DefineR5 kernel: ..
May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current]
May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1
May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 6f c8 1a 00 00 00 08 00
May 23 11:17:21 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 1875384832
May 23 11:17:22 DefineR5 kernel: .not responding...
May 23 11:17:25 DefineR5 kernel: sd 11:0:0:0: [sds] Attached SCSI disk

Is there anything I can do to potentially recover the data off this drive? It is, stupidly, not backed-up. Losing the data wouldn't be the end of the world, but I'd rather not have to reconfigure all of my dockers and lose my plex logs and torrent client state if possible.

definer5-diagnostics-20200520-2251.zip

December 26, 2019

Upgrading to Unraid 6.8 seems to have completely fixed this issue for me.

December 5, 2019

31 minutes ago, trurl said:

Linux is case-sensitive so /mnt/user/Media is a different path than /mnt/user/media.

Using the wrong upper/lower case in a docker mapping is a very common way to get shares with default settings accidentally created.

Yeah, almost certainly what happened but I think we got that cleared up. The problem is, the default settings have the same high-water mark and split level as any share I may have incidentally created, but something is still breaking the 'rules' and overfilling disks.

I really do not know how unraid, behind the scenes, handles filling multiple disks and what happens as they approach their high-water mark. My best guess as to what is happening in my case is something along the lines of ruTorrent recieved a bunch of files, checked for free space, then allocated them to a location; since torrents often download simultaneously and some of them approach the size of the min free space limit, somehow it caused the too much data to be written to a disk. Alternatively Radarr may have decided to copy certain files instead of hardlinking them resulting in double the space usage.

I realize both dockers should only see the share and are not 'aware' that there even are multiple disks, but I can't come up with much else especially given that downloading is force to stop when a given disk gets 100% filled.

December 5, 2019

Sorry for the late reply. Everything you said is quite clear and I appreciate the lengthy response.

Looking through my diagnostics I still no not see any true "duplicates" that have been appended with a (1). There were three CFGs which were capitalized showing "# Share exists on no drives" that correspond with non-capitalized versions that do exist. I've deleted these configs and they were recreated automatically (with no capitalization) and default share settings.

I then rebooted the server, moved files around so no disks were over there high-water mark, and then let downloading continue. Disk 7 filled right back up to 20.5KB free and downloading halted.

On 12/3/2019 at 8:55 AM, trurl said:

I don't really see how this could cause the symptoms you are having, since default share settings are highwater, split any, 0 minimum, and cache-no.

This is my sentiment as well; I was quite surprised changing a share config initially seemed to fix the problem. I am thinking the issue has more to do with how ruTorrent and radarr and/or sonarr are handing files and less to do with share configuration. I will probably try and setup a different download client (or duplicate ruTorrent docker) and see if the problem persists.

I am going to toss a final diagnostics on here, I think my share settings should be fine, but if there are still .cfg duplicates with a (1) could you let me know where you are seeing them?

definer5-diagnostics-20191205-1831.zip

December 3, 2019

I guess I spoke too soon... Same problem different disk. This time disk 7 blew right past the high-water mark and now has 20kb free. I will move the data to another disk and see if it gets overfilled again, here is a fresh diagnostics if anyone has any thoughts.

definer5-diagnostics-20191203-0321.zip

November 30, 2019

Hey, problem seems to be fixed, data is going to multiple disks and none have gone past the high-water mark. Thank you so much, I absolutely would never have figured that out on my own.

November 29, 2019

Thanks, I am going to usethe unbalance plugin to spread the files out and see if incoming data is still being sent exclusively to disk13. Will update when its finished

November 29, 2019

Okay, so I restarted and it was no longer considered unclean (idk what happened) and the array started fine without needing a full parity check. the 'media' share was re-setup so now /boot/config/shares has media.cfg and /mnt/user has a 'media' folder

November 29, 2019

My last restart (about a week ago) was unclean--I had lost webui access and had to issue a 'reboot' command via ssh--but the parity check after restarting was clean and everything has been more or less fine since.

November 29, 2019

Uhh so I deleted to .cfg, stopped my containers, and went to stop the array before reboot. It hung for a few minutes on "retry unmounting user shares" and now is at "Stopped. Unclean shutdown detected." which will parity check if I bring it back online.

November 29, 2019

19 minutes ago, johnnie.black said:

Check all your disks, if media and Media exists on any of them, as a top folder.

No Media folder on any disk.

12 minutes ago, trurl said:

According to your diagnostics, your media(1).cfg has settings. This is likely the Media.cfg file you are seeing. But there is no share by that name (no files). The media share itself has default settings, and that is the share that has the files. You might try deleting Media.cfg, reboot and make settings for the media share to see if that clears up the confusion.

I am not seeing "media(1).cfg" anywhere. Diagnostics file has a M---a.cfg and m---a.cfg file, but my only share in /mnt/user is "media" and in /boot/config/shares I only have Media.cfg (which must correspond to my actual media share). I am hesitant to delete the .cfg as the majority of my files are in that share.

November 29, 2019

Hmm, you're right. in /boot/config/shares there is a Media.cfg (but not a media.cfg) while in user shares, as in my screen shot, the share is named "media".

I would note that I haven't edited any of my shares main shares in over a year and this problem only started very recently.

November 29, 2019

I mean I don't think so, here's a screen of my shares. Could you elaborate?

November 29, 2019

Data comes into the download share and later gets moved the the media share, both are allocated "High-water", split level "Automatically split any directory as required"

November 29, 2019

I have an array of 16 drives (1 parity) + cache, for some reason recently disk13 is getting filled to 100%. MY shares are setup as high-water with 100gb min free space and not to use the cache drive. The array has over 14TBs of free space still, but disk 13 is getting overfilled. I have deleted and/or moved files from the disk manually but it fills right back up when new data is coming in.

Syslog is attached, any help would be greatly appreciated.

~~(Solved)~~

definer5-diagnostics-20191129-1834.zip

November 22, 2019

Background: I have a pretty basic unraid server for radarr/sonarr >> Rutorrent (linuxserver docker) >> Plex running on an i7-6700k / 16gb. Pretty much everything I know about unraid is thanks to Spaceinvaderone.

I made the critical mistake of adding some very large lists to my radarr library in monitored status. About 1000 torrents got queued up in rutorrent (which was already seeding around 3k torrents). Things seemed to be okay for a while, around 200 files successfully downloaded and process via radarr. Rutorrent goes unresponsive, 502 gateway, whole server running like molasses, CPU/RAM both at 100% according to the dashboard. I restart the container but after a few hours it become inaccessible again. System log shows:

Nov 15 20:42:01 DefineR5 kernel: Out of memory: Kill process 5960 (rtorrent main) score 781 or sacrifice child
Nov 15 20:42:01 DefineR5 kernel: Killed process 5960 (rtorrent main) total-vm:14805204kB, anon-rss:12587032kB, file-rss:0kB, shmem-rss:0kB

After it kills the process, the server is still hung up on something that only stopping the container will fix. I check logs in appdata, the most relevant one at
appdata\rutorrent\log\php\error.txt:

[17-Nov-2019 07:00:00] NOTICE: fpm is running, pid 11005
[17-Nov-2019 07:00:00] NOTICE: ready to handle connections
[17-Nov-2019 19:06:51] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[17-Nov-2019 19:15:43] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[17-Nov-2019 19:25:04] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[17-Nov-2019 19:40:17] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[17-Nov-2019 20:02:24] NOTICE: Terminating ...
[17-Nov-2019 20:02:27] NOTICE: exiting, bye-bye!
[18-Nov-2019 04:07:56] NOTICE: fpm is running, pid 313
[18-Nov-2019 04:07:56] NOTICE: ready to handle connections
[18-Nov-2019 15:19:47] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

I tried to raise the pm.max_children setting to 20 by editing the www2.conf file but that failed with the following error:

[22-Nov-2019 02:30:47] ERROR: [pool www] the chdir path '/usr/local/emhttp' does not exist or is not a directory

[22-Nov-2019 02:30:47] ERROR: failed to post process the configuration

[22-Nov-2019 02:30:47] ERROR: FPM initialization failed

I am bit out of my depth here. As it stands I have 800 partially downloaded torrents (most 20-99% complete), many of which throw

hash check on download completion found bad chunks, consider using \"safe_sync\".

when checked, despite having pieces.sync.always_safe.set=1 at the bottom of my otherwise stock config.

Even with most of the torrents stopped it keeps crashing. Twice now it has also taken the entire unraid webui with it, and I have had to ssh in to reboot which then requires a 20+hr parity check.

I feel like I am probably missing something simple, but it increasingly seems like an uphill battle and I am officially lost.

RIDGID

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by RIDGID

[Solved] Unraid becomes unreachable

[Solved] Unraid becomes unreachable

[Solved] Unraid becomes unreachable

[Solved] Unraid becomes unreachable

[Solved] Unraid becomes unreachable

/usr/local/sbin/shfs /mnt/user -disks 65535 2048000000 -o noatime,allow_other -o remember=0

[Support] Linuxserver.io - OpenVPN AS

Help with "mostly dead" cache drive

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

[SOLVED] Unraid filling disk to 100%

Need help with Rutorrent crashing unraid webui