RIDGID

Members
  • Posts

    26
  • Joined

  • Last visited

Everything posted by RIDGID

  1. Good call. Correctable Memory ECC @ DIMMC1(CPU1) - Asserted Repeated ad nauseum in the event log. Believe I've isolated the bad stick and removed it, though it was in H1 not C1 so I will monitor for additional issues. Marking this solved as I know to look at syslog and impi events now. Thanks for the assistance.
  2. Here is my syslog from the most recent crash. Looking at this bit Apr 9 06:52:48 Supermicro kernel: mce: [Hardware Error]: Machine check events logged Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010093 Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: TSC 6d1be24ad4e8c Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: ADDR c4fce24c0 Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: MISC 40381286 Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1617965568 SOCKET 0 APIC 0 Apr 9 06:52:48 Supermicro kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0xc4fce2 offset:0x4c0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1) I'm guessing bad memory stick? syslog
  3. Thanks, that is great info I never would have picked up on myself! Looks like my libvert.img file got moved to disk 11 somehow, I've moved it back onto the cache/system folder. No clue how it could have happened but I'm glad to fix it and will keep an eye on it in the future. As for the docker.img, a few years back I had an issue with it getting filled (I believe by radarr or rutorrent logs or something) and I probably increased the size hoping to fix the issue as I had a 2TB cache at the time. its been that way so long I forgot 50g wasnt the default. Any advantage to making it smaller other than saving 30g on the cache?
  4. I do not have the syslog unfortunately, but I will moving forward. Attached diagnostics with array running (none of my dockers or VMs are running during the parity check though), though it looks like syslog is where the useful info will be so I will probably have to reproduce the issue. supermicro-diagnostics-20210406-0921.zip
  5. Recently I have had unraid go full unresponsive on me a couple times. Webui gone, no SSH, can't ping, not visible to my router, no video output, but still powered on and active. Trying to restart from IPMI give the error below. First unclean shutdown was 31Mar, 12TB parity check finshed 02Apr @ 5am. Crashed again 04Apr, parity check started at 2pm. Server unresponsive again this afternoon 05Apr 6pm. When I rebooted I get a notification that the parity check finished with 0 errors (Average speed: nan B/s), I assume it failed. I've now rebooted and the parity check is running currently. I've attached my diagnostics, maybe someone smarter than me could lend some insight as to what may be causing these crashes? Nothing in the logs jumped out at me, but I am not quite sure what to look for. Only two notable changes I've made recently to the otherwise stable server are: 1. Changing the server name via Settings>Identification 2. Upgrading to 6.9.1 supermicro-diagnostics-20210405-1821.zip
  6. Could someone help me understand what this process means and why it is taking up so much of my CPU usage (20-90% at times, 1-2% with more or less everything idle) "/usr/local/sbin/shfs /mnt/user -disks 65535 2048000000 -o noatime,allow_other -o remember=0" I'm having issues with high CPU usage associated with sonarr/radarr and figuring out this process seems to be a starting point. Any insight appreciated. definer5-diagnostics-20200711-1018.zip
  7. Came here looking for help getting to the OpenVPN-AS webui but I will probably just give WireGuard a shot instead.
  8. A couple days ago Plex and some other dockers I had running became sluggish and unresponsive. I went to stop and restart the docker service but was unable to get it to stop. I didn't think much about it and went to stop the array which got stuck hung up trying to unmount the cache drive. The diagnostics show the following 5 lines repeating. May 20 22:51:06 DefineR5 emhttpd: Retry unmounting disk share(s)... May 20 22:51:11 DefineR5 emhttpd: Unmounting disks... May 20 22:51:11 DefineR5 emhttpd: shcmd (94604): umount /mnt/cache May 20 22:51:11 DefineR5 root: umount: /mnt/cache: target is busy. May 20 22:51:11 DefineR5 emhttpd: shcmd (94604): exit status: 32 At this point I opted to reboot the system. I've attached the diagnostics from that shutdown. Upon rebooting, my cache drive--a 1TB sandisk ssd--was missing. Taking a trip to the bios I see that the SSD is indeed missing, sort of... https://imgur.com/DdHYURN.jpg You can see that it does not show up in the storage config (it is the only device plugged into the mobo sata) but interestingly does show up as a boot device. Since it is still sort of recognized I am not sure it is dead-dead. It seems mostly dead, but mostly dead is slightly alive. I pulled the drive and put it into a USB enclosure and hooked it back into my server. From the system log on plugging it in: May 23 11:04:38 DefineR5 kernel: usb 2-6: new SuperSpeed Gen 1 USB device number 26 using xhci_hcd May 23 11:04:38 DefineR5 kernel: usb-storage 2-6:1.0: USB Mass Storage device detected May 23 11:04:38 DefineR5 kernel: scsi host11: usb-storage 2-6:1.0 May 23 11:04:39 DefineR5 kernel: scsi 11:0:0:0: Direct-Access TO Exter nal USB 3.0 6101 PQ: 0 ANSI: 6 May 23 11:04:39 DefineR5 kernel: sd 11:0:0:0: Attached scsi generic sg18 type 0 May 23 11:04:49 DefineR5 kernel: sd 11:0:0:0: [sds] Spinning up disk... May 23 11:05:16 DefineR5 kernel: ....ready May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] 1875385008 512-byte logical blocks: (960 GB/894 GiB) May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Write Protect is off May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Mode Sense: 47 00 00 08 May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current] May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1 May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00 May 23 11:08:17 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0 May 23 11:08:17 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current] May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1 May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00 May 23 11:11:17 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0 May 23 11:11:17 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read May 23 11:11:17 DefineR5 kernel: ldm_validate_partition_table(): Disk read failed. May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current] May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1 May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00 May 23 11:14:18 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0 May 23 11:14:18 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read May 23 11:14:18 DefineR5 kernel: sds: unable to read partition table May 23 11:14:18 DefineR5 rc.diskinfo[10409]: SIGHUP received, forcing refresh of disks info. May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] Spinning up disk... May 23 11:17:18 DefineR5 kernel: .. May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current] May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1 May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 6f c8 1a 00 00 00 08 00 May 23 11:17:21 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 1875384832 May 23 11:17:22 DefineR5 kernel: .not responding... May 23 11:17:25 DefineR5 kernel: sd 11:0:0:0: [sds] Attached SCSI disk Is there anything I can do to potentially recover the data off this drive? It is, stupidly, not backed-up. Losing the data wouldn't be the end of the world, but I'd rather not have to reconfigure all of my dockers and lose my plex logs and torrent client state if possible. definer5-diagnostics-20200520-2251.zip
  9. Upgrading to Unraid 6.8 seems to have completely fixed this issue for me.
  10. Yeah, almost certainly what happened but I think we got that cleared up. The problem is, the default settings have the same high-water mark and split level as any share I may have incidentally created, but something is still breaking the 'rules' and overfilling disks. I really do not know how unraid, behind the scenes, handles filling multiple disks and what happens as they approach their high-water mark. My best guess as to what is happening in my case is something along the lines of ruTorrent recieved a bunch of files, checked for free space, then allocated them to a location; since torrents often download simultaneously and some of them approach the size of the min free space limit, somehow it caused the too much data to be written to a disk. Alternatively Radarr may have decided to copy certain files instead of hardlinking them resulting in double the space usage. I realize both dockers should only see the share and are not 'aware' that there even are multiple disks, but I can't come up with much else especially given that downloading is force to stop when a given disk gets 100% filled.
  11. Sorry for the late reply. Everything you said is quite clear and I appreciate the lengthy response. Looking through my diagnostics I still no not see any true "duplicates" that have been appended with a (1). There were three CFGs which were capitalized showing "# Share exists on no drives" that correspond with non-capitalized versions that do exist. I've deleted these configs and they were recreated automatically (with no capitalization) and default share settings. I then rebooted the server, moved files around so no disks were over there high-water mark, and then let downloading continue. Disk 7 filled right back up to 20.5KB free and downloading halted. This is my sentiment as well; I was quite surprised changing a share config initially seemed to fix the problem. I am thinking the issue has more to do with how ruTorrent and radarr and/or sonarr are handing files and less to do with share configuration. I will probably try and setup a different download client (or duplicate ruTorrent docker) and see if the problem persists. I am going to toss a final diagnostics on here, I think my share settings should be fine, but if there are still .cfg duplicates with a (1) could you let me know where you are seeing them? definer5-diagnostics-20191205-1831.zip
  12. I guess I spoke too soon... Same problem different disk. This time disk 7 blew right past the high-water mark and now has 20kb free. I will move the data to another disk and see if it gets overfilled again, here is a fresh diagnostics if anyone has any thoughts. definer5-diagnostics-20191203-0321.zip
  13. Hey, problem seems to be fixed, data is going to multiple disks and none have gone past the high-water mark. Thank you so much, I absolutely would never have figured that out on my own.
  14. Thanks, I am going to usethe unbalance plugin to spread the files out and see if incoming data is still being sent exclusively to disk13. Will update when its finished
  15. Okay, so I restarted and it was no longer considered unclean (idk what happened) and the array started fine without needing a full parity check. the 'media' share was re-setup so now /boot/config/shares has media.cfg and /mnt/user has a 'media' folder
  16. My last restart (about a week ago) was unclean--I had lost webui access and had to issue a 'reboot' command via ssh--but the parity check after restarting was clean and everything has been more or less fine since.
  17. Uhh so I deleted to .cfg, stopped my containers, and went to stop the array before reboot. It hung for a few minutes on "retry unmounting user shares" and now is at "Stopped. Unclean shutdown detected." which will parity check if I bring it back online.
  18. No Media folder on any disk. I am not seeing "media(1).cfg" anywhere. Diagnostics file has a M---a.cfg and m---a.cfg file, but my only share in /mnt/user is "media" and in /boot/config/shares I only have Media.cfg (which must correspond to my actual media share). I am hesitant to delete the .cfg as the majority of my files are in that share.
  19. Hmm, you're right. in /boot/config/shares there is a Media.cfg (but not a media.cfg) while in user shares, as in my screen shot, the share is named "media". I would note that I haven't edited any of my shares main shares in over a year and this problem only started very recently.
  20. I mean I don't think so, here's a screen of my shares. Could you elaborate?
  21. Data comes into the download share and later gets moved the the media share, both are allocated "High-water", split level "Automatically split any directory as required"
  22. I have an array of 16 drives (1 parity) + cache, for some reason recently disk13 is getting filled to 100%. MY shares are setup as high-water with 100gb min free space and not to use the cache drive. The array has over 14TBs of free space still, but disk 13 is getting overfilled. I have deleted and/or moved files from the disk manually but it fills right back up when new data is coming in. Syslog is attached, any help would be greatly appreciated. (Solved) definer5-diagnostics-20191129-1834.zip
  23. Background: I have a pretty basic unraid server for radarr/sonarr >> Rutorrent (linuxserver docker) >> Plex running on an i7-6700k / 16gb. Pretty much everything I know about unraid is thanks to Spaceinvaderone. I made the critical mistake of adding some very large lists to my radarr library in monitored status. About 1000 torrents got queued up in rutorrent (which was already seeding around 3k torrents). Things seemed to be okay for a while, around 200 files successfully downloaded and process via radarr. Rutorrent goes unresponsive, 502 gateway, whole server running like molasses, CPU/RAM both at 100% according to the dashboard. I restart the container but after a few hours it become inaccessible again. System log shows: Nov 15 20:42:01 DefineR5 kernel: Out of memory: Kill process 5960 (rtorrent main) score 781 or sacrifice child Nov 15 20:42:01 DefineR5 kernel: Killed process 5960 (rtorrent main) total-vm:14805204kB, anon-rss:12587032kB, file-rss:0kB, shmem-rss:0kB After it kills the process, the server is still hung up on something that only stopping the container will fix. I check logs in appdata, the most relevant one at appdata\rutorrent\log\php\error.txt: [17-Nov-2019 07:00:00] NOTICE: fpm is running, pid 11005 [17-Nov-2019 07:00:00] NOTICE: ready to handle connections [17-Nov-2019 19:06:51] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [17-Nov-2019 19:15:43] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [17-Nov-2019 19:25:04] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [17-Nov-2019 19:40:17] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [17-Nov-2019 20:02:24] NOTICE: Terminating ... [17-Nov-2019 20:02:27] NOTICE: exiting, bye-bye! [18-Nov-2019 04:07:56] NOTICE: fpm is running, pid 313 [18-Nov-2019 04:07:56] NOTICE: ready to handle connections [18-Nov-2019 15:19:47] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it I tried to raise the pm.max_children setting to 20 by editing the www2.conf file but that failed with the following error: [22-Nov-2019 02:30:47] ERROR: [pool www] the chdir path '/usr/local/emhttp' does not exist or is not a directory [22-Nov-2019 02:30:47] ERROR: failed to post process the configuration [22-Nov-2019 02:30:47] ERROR: FPM initialization failed I am bit out of my depth here. As it stands I have 800 partially downloaded torrents (most 20-99% complete), many of which throw hash check on download completion found bad chunks, consider using \"safe_sync\". when checked, despite having pieces.sync.always_safe.set=1 at the bottom of my otherwise stock config. Even with most of the torrents stopped it keeps crashing. Twice now it has also taken the entire unraid webui with it, and I have had to ssh in to reboot which then requires a 20+hr parity check. I feel like I am probably missing something simple, but it increasingly seems like an uphill battle and I am officially lost.
  24. Very new to Unraid and linux systems in general. I'm still getting setup, preclearing disks, transfering data, and getting a feel for the lay of the land. I grabbed the Fix common problems plugin and after a reboot I'm getting this error: "/var/log is getting full (currently 100 % used)" I downloaded the diagnostic folder and took a look at the syslog which is being flooded with this error: Nov 16 13:37:07 DefineR5 kernel: ACPI BIOS Error (bug): Could not resolve [\_GPE._L6F.PGRT], AE_NOT_FOUND (20180531/psargs-330) Nov 16 13:37:07 DefineR5 kernel: ACPI Error: Method parse/execution failed \_GPE._L6F, AE_NOT_FOUND (20180531/psparse-516) Nov 16 13:37:07 DefineR5 kernel: ACPI Error: AE_NOT_FOUND, while evaluating GPE method [_L6F] (20180531/evgpe-509) This really doesn't mean much to me so I did some googling and found a few having the same issue: http://jhshi.me/2015/11/14/acpi-error-method-parseexecution-failed-_gpe_l6f/index.html#.W-8Ys-hKiUl https://superuser.com/questions/1117992/acpi-exception-ae-not-found-while-evaluating-gpe-method-floods-syslog It seems to be a known issue with Skylake chipsets on ASROCK boards. At least one person had it fixed by updating their BIOS so I will try that first. Beyond that I don't really know how to troubleshoot this issue. I don't love the idea of blindly issuing commands that I do not understand; and looking at the posted solutions I wouldn't even know what to change or where to issue them. I will update my bios and report back if the issue persists. Edit: BIOS update seems to have fixed it. Sorry for creating unnecessary post.
  25. Getting more ram is not an issue, I've been meaning to do it for the past 5 years anyhow. Yes the 240GB would be for the VM and cache. Not sure how big of a cache is really needed, after the initial setup and transfers I probably wont ever need to write more than 100gb per day. I don't know if running the mover once per day is a lot or a little. I could possibly get a bigger SSD... Still wondering if its possible to use multiple disks to expand the cache. Alternatively, say I have 2 240GB SSDs, could I use one as a cache and add the other to the array to run my VM off of?