RIDGID

Members
  • Posts

    26
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

RIDGID's Achievements

Noob

Noob (1/14)

0

Reputation

  1. Good call. Correctable Memory ECC @ DIMMC1(CPU1) - Asserted Repeated ad nauseum in the event log. Believe I've isolated the bad stick and removed it, though it was in H1 not C1 so I will monitor for additional issues. Marking this solved as I know to look at syslog and impi events now. Thanks for the assistance.
  2. Here is my syslog from the most recent crash. Looking at this bit Apr 9 06:52:48 Supermicro kernel: mce: [Hardware Error]: Machine check events logged Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010093 Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: TSC 6d1be24ad4e8c Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: ADDR c4fce24c0 Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: MISC 40381286 Apr 9 06:52:48 Supermicro kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1617965568 SOCKET 0 APIC 0 Apr 9 06:52:48 Supermicro kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0xc4fce2 offset:0x4c0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1) I'm guessing bad memory stick? syslog
  3. Thanks, that is great info I never would have picked up on myself! Looks like my libvert.img file got moved to disk 11 somehow, I've moved it back onto the cache/system folder. No clue how it could have happened but I'm glad to fix it and will keep an eye on it in the future. As for the docker.img, a few years back I had an issue with it getting filled (I believe by radarr or rutorrent logs or something) and I probably increased the size hoping to fix the issue as I had a 2TB cache at the time. its been that way so long I forgot 50g wasnt the default. Any advantage to making it smaller other than saving 30g on the cache?
  4. I do not have the syslog unfortunately, but I will moving forward. Attached diagnostics with array running (none of my dockers or VMs are running during the parity check though), though it looks like syslog is where the useful info will be so I will probably have to reproduce the issue. supermicro-diagnostics-20210406-0921.zip
  5. Recently I have had unraid go full unresponsive on me a couple times. Webui gone, no SSH, can't ping, not visible to my router, no video output, but still powered on and active. Trying to restart from IPMI give the error below. First unclean shutdown was 31Mar, 12TB parity check finshed 02Apr @ 5am. Crashed again 04Apr, parity check started at 2pm. Server unresponsive again this afternoon 05Apr 6pm. When I rebooted I get a notification that the parity check finished with 0 errors (Average speed: nan B/s), I assume it failed. I've now rebooted and the parity check is running currently. I've attached my diagnostics, maybe someone smarter than me could lend some insight as to what may be causing these crashes? Nothing in the logs jumped out at me, but I am not quite sure what to look for. Only two notable changes I've made recently to the otherwise stable server are: 1. Changing the server name via Settings>Identification 2. Upgrading to 6.9.1 supermicro-diagnostics-20210405-1821.zip
  6. Could someone help me understand what this process means and why it is taking up so much of my CPU usage (20-90% at times, 1-2% with more or less everything idle) "/usr/local/sbin/shfs /mnt/user -disks 65535 2048000000 -o noatime,allow_other -o remember=0" I'm having issues with high CPU usage associated with sonarr/radarr and figuring out this process seems to be a starting point. Any insight appreciated. definer5-diagnostics-20200711-1018.zip
  7. Came here looking for help getting to the OpenVPN-AS webui but I will probably just give WireGuard a shot instead.
  8. A couple days ago Plex and some other dockers I had running became sluggish and unresponsive. I went to stop and restart the docker service but was unable to get it to stop. I didn't think much about it and went to stop the array which got stuck hung up trying to unmount the cache drive. The diagnostics show the following 5 lines repeating. May 20 22:51:06 DefineR5 emhttpd: Retry unmounting disk share(s)... May 20 22:51:11 DefineR5 emhttpd: Unmounting disks... May 20 22:51:11 DefineR5 emhttpd: shcmd (94604): umount /mnt/cache May 20 22:51:11 DefineR5 root: umount: /mnt/cache: target is busy. May 20 22:51:11 DefineR5 emhttpd: shcmd (94604): exit status: 32 At this point I opted to reboot the system. I've attached the diagnostics from that shutdown. Upon rebooting, my cache drive--a 1TB sandisk ssd--was missing. Taking a trip to the bios I see that the SSD is indeed missing, sort of... https://imgur.com/DdHYURN.jpg You can see that it does not show up in the storage config (it is the only device plugged into the mobo sata) but interestingly does show up as a boot device. Since it is still sort of recognized I am not sure it is dead-dead. It seems mostly dead, but mostly dead is slightly alive. I pulled the drive and put it into a USB enclosure and hooked it back into my server. From the system log on plugging it in: May 23 11:04:38 DefineR5 kernel: usb 2-6: new SuperSpeed Gen 1 USB device number 26 using xhci_hcd May 23 11:04:38 DefineR5 kernel: usb-storage 2-6:1.0: USB Mass Storage device detected May 23 11:04:38 DefineR5 kernel: scsi host11: usb-storage 2-6:1.0 May 23 11:04:39 DefineR5 kernel: scsi 11:0:0:0: Direct-Access TO Exter nal USB 3.0 6101 PQ: 0 ANSI: 6 May 23 11:04:39 DefineR5 kernel: sd 11:0:0:0: Attached scsi generic sg18 type 0 May 23 11:04:49 DefineR5 kernel: sd 11:0:0:0: [sds] Spinning up disk... May 23 11:05:16 DefineR5 kernel: ....ready May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] 1875385008 512-byte logical blocks: (960 GB/894 GiB) May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Write Protect is off May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Mode Sense: 47 00 00 08 May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current] May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1 May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00 May 23 11:08:17 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0 May 23 11:08:17 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current] May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1 May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00 May 23 11:11:17 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0 May 23 11:11:17 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read May 23 11:11:17 DefineR5 kernel: ldm_validate_partition_table(): Disk read failed. May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current] May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1 May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00 May 23 11:14:18 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0 May 23 11:14:18 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read May 23 11:14:18 DefineR5 kernel: sds: unable to read partition table May 23 11:14:18 DefineR5 rc.diskinfo[10409]: SIGHUP received, forcing refresh of disks info. May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] Spinning up disk... May 23 11:17:18 DefineR5 kernel: .. May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current] May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1 May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 6f c8 1a 00 00 00 08 00 May 23 11:17:21 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 1875384832 May 23 11:17:22 DefineR5 kernel: .not responding... May 23 11:17:25 DefineR5 kernel: sd 11:0:0:0: [sds] Attached SCSI disk Is there anything I can do to potentially recover the data off this drive? It is, stupidly, not backed-up. Losing the data wouldn't be the end of the world, but I'd rather not have to reconfigure all of my dockers and lose my plex logs and torrent client state if possible. definer5-diagnostics-20200520-2251.zip
  9. Upgrading to Unraid 6.8 seems to have completely fixed this issue for me.
  10. Yeah, almost certainly what happened but I think we got that cleared up. The problem is, the default settings have the same high-water mark and split level as any share I may have incidentally created, but something is still breaking the 'rules' and overfilling disks. I really do not know how unraid, behind the scenes, handles filling multiple disks and what happens as they approach their high-water mark. My best guess as to what is happening in my case is something along the lines of ruTorrent recieved a bunch of files, checked for free space, then allocated them to a location; since torrents often download simultaneously and some of them approach the size of the min free space limit, somehow it caused the too much data to be written to a disk. Alternatively Radarr may have decided to copy certain files instead of hardlinking them resulting in double the space usage. I realize both dockers should only see the share and are not 'aware' that there even are multiple disks, but I can't come up with much else especially given that downloading is force to stop when a given disk gets 100% filled.
  11. Sorry for the late reply. Everything you said is quite clear and I appreciate the lengthy response. Looking through my diagnostics I still no not see any true "duplicates" that have been appended with a (1). There were three CFGs which were capitalized showing "# Share exists on no drives" that correspond with non-capitalized versions that do exist. I've deleted these configs and they were recreated automatically (with no capitalization) and default share settings. I then rebooted the server, moved files around so no disks were over there high-water mark, and then let downloading continue. Disk 7 filled right back up to 20.5KB free and downloading halted. This is my sentiment as well; I was quite surprised changing a share config initially seemed to fix the problem. I am thinking the issue has more to do with how ruTorrent and radarr and/or sonarr are handing files and less to do with share configuration. I will probably try and setup a different download client (or duplicate ruTorrent docker) and see if the problem persists. I am going to toss a final diagnostics on here, I think my share settings should be fine, but if there are still .cfg duplicates with a (1) could you let me know where you are seeing them? definer5-diagnostics-20191205-1831.zip
  12. I guess I spoke too soon... Same problem different disk. This time disk 7 blew right past the high-water mark and now has 20kb free. I will move the data to another disk and see if it gets overfilled again, here is a fresh diagnostics if anyone has any thoughts. definer5-diagnostics-20191203-0321.zip
  13. Hey, problem seems to be fixed, data is going to multiple disks and none have gone past the high-water mark. Thank you so much, I absolutely would never have figured that out on my own.
  14. Thanks, I am going to usethe unbalance plugin to spread the files out and see if incoming data is still being sent exclusively to disk13. Will update when its finished
  15. Okay, so I restarted and it was no longer considered unclean (idk what happened) and the array started fine without needing a full parity check. the 'media' share was re-setup so now /boot/config/shares has media.cfg and /mnt/user has a 'media' folder