Jump to content

RIDGID

Members
  • Posts

    26
  • Joined

  • Last visited

Posts posted by RIDGID

  1. Good call. 

    Correctable Memory ECC @ DIMMC1(CPU1) - Asserted

    Repeated ad nauseum in the event log. Believe I've isolated the bad stick and removed it, though it was in H1 not C1 so I will monitor for additional issues.

     

    Marking this solved as I know to look at syslog and impi events now. Thanks for the assistance.

  2. Here is my syslog from the most recent crash. Looking at this bit

    Apr  9 06:52:48 Supermicro kernel: mce: [Hardware Error]: Machine check events logged
    Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
    Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010093
    Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: TSC 6d1be24ad4e8c 
    Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: ADDR c4fce24c0 
    Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: MISC 40381286 
    Apr  9 06:52:48 Supermicro kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1617965568 SOCKET 0 APIC 0
    Apr  9 06:52:48 Supermicro kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0xc4fce2 offset:0x4c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1)

    I'm guessing bad memory stick?

    syslog

  3. Thanks, that is great info I never would have picked up on myself!

     

    Looks like my libvert.img file got moved to disk 11 somehow, I've moved it back onto the cache/system folder. No clue how it could have happened but I'm glad to fix it and will keep an eye on it in the future. 

     

    As for the docker.img, a few years back I had an issue with it getting filled (I believe by radarr or rutorrent logs or something) and I probably increased the size hoping to fix the issue as I had a 2TB cache at the time. its been that way so long I forgot 50g wasnt the default. Any advantage to making it smaller other than saving 30g on  the cache?

  4. Recently I have had unraid go full unresponsive on me a couple times. Webui gone, no SSH, can't ping, not visible to my router, no video output, but still powered on and active. Trying to restart from IPMI give the error below.

     

    1664122049_Screenshot2021-04-05181406.png.1e900ad09b591d85ca185e001cf3b0f7.png

     

    First unclean shutdown was 31Mar, 12TB parity check finshed 02Apr @ 5am. Crashed again 04Apr, parity check started at 2pm. Server unresponsive again this afternoon 05Apr 6pm. When I rebooted I get a notification that the parity check finished with 0 errors (Average speed: nan B/s), I assume it failed. I've now rebooted and the parity check is running currently. 

     

    I've attached my diagnostics, maybe someone smarter than me could lend some insight as to what may be causing these crashes? Nothing in the logs jumped out at me, but I am not quite sure what to look for.

     

    Only two notable changes I've made recently to the otherwise stable server are:

     

    1. Changing the server name via Settings>Identification 

     

    2. Upgrading to 6.9.1

    supermicro-diagnostics-20210405-1821.zip

  5. Could someone help me understand what this process means and why it is taking up so much of my CPU usage (20-90% at times, 1-2% with more or less everything idle)

     

    "/usr/local/sbin/shfs /mnt/user -disks 65535 2048000000 -o noatime,allow_other -o remember=0"

     

    nt8uy8l.jpg

     

    I'm having issues with high CPU usage associated with sonarr/radarr and figuring out this process seems to be a starting point. Any insight appreciated. 

     

    definer5-diagnostics-20200711-1018.zip

  6. A couple days ago Plex and some other dockers I had running became sluggish and unresponsive. I went to stop and restart the docker service but was unable to get it to stop. I didn't think much about it and went to stop the array which got stuck hung up trying to unmount the cache drive. The diagnostics show the following 5 lines repeating.

     

    May 20 22:51:06 DefineR5 emhttpd: Retry unmounting disk share(s)...
    May 20 22:51:11 DefineR5 emhttpd: Unmounting disks...
    May 20 22:51:11 DefineR5 emhttpd: shcmd (94604): umount /mnt/cache
    May 20 22:51:11 DefineR5 root: umount: /mnt/cache: target is busy.
    May 20 22:51:11 DefineR5 emhttpd: shcmd (94604): exit status: 32

    At this point I opted to reboot the system. I've attached the diagnostics from that shutdown.

     

    Upon rebooting, my cache drive--a 1TB sandisk ssd--was missing. Taking a trip to the bios I see that the SSD is indeed missing, sort of... 

     

    https://imgur.com/DdHYURN.jpg

     

    You can see that it does not show up in the storage config (it is the only device plugged into the mobo sata) but interestingly does show up as a boot device. 

     

    Since it is still sort of recognized I am not sure it is dead-dead. It seems mostly dead, but mostly dead is slightly alive.

     

    I pulled the drive and put it into a USB enclosure and hooked it back into my server. From the system log on plugging it in:

     

    May 23 11:04:38 DefineR5 kernel: usb 2-6: new SuperSpeed Gen 1 USB device number 26 using xhci_hcd
    May 23 11:04:38 DefineR5 kernel: usb-storage 2-6:1.0: USB Mass Storage device detected
    May 23 11:04:38 DefineR5 kernel: scsi host11: usb-storage 2-6:1.0
    May 23 11:04:39 DefineR5 kernel: scsi 11:0:0:0: Direct-Access TO Exter nal USB 3.0 6101 PQ: 0 ANSI: 6
    May 23 11:04:39 DefineR5 kernel: sd 11:0:0:0: Attached scsi generic sg18 type 0
    May 23 11:04:49 DefineR5 kernel: sd 11:0:0:0: [sds] Spinning up disk...
    May 23 11:05:16 DefineR5 kernel: ....ready
    May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] 1875385008 512-byte logical blocks: (960 GB/894 GiB)
    May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Write Protect is off
    May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Mode Sense: 47 00 00 08
    May 23 11:05:16 DefineR5 kernel: sd 11:0:0:0: [sds] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
    May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current]
    May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1
    May 23 11:08:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
    May 23 11:08:17 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0
    May 23 11:08:17 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read
    May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
    May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current]
    May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1
    May 23 11:11:17 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
    May 23 11:11:17 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0
    May 23 11:11:17 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read
    May 23 11:11:17 DefineR5 kernel: ldm_validate_partition_table(): Disk read failed.
    May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
    May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current]
    May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1
    May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
    May 23 11:14:18 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 0
    May 23 11:14:18 DefineR5 kernel: Buffer I/O error on dev sds, logical block 0, async page read
    May 23 11:14:18 DefineR5 kernel: sds: unable to read partition table
    May 23 11:14:18 DefineR5 rc.diskinfo[10409]: SIGHUP received, forcing refresh of disks info.
    May 23 11:14:18 DefineR5 kernel: sd 11:0:0:0: [sds] Spinning up disk...
    May 23 11:17:18 DefineR5 kernel: ..
    May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
    May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 Sense Key : 0x2 [current]
    May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 ASC=0x4 ASCQ=0x1
    May 23 11:17:21 DefineR5 kernel: sd 11:0:0:0: [sds] tag#0 CDB: opcode=0x28 28 00 6f c8 1a 00 00 00 08 00
    May 23 11:17:21 DefineR5 kernel: print_req_error: I/O error, dev sds, sector 1875384832
    May 23 11:17:22 DefineR5 kernel: .not responding...
    May 23 11:17:25 DefineR5 kernel: sd 11:0:0:0: [sds] Attached SCSI disk

    Is there anything I can do to potentially recover the data off this drive? It is, stupidly, not backed-up. Losing the data wouldn't be the end of the world, but I'd rather not have to reconfigure all of my dockers and lose my plex logs and torrent client state if possible. 

     

    definer5-diagnostics-20200520-2251.zip

  7. 31 minutes ago, trurl said:

    Linux is case-sensitive so /mnt/user/Media is a different path than /mnt/user/media.

     

    Using the wrong upper/lower case in a docker mapping is a very common way to get shares with default settings accidentally created.

    Yeah, almost certainly what happened but I think we got that cleared up. The problem is, the default settings have the same high-water mark and split level as any share I may have incidentally created, but something is still breaking the 'rules' and overfilling disks. 

     

    I really do not know how unraid, behind the scenes, handles filling multiple disks and what happens as they approach their high-water mark. My best guess as to what is happening in my case is something along the lines of ruTorrent recieved a bunch of files, checked for free space, then allocated them to a location; since torrents often download simultaneously and some of them approach the size of the min free space limit, somehow it caused the too much data to be written to a disk. Alternatively Radarr may have decided to copy certain files instead of hardlinking them resulting in double the space usage. 

     

    I realize both dockers should only see the share and are not 'aware' that there even are multiple disks, but I can't come up with much else especially given that downloading is force to stop when a given disk gets 100% filled.

     

  8. Sorry for the late reply. Everything you said is quite clear and I appreciate the lengthy response. 

     

    Looking through my diagnostics I still no not see any true "duplicates" that have been appended with a (1). There were three CFGs which were capitalized showing "# Share exists on no drives" that correspond with non-capitalized versions that do exist. I've deleted these configs and they were recreated automatically (with no capitalization) and default share settings.

     

    I then rebooted the server, moved files around so no disks were over there high-water mark, and then let downloading continue. Disk 7 filled right back up to 20.5KB free and downloading halted. 

     

    On 12/3/2019 at 8:55 AM, trurl said:

    I don't really see how this could cause the symptoms you are having, since default share settings are highwater, split any, 0 minimum, and cache-no.

     

    This is my sentiment as well; I was quite surprised changing a share config initially seemed to fix the problem. I am thinking the issue has more to do with how ruTorrent and radarr and/or sonarr are handing files and less to do with share configuration. I will probably try and setup a different download client (or duplicate ruTorrent docker) and see if the problem persists. 

     

    I am going to toss a final diagnostics on here, I think my share settings should be fine, but if there are still .cfg duplicates with a (1) could you let me know where you are seeing them? 

    definer5-diagnostics-20191205-1831.zip

  9. 19 minutes ago, johnnie.black said:

    Check all your disks, if media and Media exists on any of them, as a top folder.

    No Media folder on any disk.

     

    12 minutes ago, trurl said:

    According to your diagnostics, your media(1).cfg has settings. This is likely the Media.cfg file you are seeing. But there is no share by that name (no files). The media share itself has default settings, and that is the share that has the files. You might try deleting Media.cfg, reboot and make settings for the media share to see if that clears up the confusion.

    I am not seeing "media(1).cfg" anywhere. Diagnostics file has a M---a.cfg and m---a.cfg file, but my only share in /mnt/user is "media" and in /boot/config/shares I only have Media.cfg (which must correspond to my actual media share).  I am hesitant to delete the .cfg as the majority of my files are in that share.

  10.  I have an array of 16 drives (1 parity) + cache, for some reason recently disk13 is getting filled to 100%. MY shares are setup as high-water with 100gb min free space and not to use the cache drive. The array has over 14TBs of free space still, but disk 13 is getting overfilled. I have deleted and/or moved files from the disk manually but it fills right back up when new data is coming in. 

     

    Syslog is attached, any help would be greatly appreciated.

     

    (Solved)

     

    definer5-diagnostics-20191129-1834.zip

  11. Background: I have a pretty basic unraid server for radarr/sonarr >> Rutorrent (linuxserver docker) >> Plex running on an i7-6700k / 16gb. Pretty much everything I know about unraid is thanks to Spaceinvaderone.

     

    I made the critical mistake of adding some very large lists to my radarr library in monitored status. About 1000 torrents got queued up in rutorrent (which was already seeding around 3k torrents). Things seemed to be okay for a while, around 200 files successfully downloaded and process via radarr. Rutorrent goes unresponsive, 502 gateway, whole server running like molasses, CPU/RAM both at 100% according to the dashboard. I restart the container but after a few hours it become inaccessible again. System log shows:

    Nov 15 20:42:01 DefineR5 kernel: Out of memory: Kill process 5960 (rtorrent main) score 781 or sacrifice child
    Nov 15 20:42:01 DefineR5 kernel: Killed process 5960 (rtorrent main) total-vm:14805204kB, anon-rss:12587032kB, file-rss:0kB, shmem-rss:0kB

    After it kills the process, the server is still hung up on something that only stopping the container will fix. I check logs in appdata, the most relevant one at 
    appdata\rutorrent\log\php\error.txt:

    [17-Nov-2019 07:00:00] NOTICE: fpm is running, pid 11005
    [17-Nov-2019 07:00:00] NOTICE: ready to handle connections
    [17-Nov-2019 19:06:51] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
    [17-Nov-2019 19:15:43] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
    [17-Nov-2019 19:25:04] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
    [17-Nov-2019 19:40:17] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
    [17-Nov-2019 20:02:24] NOTICE: Terminating ...
    [17-Nov-2019 20:02:27] NOTICE: exiting, bye-bye!
    [18-Nov-2019 04:07:56] NOTICE: fpm is running, pid 313
    [18-Nov-2019 04:07:56] NOTICE: ready to handle connections
    [18-Nov-2019 15:19:47] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

    I tried to raise the pm.max_children setting to 20 by editing the www2.conf file but that failed with the following error:

    [22-Nov-2019 02:30:47] ERROR: [pool www] the chdir path '/usr/local/emhttp' does not exist or is not a directory
    
    [22-Nov-2019 02:30:47] ERROR: failed to post process the configuration
    
    [22-Nov-2019 02:30:47] ERROR: FPM initialization failed

    I am bit out of my depth here. As it stands I have 800 partially downloaded torrents (most 20-99% complete), many of which throw 

     

    hash check on download completion found bad chunks, consider using \"safe_sync\".

     

    when checked, despite having pieces.sync.always_safe.set=1 at the bottom of my otherwise stock config.

     

    Even with most of the torrents stopped it keeps crashing. Twice now it has also taken the entire unraid webui with it, and I have had to ssh in to reboot which then requires a 20+hr parity check. 

     

    I feel like I am probably missing something simple, but it increasingly seems like an uphill battle and I am officially lost.

     

     

×
×
  • Create New...