Jump to content

3 years no issues, until now .....


blade316

Recommended Posts

Hey everyone!

 

Hope your 2018 is off to a great start! :)

 

I haven't really had to the need to post here before as I've been running my unRAID server for about 3 years with almost no issues at all, so if I miss something in this please let me know.

 

Before I begin, I have attached my diagnostics file: 

 

For the last 3 years I have been using unRAID as my NAS and media server, and its been pretty flawless, however in recent months I have noticed a dramatic decrease in performance and speed across all of my unRAID services. I first noticed it when I tried to install the 'PiHole' docker, and because PiHole has to operate on port 80, I had to change my unRAID port to something else ... i went with 8008. I never could get PiHole running properly so I ended up removing it, but left the UnRAID port at 8008, and looking back on it now, thats when I think my performance problems started. Since the recent upgrade to 6.4.1 the port has returned to port 80.

 

One of the issues I am having is incredibly slow response times from UnRAID when:

 

  1. Trying to map a drive in Windows or Volume on Mac, both using SMB. On Mac when I connect to server, it will sit there for around 20 seconds before prompting me for credentials to connect.
  2. Trying to browse folders and files on either Windows or Mac, it will just sit there loading the folder or files list for around 15-30 seconds, before presenting the folders/files

 

Another issue, but I believe its due to the above is really poor performance in my docker containers such as Plex, Emby, CouchPotato, Radar, Sonarr, NZBget etc. When browsing my media using Plex or Emby it takes ages to load the movie details and artwork, and sometimes it wont load the details at all and I can only see the movie name. I have tested this using all my mac and windows machines using the plex app and through the browser, along with Emby through the browser and also using the Roku and Chromecast apps. Results vary only slightly, all are very slow.

 

Whilst I was testing these I was running 'HTOP' in terminal watching the usage, and I noticed that when Plex and Emby was loading the movie details on whichever client/app I was using, there was quick spikes in the CPU usage anywhere from 25-160%, usually lasting around 2 seconds at a time. I have also noticed that performance gets even worse when NZBGet is downloading, and then worse again when its extracting the contents of a download.

 

My first thought was to set up some resource restrictions on each of my docker containers, to try and give containers like Plex and Emby priority, however it didn't seem to make a difference. I have also removed and reinstalled my docker containers, but that didn't make a difference.

 

Also I should point out that I have had this same setup since the beginning, and at the beginning everything was incredibly fast, I could play between 4-5 1080p streams simultaneously, along with NZBGet downloading and extracting, along with copying files to and from machines. Now I'm struggling to get 1-2 1080p streams and thats it, can hardly even browse the network files.

 

My next thought I had was perhaps one or more of my HDDs are starting to fail, however I have checked the SMART details, and nothing shows. I then thought perhaps I need to a cache drive, however:

 

  1. I'm not copying large files to an from the server really
  2. I started with 80% of the content I have now, and it was lightning fast at the beginning, so I don't believe a cache drive would resolve my specific issues.

 

Regarding my hardware, details are in the diagnostics file, but my HDD set up is:

 

  1. 8 HDDs - some are 8TB others are 4TB
  2. Some are connected to the SATA ports on my MB, and the rest are connected to an Adaptec RAID card in JBOD mode

 

Outside of the UnRAID box, I am running a mix of gigabit ethernet clients and things like my laptops, Roku and Chromecast are on dual band wifi, using N or AC. I run a Linksys WRT1900ACSv2 router with LEDE installed, and I have factory reset that and set up again, just in case - made no difference.

 

I'm really at a loss as to what it could be, the only things I haven't done is performed a complete fresh install of UnRAID, or tried moving from XFS to BTRFS, basically because I don't really believe that would solve the issue, as this same set up was working great for the last 3 years.

 

Any help or advice will be greatly appreciated, have a great day! :)

 

Regards,

 

Dan

 

 

 

unraid-diagnostics-20180218-1346.zip

Link to comment
13 minutes ago, SSD said:

Sorry on phone and not able to review your logs.

 

Are you, by chance, using an SSD for your Dockers? If so, are you ruining trim? After a long period of use without trimming, SSD performance can dramatically decrease.

 

Hi SSD, thanks for being the first to reply :) 

 

No I do not have an SSD at all in my UnRAID server, never needed to I guess, everything has been great until recently.

Link to comment

Syslog is practically empty, only these lines:

 

Feb 18 13:37:13 unRAID emhttpd: req (12): shareMoverSchedule=40+3+*+*+*&shareMoverLogging=no&changeMover=Apply&csrf_token=****************
Feb 18 13:37:13 unRAID rsyslogd: action 'action 0' resumed (module 'builtin:omfile') [v8.29.0 try http://www.rsyslog.com/e/2359 ]
Feb 18 13:37:13 unRAID rsyslogd: action 'action 0' resumed (module 'builtin:omfile') [v8.29.0 try http://www.rsyslog.com/e/2359 ]
Feb 18 13:37:13 unRAID emhttpd: shcmd (25343): /usr/local/sbin/update_cron
Feb 18 13:37:21 unRAID root: Fix Common Problems Version 2018.02.16
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO is currently set up to run in host mode
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO, Container Port 5800 not found or changed on installed application
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO, Container Port 5900 not found or changed on installed application
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered blocking state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:02 unRAID kernel: device veth987da36 entered promiscuous mode
Feb 18 13:42:02 unRAID kernel: IPv6: ADDRCONF(NETDEV_UP): veth987da36: link is not ready
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered blocking state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered forwarding state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:03 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:03 unRAID kernel: device veth987da36 left promiscuous mode
Feb 18 13:42:03 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:45 unRAID kernel: docker0: port 7(veth9ace144) entered blocking state
Feb 18 13:42:45 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state
Feb 18 13:42:45 unRAID kernel: device veth9ace144 entered promiscuous mode
Feb 18 13:42:45 unRAID kernel: IPv6: ADDRCONF(NETDEV_UP): veth9ace144: link is not ready
Feb 18 13:42:47 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state
Feb 18 13:42:47 unRAID kernel: device veth9ace144 left promiscuous mode
Feb 18 13:42:47 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state

Try grabbing new diags, or reboot let it run a few hours and grabbed them.

Link to comment
10 minutes ago, johnnie.black said:

Syslog is practically empty, only these lines:

 


Feb 18 13:37:13 unRAID emhttpd: req (12): shareMoverSchedule=40+3+*+*+*&shareMoverLogging=no&changeMover=Apply&csrf_token=****************
Feb 18 13:37:13 unRAID rsyslogd: action 'action 0' resumed (module 'builtin:omfile') [v8.29.0 try http://www.rsyslog.com/e/2359 ]
Feb 18 13:37:13 unRAID rsyslogd: action 'action 0' resumed (module 'builtin:omfile') [v8.29.0 try http://www.rsyslog.com/e/2359 ]
Feb 18 13:37:13 unRAID emhttpd: shcmd (25343): /usr/local/sbin/update_cron
Feb 18 13:37:21 unRAID root: Fix Common Problems Version 2018.02.16
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO is currently set up to run in host mode
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO, Container Port 5800 not found or changed on installed application
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO, Container Port 5900 not found or changed on installed application
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered blocking state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:02 unRAID kernel: device veth987da36 entered promiscuous mode
Feb 18 13:42:02 unRAID kernel: IPv6: ADDRCONF(NETDEV_UP): veth987da36: link is not ready
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered blocking state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered forwarding state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:03 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:03 unRAID kernel: device veth987da36 left promiscuous mode
Feb 18 13:42:03 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:45 unRAID kernel: docker0: port 7(veth9ace144) entered blocking state
Feb 18 13:42:45 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state
Feb 18 13:42:45 unRAID kernel: device veth9ace144 entered promiscuous mode
Feb 18 13:42:45 unRAID kernel: IPv6: ADDRCONF(NETDEV_UP): veth9ace144: link is not ready
Feb 18 13:42:47 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state
Feb 18 13:42:47 unRAID kernel: device veth9ace144 left promiscuous mode
Feb 18 13:42:47 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state

Try grabbing new diags, or reboot let it run a few hours and grabbed them.

 

OK I'll restart the server, and let it run overnight, then i'll post new diags

Link to comment
20 hours ago, johnnie.black said:

Syslog is practically empty, only these lines:

 


Feb 18 13:37:13 unRAID emhttpd: req (12): shareMoverSchedule=40+3+*+*+*&shareMoverLogging=no&changeMover=Apply&csrf_token=****************
Feb 18 13:37:13 unRAID rsyslogd: action 'action 0' resumed (module 'builtin:omfile') [v8.29.0 try http://www.rsyslog.com/e/2359 ]
Feb 18 13:37:13 unRAID rsyslogd: action 'action 0' resumed (module 'builtin:omfile') [v8.29.0 try http://www.rsyslog.com/e/2359 ]
Feb 18 13:37:13 unRAID emhttpd: shcmd (25343): /usr/local/sbin/update_cron
Feb 18 13:37:21 unRAID root: Fix Common Problems Version 2018.02.16
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO is currently set up to run in host mode
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO, Container Port 5800 not found or changed on installed application
Feb 18 13:37:37 unRAID root: Fix Common Problems: Error: Docker Application CrashPlanPRO, Container Port 5900 not found or changed on installed application
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered blocking state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:02 unRAID kernel: device veth987da36 entered promiscuous mode
Feb 18 13:42:02 unRAID kernel: IPv6: ADDRCONF(NETDEV_UP): veth987da36: link is not ready
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered blocking state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered forwarding state
Feb 18 13:42:02 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:03 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:03 unRAID kernel: device veth987da36 left promiscuous mode
Feb 18 13:42:03 unRAID kernel: docker0: port 7(veth987da36) entered disabled state
Feb 18 13:42:45 unRAID kernel: docker0: port 7(veth9ace144) entered blocking state
Feb 18 13:42:45 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state
Feb 18 13:42:45 unRAID kernel: device veth9ace144 entered promiscuous mode
Feb 18 13:42:45 unRAID kernel: IPv6: ADDRCONF(NETDEV_UP): veth9ace144: link is not ready
Feb 18 13:42:47 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state
Feb 18 13:42:47 unRAID kernel: device veth9ace144 left promiscuous mode
Feb 18 13:42:47 unRAID kernel: docker0: port 7(veth9ace144) entered disabled state

Try grabbing new diags, or reboot let it run a few hours and grabbed them.

 

@johnnie.black I have rebooted the server and let it run all day today, please find the attached new diags.

 

 

unraid-diagnostics-20180220-1822.zip

Link to comment
2 hours ago, johnnie.black said:

Don't see nothing out of ordinary on the diagnostics.

 

Still seeing the same symptoms during the time covered on them?

Any hardware change when you started noticing the issues?

If you start a parity check does it run at a normal speed or also slow?

 

@johnnie.black 

yep I was still seeing the same issues when I got home

No hardware changes and no changes at all apart from what I mentioned about trying to get the PiHole docker running at the time, and changing my unRAID port to 8008, which is now back to port 80 after updating to 6.4

 

@jedimstr @Zonediver @johnnie.black 

 

So I have a larger issue now, I thought it could have been my adaptec controller perhaps doing something strange, so i checked the forums and saw a number of posts saying as of unRAID 5 the disk ID's are done using the serial numbers, so it shouldn't matter what SATA ports they are changed to. As long as I restart the server with all the same drives connected.

 

I had my two 8TB drives on my SATA ports, with my other 6 drives connected to my adaptec card, so as a test, I shut the server down, disconnected 4 of my drives from the adaptec card, and connected them to my 4 remaining SATA ports on my MB, so now 6 on SATA and 2 on the adaptec.

 

I restarted, updated my JBOD config to confirm the two remaining disks - all ok ..... however now in unRAID its saying I have an invalid array configuration (stale configuration)

 

I've attached what I see in my dashboard, and when I select one the disks that it says is wrong, I just get the drive name or 'no device'

I really don't want to lose my data, so if someone could point me in the right direction that would be a great help.

 

 

 

 

ArrayError.png

ArrayError2.png

Link to comment

I can see the entries in the syslog for the wrong disks, however its just basically telling me what the web UI is saying:

 

Feb 20 22:19:29 unRAID emhttpd: unRAID System Management Utility version 6.4.1
Feb 20 22:19:29 unRAID emhttpd: Copyright (C) 2005-2018, Lime Technology, Inc.
Feb 20 22:19:29 unRAID emhttpd: shcmd (8): modprobe md-mod super=/boot/config/super.dat
Feb 20 22:19:29 unRAID kernel: md: unRAID driver 2.9.3 installed
Feb 20 22:19:29 unRAID emhttpd: Pro key detected, GUID: 1..A FILE: /boot/config/Pro.key
Feb 20 22:19:29 unRAID emhttpd: Device inventory:
Feb 20 22:19:29 unRAID emhttpd: ST4000VN000-1H4168_Z3051JA1 (sdg) 512 7814037168
Feb 20 22:19:29 unRAID emhttpd: ST4000VN000-1H4168_Z3051QXZ (sdh) 512 7814037168
Feb 20 22:19:29 unRAID emhttpd: ST8000AS0002-1NA17Z_Z840EEM3 (sdd) 512 15628053168
Feb 20 22:19:29 unRAID emhttpd: WDC_WD80EFZX-68UW8N0_VLH3727Y (sde) 512 15628053168
Feb 20 22:19:29 unRAID emhttpd: ST2000DM001-1CH164_Z1E8DLBM (sdb) 512 3900682240
Feb 20 22:19:29 unRAID emhttpd: ST4000DM000-1F2168_S300QKX8 (sdf) 512 7814037168
Feb 20 22:19:29 unRAID emhttpd: ST2000DM001-1CH164_Z1E8CNVZ (sdc) 512 3900682240
Feb 20 22:19:29 unRAID emhttpd: ST4000VN000-1H4168_Z3051PA2 (sdi) 512 7814037168
Feb 20 22:19:29 unRAID emhttpd: Verbatim_STORE_N_GO_1208000000008EFA-0:0 (sda) 512 15669248
Feb 20 22:19:29 unRAID kernel: mdcmd (1): import 0 sde 64 7814026532 0 WDC_WD80EFZX-68UW8N0_VLH3727Y
Feb 20 22:19:29 unRAID kernel: md: import disk0: (sde) WDC_WD80EFZX-68UW8N0_VLH3727Y size: 7814026532 
Feb 20 22:19:29 unRAID kernel: mdcmd (2): import 1 sdg 64 3907018532 0 ST4000VN000-1H4168_Z3051JA1
Feb 20 22:19:29 unRAID kernel: md: import disk1: (sdg) ST4000VN000-1H4168_Z3051JA1 size: 3907018532 
Feb 20 22:19:29 unRAID kernel: md: import_slot: 1 wrong
Feb 20 22:19:29 unRAID kernel: mdcmd (3): import 2 sdi 64 3907018532 0 ST4000VN000-1H4168_Z3051PA2
Feb 20 22:19:29 unRAID kernel: md: import disk2: (sdi) ST4000VN000-1H4168_Z3051PA2 size: 3907018532 
Feb 20 22:19:29 unRAID kernel: md: import_slot: 2 wrong
Feb 20 22:19:29 unRAID kernel: mdcmd (4): import 3 sdh 64 3907018532 0 ST4000VN000-1H4168_Z3051QXZ
Feb 20 22:19:29 unRAID kernel: md: import disk3: (sdh) ST4000VN000-1H4168_Z3051QXZ size: 3907018532 
Feb 20 22:19:29 unRAID kernel: md: import_slot: 3 wrong
Feb 20 22:19:29 unRAID kernel: mdcmd (5): import 4 sdf 64 3907018532 0 ST4000DM000-1F2168_S300QKX8
Feb 20 22:19:29 unRAID kernel: md: import disk4: (sdf) ST4000DM000-1F2168_S300QKX8 size: 3907018532 
Feb 20 22:19:29 unRAID kernel: md: import_slot: 4 wrong
Feb 20 22:19:29 unRAID kernel: mdcmd (6): import 5 sdb 64 1950341088 0 ST2000DM001-1CH164_Z1E8DLBM
Feb 20 22:19:29 unRAID kernel: md: import disk5: (sdb) ST2000DM001-1CH164_Z1E8DLBM size: 1950341088 
Feb 20 22:19:29 unRAID kernel: mdcmd (7): import 6 sdc 64 1950341088 0 ST2000DM001-1CH164_Z1E8CNVZ
Feb 20 22:19:29 unRAID kernel: md: import disk6: (sdc) ST2000DM001-1CH164_Z1E8CNVZ size: 1950341088 
Feb 20 22:19:29 unRAID kernel: mdcmd (8): import 7 sdd 64 7814026532 0 ST8000AS0002-1NA17Z_Z840EEM3
Feb 20 22:19:29 unRAID emhttpd: import 30 cache device: no device
Feb 20 22:19:29 unRAID kernel: md: import disk7: (sdd) ST8000AS0002-1NA17Z_Z840EEM3 size: 7814026532 

 

 

Link to comment

The problem is that the Adaptec controller was not using the total disk capacity, they were slightly smaller:

 

  => 3907018532 -> current size
 [status] => DISK_WRONG
 [sizeSb] => 3905935308 -> previous size

 

This is one of the reasons RAID controllers are not recommended, HBAs only, if you do a new config you'll get unmountable disks because the existing partition will be considered invalid.

Link to comment
1 hour ago, johnnie.black said:

The problem is that the Adaptec controller was not using the total disk capacity, they were slightly smaller:

 

  => 3907018532 -> current size
 [status] => DISK_WRONG
 [sizeSb] => 3905935308 -> previous size

 

This is one of the reasons RAID controllers are not recommended, HBAs only, if you do a new config you'll get unmountable disks because the existing partition will be considered invalid.

 

Yep fair enough, lesson learned I guess.

 

1 hour ago, johnnie.black said:

You'll need to leave the disks as they were or rebuild one by one after changing controller, so unRAID can expand the partition and filesystem.

 

ok so as it stands right now, with the 4 disks being marked 'wrong', can I rebuild them one at a time from this current state? As my 8TB parity disk is valid, will I lose any data that's on any of those four 4TB disks that are currently marked as wrong?

 

Is there an existing guide to assist me in how to rebuild one disk at a time in the current state my disks are in? 

 

Also I really appreciate you taking the time to assist me trying to get this fixed  @johnnie.black

Link to comment
4 hours ago, johnnie.black said:

The problem is that the Adaptec controller was not using the total disk capacity, they were slightly smaller:

 

  => 3907018532 -> current size
 [status] => DISK_WRONG
 [sizeSb] => 3905935308 -> previous size

 

This is one of the reasons RAID controllers are not recommended, HBAs only, if you do a new config you'll get unmountable disks because the existing partition will be considered invalid.

 

I hear that for the first time - i have two Adaptec's and never had this problem... (???)

Link to comment
4 minutes ago, johnnie.black said:

It's not because it's an Adaptec, it's because it's an Adaptec RAID controller, not HBA or in HBA mode.

 

And since when is the "Raid-Controller" a problem? Since v6 or earlier?

In my case, the BIOS of both Adaptec's is deactivated, so i hope, i will never have this strange problem...

Link to comment
3 hours ago, Zonediver said:

 

And since when is the "Raid-Controller" a problem? Since v6 or earlier?

In my case, the BIOS of both Adaptec's is deactivated, so i hope, i will never have this strange problem...

 

Yeah it's a strange error, I do remember a long time ago when I wanted to upgrade my parity drive to an 8TB, I purchased two 8TB drives .... one was connected to onboard SATA and the other to the Adaptec, and it wouldn't let me set the one connected to the Adaptec as the parity drive, as it said it 'wasn't the largest drive' .... and looking into it, the Adaptec reported it a tiny bit smaller than the 8TB that's connected to onboard SATA. Once I moved both to onboard SATA they were identical in size.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...