ZipsServer Posted January 26, 2019 Share Posted January 26, 2019 (edited) Hi all, My server is running extremely slowly (100% CPU usage) for reasons I cannot understand. It idles at 20% with no dockers active and even updating a docker will cause CPU usage to spike to 100%. This is extremely uncharacteristic of my system (it can usually handle 5+ concurrent Plex streams). This started happening when I wasn't monitoring my downloads and accidentally filled up the cache array to 100% capacity (which obviously causes problems). Since then I have run a balance and a btrfs scrub with multiple restarts. I have also run the docker new permissions tool. This has not solved the problem and I am not sure what else to do. I assume it is filesystem related, but maybe it could be something else? mastertower-diagnostics-20190126-1206.zip EDIT: Forgot to mention that "top" does not show any processes that are taking up 20%+ of the CPU. So quite perplexing. Edited January 26, 2019 by ZipsServer Quote Link to comment
ZipsServer Posted January 26, 2019 Author Share Posted January 26, 2019 I am seeing some errors on the SMART report for the cache drives. I have a feeling there could be a bad SATA cable or poor connection since I recently put my cache drives in a back plane module. Not sure how to diagnose or correct this problem if it even exists. Quote Link to comment
BRiT Posted January 26, 2019 Share Posted January 26, 2019 I don't see anything that's not expected from the process/syslogs. At the time of the diagnostics capture it shows you have 2 "rsync" processing running that's causing the load from the "unbalance" plugin. nobody 15606 0.4 0.1 110180 14136 ? Sl 11:04 0:18 /usr/local/emhttp/plugins/unbalance/unbalance -port 6237 nobody 27569 21.4 0.0 20164 3308 ? D 12:03 0:39 \_ /usr/bin/rsync -avPR -X Video/Action Cam /mnt/disk2/ nobody 27570 0.0 0.0 19556 2556 ? S 12:03 0:00 \_ /usr/bin/rsync -avPR -X Video/Action Cam /mnt/disk2/ nobody 27571 15.3 0.0 19996 2384 ? S 12:03 0:28 \_ /usr/bin/rsync -avPR -X Video/Action Cam /mnt/disk2/ From your SMARTS on your cache drive(s), I'm not sure if the RAW value has any meaning for SSDs, but for most spinners, it does for field 187. Do you have a lot of power outages? If not, perhaps your power plane for the cache drives needs to be corrected. The larger ones report a number of "unexpected power loss" as well. Model Family: SandForce Driven SSDs Device Model: SanDisk SDSSDA240G Serial Number: 154836404031 LU WWN Device Id: 5 001b44 f188d033f Firmware Version: Z22000RL User Capacity: 240,057,409,536 bytes [240 GB] 174 Unexpect_Power_Loss_Ct -O--CK 100 100 000 - 24 187 Reported_Uncorrect -O--CK 100 100 000 - 5872 0x04 0x008 4 5872 --- Number of Reported Uncorrectable Errors --- Model Family: SandForce Driven SSDs Device Model: SanDisk SDSSDA240G Serial Number: 161337404732 LU WWN Device Id: 5 001b44 4a4758b33 Firmware Version: Z22000RL User Capacity: 240,057,409,536 bytes [240 GB] 174 Unexpect_Power_Loss_Ct -O--CK 100 100 000 - 16 187 Reported_Uncorrect -O--CK 100 100 000 - 2 0x000a 2 4 Device-to-host register FISes sent due to a COMRESET SMART Extended Comprehensive Error Log Version: 1 (1 sectors) Device Error Count: 1 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 [0] log entry is empty SMART Extended Self-test Log Version: 1 (1 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] --- Model Family: SandForce Driven SSDs Device Model: TS64GSSD320 Serial Number: A2910611949538650037 LU WWN Device Id: 0 0232d0 000000000 Firmware Version: 5.0.2 User Capacity: 64,023,257,088 bytes [64.0 GB] 174 Unexpect_Power_Loss_Ct ----CK 000 000 000 - 122 0x0009 2 1 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 1 Device-to-host register FISes sent due to a COMRESET Quote Link to comment
ZipsServer Posted January 26, 2019 Author Share Posted January 26, 2019 Thanks BRiT, I was running unBalance on an array drive, but that was unrelated to the current problem. I didn't notice the power outage metric. My server runs on a UPS so I doubt it is an issue with mains voltage. However, it could be possible that I need a second PSU or a larger PSU? Maybe when all drives spin up it causes brown-out? Still doesn't explain why the system/cache drives are so slow though Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.