2nd time - ~60 days server goes wonky


mfwade

Recommended Posts

All,

 

Need your help. For the 2nd time after ~60 days (last time was 59 if memory serves), my array is acting weird. When logging in to view the array, only 'some' of the tabs are shown, the web interface is slow and close to unresponsive. Shares are no longer visible on the network however, docker containers and virtual machines appear to be running. There is a constant Array Started . starting services... flashing in the lower left-hand corner of the web interface.

 

Additionally, I am concerned that if I just hard power down the array the parity check will kick off for over 20 plus hours. This also results in unresponsive performance, slow gui, etc. I did post about this some time ago and no errors were observed.

Original post: https://forums.unraid.net/topic/80455-unable-to-run-a-parity-check/?tab=comments#comment-747515

 

Current setup:

  • UNRAID Version 6.7.0 2019-05-08
  • HP DL365 G7
  • 192G RAM
  • AMD Opteron™ 6282 SE @ 2600 MHz
  • Qty. (1) H200 flashed to IT Mode – 1 Parity and 2 Cache drives
  • Qty. (1) H200 flashed to IT Mode – 1 Parity and 2 Cache drives
  • Qty. (1) H200E flashed to IT Mode - connected to external EMC SAS shelves
  • Qty. (30) 4TB Enterprise SAS Flash Drives (2 Parity, 28 Data)
  • Qty. (4) 4TB Enterprise SAS Flash Drives (BTRFS Cache Pool)
  • Only modification: /boot/config/go => rmmod acpi_power_meter
  • Plugins:
    • CA Auto Update Applications

    • CA Backup / Restore Appdata

    • Community Applications

    • Dynamix SSD TRIM

    • Fix Common Problems

    • Unassigned Devices

    • unBALANCE

 

I don’t remember every experiencing these conditions back on 6.6.7 however, I didn’t have as many SAS trays and drives attached. I am beginning to wonder if there is an underlying issue with the total number of drives albeit SAS and the SSD combination. That being said, in the arrays current condition there is no method available via the web gui to go back.

 

Diagnostics attached.

 

Thank you all in advance!

 

 

 

 

Screen Shot 2019-09-19 at 9.06.45 PM.png

Screen Shot 2019-09-19 at 9.07.01 PM.png

Screen Shot 2019-09-19 at 9.21.55 PM.png

Screen Shot 2019-09-19 at 9.33.56 PM.png

unraid-1-diagnostics-20190920-0058.zip

Link to comment

Flash drive problems:

 

Sep 19 19:57:38 unRAID-1 kernel: usb 1-4: USB disconnect, device number 2
Sep 19 19:57:38 unRAID-1 kernel: sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
Sep 19 19:57:38 unRAID-1 kernel: sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 00 05 bb db 00 00 01 00
Sep 19 19:57:38 unRAID-1 kernel: print_req_error: I/O error, dev sda, sector 375771
Sep 19 19:57:38 unRAID-1 kernel: Buffer I/O error on dev sda1, logical block 373723, lost async page write
Sep 19 19:57:38 unRAID-1 emhttpd: error: put_config_idx, 609: No such file or directory (2): fopen: /boot/config/shares/DUMP.cfg
Sep 19 19:57:38 unRAID-1 kernel: FAT-fs (sda1): Directory bread(block 29592) failed
Sep 19 19:57:38 unRAID-1 kernel: FAT-fs (sda1): Directory bread(block 29593) failed
Sep 19 19:57:38 unRAID-1 kernel: FAT-fs (sda1): Directory bread(block 29594) failed
Sep 19 19:57:38 unRAID-1 kernel: FAT-fs (sda1): Directory bread(block 29595) failed


 

Link to comment

Thanks Johnnie. I did see that after I sent the request for help. Is there an official replacement procedure to swap it out? I do have a backup of the drive albeit dated 5/31/2019 - so I at least have something.... I have more current backups (as of last week) on an unassigned device however, I can't access it in this current state.

 

Regardless, I still have to power it down so maybe I will try to grab a good copy after a reboot. I should be ok with parity as well however, if needed I will kick one off and wait patiently.

Link to comment

Well, tried creating 3 different USB drives with no luck. Tried with the backup zip file and then without (image from Unraid site). Still no luck. The USB would appear to boot, bring up the Unraid boot screen, then count down from 5 to 0 then continue to start over again. Tried to change the menu to i.e. w/Web Gui and nothing, just remains on the screen.

 

I did all of the USB drives from my Mac. Each of the drives were from a different manufacturer. All were 8G in size and USB 2.0. Tried the 6.7.0 and 6.7.2 images from the Unraid site. Give up...

 

Reinserted the perceived faulty drive and booted right up - just like last time (just rebooted without any troubleshooting). Seems odd that it happened right around ~60 days ago. Will keep an eye on it and if it starts to go wonky again, I will replace it with one made from a Windows machine (maybe there is a difference....). Or, if right around the 60 day mark, will formally request assistance from the big guy :) - as then it would no longer be a coincidence as the timing is too specific.

 

Still can't run a parity check as the machine becomes completely unusable to the point it drops pings intermittently and the web interface is unresponsive, shares not accessible, etc. Maybe one day the big guy @limetech will be able to duplicate this (doubtful) and offer up a solution. Maybe I am the only one running a full enterprise SSD SAS array with external shelves with an unlimited drives license - 28 data + 2 parity. At any rate, I can offer my support for testing if he/they would like....

 

Thanks again to those that answered. As always, a huge thank you for this forum!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.