Jump to content

nwootton

Members
  • Posts

    21
  • Joined

  • Last visited

Everything posted by nwootton

  1. Ended up rolling back to 6.12.4 until the new Dell PERC card was installed. System remained stable with no further issues for the following 2 weeks (29th Feb). Yesterday I installed the new HBA card. Still running 6.12.4 without issues
  2. Just ordered Dell PERC H310 for £20. Hoping it will fit my motherboard and can be stop gap until I can source a couple of LSI cards. Probably 9211-8i or later. Will have to see if that solves the issue.
  3. Same cards have run for the past x years without an issue - never once dropping a disk. What would cause them to start now? Spin down was turned on after the third failure - though maybe the constant running of disks might be the cause. Was off after the update and subsequent failures. What are the currently recommended alternatives?
  4. Going a little nuts here, getting multiple repeating failure scenarios with same disk number, but no actual errors when disks are checked. Background. Recently upgraded to latest version (6.12.6 at time of writing). As part of this I also installed the Fix Common Problems plugin. It flagged 2 issues. I still had 4 reisferfs formatted disks in my box (cache & 3 array drives) Some of my dockers were pointing at a non-array disk (unassigned device). Nothing significant, so upgrade is done. I do need to get rid of the reiserfs before the capability gets removed from the kernel. After some research I followed this method to move data from the reiserfs disks. Using a brand new 4Tb disk, I followed the procedure (rsync in tmux session with web ui to see read & write numbers changing) and got the first disk migrated without issue. I then left the array to ‘settle’ for a couple of days to make sure everything was ok. Went back to do the same with the next disk. At some point the web ui stopped responding and although I could switch between tabs, nothing updated. In the tmux session the rsync was proceeding and after it completed, the web ui still failed to act on commands. Eg I’d request disk spin down and it would indicate it was doing it but then the disk wouldn’t change. Stop array, reboot and shutdown all produced the correct dialogues, but the event wouldn’t happen. I left the web ui open but unused for a while and it remained unhelpful. Opening it in other browsers, forcing cache clear all failed to allow control. Eventually as a last resort I did a shutdown via ssh. I checked all the connections and then rebooted. Server came up and informed me that I no longer had a license key. After multiple reboots the system now agrees that I do still have a valid license and began to work as expected. This leaves me with 2 remaining reiserfs disks I want to migrate. Issue I left the server running for several days and it appeared fine. Then I get a failure in disk4. Array is emulated warning, so I check SMART status and there are no errors. Put the array into maintenance mode and run check disk on the xfs drive. No issues. Run fix anyway. In fact nothing I do indicates an issue with the disk. Swap the disk out for another, parity rebuild takes place (12 hrs) and the new disk is running. Array appears ok. Turn on Docker containers. Next morning, disk4 is in an error state, array is emulated. Run the same SMART and xfs routines and no issues found. Swap the disk out for a third, parity rebuild takes place and array is happy again. Turn on minimum Dockers to keep family happy. Next day, disk4 is again in an error state. No errors in SMART. Check xfs disk, no errors. Run xfs fix anyway again just in case. Nothing done. Replace disk with original disk. Parity rebuild takes place, arrays says it’s happy. Leave all Dockers off. I looked at the ‘failed’ disks on another laptop and still find no errors on them. I’ve run parity read-check to make sure everything agrees. This morning. I get another message that Disk4 is in error state. Logs show read & write errors on disk4 about the time the error message got sent about the array state: ``` .... Feb 14 21:27:16 Tower kernel: md: disk4 read error, sector=1381277744 Feb 14 21:27:16 Tower kernel: md: disk4 read error, sector=1381277752 Feb 14 21:27:16 Tower kernel: md: disk4 read error, sector=1381277760 Feb 14 21:27:16 Tower kernel: md: disk4 read error, sector=1381277768 Feb 14 21:27:16 Tower kernel: md: disk4 read error, sector=1381277776 Feb 14 21:27:16 Tower kernel: md: disk4 read error, sector=1381277784 ... Feb 14 21:27:26 Tower kernel: md: disk4 write error, sector=1381277744 Feb 14 21:27:26 Tower kernel: md: disk4 write error, sector=1381277752 Feb 14 21:27:26 Tower kernel: md: disk4 write error, sector=1381277760 Feb 14 21:27:26 Tower kernel: md: disk4 write error, sector=1381277768 Feb 14 21:27:26 Tower kernel: md: disk4 write error, sector=1381277776 Feb 14 21:27:26 Tower kernel: md: disk4 write error, sector=1381277784 ``` Can anyone suggest something that could explain the repeated failure of different disks in the same allocation? Did the migration process do something that is causing a conflict? Is it something in the current version? Been running unRAID since about version 4 and prior to this all I’ve had is the odd disk failure - something that has been easy to handle. I’ve spent more time dealing with issues in the last 2 weeks than I have in the previous blah years. Now completely out of my depth with an array that no longer works. Update: Hard drives are all 4Tb in size. Original disk4 was WD Red, replaced by Seagate Baracuda, then by Seagate IronWolf. tmux installed via NerdTools tower-diagnostics-20240215-0819.zip
  5. @yayitazale Just installed the getting started parrot model detection routine onto my Mac and it runs fine, so it looks like the TPU hasn't failed. https://coral.ai/docs/accelerator/get-started/
  6. @yayitazale Power shouldn't be an issue, it's been running fine up until now. No other devices on the USB except the OS - and that is direct onto the motherboard. Neither USB2 or USB3 make any difference to this. I guess it's an issue internal to the docker, rather than unRAID -> docker. Thanks for your help. I'll chase Blake via GitHub/HomeAssistant.
  7. @yayitazale Thanks for that. Changing to privileged made no difference, neither did amending the USB path to use the ' /dev/bus/usb' path. The docker still fails with a python error Fatal Python error: Aborted and this is generally proceeded by this line: F :1150] HandleQueuedBulkIn transfer in failed. Unknown: USB transfer error 1 [LibUsbDataInCallback] The system tells me that the TPU has been detected: detector.coral INFO : Starting detection process: 36 frigate.app INFO : Camera processor started for drive: 39 frigate.app INFO : Capture process started for drive: 40 frigate.edgetpu INFO : Attempting to load TPU as usb frigate.edgetpu INFO : TPU found I've also changed the port that the TPU is plugged into and checked that the LED on it is on.
  8. So I've had Frigate running on my unRAID for ages without any issues. Home Assistant on a separate box was connected and working as expected. System is happily using a Corel USB adapter and a single camera. At some point last week, I noticed that the container had stopped and was erroring on re-start. I've made sure that the correct USB is selected for the Corel device, that no GPU has snuck back into the settings and that there's space available. I've now completely wiped the image and settings and started again, but I am still getting the same situation. Frigate starts up, runs for a few minutes and then aborts and shuts down with a python error. Anyone advise? Frigate.log config.yml
  9. Just an update before I mark this issue as solved. It looks like the issue is a failing Netgear GS608 switch in the study next to the unRAID server. Actually to be 100% accurate, it looks like one or two of the ports is starting to fail, but that means it all goes! A long evening of testing every link in the data path indicated that the issue was with a switch in my loft patch panel. Spending most of yesterday swapping out this switch (four times!) still didn't solve the issue. Eventually I went back and tested every stage again, and this time the issue looked like it was the study switch. Swapping the study switch for another Netgear (GS105 this time) meant everything started to work. I'm planning to do some more testing tonight, but I will mark this as closed for now. Thank you for all your help & suggestions. I might find some time to actually watch my content now rather than just swear at a stuttering stream
  10. Just a general update, I reflashed the BIOS last night and I've not yet had time to see if that made any difference. I've also used a new mini LAN tester to check the cables and they all pass - no shorts, crosses or faults reported. I'm also planning on sticking a software LAN speed tester on each end of the data path to see what it says, but that will have to wait until the weekend as I can't run up and down stairs all night - it'll wake the kids and then I'll be in even more trouble Thanks for all the help and suggestions. I'll update once I've had a chance to do some more testing.
  11. @Russ all the cables are Cat5E or Cat6. I also bought a mini LAN tester and none of them show faults. In fact I actually binned all the older cat5 cables as part of the checking, just to prevent one accidentally ending up somewhere it shouldn't in the future The CPU is an i5 backed with 8Gb RAM, and it's not shown any sign of straining. Rolling back to v5 on this box had no effect on the issue either.
  12. @RobJ glad to know that I didn't miss anything Thanks for looking though.
  13. @Ashman70 - The bits that changed were the unRAID server and the Gb switch at the other end of the cable in the AV rack. I've put the old switch back in the rack and it hasn't made any difference. @Russ - I've tried playing back SD content from each drive through Kodi, but even that failed. In fact the situation appears to be worsening :'( the Sonos now reports that it can't play back the music from unRAID as 'there isn't enough bandwidth to maintain a buffer'. Most of the day has been spent swapping out switches and checking cable types. I'm leaning towards either the motherboard BIOS is doing something to limit output (and that is going to be painful to figure out) or physical damage to the cables in the walls. I add that last one, but every switch reports that its got a gig connection (both lights lit or light colour etc etc). Swapping to lower capability cables does change the reported state on the switch - just in case you ask I think I'm going to try the drive speed test that @Russ suggested, but the next course of action is to acquire some proper network testing kit and see what that says about the entire data path. If that passes with full GB capability, the BIOS must be the answer.... Assuming the diagnostics posted above don't show something I've missed.
  14. @ashman70. The unRAID box is connected to an eight port Netgear GB switch. Both my desktop and a Mac laptop suffer the same issue when connected directly to this switch. I've tried Kodi and VLC on both. The drives are split between the on-board SATA and a Supermicro AOC-SASLP-MV8 card. The card was the same one used in my previous build, but I have also tried swapping it out for a different on of the same type.
  15. As requested - here's the diagnostic dump. I can't see anything that indicates an issue.... Any opinions appreciated tower-diagnostics-20151002-2354.zip
  16. @RobJ Thanks - I'll get them out ASAP - just need to upgrade back to v6 tonight.
  17. I used to run unRAID Plus v5 on a 780G AMD board with 2Gb RAM and a single Supermicro AOC-SASLP-MV8 board. Everything ran fine and the setup was perfectly able to serve BR rips across the network without buffering issues. A recent gift of a MSI Z87-G45 motherboard complete with i5 CPU and 8Gb RAM combined with the demise of the current board saw me migrating the system, including upgrading to v6, across into a new chassis. Since then, we have had nothing but trouble with video buffering issues. Initial issues with the on-board KillerNIC not working at all were resolved by disabling in in the BIOS and adding a dual port Intel Gb card. On the unRAID dashboard the card never gave speeds over 100Mbps. The local switch also showed the network speed running at 100Mbps. My initial thought was that I had used the wrong cable between the switch and the server when connecting it. So I went through and ensured that every cable was Cat5e rated or higher at all points. The dashboard and switch still reported 100Mbps and the buffering still happened. Multiple restarts with cables in various states of connection did nothing to change the speeds. Assuming that the dual-port NIC was the cause of the issue, I replaced the card with an Intel 82573L PCI-Ex1 card. Both switch and dashboard now happily report a 1Gbps connection, but the buffering still occurs on playback. I’m running Plex, Maria and SB/CP/SAB as Docker containers, but turning these off makes no difference. CPU, network, IO and memory are all running low - there are no periods of sustained activity. The only other plugin I am running is the Docker container search tool - everything else is default. Buffering occurs on both of our Kodi (15.2) boxes, OpenELEC (Kodi 14), VLC on Win10 & OSX 10.9, as well as on a Roku 3 using the Plex client. In desperation, last night I rolled back to v5 and the same thing still occurs serving files from that version as well. The remainder of the network hasn’t changed, and the end point devices are still the same hardware that used to play these files without issue. While the Kodi boxes have been updated from 14 to 15.2 at various points, the fact that buffering occurs on VLC as well indicates to me that Kodi isn’t the problem. As far as I can tell, it’s not the network - the same buffering occurs no matter where I connect a playback device. unRAID was up to date (v6.1.3). No discs are showing errors on the dashboard, although I have had a ‘Warning [TOWER] - offline uncorrectable is 1’ message yesterday against my cache drive. I’ve not got any logs available - I’m posting this from work, but I can get tonight if required. The family are getting VERY upset that things are unavailable/unwatchable. Does anyone have any suggestions about other things I can try?
  18. unRAID 5.0.6 Plus license on Lexar Firefly USB stick MSI P7N SLI Platinum MB Supermicro AOC-SASLP-MV8 SAS Card As we are approaching the capacity of our current unRAID server I decided that it was time to increase both physical capacity (bigger case for more drives) and CPU. So I managed to get hold of a HAF Stacker case and a pre-populated MSI P7N SLI Platinum MB. The current working set up is a Gigabyte GA-MA78GPM-DS2H running an AMD AM2+ processor and a single SuperMicro SAS card (AOC-SASLP-MV8) running 6 disks on a Plus license. The new MSI board boots fine from the USB key when the SAS card isn't installed. As soon as the SAS card is in place all I get is a black screen with a single blinking "_" cursor. I don't even get an error requesting that I add a suitable boot drive… just a blinking cursor. It makes no difference to the number of drives connected either to the SAS or the MB. As soon as the SAS is plugged in, booting fails. Other versions of the same SAS card do same thing and the card is working - the correct HDD show up in the card BIOS. I'm at work, but I think the SAS card is still version .15 - I've never updated it. I've been through the Lime-Tech as well as various other linux forums trying to solve this issue without any success. I'm hoping that someone here might have other suggestions to help me solve this. I've tried the following - all result in the same blinking cursor: The correct (named) USB device shows up in the BIOS and is set to be the primary boot device. USB device is set to be the ONLY boot device - disabled all other drives Tried telling the system NOT to boot from any other device F11 Boot menu & select the USB stick from the options The USB stick boots fine in an alternative PC. Checksum passes with no errors in a Windows machine. Make bootable re-done and still boots without SAS or in another machine. I've also done a fresh Install of unRAID 5.0.6 on another (known to work) USB key with same results. SAS card has the 'INT13' option disabled (didn't previously on my old Gigabyte board, but is now). RAID mode/option is set to be JBOD on the card BIOS. SAS card tried in both PCI-e x8 slots. The HDD attached via the SAS card all show up in the MB BIOS - I have also tried booting with them listed (but low priority) and with them disabled via the MB BIOS boot list Tried limiting the visible/usable devices in the boot list by disabling them via MB bios. Turned off IDE drives & e-SATA drives - even though nothing connected. I've also disabled on-board audio and 1384 ports. If I use a HDD that has a working OS on it (OpenELEC), I can boot. That is true if the HDD is connected via the MB or the SAS card. I've also reset the MB BIOS back to the default just in case I've done something stupid in the early hours while trying to solve this. Going quietly insane here, every suggestion I can find on the forum seems to indicate that the issue either lies with the SAS card 'INT13' boot hijack or that the USB itself isn't bootable, but nothing suggested has worked for me. I'm starting to suspect that the MSI board just doesn't want to play nice with the SAS card….. Any other suggestions before I try to offload the MSI board and have to explain to the finance committee why I needed to spend stupid cash on a new motherboard, RAM & CPU….
  19. burtjr I'd tried running both with earlier versions of simpleFeatures and with it completely removed. The issue was caused by a file system error on the actual flash drive that I fixed by running Windows CheckDisk. Regards.
  20. Ok, so a few additional issues started to pop up, the biggest being that I was unable to write to the flash disk when it was removed and put into an Ubuntu desktop. After wasting some time doing the checks against the disk file systems with reiserfsck --check /dev/md* and getting no errors, I realised that the issue was with the FLASH drive, not the disks..... Other posts suggested removing the flash drive and running checkdisk under Windows to fix errors. This I did... and suddenly after a reboot everything now works.
  21. I was running 5.0-rc10a on a small hp desktop as well as running several simpleFeatures (v1.0.5) packages for stats. I was able to access the shares by both AFP (Time Machine) and by SMB (media folders) on various machines. After upgrading to 5.0-rc12a and installing new versions of simpleFeatures (v1.0.11) I can no longer access SMB shares from either my Mac or from my Linux box. I've done the following things without any success to fix the issue: Cleared OSX keychain. Rebooted Mac. Changed permissions on shares to be 'public'. Rebooted. Removed simpleFeatures 1.011, rebooted, installed simpleFeatures 1.0.5, rebooted. Removed simpleFeatures completely & rebooted Rolled back to 5.0-10a & rebooted Re-installed 5.0-rc12a & rebooted. In all cases I can still use the AFP/Time Machine to backup my Mac, but attempts to access smb shares all fail, even though I can see 'Tower2-SMB' listed when I browse the network. Is OSX I get a connection failed and a pop up telling me to contact my system administrator (see attached). Under Linux I get a 'Unable to mount location. Failed to retrieve share list from server' pop up (see attached). In both scenarios I do NOT get asked for credentials. Sys log on the Tower doesn't tell me anything... I occasionally get "Tower2 avahi-daemon[9078]: Invalid response packet from host 192.168.0.149." messages, but nothing I can tie back to my attempts to login to the shares. Hardware is an old HP desktop with 3 SATA HDD (1x2Tb parity, 1x1.5Tb & 1x500Gb for data), simple on-board graphics, 2Gb RAM, 10/100/1000 Network card. Any suggestions, ideas?
×
×
  • Create New...