Jump to content

USB issue, unraid 6.6.6


sturgismike

Recommended Posts

I just had a weird issue with my unraid.  All screens went white (though they still worked) and on checking logs, there were a ton of "can't find directory" errors regarding sda1 (the flash drive boot partition)  I attempted, and failed to restart.  No monitor available at the moment, so I couldn't see the console.   After shutting down and shifting the drive to a new usb port, everything seemes to be working great again.  Boot up was fine, no errors, no problems that I can see at all.  

 

My suspicion of course, is that the specific usb port has decided to blow chunks and die.  I fear that others may kick the bucket soon, so my question is..  Would it be reasonable to buy a usb card so that I stop using the onboard usb ports, on the assumption that it will solve the issue, or should I worry that a failed port might indicate a motherboard that is thinking about failing.  

 

The worry about choice 2 (the motherboard going bad) is not good. My finances are extremely tight, yet I want a decent amount of cores/ability to play with. (currently i'm on a HP z800 with dual 6core cpus, for a total of 24 threads.  Not the fastest, but it has worked really well for me)   Could anyone recommend an affordable motherboard/cpu combo that would be comparable or better?  By affordable I mean 500 or less including memory. (unless I can transfer my z800 memory over)  

 

I will probably take a plunge and just get a pci (or pcie, not sure) usb card and cross my fingers, unless you all think its likely that a failure is imminent.   

 

I suspect that these messages are a bad sign..

   Dec 27 16:07:30 Tower kernel: usb 7-2: reset low-speed USB device number 2 using uhci_hcd
   Dec 27 16:08:22 Tower kernel: usb 7-2: reset low-speed USB device number 2 using uhci_hcd

 

Device 2 is where my ups is hooked up.  

Link to comment

No usb 3 ports on this system, its old.  But that does tell me, if I buy an addon card I should go with usb2. 

 

I was also running a couple rPis using usb, as well as having a keyboard and mouse hooked up that never get used, so I have removed everything except the flash drive and the ups.

 

Stupid question..   Since I was getting the usb resets for the UPS, I redirected the messages to syslog (changing the setting to /usr/bin/logger)

Works good and keeps me from seeing the errors that I know are there.  Is it possible things went haywire because of log overload?  After things went wonky I tried a reboot (using unraid) and couldn't boot, but things worked again after a full shutdown and then reboot.

 

I've temporarily redirected those messages to /dev/null instead, so log overload shouldn't be an issue. (I'm hoping I did it right.  I set it to go to cat > /dev/null rather than /usr/bin/logger)  

 

As long as i'm here, is there a way to make the change stick between boots?  I'm not sure how to edit files on the actual flash so that changes stay changed.  

Link to comment

The OS isn't actually on flash like you might be thinking. The OS is in archives on flash, and those are unpacked fresh at each boot into RAM (sort of a clean install), and the OS runs completely in RAM. So there aren't any OS files you can edit on flash. And that also means that any of the usual linux OS folders is also in RAM, so you shouldn't store things there. Only /boot (the flash) and /mnt (the disks and user shares) are actual persistent storage and have any real capacity.

 

To get something to 'stick' between boots, you have to actually reapply it at each boot. The recommended way to do this is with the User Scripts plugin, which will let you run a script. It has options for when the script runs, including at boot time.

 

Of course, ideally, losing communication with the UPS isn't something to be /dev/null'ed. It is something that needs to be fixed. Fixing the connection with the flash is much more critical though.

Link to comment

I had looked up possibilities on the ups fix, basically what I read said that you could blacklist the hid (ok, so I don't remember exactly which driver it was at the moment) forcing things to run at 1.1 and it would get things working.  Unfortunately, the info I found referred to a module, and from what I understand, that  module is now part of the kernel, so the instructions I found to blacklist/disable the module no longer apply.  I do have some 1.1 ports, but it doesn't seem to matter if I plug into those, or a 2.0.    Functionally, the ups mostly works fine (I've tested it several times to make sure)  Setting to 30 seconds on UPS triggering a shutdown DOES create false triggers.  Since the port reset never takes much more than 30 seconds, I have it set to 60 and have had no issues since. Other than the incredibly annoying message that I don't know how to get rid of.  

 

As for the flash, this is the first time I've had an issue with it.  The system has pretty much been a rock since I got it going.   The reason I wonder about the way I was re-directing to syslog is that it is a a new thing I've been doing, and the system had been up since the 6.6.6 upgrade.  Possibly long enough to cause an issue?  

 

Honestly, I'd LOVE to figure out how to fix the ups issue, but I just haven't been able to find an answer that I can make work.  

 

I'll see if I can re-locate the answer I found, (that I can't make work) and post it here on the off chance someone has an idea.  

Link to comment

Looks like my ups is on 0000:00:1d.1 , and I can unbind it, but as expected, unbinding it makes the ups stop showing up at all.  Rebinding brings it back and it starts doing intermittent resets again. So, I'm stumped.  As for re-directing the error messages, while they don't show up in the unraid interface any more, they still show up in syslog, so if errors every 30 seconds or so is a bad thing, I do need to find a solution.  Worst case, I guess I can hook the ups to a different machine and use it to control shutdown.  I did just try to do this using a raspberry pi, but ended up with the same resets going on.   So now i'm wondering if I have a bad ups.   (I've tried multiple cables, so am pretty sure its not a cabling issue)   

Guess i'll hook it up to my windows box and see what happens.  

 

Thanks for all the input.  I do think I will start saving for a replacement server.   I'd like better performance anyway, so throwing money at a system that may be going flaky is probably not the best idea.   

 

Edit:

Managed to unbind using this command..

root@Tower:/sys/bus/pci/drivers/ehci-pci# echo -n "0000:00:1d.7" > unbind

The usb still resets, but it has gone from 1 every 20 to 30 seconds to around a minute + between resets.    

Still no clue how to actually FIX the issue.   Any suggestions welcome.  

0000:00:1d.7 is the controller in question.  The flash drive is on 0000:00:1a.7 so appears unaffected. 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...