Sparkum Posted February 9, 2017 Share Posted February 9, 2017 Really hoping someone can tell me what to do, Everytime, since day one that I want to reboot my unraid server it turns into an hour+ job on rebooting, having unassigned disks, rebooting etc until they are all back up. For example I went to upgade unraid last night, took me 90 minutes and the update actually failed, had to roll back. I would say I rebooted 15+ times Its definately slowly making me hate unraid when a simple reboot is actually 5+ reboots and typically having to write "shutdown -r now" into terminal atleast once. Please! Let me know what yo need from me or want me to do and I will do it! thanks Quote Link to comment
JonathanM Posted February 9, 2017 Share Posted February 9, 2017 Diagnostics zip file from both unsuccessful and successful boots. Hardware list, MB, HBA's, Drives, PSU, etc. Quote Link to comment
JorgeB Posted February 9, 2017 Share Posted February 9, 2017 If you're using 6.3.0 update to 6.3.1 as there's a bug with slot assignments not sticking. Quote Link to comment
Sparkum Posted February 9, 2017 Author Share Posted February 9, 2017 Currently 6.2.4 I TRIED to upgrade last night, but I couldnt get GUI even once, so I downgraded back. Quote Link to comment
Sparkum Posted February 9, 2017 Author Share Posted February 9, 2017 Attached Diagnostics. Sorry not sure how it works, I assume it just clumps is all together, esentially every reboot yesterday was a fail until the last one. And when the computer reboots I do see every single harddrive, they are just sitting in unassigned. Um.... I can definitely get more specific after work. ASRock mobo, I5, LSI flashed card, 16GB ram, 500W power supply, 11 harddrives, another 2 parity, and 2 cache drives, Mainly all 2TB, I think 3 1.5TB then 240GB Kingstons's for cache Then a 4GB Lexar flash drive. tower-diagnostics-20170209-1139.zip Quote Link to comment
John_M Posted February 9, 2017 Share Posted February 9, 2017 Your SAS card seems to be resetting repeatedly. I'm sure others can give you more advice but I'd check the cables between it and the drives and make sure it's seated in its slot properly. Quote Link to comment
trurl Posted February 9, 2017 Share Posted February 9, 2017 And in general, rebooting is seldom the solution for anything with unRAID. Basically, what you have been describing is a hardware problem that you need to address. Quote Link to comment
Sparkum Posted February 9, 2017 Author Share Posted February 9, 2017 Really? Even though every single harddrive is still connected and visible, I'll see them all populated under unassigned, I can see them when I drop down in the harddrive list. They are completely visible during boot, in BIOS etc, Just placed incorrectly in GUI Quote Link to comment
John_M Posted February 9, 2017 Share Posted February 9, 2017 Really! Take a look at your syslog. Quote Link to comment
Sparkum Posted February 9, 2017 Author Share Posted February 9, 2017 Hmm the log doesnt even cover the timeframe of the reboots, if I'm not mistaken. Quote Link to comment
John_M Posted February 9, 2017 Share Posted February 9, 2017 With respect, sir, you are the one who posted the log and asked for advice. Quote Link to comment
Sparkum Posted February 9, 2017 Author Share Posted February 9, 2017 ..... yep no got that.. I'm asking questions, making comments, initiating 2 way conversation. Not sure how its being taken but apparently not how its meant. My last comment was stating that I didn't provide nor do I apparently have any logs covering the time frame asked for, if that wasn't clear. My previous comment was me being surprised that Trurl said it was a hardware fault since I could see the hard drives pre unraid.... So ya, apparently this isnt going anywhere and I shouldn't bother asking for help. .... with respect. Quote Link to comment
trurl Posted February 9, 2017 Share Posted February 9, 2017 Logs do not cover reboots obviously. You already posted a log in your diagnostics. What do you think unRAID is doing differently when you reboot? The correct answer is nothing, since rebooting does not reconfigure anything about the software. That's why I said rebooting is seldom the solution. But rebooting does have an effect on your hardware and sometimes you get lucky and it works. That's why I said it was a hardware problem. Quote Link to comment
Sparkum Posted February 9, 2017 Author Share Posted February 9, 2017 Ya I see that about the logs now. Next time it happens I'll start grabbing logs (If there are of any help) Typically its just this panic trying to get it back up. Here's my question though, and I fully get what you are saying. So this has been happening for...4 months now roughly so I've done alot of googling since then. I read a post that suggested rebooting with IE. Now when I do this, I have a much greater change of everything coming back up. Just a dumb coincidence perhaps? Additionally I've used the ControlR app to reboot the server (just once) and that worked flawlessly, So I seem to get much different results depending on what I use. Quote Link to comment
trurl Posted February 9, 2017 Share Posted February 9, 2017 There have been some reports that some settings in the webUI are not saved correctly by some browsers. The solution has been to use a different browser to save the settings. Once the setting is saved it should work from then on until you change it. Also, the webUI doesn't always work well with adblockers, so if you are using one, whitelist your server. Quote Link to comment
Sparkum Posted February 9, 2017 Author Share Posted February 9, 2017 Definitely use adblocker on Firefox (browser of choice) I typically try to remember to reboot with IE though (as I have a much higher success rate with it) Could any of this be due to my USB? I definitely say my USB might not be the best. Just a Lexar I had lying around, it worked so I continued on with it. But if not then I wont worry about it. Quote Link to comment
HellDiverUK Posted February 9, 2017 Share Posted February 9, 2017 AsRock. Friends don't let friends buy AsRock boards. Quote Link to comment
Sparkum Posted February 9, 2017 Author Share Posted February 9, 2017 Haha, so far (minus this if this is mobo related) I'm a fan of it. Maybe I'll start paying more attantion to which drives. See if its a "always drive 3 5 6 9 kinda thing" I have a LSI and mobo connections and the 8 or so times I rebooted yesterday I saw the LSI cards on the spash screen come up for 100% everytime. So I'll def start pen and papering this and ya, maybe its the mobo and I just need a second LSI card. That would be the best $100 I've ever spent because I cringe when I have to do a reboot, and I literally set aside hours to do it. I was up until almost 2 am last night and had to be up for 6:30 because I needed to get it online before work. Quote Link to comment
John_M Posted February 11, 2017 Share Posted February 11, 2017 Did you make any progress? Really hoping someone can tell me what to do, Please! Let me know what yo need from me or want me to do and I will do it! Your SAS card seems to be resetting repeatedly. I'm sure others can give you more advice but I'd check the cables between it and the drives and make sure it's seated in its slot properly. syslog: Feb 9 02:56:58 Tower kernel: mpt2sas_cm0: fault_state(0x265d)! Feb 9 02:56:58 Tower kernel: mpt2sas_cm0: sending diag reset !! Feb 9 02:57:00 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS Feb 9 02:57:00 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Feb 9 02:57:00 Tower kernel: mpt2sas_cm0: log_info(0x30030100): originator(IOP), code(0x03), sub_code(0x0100) Feb 9 02:57:00 Tower kernel: mpt2sas_cm0: log_info(0x30030100): originator(IOP), code(0x03), sub_code(0x0100) Feb 9 02:57:00 Tower kernel: mpt2sas_cm0: LSISAS2008: FWVersion(02.15.63.00), ChipRevision(0x03), BiosVersion(07.01.09.00) Feb 9 02:57:00 Tower kernel: mpt2sas_cm0: Protocol=( Feb 9 02:57:00 Tower kernel: Initiator,Target Feb 9 02:57:00 Tower kernel: ), Capabilities=( Feb 9 02:57:00 Tower kernel: Raid,TLR Feb 9 02:57:00 Tower kernel: ,EEDP,Snapshot Buffer Feb 9 02:57:00 Tower kernel: ,Diag Trace Buffer,Task Set Full Feb 9 02:57:00 Tower kernel: ,NCQ<6>) Feb 9 02:57:00 Tower kernel: mpt2sas_cm0: sending port enable !! Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: port enable: SUCCESS Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: search for end-devices: start Feb 9 02:57:07 Tower kernel: scsi target1:0:1: handle(0x0009), sas_addr(0x4433221100000000) Feb 9 02:57:07 Tower kernel: scsi target1:0:1: enclosure logical id(0x5782bcb00a076a00), slot(7) Feb 9 02:57:07 Tower kernel: scsi target1:0:2: handle(0x000a), sas_addr(0x4433221103000000) Feb 9 02:57:07 Tower kernel: scsi target1:0:2: enclosure logical id(0x5782bcb00a076a00), slot(4) Feb 9 02:57:07 Tower kernel: scsi target1:0:4: handle(0x000b), sas_addr(0x4433221101000000) Feb 9 02:57:07 Tower kernel: scsi target1:0:4: enclosure logical id(0x5782bcb00a076a00), slot(6) Feb 9 02:57:07 Tower kernel: handle changed from(0x000c)!!! Feb 9 02:57:07 Tower kernel: scsi target1:0:5: handle(0x000c), sas_addr(0x4433221104000000) Feb 9 02:57:07 Tower kernel: scsi target1:0:5: enclosure logical id(0x5782bcb00a076a00), slot(3) Feb 9 02:57:07 Tower kernel: handle changed from(0x000d)!!! Feb 9 02:57:07 Tower kernel: scsi target1:0:6: handle(0x000d), sas_addr(0x4433221105000000) Feb 9 02:57:07 Tower kernel: scsi target1:0:6: enclosure logical id(0x5782bcb00a076a00), slot(2) Feb 9 02:57:07 Tower kernel: handle changed from(0x000e)!!! Feb 9 02:57:07 Tower kernel: scsi target1:0:7: handle(0x000e), sas_addr(0x4433221106000000) Feb 9 02:57:07 Tower kernel: scsi target1:0:7: enclosure logical id(0x5782bcb00a076a00), slot(1) Feb 9 02:57:07 Tower kernel: handle changed from(0x000f)!!! Feb 9 02:57:07 Tower kernel: scsi target1:0:0: handle(0x000f), sas_addr(0x4433221107000000) Feb 9 02:57:07 Tower kernel: scsi target1:0:0: enclosure logical id(0x5782bcb00a076a00), slot(0) Feb 9 02:57:07 Tower kernel: handle changed from(0x0010)!!! Feb 9 02:57:07 Tower kernel: scsi target1:0:3: handle(0x0010), sas_addr(0x4433221102000000) Feb 9 02:57:07 Tower kernel: scsi target1:0:3: enclosure logical id(0x5782bcb00a076a00), slot(5) Feb 9 02:57:07 Tower kernel: handle changed from(0x000b)!!! Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: search for end-devices: complete Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: search for raid volumes: start Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: search for responding raid volumes: complete Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: search for expanders: start Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: search for expanders: complete Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: _base_fault_reset_work: hard reset: success Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: removing unresponding devices: start Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: removing unresponding devices: end-devices Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: removing unresponding devices: volumes Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: removing unresponding devices: expanders Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: removing unresponding devices: complete Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: start Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: expanders start Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: break from expander scan: ioc_status(0x0022), loginfo(0x310f0400) Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: expanders complete Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: phys disk start Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: break from phys disk scan: ioc_status(0x0022), loginfo(0x00000000) Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: phys disk complete Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: volumes start Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: break from volume scan: ioc_status(0x0022), loginfo(0x00000000) Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: volumes complete Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: end devices start Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: break from end device scan: ioc_status(0x0022), loginfo(0x310f0400) Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: end devices complete Feb 9 02:57:07 Tower kernel: mpt2sas_cm0: scan devices: complete Feb 9 02:57:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Feb 9 02:57:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Feb 9 02:57:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Feb 9 02:57:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Feb 9 02:57:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO repeated 398 times in 7 hours. This isn't something that rebooting will fix. Also check backplane, if you have one, and power to drives. Quote Link to comment
Sparkum Posted February 13, 2017 Author Share Posted February 13, 2017 Hey. So just re-seeded SAS card, put into a different slot even, checked all wires/cords. Wrote down all serials of drives connected to the SAS card as well as all of them connected to the mobo. Currently in my reboot loop trying to get all drives to come back up and everytime they are mobo drives that arent coming up. Not once has it been a SAS card drive, so being as I have a second slot for another SAS card.....that might just be my easiest solution (really dumb though) I dunno, I'm also maybe thinking power? Might be time to go to a bigger size. EDIT: Spoke too soon, latest reboot contained 2 SAS card drives missing, Quote Link to comment
John_M Posted February 13, 2017 Share Posted February 13, 2017 You won't fix this by rebooting. Quote Link to comment
Sparkum Posted February 14, 2017 Author Share Posted February 14, 2017 Yes, I understand that. It does however give me a short term fix (until I reboot again) otherwise I just have a heavy paperweight. I was just stating my findings and my guesses on the problem. I stated I not only re-seeded my SAS card but additionally but it into another slot, all cords were checked and double checked and stated that my power supply "may" be underpowered. Additionally that both mobo and SAS card drives were dropping off. Quote Link to comment
itimpi Posted February 14, 2017 Share Posted February 14, 2017 I would have thought that the most likely thing to cause the symptoms you are having is a power supply that is under-rated and thus not capable of handling the max current when the system tres to spin up all drives simultaneously as is normal at power on. Reboots work because at that point in time some of the drives are probably still spinning so the total current required is less. What power supply do you have, and how many drives in the system? Quote Link to comment
SSD Posted February 14, 2017 Share Posted February 14, 2017 I would have thought that the most likely thing to cause the symptoms you are having is a power supply that is under-rated and thus not capable of handling the max current when the system tres to spin up all drives simultaneously as is normal at power on. Reboots work because at that point in time some of the drives are probably still spinning so the total current required is less. What power supply do you have, and how many drives in the system? Reasonable question. Seen stranger things caused by a flakey or underpowered psu (reported on the forum, not personally). I once had power issues and had to power on twice in rapid succession to get all drives spinning. Once all drives were spun up, everything was good. Doesn't feel like that, but can't rule it out. I wanted to answer the question about the USB. No, this is not related to the USB. Also concerning the swipe as ASRock. Lots of people here have ASRock MBs (me included) and I don't agree with the sentiment that they are poor quality. But regardless, MBs do go bad and I have had a few go bad over the years. They can definitely cause some funky symptoms. MB failures are unique - sometimes they just die. But when they flake out these are the kinds of things that can happen. I'd put this relatively high on the list, and subject to being ruled out. Starting the obvious, not s single other user is reporting the problems you are reporting. This is something unique to your system and your hardware. Could be an incompatibility. Intermittent connectivity (board not fully inserted, cabling loose in connection chain, etc.), bad motherboard, bad PSU, bad controller, or bad something else. I can't tell you what it is, but can tell you what I would do in your situation. I'd breakdown the server to its most basic setup. Remove addon controllers. One drive connected to one motherboard port. On your usb, make a backup of the config folder and then delete the file called super.dat from the real config folder. Boot unRaid. See if it comes up, reboots and does all the things it is supposed to do. I'd avoid writing to any disks, but probably would assign the disk to disk1 slot and you could start the "array". Once you've convinced yourself it is working well, power down. If this is working fine, repeatedly, this would reduce likelihood of a MB failure. This test is especially important - knowing you are on a solid foundation is critical to the isolation process. Once you are convinced, add your controller, and connect the one drive there. Repeat. I can't tell you the step by step progression, but you want to slowly and methodically add components/drives to the system until you know it is solid and then move on. I've done this quite a few times with my computers over the years and have had very good luck isolating weird issues. Occasionally, when I'm done, everything just worked. Means there was some connectivity issue that I fixed in the process. It is important that you not be in denial that this is something unique to your system and not some flakiness of unRaid itself. You need to be in that mindset to be an effective investigator to unravel the mystery. Good luck! Quote Link to comment
Sparkum Posted February 14, 2017 Author Share Posted February 14, 2017 Hey. Thanks all. Yes, I am definitely on board with the idea thats its me. (Now, I definately wasn't when I first got here) I'm learning towards power myself, its a Corsair 450 Gold (if memory serves) definately didnt have as many harddrives when I bought it but there's currently 14 spinning drives and 2 SSD connected Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.