Unable to get a successful boot


Recommended Posts

Hey all. Been running unraid for like 1-2 years now.. anyway got some issues.

 

Ill go thru the last few things before this happened... may be helpful 😕

 

1) 4tb drive started clicking, removed all files using unbalance plugin, verified all was moved, removed drive, new cfg, good to go.

(a week goes by)

2) decided hey, i've got some extra nvme's laying around, lets increase my already large cache array from 768gb to 1tb

3) 3 hours later running to unraid forums for help....

 

I had some issues with my board booting, not related to unraid at all, its picky... figured those out

 

Then it would finally post, 5nvmes via pciE x4, and my 12 or so array drives but it just 'gets stuck' as loading service daemon syslogd which is like... RIGHT before unraid asks for a consol login (which I never use)

 

Did a quick forum search, found it may be my USB, so stuck it into windows, and it popped up without asking for check disk or saying it had errors...

 

At this point, im not sure what to do...

 

Within the last 12 months, i've already replaced a flash drive, so AFAIK I can't get another one until the 12 month mark? (5/18 if memory serves)

 

Not sure if I was running 6.6.6 or 6.6.7... last few weeks have been a blur at work

 

As of right now, i've got whatever diagnostics were on the flash drive, logs should be included in those... i hope!

 

 

Please halp!

 

System: dual xeon 2670's v2's on ASRock - EP2C602-4L/D16

64gb ecc registered 1600mhz @ like cl13 or something

like 9 or so 6x tb toshiba x300's

2x samsung 970 250s evo's via pcie x4

3x samsung 970 500s evo's via pcie x4

 

Based on unraid/logs 'date modified' file, i've uploaded the latest diagnostics .zip that i've got. 

pic of usb (dunno if it helps)

http://prntscr.com/mvkg3f

rebnet-diagnostics-20190213-0218.zip

Edited by tbonedude420
Link to comment

So, at this point i've gotten unraid to boot into guimode, no ip address, bond0 fatal errors, and can't mount some usb lib/module folder or something. I think this flash-drive might be toast.

 

Going to email limetech, link to this forum and see if there's something else i'm missing and or if the USB may be the problem. At this point i'm okay with redoing all my dockers, and since im changing cache around anyway, it would probably need to happen. I just don't want to lose the array, and I don't think that I will. 

 

-Grabbed latest diagnostic package too

rebnet-diagnostics-20190310-1248.zip

 

Can I delete super.dat to force a hardware config reset? or is there a better way?

 

If/when it shows up on the network, the webui becomes unresponsive... (which is why I tried guimode)

 

Not sure what to do here 😕 

Side note: but kind of related... any good flash drive brands/amazon links you recommend? I know 24/7 use is asking alot of a flash drive, so what are some of the other unraid users using?

 

EDIT:

 

Cannot find device "bond0"

cat: write error: broken pipe

modprobe: FATAL: Module bonding not found in directory /lib/modules/4.18.20-unRAID

 

I took some pics on my phone, will edit and upload links.

 

-pic1 http://prntscr.com/mw44es

-pic2 http://prntscr.com/mw44jj

-pic3 http://prntscr.com/mw44n2

 

If I let it sit long enough, it sometimes asks for a login, but the ipv4 address is private (169.254.105.189)

At least once, it grabbed my normal ip from the switch/router I guess, but the webui was completely unresponsive

Edited by tbonedude420
Diagnostics/typos
Link to comment

Update:

 

Waiting on email from limetech. In the meantime I completly tore down the system, repasted heatsink/fans, used 2 cans of air, did some cable management, and found a possible IRQ error relating to having 5x pciE x4 nvme adapters, so as I mentioned above, i've dropped this back to 3 adapters. Should be more than enough for what i'm doing.

 

I later popped the USB into windows, verified for a second time the label is indeed UNRAID, and also checked it for errors. It said it found some, clicked the fix button and it all seems to be okay? At least as far as windows is concerned. 

 

Obviously I still have my doubts. Will report back if anything updates.

Link to comment

3/12 Update:

Got an email from limetech, got a new flash drive.

 

First try formatted, called unraid, copied over stuff, tried to boot. Several errors still (log did not write though.. not sure why) including "unraid label not found" and bond0 missing.

 

Copied prokey to my desktop, and now im using the unraid usb creation tool, and making a fresh new copy.

 

Will edit this post with any updates, and or logs. 

 

:Noon update. No changes... Still no boot.. still private ip... still bond0 errors... still unraid label missing....

da heck is going on here?!

As a bit of a last straw, gonna bring the server back down to the basement, plug everything in like normal, and pray to the computer gods. This time with a fresh unraid... to see if it helps..

 

No hardware failures that I can tell, using the same usb slot I can load other os's (live linux), I did a quick test install on a spare ssd with windows... all seems to work including 4 network ports and 1 mobo network for remote bios thingie some asrock rack server control panel thing. Windows also sees both cpu's and all 64 gigs, cinebench no problem... temps are fine... Im really losing my mind here. 

Edited by tbonedude420
Updates,typos
Link to comment

Nvme temps are normal to hit 50-70 under load, and they have an internal controller from what I understand that will throttle performance. Array drives from what i've seen are 30-40, average in around the middle... for spinners this should be well within range too. 😕

 

Bios sees all the drives, looking for a windows app that can read xfs or w/e the default unraid scheme is, as long as I can get to the data, im not really too worried. It's all replaceable sure, but time is money... ya know? I will check, but I do think i've got the newest bios. Either way thats something I can check, so thanks for that!

 

https://www.toshiba-storage.com/products/toshiba-internal-hard-drives-x300/?pdf

5-60c for x300 spinners

so if those letters match my spinners, then it's something I will address, but i've not noticed it before on array drives, only NVME. 

Edit: Checked the logs under smart, one at 58, one at 60, one at 65... Im pretty sure those 3 are in the same row, and I know of a fan failure for that row... something I can address. Useful yes, but again, its unraid that isn't booting. Hard Drive health comes down the line. And at this point, I need to start thinking of parity drives, and larger spinners. Im running out of sata/sas ports

Edited by tbonedude420
update, link
Link to comment

I do not know if there is a Windows application to read the content of a XFS formatted drive, but you can simply download one of the many "live" Linux distros, bur it on a CD or USB drive, boot your windows computer from it, it will see the XFS partition and then you can copy the data to some place, even to your NTFS formatted disks if you have enough space there....

 

Your BIOS is 1.80, there is 1.90 which "improves the system performance"????

 

And regarding the high temperatures I am talking about your mechanical hard drives:

For example - the 4TB Seagate (sdb) has the following:

 

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    24 Celsius
Power Cycle Min/Max Temperature:     17/25 Celsius
Lifetime    Min/Max Temperature:     16/66 Celsius

 

Seagate own specs calls for (just below in the SMART report)

 

SCT Temperature History Version:     2
Temperature Sampling Period:         3 minutes
Temperature Logging Interval:        59 minutes
Min/Max recommended Temperature:     14/55 Celsius
Min/Max Temperature Limit:           10/60 Celsius

Temperature History Size (Index):    128 (50)

 

The same with the other Seagate and three of the Toshibas (which BTW have 55 degC  recommended/max temperature)

 

For example the "new" disk Toshiba (sde - from your latest log) has

Current Temperature:                    28 Celsius
Power Cycle Min/Max Temperature:     15/29 Celsius
Lifetime    Min/Max Temperature:     13/65 Celsius
Under/Over Temperature Limit Count:   0/2348

 

Toshiba sampling/logging is 1 min and you have 2348 counts of over-temperature - this is at least 39 hours+  running at very high temperature in the past.

 

Perhaps you had a bad fan or a disconnected cable for the fan....

 

Link to comment
3 minutes ago, bcbgboy13 said:

I do not know if there is a Windows application to read the content of a XFS formatted drive, but you can simply download one of the many "live" Linux distros, bur it on a CD or USB drive, boot your windows computer from it, it will see the XFS partition and then you can copy the data to some place, even to your NTFS formatted disks if you have enough space there....

 

Your BIOS is 1.80, there is 1.90 which "improves the system performance"????

 

And regarding the high temperatures I am talking about your mechanical hard drives:

For example - the 4TB Seagate (sdb) has the following:

 

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    24 Celsius
Power Cycle Min/Max Temperature:     17/25 Celsius
Lifetime    Min/Max Temperature:     16/66 Celsius

 

Seagate own specs calls for (just below in the SMART report)

 

SCT Temperature History Version:     2
Temperature Sampling Period:         3 minutes
Temperature Logging Interval:        59 minutes
Min/Max recommended Temperature:     14/55 Celsius
Min/Max Temperature Limit:           10/60 Celsius

Temperature History Size (Index):    128 (50)

 

The same with the other Seagate and three of the Toshibas (which BTW have 55 degC  recommended/max temperature)

 

For example the "new" disk Toshiba (sde - from your latest log) has

Current Temperature:                    28 Celsius
Power Cycle Min/Max Temperature:     15/29 Celsius
Lifetime    Min/Max Temperature:     13/65 Celsius
Under/Over Temperature Limit Count:   0/2348

 

Toshiba sampling/logging is 1 min and you have 2348 counts of over-temperature - this is at least 39 hours+  running at very high temperature in the past.

 

Perhaps you had a bad fan or a disconnected cable for the fan....

 

Both 4TB's are gone. And I know of an issue with at least one of my fans. 

I will do a bios update after lunch. But a bios update is normally for stability. And or compatibility, security...ect.

 

Unraid it self is not booting. Its throwing errors left and right, the usb wont mount, it complains about a missing label, and bond0 has 'fatal errors'. Booting live linux, and testing w10 (both using that same usb port, to at least rule that out) shows all hardware working normal. Short of memory/cpu, theres not much that should make unraid not boot. These are issues, I agree, and thank you for pointing them out, but the main problem of the unraid OS it self not booting, is still the priority here. At this point I can disconnect the entire array, and cache drives... UNRAID STILL WONT BOOT.....

 

This thread talks of ways to view the array in windows. I've done this at least once before, but I believe it was for butterfs, not xfs. Either way. Tools do exist. 

Link to comment

So it's basically a week later. I've received virtually no help for what is seemingly and obviously an unraid problem. The hardware works for windows, other linux's, and all my drives are seen in the bios.

 

I've tried all 3 usb ports on this board. I added in a usb3 pcie card, and tried from that, same results.

 

I have tried a new flash drive. I removed all array and cache drives, and still it wont boot unraid properly. 

 

No one has addressed the actual errors in my log file. 

 

A whopping 2 community members have attempted to come in and offer help, but not for the issues i'm actually having...

 

I will say, something good came out of wasting 129$ on a basically niche product. I've learned linux, to a fair extent. So much so that I've realized unraid is basically a fancy web-front end linux kernel with ease-of-use as the number 1 selling point. 

 

If anyone wants to actually attempt to help, my email will alert me, but other than that it's time to say goodbye. I wish this wasn't the case, because unraid is so simple and beautiful, but im being forced by lack of support. The part that really made me turn away is simply looking at the support forum. There is 10's if not hundreds of other users with very similar, if not the same issue. 6.6.7 has webui crashes, hardlocks, drives disappearing, and surprise surprise, ZERO responses on their threads. I guess I should feel lucky, at least other community members had the heart to respond, but tough luck for everyone else, right?

 

At least unraid isn't apple. XFS is pretty common, and all of my data is back and on the network currently just using a live distro with plex baked into it. I do lose out on a few things, at least until I figure out how to recreate them, such as a second array for caching to my main.

 

But it just goes to show its NOT hardware related, and it IS an unraid only bug at the moment. 

Link to comment

You have a disk (TOSHIBA HDWE160, X743KCAJF56D) that is continually failing to respond (I suspect this means it has failed) and hanging the system for a short period every time this happens.    Removing this might well allow the system startup to complete starting Services as expected. 

 

The title of the thread is slightly mid-leading.     The system is actually booting fine.    It is trying to bring all the disks online to start up the services that is failing.

Link to comment
2 minutes ago, itimpi said:

You have a disk (TOSHIBA HDWE160, X743KCAJF56D) that is continually failing to respond (I suspect this means it has failed) and hanging the system for a short period every time this happens.    Removing this might well allow the system startup to complete starting Services as expected. 

 

The title of the thread is slightly mid-leading.     The system is actually booting fine.    It is trying to bring all the disks online to start up the services that is failing.

So how come bios, windows, and linux live disks all see them? ... regardless why would a disk failure cause unraid to not boot and cause errors that halt the rest of the boot process?

 

bond0 missing. this is network, has nothing to due with a missing or failed array drive.

cannot mount usb/lib/modules/unraidxxxx

unraid label missing whatever this even means

private IP address, again, network related, how does a drive interfere?

Link to comment

JPEG_20190314_000242.jpg

After 5 minutes or so

JPEG_20190314_000906.jpg

bond0 fatal error. private ip address. unraid not accessible over a network (which is the way its intended to be used) so im not sure how my title is misleading. 

 

If this was a remote scenario in a commercial space, or data center, this would be called 'a non booting system'. 

EDIT: (These 2 pics were taken after removing the aforementioned drive. )

 (TOSHIBA HDWE160, X743KCAJF56D) 

As well as on the new flash drive (as seen by asking for a Tower login, instead of Rebnet)

 

Edited by tbonedude420
Link to comment
3 hours ago, tbonedude420 said:

JPEG_20190314_000242.jpg

After 5 minutes or so

JPEG_20190314_000906.jpg

bond0 fatal error. private ip address. unraid not accessible over a network (which is the way its intended to be used) so im not sure how my title is misleading. 

 

If this was a remote scenario in a commercial space, or data center, this would be called 'a non booting system'. 

EDIT: (These 2 pics were taken after removing the aforementioned drive. )

 (TOSHIBA HDWE160, X743KCAJF56D) 

As well as on the new flash drive (as seen by asking for a Tower login, instead of Rebnet)

 

That particular symptom is most commonly caused by the flash drive not being seen for the second stage of the booting process.

 

A quick way to check is to log in at the console prompt and then use the ‘df’ command.    If I am right then you will not see anything mounted as /boot (which is where the flash drive gets mounted).   Once the flash drive does not get mounted then all network related modules and your configuration information fail to load which give symptoms like your screen shots show.   Are you using a USB2 flash drive and/or a USB2 port on the Unraid server.    What model of flash drive?  Just asking as USB2 seems to be much more reliable on some systems than USB3 particularly during the booting phases., and Unraid gains no performance advantage from USB3 as it runs from RAM.

 

I had originally assumed that since since you had successfully obtained diagnostics files you were successfully booting (at least that was what the diagnostics files showed).

Link to comment

The first flash drive is an 8gb lexar. USB2. The new flash drive is a 16gb cruizer sandisk. Also usb2.

The server has 3 usb 2 ports, 2 on the back, one inside the case directly to the mobo.

I've tried all 3 ports. I've tried both old and brand new out of the package flash drives.

I added in a 2port usb3 card via pciE. Tried both those ports on both drives.

 

Moreover, the system was running for months on end without errors... The brand new flash drive, with brand new unraid from the unraid creation tool gives the exact same error.

 

Ive attempted to rule out ports in various was as described above, as well as loading live linux and installing windows 10.

 

At work for the next few hours, but I will gladly take pics of both drives, including any other info that may be helpful. 

Link to comment

Have you tried the test of running the ‘df’ command to see if you have the /boot mount point?   If not then the issue is almost certainly USB or flash drive related.    However it is not all clear why as you seem to have taken all the regular steps for resolving this type of issue.

 

have you by any chance tried creating the flash drive using a different machine?   It is always possible the problem is at the creation end and that is something it may not have occurred to you try!

Link to comment

I can appreciate the response Johnnie.Black...but it seems a bit of a cop-out. The system has been working for the last 10 months solid, without issue on the same flash drive.... all of a sudden one day, it stops working?

 

A brand new flash drive, from a different company, same results?

 

@itimpi why would another system make a difference? This is the same system (my gaming rig) used to make the drive the first time. Let alone different brand flash drives..

 

Also, as I stated in previous posts i've tried manual, and tool creation for the USB's.... 

 

I can rule out hardware issues, as i've said i installed w10 on a spare SSD with array and cache removed. Also booted live linux mint. all 4 network ports, and BMC port work on both. I even used the same normal (mobo slot) usb slot to install/run both... 

 

At this point, the only relevant factors I can come up with, revolved around the 6.6.7 update. 

Moreover, after the update i've had several of the same issues hundreds of others are having. Webui frozen and unresponsive. 

 

Sandisk (brand new, from bestbuy) Cruizer model?

http://prntscr.com/mxzvii

 

Lexar (older one, possible bad) V10 model?

http://prntscr.com/mxzwaj

 

At this point, without unraid booting, the flash drive isnt generating the diagnostic files. The last known file(s) I already uploaded, are a week old now. If anything is happening.. I dont know... On average from power-off, it would take about 5 minutes to boot... 4 of those minutes were array startup, bios, and ecc memory. let alone being a dual cpu board. 

 

Now, if I let it sit for upwards of 20 minutes, unraid it self I guess evantually boots, but not over the web since bond0 is failing. If theres a list of commands or something I can run from console/gui please let me know.

 

(the above pics are from google, waiting for my phone to charge... used it all day at work)

 

EDIT: Pics

20190314_171019.jpg

20190314_171009.jpg

The third one is a random verbatim drive. I use this all the time for live installs and other.

Also, I said 8gb lexar, turns out its 16gb. Other then that, those are the actual drives.

Specifically, this is the sandisk drive that I purchased from bestbuy, exact link.

https://www.bestbuy.com/site/sandisk-cruzer-16gb-usb-2-0-flash-drive-black/9226875.p?skuId=9226875

 

Edited by tbonedude420
updated, pics, links
Link to comment
13 hours ago, tbonedude420 said:

I can appreciate the response Johnnie.Black...but it seems a bit of a cop-out. The system has been working for the last 10 months solid, without issue on the same flash drive.... all of a sudden one day, it stops working?

Everything that malfunctions was working well before that, either way no one can help with this but yourself, you can try booting the Unraid flashdrive in a different system, like mentioned it can be the flash drive, it can be the server, in can be a combination of both, but the problem is because the flash drive isn't booting correctly.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.