Unraid wont boot.

zyrmpg · July 27, 2019

Hi guys,

Sad day for me today. Here's what happened in as much detail as I can remember:

I was moving files around when docker crashed. After some amateur fiddling, I ended up trying to do a safe restart via the webUI. It looked like the array did stop, but somewhere after the restart got caught on something and got the UI stuck. SSH access was available but almost any command would just hang. Here's what I got before I gave up and hard rebooted:

Via SSH Before Hard Reboot:

- htop: shfs ~50% cpu usage

- iotop: a few things were high io%. all from /usr/local/

- diagnostics: I dont know what happened to the zip

Current Status:

- server boots to unraid splash screen on attached monitor ( I set default gui boot a while back)

- I can log in with physically attached keyboard

- after login I get a black screen

- ping fails

I do have the boot drive. I found a diagnostics zip from yesterday, though I didnt run it. Theres some log files, I dont know which might be useful.

Can someone help me out?

unraid-diagnostics-20190726-1244.zip

Edited July 27, 2019 by zyrmpg

Frank1940 · July 28, 2019

Found this at the very end of your syslog:

Jul 26 12:44:21 Unraid kernel: print_req_error: critical medium error, dev sda, sector 1734816
Jul 26 12:44:21 Unraid kernel: sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jul 26 12:44:21 Unraid kernel: sd 0:0:0:0: [sda] tag#0 Sense Key : 0x3 [current] 
Jul 26 12:44:21 Unraid kernel: sd 0:0:0:0: [sda] tag#0 ASC=0x11 ASCQ=0x0 
Jul 26 12:44:21 Unraid kernel: sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 1a 78 a8 00 00 01 00
Jul 26 12:44:21 Unraid kernel: print_req_error: critical medium error, dev sda, sector 1734824

Device sda is your Unraid boot (flash) drive. Shut the server down, pull the flash drive, plug it in your PC and run chkdsk on it.

zyrmpg · July 28, 2019

Thanks for taking the time, Frank.

So I've only got Ubuntu at the moment so I tried a couple other commands:

sudo fsck /dev/sda

fsck from util-linux 2.33.1
e2fsck 1.44.6 (5-Mar-2019)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/sda

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>

Found a dos partition table in /dev/sda

sudo fsck.fat /dev/sda
fsck.fat 4.1 (2017-01-24)
Logical sector size (1766 bytes) is not a multiple of the physical sector size.

sudo dosfsck -w -r -l -v -t /dev/sdc1
fsck.fat 4.1 (2017-01-24)
open: No such file or directory

sudo dosfsck -w -r -l -v -t /dev/sda1
fsck.fat 4.1 (2017-01-24)
Checking we can access the last sector of the filesystem
Boot sector contents:
System ID "MSWIN4.1"
Media byte 0xf8 (hard disk)
       512 bytes per logical sector
      4096 bytes per cluster
        44 reserved sectors
First FAT starts at byte 22528 (sector 44)
         2 FATs, 32 bit entries
   7808000 bytes per FAT (= 15250 sectors)
Root directory start at cluster 281 (arbitrary size)
Data area starts at byte 15638528 (sector 30544)
   1949974 data clusters (7987093504 bytes)
63 sectors/track, 255 heads
      2048 hidden sectors
15630336 sectors total
Checking file /
Checking file /UNRAID
Checking file /EFI-
Checking file /System Volume Information (SYSTEM~1)
Checking file /bzfirmware (BZFIRM~1)

.

Checking file /preclear_reports/preclear_report_3153474D5034345A_2018.04.27_16.43.42.txt (PRECLE~8.TXT)
Checking file /preclear_reports/preclear_report_1SG6T0EZ_2018.05.09_19.45.17.txt (PRECLE~9.TXT)
Checking file /preclear_reports/preclear_report_37534A4730323757_2019.03.23_21.11.13.txt (PRC2E9~2.TXT)
Checking for bad clusters.
Cluster 212756 is unreadable.
Cluster 212757 is unreadable.
Cluster 212758 is unreadable.
Cluster 212759 is unreadable.
Cluster 212782 is unreadable.
Cluster 212783 is unreadable.
Checking for unused clusters.
Checking free cluster summary.
Free cluster summary wrong (1756693 vs. really 1756687)
1) Correct
2) Don't correct
?

That last bit took a while. Safe to correct?

Edited July 28, 2019 by zyrmpg

Frank1940 · July 28, 2019

One always hopes that disk checking operation finds a simple problem that it can fix easily. When it finds unreadable sectors, one had best hope that they are in a file that is not really important or easily replaceable. The most important directory on a the flash drive is the /config directory. That is where all the user settings, plugin install data and Docker management settings are stored. Virtually everything else is simply the stock files from the distribution zip file, or informational files-- like those preclear reports.

Do you have a backup of the flash drive? That could be your best step forward at this point.

50 minutes ago, zyrmpg said:

That last bit took a while. Safe to correct?

Great question! I don't have an answer. (Windows would just fix it and let the chips fall where they may.) Linux always assumes that you are smart enough to know the answer! 😈 Google might be your friend...

zyrmpg · July 28, 2019

hm ok. I do have some old back ups. Could I do a straight swap if they're differences?

I'm most concerned with getting my docker containers back up with the data on the cache and array. Are those dependent on my boot drive? I don't fully understand it but I've heard Unraid operates somewhat independently from the flash drive.

Settings - I think I can put back together. Plugins - I don't think my containers were dependent on any in particular. Docker management settings? Can I set them manually from a fresh install on a fresh usb drive without messing with what I had before the crash? I assume my docker images are intact and i can still access them.. somehow?

Thanks for answering my barrage of questions. Almost 5 years use of Unraid and recovering from something like this is new territory for me.

Frank1940 · July 28, 2019

2 hours ago, zyrmpg said:

hm ok. I do have some old back ups. Could I do a straight swap if they're differences?

If you have not installed or replace any drives since you make the backup. To be extra cautious, I would edit the /config/docker.cfg and the /config/disk.cfg files so that both the Docker and the array don't start! (It should be obvious to the casual observer that changing a couple of strings from "yes" to "no" would accomplish that!) That will allow you to do some checking prior on booting with an old backup. By the way, Do you have Community Apps Backup plugin installed. IF you do, you should have a very recent backup of your flash drive on your array. (Not much good if you can't boot the server...) However, if you get the server up and the array working, you could easily use the contents of that backup to restore the config directory to a version which should be less than a couple of weeks old.

Dockers are something I have never got my fingers into to really play around with. (I only have one Docker installed.) However, I believe that the configuration settings are actually stored on the flash drive in /config/plugins/dockerMan/templates-user. You can read about rebuilding the image file here:

https://forums.unraid.net/topic/36647-official-guide-restoring-your-docker-applications-in-a-new-image-file/

However, I believe you can actually update most Dockers without having to change the configuration files and/or settings so they should work from using the old configuration files in that backup. IF you are really concerned, follow the procedure in that thread but rather than deleting the image file, just rename it.

Edited July 28, 2019 by Frank1940

Frank1940 · July 28, 2019

One more quick thought. I would be looking at getting a new flash drive. LimeTech has made it very easy to transfer your license to a new flash drive. As you finding out, a flaky boot drive is no fun to deal with. The "How-to" instructions are below:

https://wiki.unraid.net/UnRAID_6/Changing_The_Flash_Device

and

https://unraid.net/download

zyrmpg · July 28, 2019

Wow ok. Lots of new info to look into; I appreciate the knowledge download. Might be a few days to research before get back here.

I'm going to leave some notes for myself or anyone in my situation. Do correct me if I've misunderstood anything.

- /config/ is most important. If that's corrupt, find a backup copy

- if no drives have been replaced or added since an older backup was made, it can be used as a straight swap.

- if, at any point, array is bootable and accessable look for those CA backups. Those would be most up to date. (Thanks for reminding me. Forgot about those)

- before attempting to boot a backup, edit /config/disk.cfg and /config/docker.cfg so the array and docker don't autostart (simple "yes" ->"no"). Check for outdated settings since the backup had been made.

- should be able to rebuild docker image off /config/plugins/dockerMan/templates-user. If those are corrupted, try the user templates from a backup.

*Rename instead of delete the image as a backup.

Thanks for the tip. I'll definitely be replacing the flash drive. This little guy has been getting old. Shoulda laid him to rest earlier. Might try a scheduled replacement after this.

Edited July 28, 2019 by zyrmpg

zyrmpg · July 29, 2019

So I was looking through the flash drive and I noticed something.

DISK_ASSIGNMENTS.txt

Disk Assignments as of Sat, 21 Apr 2018 23:21:03 -0700
Disk: parity  Device:   Status: DISK_NP_DSBL
Disk: disk1  Device: STXXX  Status: DISK_OK
Disk: disk2  Device: WDCXXX  Status: DISK_OK
Disk: disk3  Device: STXXX  Status: DISK_OK
Disk: disk4  Device:   Status: DISK_NP
Disk: disk5  Device:   Status: DISK_NP
Disk: parity2  Device:   Status: DISK_NP_DSBL
Disk: cache  Device: TOSXXX  Status: DISK_OK
Disk: cache2  Device:   Status: DISK_NP
Disk: flash  Device: Cruzer_Fit  Status: DISK_OK

The "Disk" labels are correct, but the "Device" and "Status" don't make sense. The disk.cfg has some similar oddities. These were the only lines with "cacheId" in them.

/config/disk.cfg

cacheId="TOSXXX"
...
cacheId.1="SamXXX"
cacheId.2=""
cacheId.3=""

Everything has been green since April as dated in the txt so I'm kinda confused. Is this going to be an issue when I eventually start the array?

Edited July 29, 2019 by zyrmpg

Hoopster · July 29, 2019

1 hour ago, zyrmpg said:

So I was looking through the flash drive and I noticed something.

DISK_ASSIGNMENTS.txt

That DISK_ASSIGNMENTS.txt file is well over a year old. It shows you had no parity disks, three data disks and a single cache drive. The other disks were not present/disabled. Is that what your system looked like in April 2018? The devices and disk positions may have changed since then.

DISK_ASSIGNMENTS.txt gets created when Community Applications backs up the flash drive. Perhaps CA is no longer configured to backup the flash drive so that file has not been updated in a while?

The fact that it appears to contain outdated info will have no affect on your current array. In fact, many systems don't even have that file.

Edited July 29, 2019 by Hoopster

zyrmpg · July 29, 2019

Oh ok. Didn't notice the 2018. Yeah, it guess it is outdated. That configuration sounds about right. Can't image why I would turn backups off. That is unfortunate. Thanks for the info. I assume the current disk assignments are stored somewhere else then. No way I can put it together from memory.

itimpi · July 29, 2019

The assignment are stored in the config/super.dat. file on the flash drive. Unfortunately this is a binary file that is not human readable

Based on the syslog in your diagnostics it looks like your disk assignments were (disk0 is parity1) probably:

Jul 13 18:18:14 Unraid kernel: mdcmd (1): import 0 sdi 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_1SG6T0EZ
Jul 13 18:18:14 Unraid kernel: md: import disk0: (sdi) WDC_WD80EMAZ-00WJTA0_1SG6T0EZ size: 7814026532 
Jul 13 18:18:14 Unraid kernel: mdcmd (2): import 1 sdh 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7SH0NN3C
Jul 13 18:18:14 Unraid kernel: md: import disk1: (sdh) WDC_WD80EMAZ-00WJTA0_7SH0NN3C size: 7814026532 
Jul 13 18:18:14 Unraid kernel: mdcmd (3): import 2 sdg 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_1SGMP44Z
Jul 13 18:18:14 Unraid kernel: md: import disk2: (sdg) WDC_WD80EMAZ-00WJTA0_1SGMP44Z size: 7814026532 
Jul 13 18:18:14 Unraid kernel: mdcmd (4): import 3 sde 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_2YHZJ08D
Jul 13 18:18:14 Unraid kernel: md: import disk3: (sde) WDC_WD80EMAZ-00WJTA0_2YHZJ08D size: 7814026532 
Jul 13 18:18:14 Unraid kernel: mdcmd (5): import 4 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7SJG027W
Jul 13 18:18:14 Unraid kernel: md: import disk4: (sdf) WDC_WD80EMAZ-00WJTA0_7SJG027W size: 7814026532

If you think this may be wrong then the following may help.

What you do not want to do is accidentally assign a data drive as a parity drive as doing so means the content get lost when you start the array. The way to proceed that is typically recommended when the user does not know which drive is their parity drive is:

Assign all drives as data drives. If necessary first do a Tools -> New Config to reset the array
Start the array. One drive should show as unmountable and this would be the parity drive as that drive has no file system so make a note of it's serial number. If more than one drive shows as unmountable then stop and ask for advice on how to proceed. It will still be possible to identify the parity drive but the steps are not as simple and require some command line activity.
The above does not identify what slot each data drive was in but if it matters to you then since they mounted examining the contents may help. If it mattered then take note of serial numbers of such drives.
Stop the array and use Tools -> New Config to reset the array selecting the option to keep current assignments.
Go back to the Main tab and correct the assignments based on the new information from the previous steps.
Start the array now that the assignments are correct. If you only had single parity then you may get away with ticking the "Parity is valid" box before starting the array as single parity calculation is not affected by the slot a data drive is in. You may want to start the array in Maintenance mode to facilitate running File System checks on each data drive although this is optional.
If you did not tick the "Parity is Valid" box then Unraid will automatically start building parity based on the current data drive assignments. If you did tick that box then you should start one yourself to check it is OK. A small number of errors is not unexpected at this point as you did not do a tidy closedown.
Make a backup of your flash drive contents by clicking on the 'Flash' drive in the Main tab. It is a good idea to do this any time you make a significant configuration change to help with recovering from the sort of scenario you have encountered.

Edited July 29, 2019 by itimpi

Unraid wont boot.

Recommended Posts

zyrmpg

Link to comment

Frank1940

Link to comment

zyrmpg

Link to comment

Frank1940

Link to comment

zyrmpg

Link to comment

Frank1940

Link to comment

Frank1940

Link to comment

zyrmpg

Link to comment

zyrmpg

Link to comment

Hoopster

Link to comment

zyrmpg

Link to comment

itimpi

Link to comment

Join the conversation