zyrmpg Posted July 27, 2019 Share Posted July 27, 2019 (edited) Hi guys, Sad day for me today. Here's what happened in as much detail as I can remember: I was moving files around when docker crashed. After some amateur fiddling, I ended up trying to do a safe restart via the webUI. It looked like the array did stop, but somewhere after the restart got caught on something and got the UI stuck. SSH access was available but almost any command would just hang. Here's what I got before I gave up and hard rebooted: Via SSH Before Hard Reboot: - htop: shfs ~50% cpu usage - iotop: a few things were high io%. all from /usr/local/ - diagnostics: I dont know what happened to the zip Current Status: - server boots to unraid splash screen on attached monitor ( I set default gui boot a while back) - I can log in with physically attached keyboard - after login I get a black screen - ping fails I do have the boot drive. I found a diagnostics zip from yesterday, though I didnt run it. Theres some log files, I dont know which might be useful. Can someone help me out? unraid-diagnostics-20190726-1244.zip Edited July 27, 2019 by zyrmpg Quote Link to comment
Frank1940 Posted July 28, 2019 Share Posted July 28, 2019 Found this at the very end of your syslog: Jul 26 12:44:21 Unraid kernel: print_req_error: critical medium error, dev sda, sector 1734816 Jul 26 12:44:21 Unraid kernel: sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Jul 26 12:44:21 Unraid kernel: sd 0:0:0:0: [sda] tag#0 Sense Key : 0x3 [current] Jul 26 12:44:21 Unraid kernel: sd 0:0:0:0: [sda] tag#0 ASC=0x11 ASCQ=0x0 Jul 26 12:44:21 Unraid kernel: sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 1a 78 a8 00 00 01 00 Jul 26 12:44:21 Unraid kernel: print_req_error: critical medium error, dev sda, sector 1734824 Device sda is your Unraid boot (flash) drive. Shut the server down, pull the flash drive, plug it in your PC and run chkdsk on it. Quote Link to comment
zyrmpg Posted July 28, 2019 Author Share Posted July 28, 2019 (edited) Thanks for taking the time, Frank. So I've only got Ubuntu at the moment so I tried a couple other commands: sudo fsck /dev/sda fsck from util-linux 2.33.1 e2fsck 1.44.6 (5-Mar-2019) ext2fs_open2: Bad magic number in super-block fsck.ext2: Superblock invalid, trying backup blocks... fsck.ext2: Bad magic number in super-block while trying to open /dev/sda The superblock could not be read or does not describe a valid ext2/ext3/ext4 filesystem. If the device is valid and it really contains an ext2/ext3/ext4 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> or e2fsck -b 32768 <device> Found a dos partition table in /dev/sda sudo fsck.fat /dev/sda fsck.fat 4.1 (2017-01-24) Logical sector size (1766 bytes) is not a multiple of the physical sector size. sudo dosfsck -w -r -l -v -t /dev/sdc1 fsck.fat 4.1 (2017-01-24) open: No such file or directory sudo dosfsck -w -r -l -v -t /dev/sda1 fsck.fat 4.1 (2017-01-24) Checking we can access the last sector of the filesystem Boot sector contents: System ID "MSWIN4.1" Media byte 0xf8 (hard disk) 512 bytes per logical sector 4096 bytes per cluster 44 reserved sectors First FAT starts at byte 22528 (sector 44) 2 FATs, 32 bit entries 7808000 bytes per FAT (= 15250 sectors) Root directory start at cluster 281 (arbitrary size) Data area starts at byte 15638528 (sector 30544) 1949974 data clusters (7987093504 bytes) 63 sectors/track, 255 heads 2048 hidden sectors 15630336 sectors total Checking file / Checking file /UNRAID Checking file /EFI- Checking file /System Volume Information (SYSTEM~1) Checking file /bzfirmware (BZFIRM~1) . . . Checking file /preclear_reports/preclear_report_3153474D5034345A_2018.04.27_16.43.42.txt (PRECLE~8.TXT) Checking file /preclear_reports/preclear_report_1SG6T0EZ_2018.05.09_19.45.17.txt (PRECLE~9.TXT) Checking file /preclear_reports/preclear_report_37534A4730323757_2019.03.23_21.11.13.txt (PRC2E9~2.TXT) Checking for bad clusters. Cluster 212756 is unreadable. Cluster 212757 is unreadable. Cluster 212758 is unreadable. Cluster 212759 is unreadable. Cluster 212782 is unreadable. Cluster 212783 is unreadable. Checking for unused clusters. Checking free cluster summary. Free cluster summary wrong (1756693 vs. really 1756687) 1) Correct 2) Don't correct ? That last bit took a while. Safe to correct? Edited July 28, 2019 by zyrmpg Quote Link to comment
Frank1940 Posted July 28, 2019 Share Posted July 28, 2019 One always hopes that disk checking operation finds a simple problem that it can fix easily. When it finds unreadable sectors, one had best hope that they are in a file that is not really important or easily replaceable. The most important directory on a the flash drive is the /config directory. That is where all the user settings, plugin install data and Docker management settings are stored. Virtually everything else is simply the stock files from the distribution zip file, or informational files-- like those preclear reports. Do you have a backup of the flash drive? That could be your best step forward at this point. 50 minutes ago, zyrmpg said: That last bit took a while. Safe to correct? Great question! I don't have an answer. (Windows would just fix it and let the chips fall where they may.) Linux always assumes that you are smart enough to know the answer! 😈 Google might be your friend... Quote Link to comment
zyrmpg Posted July 28, 2019 Author Share Posted July 28, 2019 hm ok. I do have some old back ups. Could I do a straight swap if they're differences? I'm most concerned with getting my docker containers back up with the data on the cache and array. Are those dependent on my boot drive? I don't fully understand it but I've heard Unraid operates somewhat independently from the flash drive. Settings - I think I can put back together. Plugins - I don't think my containers were dependent on any in particular. Docker management settings? Can I set them manually from a fresh install on a fresh usb drive without messing with what I had before the crash? I assume my docker images are intact and i can still access them.. somehow? Thanks for answering my barrage of questions. Almost 5 years use of Unraid and recovering from something like this is new territory for me. Quote Link to comment
Frank1940 Posted July 28, 2019 Share Posted July 28, 2019 (edited) 2 hours ago, zyrmpg said: hm ok. I do have some old back ups. Could I do a straight swap if they're differences? If you have not installed or replace any drives since you make the backup. To be extra cautious, I would edit the /config/docker.cfg and the /config/disk.cfg files so that both the Docker and the array don't start! (It should be obvious to the casual observer that changing a couple of strings from "yes" to "no" would accomplish that!) That will allow you to do some checking prior on booting with an old backup. By the way, Do you have Community Apps Backup plugin installed. IF you do, you should have a very recent backup of your flash drive on your array. (Not much good if you can't boot the server...) However, if you get the server up and the array working, you could easily use the contents of that backup to restore the config directory to a version which should be less than a couple of weeks old. Dockers are something I have never got my fingers into to really play around with. (I only have one Docker installed.) However, I believe that the configuration settings are actually stored on the flash drive in /config/plugins/dockerMan/templates-user. You can read about rebuilding the image file here: https://forums.unraid.net/topic/36647-official-guide-restoring-your-docker-applications-in-a-new-image-file/ However, I believe you can actually update most Dockers without having to change the configuration files and/or settings so they should work from using the old configuration files in that backup. IF you are really concerned, follow the procedure in that thread but rather than deleting the image file, just rename it. Edited July 28, 2019 by Frank1940 Quote Link to comment
Frank1940 Posted July 28, 2019 Share Posted July 28, 2019 One more quick thought. I would be looking at getting a new flash drive. LimeTech has made it very easy to transfer your license to a new flash drive. As you finding out, a flaky boot drive is no fun to deal with. The "How-to" instructions are below: https://wiki.unraid.net/UnRAID_6/Changing_The_Flash_Device and https://unraid.net/download Quote Link to comment
zyrmpg Posted July 28, 2019 Author Share Posted July 28, 2019 (edited) Wow ok. Lots of new info to look into; I appreciate the knowledge download. Might be a few days to research before get back here. I'm going to leave some notes for myself or anyone in my situation. Do correct me if I've misunderstood anything. - /config/ is most important. If that's corrupt, find a backup copy - if no drives have been replaced or added since an older backup was made, it can be used as a straight swap. - if, at any point, array is bootable and accessable look for those CA backups. Those would be most up to date. (Thanks for reminding me. Forgot about those) - before attempting to boot a backup, edit /config/disk.cfg and /config/docker.cfg so the array and docker don't autostart (simple "yes" ->"no"). Check for outdated settings since the backup had been made. - should be able to rebuild docker image off /config/plugins/dockerMan/templates-user. If those are corrupted, try the user templates from a backup. *Rename instead of delete the image as a backup. Thanks for the tip. I'll definitely be replacing the flash drive. This little guy has been getting old. Shoulda laid him to rest earlier. Might try a scheduled replacement after this. Edited July 28, 2019 by zyrmpg Quote Link to comment
zyrmpg Posted July 29, 2019 Author Share Posted July 29, 2019 (edited) So I was looking through the flash drive and I noticed something. DISK_ASSIGNMENTS.txt Disk Assignments as of Sat, 21 Apr 2018 23:21:03 -0700 Disk: parity Device: Status: DISK_NP_DSBL Disk: disk1 Device: STXXX Status: DISK_OK Disk: disk2 Device: WDCXXX Status: DISK_OK Disk: disk3 Device: STXXX Status: DISK_OK Disk: disk4 Device: Status: DISK_NP Disk: disk5 Device: Status: DISK_NP Disk: parity2 Device: Status: DISK_NP_DSBL Disk: cache Device: TOSXXX Status: DISK_OK Disk: cache2 Device: Status: DISK_NP Disk: flash Device: Cruzer_Fit Status: DISK_OK The "Disk" labels are correct, but the "Device" and "Status" don't make sense. The disk.cfg has some similar oddities. These were the only lines with "cacheId" in them. /config/disk.cfg cacheId="TOSXXX" ... cacheId.1="SamXXX" cacheId.2="" cacheId.3="" Everything has been green since April as dated in the txt so I'm kinda confused. Is this going to be an issue when I eventually start the array? Edited July 29, 2019 by zyrmpg Quote Link to comment
Hoopster Posted July 29, 2019 Share Posted July 29, 2019 (edited) 1 hour ago, zyrmpg said: So I was looking through the flash drive and I noticed something. DISK_ASSIGNMENTS.txt That DISK_ASSIGNMENTS.txt file is well over a year old. It shows you had no parity disks, three data disks and a single cache drive. The other disks were not present/disabled. Is that what your system looked like in April 2018? The devices and disk positions may have changed since then. DISK_ASSIGNMENTS.txt gets created when Community Applications backs up the flash drive. Perhaps CA is no longer configured to backup the flash drive so that file has not been updated in a while? The fact that it appears to contain outdated info will have no affect on your current array. In fact, many systems don't even have that file. Edited July 29, 2019 by Hoopster Quote Link to comment
zyrmpg Posted July 29, 2019 Author Share Posted July 29, 2019 Oh ok. Didn't notice the 2018. Yeah, it guess it is outdated. That configuration sounds about right. Can't image why I would turn backups off. That is unfortunate. Thanks for the info. I assume the current disk assignments are stored somewhere else then. No way I can put it together from memory. Quote Link to comment
itimpi Posted July 29, 2019 Share Posted July 29, 2019 (edited) The assignment are stored in the config/super.dat. file on the flash drive. Unfortunately this is a binary file that is not human readable Based on the syslog in your diagnostics it looks like your disk assignments were (disk0 is parity1) probably: Jul 13 18:18:14 Unraid kernel: mdcmd (1): import 0 sdi 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_1SG6T0EZ Jul 13 18:18:14 Unraid kernel: md: import disk0: (sdi) WDC_WD80EMAZ-00WJTA0_1SG6T0EZ size: 7814026532 Jul 13 18:18:14 Unraid kernel: mdcmd (2): import 1 sdh 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7SH0NN3C Jul 13 18:18:14 Unraid kernel: md: import disk1: (sdh) WDC_WD80EMAZ-00WJTA0_7SH0NN3C size: 7814026532 Jul 13 18:18:14 Unraid kernel: mdcmd (3): import 2 sdg 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_1SGMP44Z Jul 13 18:18:14 Unraid kernel: md: import disk2: (sdg) WDC_WD80EMAZ-00WJTA0_1SGMP44Z size: 7814026532 Jul 13 18:18:14 Unraid kernel: mdcmd (4): import 3 sde 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_2YHZJ08D Jul 13 18:18:14 Unraid kernel: md: import disk3: (sde) WDC_WD80EMAZ-00WJTA0_2YHZJ08D size: 7814026532 Jul 13 18:18:14 Unraid kernel: mdcmd (5): import 4 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7SJG027W Jul 13 18:18:14 Unraid kernel: md: import disk4: (sdf) WDC_WD80EMAZ-00WJTA0_7SJG027W size: 7814026532 If you think this may be wrong then the following may help. What you do not want to do is accidentally assign a data drive as a parity drive as doing so means the content get lost when you start the array. The way to proceed that is typically recommended when the user does not know which drive is their parity drive is: Assign all drives as data drives. If necessary first do a Tools -> New Config to reset the array Start the array. One drive should show as unmountable and this would be the parity drive as that drive has no file system so make a note of it's serial number. If more than one drive shows as unmountable then stop and ask for advice on how to proceed. It will still be possible to identify the parity drive but the steps are not as simple and require some command line activity. The above does not identify what slot each data drive was in but if it matters to you then since they mounted examining the contents may help. If it mattered then take note of serial numbers of such drives. Stop the array and use Tools -> New Config to reset the array selecting the option to keep current assignments. Go back to the Main tab and correct the assignments based on the new information from the previous steps. Start the array now that the assignments are correct. If you only had single parity then you may get away with ticking the "Parity is valid" box before starting the array as single parity calculation is not affected by the slot a data drive is in. You may want to start the array in Maintenance mode to facilitate running File System checks on each data drive although this is optional. If you did not tick the "Parity is Valid" box then Unraid will automatically start building parity based on the current data drive assignments. If you did tick that box then you should start one yourself to check it is OK. A small number of errors is not unexpected at this point as you did not do a tidy closedown. Make a backup of your flash drive contents by clicking on the 'Flash' drive in the Main tab. It is a good idea to do this any time you make a significant configuration change to help with recovering from the sort of scenario you have encountered. Edited July 29, 2019 by itimpi Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.