June 14, 200917 yr I think I would write down this note to share with unRaid users such that if anyone run into the same situations as I had, s/he will not be panic and know how to recovery it. my unRaid is 4.4.2 This morning I re-flash my unRaid Mobo BIOS (you can find my configuration in my signature) with latest release from vendor. After successfully done that I reboot my system and when unRaid finally came up, I was told there are 4 disks are missing from configuration (I have 12 disks including parity) and one of them is parity disk. OMG :-) After several system reboot, no luck, unRaid still reported 4 missing disks. at this moment I decide to check out all physical disk unRaid had discovered and reported under "Device" page. Apparently unRaid DOES find those 4 missing disk, but it has problem to associate those disks with my original configuration. The only thing different I can find is those disks are not in their original "locations", for example my parity disk original was /dev/sdf and now it is /dev/sdk. Since I had kept a separated copy of my unRaid configuration, so I decide to assign those missing disk back manually one at a time by its disk id. after I finish those re-assignment, unRaid come up successfully and report parity is valid. Later on I examine the syslog file, I found from device inventory in unRAID, all disks had been found but there are couple errors when unRaid try to restart MD driver. The "pci-0000:00:1f.2-scsi-5:0:0:0" was the location where my parity disk was located when it was /dev/sdf, this disk now is located at "pci-0000:00:1f.5-scsi-1:0:0:0 (sdk)" Jun 14 11:33:56 Tower emhttp: Device inventory: Jun 14 11:33:56 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 (sdf) ata-ST31000340AS_9QJ1TFNV Jun 14 11:33:56 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:1:0 (sdg) ata-ST31000340AS_6QJ051L6 Jun 14 11:33:56 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 (sdh) ata-ST31500341AS_9VS150BR Jun 14 11:33:56 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:1:0 (sdi) ata-ST31000340AS_6QJ00XB9 Jun 14 11:33:56 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 (sdj) ata-WDC_WD5000AAKS-00YGA0_WD-WCAS82576893 Jun 14 11:33:56 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 (sdk) ata-ST31500343AS_9VS05D9X Jun 14 11:33:56 Tower emhttp: pci-0000:01:00.0-scsi-0:0:0:0 (sda) ata-ST31500341AS_9VS14XR9 Jun 14 11:33:56 Tower emhttp: pci-0000:01:00.0-scsi-1:0:0:0 (sdb) ata-ST3500320AS_5QM142BB Jun 14 11:33:56 Tower emhttp: pci-0000:01:00.1-ide-0:0 (hda) ata-WDC_WD5000AAJB-00UHA0_WD-WCAPW1067933 Jun 14 11:33:56 Tower emhttp: pci-0000:01:00.1-ide-0:1 (hdb) ata-MAXTOR_STM3500630A_5QG05GHX Jun 14 11:33:56 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 (sdd) ata-ST31000340AS_9QJ227XS Jun 14 11:33:56 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 (sde) ata-ST31000333AS_6TE0EPNV Jun 14 11:33:56 Tower emhttp: restart_md_driver: stat pci-0000:00:1f.2-scsi-5:0:0:0: No such file or directory Jun 14 11:33:56 Tower emhttp: restart_md_driver: stat pci-0000:00:1f.2-scsi-2:0:0:0: No such file or directory Jun 14 11:33:56 Tower emhttp: restart_md_driver: stat pci-0000:00:1f.2-scsi-3:0:0:0: No such file or directory Jun 14 11:33:56 Tower emhttp: restart_md_driver: stat pci-0000:00:1f.2-scsi-4:0:0:0: No such file or directory Jun 14 11:33:56 Tower emhttp: shcmd (1): rmmod md-mod >>/var/log/go 2>&1 Jun 14 11:33:56 Tower emhttp: shcmd: shcmd (1): exit status: 1 Jun 14 11:33:56 Tower emhttp: shcmd (2): modprobe md-mod super=/boot/config/super.dat slots=0,0,8,112,0,0,8,64,8,48,0,0,0,0,8,80,8,16,8,0,3,0,3,64,0,0,0,0,0,0,0,0 >>/var/log/go 2>&1 Jun 14 11:33:56 Tower kernel: md: unRAID driver 0.95.0 installed Jun 14 11:33:56 Tower kernel: md: xor using function: pIII_sse (9448.800 MB/sec) Jun 14 11:33:56 Tower kernel: md: disk0 missing Jun 14 11:33:56 Tower kernel: md: import disk1: [8,112] (sdh) ST31500341AS 9VS150BR offset: 63 size: 1465138552 Jun 14 11:33:56 Tower kernel: md: disk2 missing Jun 14 11:33:56 Tower kernel: md: import disk3: [8,64] (sde) ST31000333AS 6TE0EPNV offset: 63 size: 976762552 Jun 14 11:33:56 Tower kernel: md: import disk4: [8,48] (sdd) ST31000340AS 9QJ227XS offset: 63 size: 976762552 Jun 14 11:33:56 Tower kernel: md: disk5 missing Jun 14 11:33:56 Tower kernel: md: disk6 missing Jun 14 11:33:56 Tower kernel: md: import disk7: [8,80] (sdf) ST31000340AS 9QJ1TFNV offset: 63 size: 976762552 Jun 14 11:33:56 Tower kernel: md: import disk8: [8,16] (sdb) ST3500320AS 5QM142BB offset: 63 size: 488386552 Jun 14 11:33:56 Tower kernel: md: import disk9: [8,0] (sda) ST31500341AS 9VS14XR9 offset: 63 size: 1465138552 Jun 14 11:33:56 Tower kernel: md: import disk10: [3,0] (hda) WDC WD5000AAJB-00UHA0 WD-WCAPW1067933 offset: 63 size: 488386552 Jun 14 11:33:56 Tower kernel: md: import disk11: [3,64] (hdb) MAXTOR STM3500630A 5QG05GHX offset: 63 size: 488386552 After unRaid is up and running, I decide to reboot it one more time for a clean start by stopping it. However I was told, after I press "stop" button in unRaid's management web page, all disks except parity are "unformatted". After several tries in stopping this unRaid, system finally report all disks are stopped and NONE of them is "unformatted". I think this might be due to the known issue (the 3rd one) in this following post. http://lime-technology.com/forum/index.php?topic=2092.msg15290#msg15290 So, to summarize my points. (a) If you make any change in your unRaid, keep a copy of your configuration. Print the content in "Devices" page from your unRaid management web page. (b) We need to have better verification in associating disks to configuration in unRaid. © If you find unRaid report some disks are "unformatted", don't panic unless you are sure those disks are newly added and were never used before, otherwise, just wait a little bit longer and give it a several tries. (d) We need to have a better "representation" of a real "unformatted" disk as well as a false one. otherwise this "Unformatted" state is very misleading and scary for many general users who might not have in deep knowledge of what is going on.
June 15, 200917 yr Your description and advice is quite complete, really does not need further comment, but I thought I would add 2 comments any way, more for the benefit of later readers. There has been considerable discussion of the "Unformatted" issue (see Why is a disk showing as Unformatted?), and we all agree with your conclusions in (d). I think what you experienced after the BIOS upgrade is the same as changing to a new motherboard. The usual instructions for a major hardware change is: * Note the device assignments * Change the hardware * After booting the new system, check and correct the device assignments The BIOS upgrade probably changes the ACPI and other tables, which can change the device discovery timings and order, which can change the device ID assignment, so it essentially looks and acts somewhat like a different motherboard - not very different, but just enough to give you all of that trouble. Nice work figuring it out on your own. A Linux kernel upgrade can sometimes result in the same device changes and trouble. It is important for all users to keep a copy of their device assignments on hand, by printout or screen capture of the Devices tab. (b) We need to have better verification in associating disks to configuration in unRaid. Somewhere (and a while back), I requested an improvement related to this, essentially that Tom drop the slot ID's (eg. pci-0000:00:1f.2-scsi-0:0:1:0) from the disk.cfg file, and just loop through the found drives matching drive serial numbers with disk numbers. It would avoid situations like yours, and when users change motherboards, and the occasional Linux kernel change, that moves the device ID's around. It would take us a long step to complete hardware independence, so you could carry just your drives and flash across the country, plug them in any compatible system, and the entire array would come up immediately with valid parity.
June 15, 200917 yr After unRaid is up and running, I decide to reboot it one more time for a clean start by stopping it. However I was told, after I press "stop" button in unRaid's management web page, all disks except parity are "unformatted". After several tries in stopping this unRaid, system finally report all disks are stopped and NONE of them is "unformatted". I think this might be due to the known issue (the 3rd one) in this following post. http://lime-technology.com/forum/index.php?topic=2092.msg15290#msg15290 To help others... Were you logged in and had changed directory to one of your disks, keeping it busy? or Did you have an add-on process of some kind running that was accessing a disk at the time you attempted to stop the array? Perhaps the cache_dirs script? or Was your system completely stock and standard, and there was no script or added function running, and you were not logged in via telnet?? I'm hoping it was something simple that was accessing a disk and had a disk busy... Please, let us know. PS. Your approach was exactly correct.... log off, or change directory, or stop the process accessing the disks, then press stop once more. Joe L.
June 15, 200917 yr Author To help others... Were you logged in and had changed directory to one of your disks, keeping it busy? or Did you have an add-on process of some kind running that was accessing a disk at the time you attempted to stop the array? Perhaps the cache_dirs script? or Was your system completely stock and standard, and there was no script or added function running, and you were not logged in via telnet?? I'm hoping it was something simple that was accessing a disk and had a disk busy... Please, let us know. PS. Your approach was exactly correct.... log off, or change directory, or stop the process accessing the disks, then press stop once more. Joe L. I did log in but only change directory to /dev try to figure out what is missing. My system has some add-on scripts and i believe it should be the cache_dirs that contribute to this issue since it is the one busy once system is up
June 15, 200917 yr To help others... Were you logged in and had changed directory to one of your disks, keeping it busy? or Did you have an add-on process of some kind running that was accessing a disk at the time you attempted to stop the array? Perhaps the cache_dirs script? or Was your system completely stock and standard, and there was no script or added function running, and you were not logged in via telnet?? I'm hoping it was something simple that was accessing a disk and had a disk busy... Please, let us know. PS. Your approach was exactly correct.... log off, or change directory, or stop the process accessing the disks, then press stop once more. Joe L. I did log in but only change directory to /dev try to figure out what is missing. My system has some add-on scripts and i believe it should be the cache_dirs that contribute to this issue since it is the one busy once system is up Thanks... I'll add a warning to the cache_dirs thread... that way a user would not be caught by surprise. It also explains why you were able to press "stop" a few times and catch the cache_dirs script between scans of the file-systems to eventually shut down. (While it is scanning, the file-systems are "busy") Joe L.
Archived
This topic is now archived and is closed to further replies.