Unable to start array - previously working system - 4.6 - unRAID Server 4.5 [No new topics]

December 30, 201015 yr

Up till today I have had no issues with the system - would power up the box and array would start up and be online.

Today for some reason array shows as "Stopped. Configuration valid." I then click on start the screen changes to Starting... and the drive free information switches to "Mounting" - when I refresh screen shows all 4 drives as spun up and the "Stopped. Configuration valid." status regardless of how long I let the system sit. Currently running a 6 drive license.

I am running three 1 Gig data drives along with a parity drive. I double checked that all the devices were assigned correctly and nothing has changed since I put the 3rd data disk in about a month ago.

I have attached a copy of the syslog file after trying to start the array.

Any help would be greatly appreciated.

syslog_122910.zip

January 1, 201115 yr

Since no one else has replied yet I'll offer my feeble newb attempt to help. First off, I'm still on 4.53. I've had similar behavior in that from "unraid main" I will click on stop array and it will show "unmounting" till the end of time, refresh all you want. If I then click on the "unmenu main" the array will immediatly show as stopped, and clicking on "unraid main" again will show stopped. It's like refresh is not refreshing, of course this behavior in itermittent.

Not much help for your problem but maybe this will help you with your refresh issue.

Told you my attempt would be feeble.

January 1, 201115 yr

It looks like one of the drives has corruption? The likely candidate is sdb, the Western Digital WDC_WD10EARS-00MVWB0_WD-WMAZA1113749. It seems like it can not find the first partition of the drive at all.

Dec 30 01:31:42 Tower emhttp: shcmd (1): udevadm settle

Dec 30 01:31:42 Tower emhttp: Device inventory:

Dec 30 01:31:42 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host1 (sda) WDC_WD10EADS-00L5B1_WD-WCAU49656278

Dec 30 01:31:42 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:1:0 host1 (sdb) WDC_WD10EARS-00MVWB0_WD-WMAZA1113749

Dec 30 01:31:42 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host2 (sdc) WDC_WD10EADS-00L5B1_WD-WCAU45672327

Dec 30 01:31:42 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host3 (sdd) WDC_WD1001FALS-00J7B0_WD-WMATV1220945

Dec 30 01:31:42 Tower emhttp: get_fstype: open /dev/sdb1: No such file or directory

... snip ... snip ... snip ...

Dec 30 01:31:42 Tower emhttp: shcmd (7): /usr/local/sbin/set_ncq sdb 1 >/dev/null

Dec 30 01:31:42 Tower emhttp: mdcmd: write: No such device or address

Dec 30 01:31:42 Tower kernel: mdcmd (12): start STOPPED

Dec 30 01:31:42 Tower kernel: md: do_run: lock_rdev error: -6

... snip ... snip ... snip ...

Dec 30 01:36:49 Tower emhttp: shcmd (19): /usr/local/sbin/set_ncq sdb 1 >/dev/null

Dec 30 01:36:49 Tower emhttp: mdcmd: write: No such device or address

Dec 30 01:36:49 Tower kernel: mdcmd (13): start STOPPED

Dec 30 01:36:49 Tower kernel: md: do_run: lock_rdev error: -6

January 1, 201115 yr

It looks like one of the drives has corruption? The likely candidate is sdb, the Western Digital WDC_WD10EARS-00MVWB0_WD-WMAZA1113749. It seems like it can not find the first partition of the drive at all.

Dec 30 01:31:42 Tower emhttp: shcmd (1): udevadm settle

Dec 30 01:31:42 Tower emhttp: Device inventory:

Dec 30 01:31:42 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host1 (sda) WDC_WD10EADS-00L5B1_WD-WCAU49656278

Dec 30 01:31:42 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:1:0 host1 (sdb) WDC_WD10EARS-00MVWB0_WD-WMAZA1113749

Dec 30 01:31:42 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host2 (sdc) WDC_WD10EADS-00L5B1_WD-WCAU45672327

Dec 30 01:31:42 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host3 (sdd) WDC_WD1001FALS-00J7B0_WD-WMATV1220945

Dec 30 01:31:42 Tower emhttp: get_fstype: open /dev/sdb1: No such file or directory

... snip ... snip ... snip ...

Dec 30 01:31:42 Tower emhttp: shcmd (7): /usr/local/sbin/set_ncq sdb 1 >/dev/null

Dec 30 01:31:42 Tower emhttp: mdcmd: write: No such device or address

Dec 30 01:31:42 Tower kernel: mdcmd (12): start STOPPED

Dec 30 01:31:42 Tower kernel: md: do_run: lock_rdev error: -6

... snip ... snip ... snip ...

Dec 30 01:36:49 Tower emhttp: shcmd (19): /usr/local/sbin/set_ncq sdb 1 >/dev/null

Dec 30 01:36:49 Tower emhttp: mdcmd: write: No such device or address

Dec 30 01:36:49 Tower kernel: mdcmd (13): start STOPPED

Dec 30 01:36:49 Tower kernel: md: do_run: lock_rdev error: -6

If the drive is readable/writable, but the MBR is corrupted, it will appear as if the partition does not exist, and no /dev/sdb1 will exist.

Soooo...

First let's see if the drive can be read at all.

Please post the output of the following command:

dd if=/dev/sdb count=1 | od -x -A d

The output should look a little like this (but with different values)

root@Tower:# dd if=/dev/sdb count=1 | od -x -A d

0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

0000448 0000 0000 0000 003f 0000 6341 1d38 0000

0000464 0000 0000 0000 0000 0000 0000 0000 0000

*

0000496 0000 0000 0000 0000 0000 0000 0000 aa55

0000512

1+0 records in

1+0 records out

512 bytes (512 B) copied, 0.0190986 s, 26.8 kB/s

also post the output of

fdisk -l /dev/sdb

and

hdparm -N /dev/sdb

Assuming the drive responds, we can put the re-write the MBR with a utility I wrote for another user long ago. It will not change the data on the drive, it will just rebuild the partition table as unRAID would have originally created it. If it works, you'll be able to re-boot and the partition will be recognized.

Joe L.

January 2, 201115 yr

Author

Well color me confused.

I started up the unit to run the commands listed above and the array was online. The box had been sitting idle since the original posting. I know for a fact I had tried multiple times for a proper boot over a couple of days prior to making my original post.

I am currently copying everything off of sdb1 on the off chance it is an issue with the drive. Right at this moment I am not exactly sure how to proceed.

January 2, 201115 yr

Well color me confused.

I started up the unit to run the commands listed above and the array was online. The box had been sitting idle since the original posting. I know for a fact I had tried multiple times for a proper boot over a couple of days prior to making my original post.

I am currently copying everything off of sdb1 on the off chance it is an issue with the drive. Right at this moment I am not exactly sure how to proceed.

It could be explained by a loose connection to the drive. Vibration or heat could be causing the intermittent disk.

Once you get the files off of sdb1, stop the array, power down, and check that everything is secure. Re-seat the connectors on both the data and power cables.

Joe L.

Unable to start array - previously working system - 4.6

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)