[I'm guilty] Lost a disk after adding a SATA drive

January 4, 201016 yr

Hello,

I got an "unformatted drive" after rebooting my unRAID 4.5 system. I wanted to benchmark an external 2.5" external HDD via SATA connection on the following configuration.

Motherboard: Intel D945GCLF2 (Atom330 @ 1.6GHz)
Memory: 2GB
Hard drive controller #1 (on-board) : Intel 82801GB, ICH7 SATA controller
Hard drive controller #2 (PCI) : Promise SATA 300 TX4
disk1 (connected to ICH7) : Samsung EcoGreen HD154UI, 1.5TB, 32MB cache
disk2 (connected to Promise SATA300) : Samsung EcoGreen HD154UI, 1.5TB, 32MB cache
disk3 (connected to Promise SATA300) : Samsung EcoGreen HD154UI, 1.5TB, 32MB cache
disk4 (parity drive, connected to ICH7) : Samsung EcoGreen HD154UI, 1.5TB, 32MB cache
No cache drive!
NIC (on-board) : Gigabit Realtek 811C. MTU can be set to maximum 7200 on D945GCLF2.
Case: Chenbro ES34069 Mini-ITX

Here is the disk configuration I wrote down before performing the tasks below.

- Promise card - PORT3 - sda - pci-0000:04:00.0-scsi-2:0:0:0 - Disk2  - SAMSUNG_HD154UI_S1Y6J1KS801960
- Intel board  - SATA0 - sdc - pci-0000:00:1f.2-scsi-0:0:0:0 - Parity - SAMSUNG_HD154UI_S1Y6J1MS706123
- Promise card - PORT4 - sdb - pci-0000:04:00.0-scsi-3:0:0:0 - Disk3 - SAMSUNG_HD154UI_S1Y6J1KS802247
- Intel board  - SATA1 - sdd - pci-0000:00:1f.2-scsi-1:0:0:0 - Disk1  - SAMSUNG_HD154UI_S1Y6J1KS802253 
- Promise card - PORT2 
- Promise card - PORT1 - this is where I plugged in the new drive.

So what I did is to shutdown the system properly and then connected the external drive to the spare SATA port (please note that the external drive is powered by a dummy USB cable (no data)). When rebooting I couldn't hear the drive spinning (after more thinking I think it did but then went to sleep) so I disconnected and reconnected the SATA cable (first mistake?). Before mounting the device I look at the web panel to check the disk array device association. I didn't notice at that time that it was different from the above (i.e. Disk2 was on sdg rather that sda). On /proc/partitions it looked like that my new external drive was on sda so I mounted it using "mount -t ntfs-3g" and it mounted fine. I then started to copy test files between the freshly mounted drive and /mnt/disk2. Everything worked fine. I then decided to shutdown the system. I started by manually unmounting the external NTFS drive and then hit the Stop button on the web interface. But it got stuck on "Stopping array". I ran to the console and noticed some kernel panics. You will find below the last part of the dmesg output. The full one can be found here : http://natzo.com/files/dmesg-alphazo-2010-01.txt

sda: detected capacity change from 0 to 640135028736
sda: detected capacity change from 0 to 640135028736
nfsd: last server has exited, flushing export cache
mdcmd (150): spinup 0
mdcmd (151): spinup 1
mdcmd (152): spinup 2
mdcmd (153): spinup 3
BUG: unable to handle kernel NULL pointer dereference at 00000010
IP: [<c10be618>] open_xa_dir+0x26/0x141
*pdpt = 00000000233c8001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:00.0/host3/target3:0:0/3:0:0:0/block/sde/stat
Modules linked in: md_mod xor ata_piix piix sata_promise r8169

Pid: 6026, comm: umount Not tainted (2.6.31.6-unRAID #6)         
EIP: 0060:[<c10be618>] EFLAGS: 00210296 CPU: 1
EIP is at open_xa_dir+0x26/0x141
EAX: f6e25a80 EBX: f76f9a00 ECX: 00000000 EDX: 00000002
ESI: ffffffc3 EDI: 00000000 EBP: f4d13dfc ESP: f4d13dd4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process umount (pid: 6026, ti=f4d12000 task=f6e58000 task.ti=f4d12000)
Stack:
00000002 f7292fd4 00000000 00000000 f4d13e30 00000008 f4d13e00 f7292fd4
<0> f7292fd4 f4d13e44 f4d13e7c c10be94b 00000000 c10befd1 f7292fd4 f7077220
<0> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c10be94b>] ? reiserfs_for_each_xattr+0x51/0x1ca
[<c10befd1>] ? delete_one_xattr+0x0/0x86
[<c10beb17>] ? reiserfs_delete_xattrs+0x13/0x40
[<c10a9ddb>] ? reiserfs_delete_inode+0x0/0xae
[<c10a9e10>] ? reiserfs_delete_inode+0x35/0xae
[<c10633b8>] ? __slab_free+0x59/0x208
[<c1073ca7>] ? __d_free+0x3a/0x3d
[<c10a9ddb>] ? reiserfs_delete_inode+0x0/0xae
[<c10760f7>] ? generic_delete_inode+0x75/0xdb
[<c107616e>] ? generic_drop_inode+0x11/0x171
[<c1075ba2>] ? iput+0x4b/0x4e
[<c107460d>] ? shrink_dcache_for_umount_subtree+0x17a/0x1be
[<c1074e96>] ? shrink_dcache_for_umount+0x2d/0x3a
[<c10692d4>] ? generic_shutdown_super+0x15/0xc0
[<c106939c>] ? kill_block_super+0x1d/0x31
[<c10af027>] ? reiserfs_kill_sb+0x7d/0x80
[<c1069604>] ? deactivate_super+0x35/0x47
[<c1078cc9>] ? mntput_no_expire+0x5a/0x74
[<c10790f7>] ? sys_umount+0x24c/0x272
[<c107912a>] ? sys_oldumount+0xd/0xf
[<c1002935>] ? syscall_call+0x7/0xb
Code: c1 5d 89 c8 c3 55 89 e5 57 56 be c3 ff ff ff 53 83 ec 1c 89 45 dc 89 55 d8 8b 98 a4 00 00 00 8b 83 94 01 00 00 8b b8 80 00 00 00 <8b> 47 10 85 c0 0f 84 06 01 00 00 83 c0 78 e8 13 f0 1c 00 8b 83 
EIP: [<c10be618>] open_xa_dir+0x26/0x141 SS:ESP 0068:f4d13dd4
CR2: 0000000000000010
---[ end trace 8ff8428156e5ea21 ]---
mdcmd (154): stop 
md: 2 devices still in use.
mdcmd (166): spinup 0
mdcmd (167): spinup 1
mdcmd (168): spinup 2
mdcmd (169): spinup 3

After trying several times to use the powerdown script and even the shutdown command I decided to power cycle the system

When booting up unRaid showed my Disk2 under .... sda ... and unformatted I'm now in a middle of a parity check (I'm glad to have a parity disk ).

I would like to understand what happened in order to first avoid this situation in the future and then understand how to safely add additional drives to the system, either as part of the array or as data source, if possible.

Thank you

Alphazo

January 4, 201016 yr

unRAID is not hot-pluggable. You cannot just plug and un-plug SATA devices, even if not part of the array, as the MB will often re-assign devices. That is probably your initial cause of the corruption.

It appears as if you have a corrupt reiserfs file system on the disk that is looking as if it is "unformatted" Really, it cannot be mounted, so the "default" error from unRAID is that it is un-formatted.

You can probably follow the instructions in the wiki to fix the corruption by running reiserfsck on the /dev/mdX device.

January 4, 201016 yr

Author

Hi Joe,

I knew from the beginning that I shouldn't play around with those external SATA drives. Shame on me... I really deserve my status of "unRaid newbie"!

Regarding your comment what would be the best practice? To let the parity check finish and then run reiserfsck, to interrupt the parity check in order to run reiserfsck or to not run reiserfsck at all if I get a successful parity check?

Cheers

Alphazo

January 4, 201016 yr

Hi Joe,

I knew from the beginning that I shouldn't play around with those external SATA drives. Shame on me... I really deserve my status of "unRaid newbie"!

Regarding your comment what would be the best practice? To let the parity check finish and then run reiserfsck, to interrupt the parity check in order to run reiserfsck or to not run reiserfsck at all if I get a successful parity check?

Cheers

Alphazo

The parity check is because you shut down without stopping the array. if the drive is un-mountable, then all you have done is to make sure the parity drive has the same file-system corruption in its calculations as the actual drive that is corrupted.

I'd stop the parity calc. It will not help you. You've probably lost any chance of recovering by re-building from it.

The drive that could not be mounted needs to have a file-system check done on it, and then follow whatever reiserfsck asks you to do next (probably run with --fix-fixable) Instructions are in the wiki. http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems

Since the drive was not mounted, you don't need to stop samba, or un-mount it, since it is not being accessed by samba, and is not currently mounted.

after fixing the drive with reiserfsck, you can mount the drive (hopefully) and get to the files on it. Follow the instructions as reported back with the reiserfsck command.

The command you will need initially (for disk2) is

reiserfsck /dev/md2

Once the corruption is fixed, then you'll need to do a full parity check to verify parity will agree with the fixed disk.

January 5, 201016 yr

P.S. ONLY do what reiserfsck tells you. Do NOT user repair before it asks and DON'T try it on your parity drive. (grin)

The link to the wiki should have good instructions. The P.S. above is just so you don't have to learn what the rest of us have the hard way.

January 5, 201016 yr

Author

As instructed I interrupted the parity check, ran reiserfcsk (no errors or corruptions were reported) and then started a full parity check that also went fine. I am now back on tracks. Thank you everyone for your precious comments.

In order to close the discussion would you recommend an external eSATA hard drive for backup purposes (not part of the array) assuming it is NOT hot plugged?

Alphazo

January 5, 201016 yr

As instructed I interrupted the parity check, ran reiserfcsk (no errors or corruptions were reported) and then started a full parity check that also went fine. I am now back on tracks. Thank you everyone for your precious comments.

In order to close the discussion would you recommend an external eSATA hard drive for backup purposes (not part of the array) assuming it is NOT hot plugged?

Alphazo

Did your disk2 "mount" properly after you checked it using reiserfsck?

January 5, 201016 yr

Author

Interesting remark. It appears to be mounted (I can see my files) however when doing a mount I get:

/dev/md1 on /mnt/disk1 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
/dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
/dev/md2 on /mnt/disk2 type reiserfs (rw)

What happened to the options for /dev/md2? How can I restore them?

Thanks

Alphazo

January 5, 201016 yr

Interesting remark. It appears to be mounted (I can see my files) however when doing a mount I get:
/dev/md1 on /mnt/disk1 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
/dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
/dev/md2 on /mnt/disk2 type reiserfs (rw)
What happened to the options for /dev/md2? How can I restore them?

Thanks

Alphazo

You just did not specify them in your mount command.

Normally those options would be given in the initial mount command (and would be back again once you stop and restart the array).

They are not critical for your recovery from your failure.

You can re-enable them with

mount -o remount,noatime,nodiratime,noacl,nouser_xattr /mnt/disk2

Joe L.

January 6, 201016 yr

Author

Thank you for the tip. Everything is like before. If done properly (aka not hot plugging) is using external SATA drive for backup purposes still too risky?

January 6, 201016 yr

Thank you for the tip. Everything is like before. If done properly (aka not hot plugging) is using external SATA drive for backup purposes still too risky?

No, I don't think there is any problem. Obviously, you'll need to mount the extra drive to copy files to it. (and it will need to have a file-system of some type on it)

Joe L.

[I'm guilty] Lost a disk after adding a SATA drive

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)