Lost array config on reboot 5.0b4


Recommended Posts

Made a fresh install of 5.0b4 on my server. Uses an Adaptec 1430 and onboard intel controller. Everything went swimmingly on install. While I had the stick out, I preinstalled preclear and unmenu, but haven't run the install script for unmenu yet. I am still in the process of loading this array, so I haven't assigned a parity yet. I did some work saving a couple of files to one of the disk shares with SMB, and then I chose to shut the server down. SNAFU time... On reboot Unraid would not find my disks, so it couldn't startup properly. I went to pull out a syslog (attached ending on a) and then reassigned the disks to the array as indicated by the main page. Well, it's up and running again for now, and I am not that dependent of this server anyhow.

 

When shutting the server down I stopped the array and made shutdown on the main tab of the GUI. Did I miss anything important?

 

When looking for the syslog I found the upper right log button not working. for me that only opens a new empty window pointing to "http://192.168.0.10/update.htm?cmd=/usr/bin/tail%20-f%20/var/log/syslog&forkCmd=Start" (running FF 3.6 on OSX). Syslog works fine from the utils tab.

 

For good measure I add the syslog from when I rebuilt the array, ending with b.

 

Ideas, anyone?

syslog110210a.txt.zip

syslog110210b.txt.zip

Link to comment

You're having the exact same problem as I am and now I think confirming an actual bug.

 

I'm using disk4 as an example: Hitachi_HDS722020ALA330_JK11C1YAJLXWRV

 

disk.cfg:

disk4=pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:0:0/0:0:0:0

 

syslog110211b.txt:

Feb 11 22:40:59 Tower emhttp: Hitachi_HDS722020ALA330_JK11C1YAJLXWRV (sdc) 1953514552 => pci0000:00/0000:00:01.0/0000:01:00.0/host4/target4:0:0/4:0:0:0

 

syslog110212a.txt:

Feb 12 00:58:21 Tower emhttp: Hitachi_HDS722020ALA330_JK11C1YAJLXWRV (sdc) 1953514552 => pci0000:00/0000:00:01.0/0000:01:00.0/host1/target1:0:0/1:0:0:0

 

Notice how this disk is getting detected differently every time you reboot?

 

I'm thinking relying on the pci device mapping in linux to detect whether a drive is present or not is not the best way to approach it.

 

 

Link to comment

I haven't had issues with my Slackware Current distro based unRAID system running 5.0b4. My hardware always gets detected the same way and in the same PCI device slots. Yet on a pure unRAID 5.0b4 it seems like the devices tend to flop around more. The reason I mention the Slackware Current distro is because it uses identical or very similar udev and libata packages. Though there could be something with my Slack system being persistent on hdd that alleviates the device flopping.

Link to comment

I haven't had issues with my Slackware Current distro based unRAID system running 5.0b4. My hardware always gets detected the same way and in the same PCI device slots. Yet on a pure unRAID 5.0b4 it seems like the devices tend to flop around more. The reason I mention the Slackware Current distro is because it uses identical or very similar udev and libata packages. Though there could be something with my Slack system being persistent on hdd that alleviates the device flopping.

 

I could be mistaken, but previous versions of unraid used /dev/disk/by-path.. Wonder if it's be better if Tom switched and relied on /dev/disk/by-id ? These contain the actual names of the disk which I believe are unique? So drives could flop around as much as they want and still be detected by unRAID?

Link to comment

I haven't had issues with my Slackware Current distro based unRAID system running 5.0b4. My hardware always gets detected the same way and in the same PCI device slots. Yet on a pure unRAID 5.0b4 it seems like the devices tend to flop around more. The reason I mention the Slackware Current distro is because it uses identical or very similar udev and libata packages. Though there could be something with my Slack system being persistent on hdd that alleviates the device flopping.

 

I could be mistaken, but previous versions of unraid used /dev/disk/by-path.. Wonder if it's be better if Tom switched and relied on /dev/disk/by-id ? These contain the actual names of the disk which I believe are unique? So drives could flop around as much as they want and still be detected by unRAID?

 

The issue is dependent on h/w, drivers, and udev.  It stems from the unRAID concept of "slots".  A slot is a particular location into which you can install a hard drive.  Best way to visualize is with a server than consists of 3-in-2, 4-in-3, or 5-in-3 drive cages where you can plug a drive into a backplane.  The backplane connector is attached to a cable (SATA or IDE), which is attached to specific "port" (connector on the motherboard or disk controller).  Slots are labeled, "Parity", "Disk1", "Disk2", etc.

 

What's important about slots is that they provide a physical link between "array devices" and physical hard drives.  For example if an error message says "disk2 has an error", you can know which physical hard drive this is, i.e., you can point to it, or remove it from it's slot.

 

Having slots also provides a degree of safety when performing array maintenance operations.  For example, suppose disk6 gets disabled.  You can power down, remove disk from slot 6, install new disk, power up, and system will recognize automatically that new disk has been installed into a previously-disabled disk's slot.  In similar manner, system can automatically detect other array config changes.

 

Anyway, it has become the "linux way" to completely dis-associate all things physical from the OS.  In old days, "hda" referred to master on first IDE port, "hdb" to the slave on first IDE port.  Well now, there are no more "hd"'s (generally), and a particular drive can be any identifier, and can change from boot to boot, all dependent on hard drive discovery order.

 

Personally, I think linux designers have gone overboard with their virtualization zeal. It sometimes is desirable to relate a particular hard drive with it's identifier.  The 'udev' subsystem (which manages device discovery), provides a mechanism for this called "/dev/by-path".  There have been a few problems with this over time.  First, the udev developers have from time-to-time changed the format of the entries in "/dev/by-path".  Second, in latest upgrade, they decided to leave out all SAS controllers from "by-path" entirely, for whatever reason.

 

A more troubling problem however, is that ultimately it's up to the device drivers to register devices via udev.  It's up to this code to provide consistent mechanism so that 'by-path' works.  But it appears that this can no longer be relied upon.

 

It is not an option for unRAID to just "freeze" which kernel/udev version will be supported.  It is also not an option to request linux kernel developers to please do the right thing so unRAID works.  It is also not an option for me to go into device drivers and install fixes, or into udev and make custom "by-path" entries.  I think the only choice is to abandon the "slot" concept entirely.

 

So... 5.0-beta5 will include some changes that will make device discovery reliable, but at the expense of a bit more user intervention.  For example, when you replace a disabled disk, you will have to "assign" it via the webGui... stuff like that.  Actually, this might not be so bad since it will make other features easier to implement, such as hot standby.

 

I'm working on these changes now.

 

BTW, "enterprise-class" disk arrays use the concept of "slots", but they don't rely on underlying OS to identify them (for the most part).  Instead, they typically employ enclosure management h/w and s/w: specific LED's or display panels that operator can use to identify hard devices.  This stuff is obviously overkill for a typical unRAID server.  So instead, I will implement utilities to do things like, "blink disk3 activity LED", or "blink all activity LED's except disk5", etc.

Link to comment

Thanks for the clarification Tom..

 

What about /dev/disk/by-id?

 

I've been keeping an eye out on this and mine is consistent throughout reboots.

 

for example:

 

ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Feb 10 23:45 ata-ST31500341AS_9VS0BGPA -> ../../sdf
lrwxrwxrwx 1 root root 10 Feb 10 23:45 ata-ST31500341AS_9VS0BGPA-part1 -> ../../sdf1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 ata-ST31500341AS_9VS0BM7F -> ../../sde
lrwxrwxrwx 1 root root 10 Feb 10 23:45 ata-ST31500341AS_9VS0BM7F-part1 -> ../../sde1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 ata-ST31500341AS_9VS0MCCM -> ../../sdi
lrwxrwxrwx 1 root root 10 Feb 10 23:45 ata-ST31500341AS_9VS0MCCM-part1 -> ../../sdi1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 ata-ST31500341AS_9VS0NMBJ -> ../../sdg
lrwxrwxrwx 1 root root 10 Feb 10 23:45 ata-ST31500341AS_9VS0NMBJ-part1 -> ../../sdg1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 ata-ST3500630AS_9QG3Y3DJ -> ../../sdc
lrwxrwxrwx 1 root root 10 Feb 10 23:45 ata-ST3500630AS_9QG3Y3DJ-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 ata-WDC_WD15EARS-00Z5B1_WD-WMAVU1800030 -> ../../sdb
lrwxrwxrwx 1 root root 10 Feb 10 23:45 ata-WDC_WD15EARS-00Z5B1_WD-WMAVU1800030-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 ata-WDC_WD2001FASS-00W2B0_WD-WMAY00153458 -> ../../sdh
lrwxrwxrwx 1 root root 10 Feb 10 23:45 ata-WDC_WD2001FASS-00W2B0_WD-WMAY00153458-part1 -> ../../sdh1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 ata-WDC_WD5000AAKS-00YGA0_WD-WCAS82838546 -> ../../sdd
lrwxrwxrwx 1 root root 10 Feb 10 23:45 ata-WDC_WD5000AAKS-00YGA0_WD-WCAS82838546-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 scsi-SATA_ST31500341AS_9VS0BGPA -> ../../sdf
lrwxrwxrwx 1 root root 10 Feb 10 23:45 scsi-SATA_ST31500341AS_9VS0BGPA-part1 -> ../../sdf1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 scsi-SATA_ST31500341AS_9VS0BM7F -> ../../sde
lrwxrwxrwx 1 root root 10 Feb 10 23:45 scsi-SATA_ST31500341AS_9VS0BM7F-part1 -> ../../sde1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 scsi-SATA_ST31500341AS_9VS0MCCM -> ../../sdi
lrwxrwxrwx 1 root root 10 Feb 10 23:45 scsi-SATA_ST31500341AS_9VS0MCCM-part1 -> ../../sdi1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 scsi-SATA_ST31500341AS_9VS0NMBJ -> ../../sdg
lrwxrwxrwx 1 root root 10 Feb 10 23:45 scsi-SATA_ST31500341AS_9VS0NMBJ-part1 -> ../../sdg1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 scsi-SATA_ST3500630AS_9QG3Y3DJ -> ../../sdc
lrwxrwxrwx 1 root root 10 Feb 10 23:45 scsi-SATA_ST3500630AS_9QG3Y3DJ-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 scsi-SATA_WDC_WD15EARS-00_WD-WMAVU1800030 -> ../../sdb
lrwxrwxrwx 1 root root 10 Feb 10 23:45 scsi-SATA_WDC_WD15EARS-00_WD-WMAVU1800030-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 scsi-SATA_WDC_WD2001FASS-_WD-WMAY00153458 -> ../../sdh
lrwxrwxrwx 1 root root 10 Feb 10 23:45 scsi-SATA_WDC_WD2001FASS-_WD-WMAY00153458-part1 -> ../../sdh1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 scsi-SATA_WDC_WD5000AAKS-_WD-WCAS82838546 -> ../../sdd
lrwxrwxrwx 1 root root 10 Feb 10 23:45 scsi-SATA_WDC_WD5000AAKS-_WD-WCAS82838546-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 usb-SanDisk_Cruzer_Gator_1738121D52C38F73-0:0 -> ../../sda
lrwxrwxrwx 1 root root 10 Feb 10 23:45 usb-SanDisk_Cruzer_Gator_1738121D52C38F73-0:0-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 wwn-0x5000c500105b9a52 -> ../../sde
lrwxrwxrwx 1 root root 10 Feb 10 23:45 wwn-0x5000c500105b9a52-part1 -> ../../sde1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 wwn-0x5000c500105bc00f -> ../../sdf
lrwxrwxrwx 1 root root 10 Feb 10 23:45 wwn-0x5000c500105bc00f-part1 -> ../../sdf1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 wwn-0x5000c500108fc8e7 -> ../../sdi
lrwxrwxrwx 1 root root 10 Feb 10 23:45 wwn-0x5000c500108fc8e7-part1 -> ../../sdi1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 wwn-0x5000c50010b4b2c0 -> ../../sdg
lrwxrwxrwx 1 root root 10 Feb 10 23:45 wwn-0x5000c50010b4b2c0-part1 -> ../../sdg1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 wwn-0x50014ee0574f4a79 -> ../../sdb
lrwxrwxrwx 1 root root 10 Feb 10 23:45 wwn-0x50014ee0574f4a79-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 wwn-0x50014ee057a49a08 -> ../../sdh
lrwxrwxrwx 1 root root 10 Feb 10 23:45 wwn-0x50014ee057a49a08-part1 -> ../../sdh1
lrwxrwxrwx 1 root root  9 Feb 10 23:45 wwn-0x50014ee2ab6ccde1 -> ../../sdd
lrwxrwxrwx 1 root root 10 Feb 10 23:45 wwn-0x50014ee2ab6ccde1-part1 -> ../../sdd1

 

What would be the potential issues referring to, let's say, /dev/disk/by-id/scsi-SATA_ST3500630AS_9QG3Y3DJ, assigning that to the slot?

 

The drive name/model number should never change...

 

Maybe I'm missing something.

Link to comment

What about this..

 

/dev/disk/by-id/scsi-SATA_ST3500630AS_9QG3Y3DJ points to sdc

/dev/disk/by-path shows that sdc is pci-0000:03:00.0-scsi-2:0:0:0

 

What about "going further down the rabbit hole"?

 

Assuming (maybe incorrectly?) that the /dev/disk/by-id/### will never change, but will always point to the actual device (ie: sdc) that was detected on boot up.

 

Next time you reboot, /d/dev/disk/by-id/### may point to sde instead of sdc, but following sde to by-path will yield its exact location.

 

Hack job? :)

 

 

 

Link to comment

As Limetech indicated /dev/disk/by-path/ entries do NOT exist for SAS controllers so it is not a solution for this customer.

 

I have to agree, the devs went a bit overboard on their virtualization. If they're going to switch over as they did in udev, they could at least be sure to implement all the features that were used previously.  :(

Link to comment

guess a simple table could make the allignments between WWN and Serial number / drive brand

 

in unmenu when i do a hparm i get all this info

 

 

/dev/sdn:

 

ATA device, with non-removable media

Model Number:       WDC WD7500AACS-00ZJB0                  

Serial Number:      WD-WCASM0013559

Firmware Revision:  01.01B01

Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5

Standards:

Supported: 8 7 6 5

Likely used: 8

Configuration:

Logical max current

cylinders 16383 16383

heads 16 16

sectors/track 63 63

--

CHS current addressable sectors:   16514064

LBA    user addressable sectors:  268435455

LBA48  user addressable sectors: 1465149168

Logical/Physical Sector size:           512 bytes

device size with M = 1024*1024:      715404 MBytes

device size with M = 1000*1000:      750156 MBytes (750 GB)

cache/buffer size  = 16384 KBytes

Capabilities:

LBA, IORDY(can be disabled)

Queue depth: 32

Standby timer values: spec'd by Standard, with device specific minimum

R/W multiple sector transfer: Max = 16 Current = 0

Recommended acoustic management value: 128, current value: 254

DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6

    Cycle time: min=120ns recommended=120ns

PIO: pio0 pio1 pio2 pio3 pio4

    Cycle time: no flow control=120ns  IORDY flow control=120ns

Commands/features:

Enabled Supported:

  * SMART feature set

    Security Mode feature set

  * Power Management feature set

  * Write cache

  * Look-ahead

  * Host Protected Area feature set

  * WRITE_BUFFER command

  * READ_BUFFER command

  * NOP cmd

  * DOWNLOAD_MICROCODE

    Power-Up In Standby feature set

  * SET_FEATURES required to spinup after power up

    SET_MAX security extension

    Automatic Acoustic Management feature set

  * 48-bit Address feature set

  * Device Configuration Overlay feature set

  * Mandatory FLUSH_CACHE

  * FLUSH_CACHE_EXT

  * SMART error logging

  * SMART self-test

  * General Purpose Logging feature set

  * 64-bit World wide name

  * {READ,WRITE}_DMA_EXT_GPL commands

  * Segmented DOWNLOAD_MICROCODE

  * Gen1 signaling speed (1.5Gb/s)

  * Gen2 signaling speed (3.0Gb/s)

  * Native Command Queueing (NCQ)

  * Host-initiated interface power management

  * Phy event counters

  * DMA Setup Auto-Activate optimization

  * Software settings preservation

  * SMART Command Transport (SCT) feature set

  * SCT Long Sector Access (AC1)

  * SCT LBA Segment Access (AC2)

  * SCT Error Recovery Control (AC3)

  * SCT Features Control (AC4)

  * SCT Data Tables (AC5)

    unknown 206[12] (vendor specific)

    unknown 206[13] (vendor specific)

Security:

Master password revision code = 65534

supported

not enabled

not locked

not frozen

not expired: security count

supported: enhanced erase

208min for SECURITY ERASE UNIT. 208min for ENHANCED SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 50014ee2ab848faa

NAA : 5

IEEE OUI : 0014ee

Unique ID : 2ab848faa

Checksum: correct

 

 

I guess Tom will get the info he collects now also this way ?

 

EDIT: just checked Samsung and Hitachi drives they all give me the same info and all have WWN numbers

Link to comment

What about this..

 

/dev/disk/by-id/scsi-SATA_ST3500630AS_9QG3Y3DJ points to sdc

/dev/disk/by-path shows that sdc is pci-0000:03:00.0-scsi-2:0:0:0

 

What about "going further down the rabbit hole"?

 

Assuming (maybe incorrectly?) that the /dev/disk/by-id/### will never change, but will always point to the actual device (ie: sdc) that was detected on boot up.

 

Next time you reboot, /d/dev/disk/by-id/### may point to sde instead of sdc, but following sde to by-path will yield its exact location.

 

Hack job? :)

 

LOL you have described how it works now.  The problem is the 'by-path' changes.

Link to comment

i am not an expert and i have no clue if you can get the info in unraid

but why not using the wwn numbers ?

they should be unique ?

in professional array's a FC world everything is handled by WWN numbers

i see all my WD hdd disks have a WWN number

and i know for a fact that SAS drives also have WWN numbers

 

 

 

Problem isn't in identifying the device, it's identifying the slot the device is plugged into.

Link to comment

My setup started doing this as well today as well.

 

I moved some disks around to in a case cleanup move but I got a couple of cables wrong.  I swapped assignments then started the array then decided to just go ahead and fix the cables.  After I shutdown and swapped the cables back, the devices no longer auto-associate. 

Once I assign the disks the array is valid and can start..

 

Link to comment

Yup, same here.

 

I'm glad Lime-Tech knows about it. I thought something had happened to my flash drive that I just registered. >.<

 

I just bought unraid and it had been working great for 1 week at least with the 3 drives I started with. As soon as I added the 4th and rebooted it started doing this which I thought was interesting... Kinda scary since all my stuff is on it. That's what I get for going with Beta i guess.  :P

 

This is slightly comforting...

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.