myndphunkie

Members
  • Posts

    51
  • Joined

  • Last visited

Posts posted by myndphunkie

  1. Yep, PEBKAC it is then :-)

     

     

    I think your right on the data rebuild process. Basically, I had moved files (which would have gone to disk3 as it had the most free-space), found an error, pressed restore (even though it said disk contents are not affected), and did a parity sync.

     

    Lesson learned.

     

     

    I'll check the cables out on the weekend.

     

     

    As for the error message(s), dmesg said it was unable to identify the interface. This is the same message in the Wiki:

     

    "ata7: hard resetting link

    ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

    ata7.00: qc timeout (cmd 0xec)

    ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)

    ata7.00: revalidation failed (errno=-5)

    ata7: failed to recover some devices, retrying in 5 secs"

     

     

  2. Hi Guys,

     

    From the Wiki (somewhere, I can't find it now), I think I have a faulty power cable or SATA cable but would like some confirmation before wriggling / replacing things.

     

    I am currently running unraid 4.4.2 on a full slackware distribution. Up until recently, I have had no real issues until my newest drive started showing errors. Unraid has marked the drive with a red circle.

     

    Now, I've run short and long S.M.A.R.T. tests several times, and there are 0 issues. So, I pressed the restore button, did a parity sync and all was fine for a few days.

     

    It was after this I noticed that some of my files may have disappeared and I didn't think it was PEBKAC.

     

    A few days later, the same issue again. So, I went in the same circle again - and lost some data - again.

     

     

    To make it easier, I'll post some stats:

     

    Drive:

    1TB - ata-ST31000528AS_6VP1PBAY (Disk 3)

     

    Smart Report:

     

    smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

    Home page is http://smartmontools.sourceforge.net/

     

    === START OF INFORMATION SECTION ===

    Device Model:    ST31000528AS

    Serial Number:    6VP1PBAY

    Firmware Version: CC37

    User Capacity:    1,000,204,886,016 bytes

    Device is:        Not in smartctl database [for details use: -P showall]

    ATA Version is:  8

    ATA Standard is:  ATA-8-ACS revision 4

    Local Time is:    Fri Feb 19 04:17:18 2010 CST

    SMART support is: Available - device has SMART capability.

    SMART support is: Enabled

     

    === START OF READ SMART DATA SECTION ===

    SMART overall-health self-assessment test result: PASSED

     

    General SMART Values:

    Offline data collection status:  (0x82) Offline data collection activity

    was completed without error.

    Auto Offline Data Collection: Enabled.

    Self-test execution status:      ( 245) Self-test routine in progress...

    50% of test remaining.

    Total time to complete Offline

    data collection: ( 600) seconds.

    Offline data collection

    capabilities: (0x7b) SMART execute Offline immediate.

    Auto Offline data collection on/off support.

    Suspend Offline collection upon new

    command.

    Offline surface scan supported.

    Self-test supported.

    Conveyance Self-test supported.

    Selective Self-test supported.

    SMART capabilities:            (0x0003) Saves SMART data before entering

    power-saving mode.

    Supports SMART auto save timer.

    Error logging capability:        (0x01) Error logging supported.

    General Purpose Logging supported.

    Short self-test routine

    recommended polling time: (  1) minutes.

    Extended self-test routine

    recommended polling time: ( 180) minutes.

    Conveyance self-test routine

    recommended polling time: (  2) minutes.

    SCT capabilities:       (0x103f) SCT Status supported.

    SCT Feature Control supported.

    SCT Data Table supported.

     

    SMART Attributes Data Structure revision number: 10

    Vendor Specific SMART Attributes with Thresholds:

    ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

      1 Raw_Read_Error_Rate    0x000f  119  099  006    Pre-fail  Always      -      226697282

      3 Spin_Up_Time            0x0003  097  095  000    Pre-fail  Always      -      0

      4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      361

      5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

      7 Seek_Error_Rate        0x000f  066  060  030    Pre-fail  Always      -      4873256

      9 Power_On_Hours          0x0032  097  097  000    Old_age  Always      -      3028

    10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

    12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      164

    183 Unknown_Attribute      0x0032  099  099  000    Old_age  Always      -      1

    184 Unknown_Attribute      0x0032  100  100  099    Old_age  Always      -      0

    187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

    188 Unknown_Attribute      0x0032  100  099  000    Old_age  Always      -      100

    189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0

    190 Airflow_Temperature_Cel 0x0022  071  059  045    Old_age  Always      -      29 (Lifetime Min/Max 27/29)

    194 Temperature_Celsius    0x0022  029  041  000    Old_age  Always      -      29 (0 19 0 0)

    195 Hardware_ECC_Recovered  0x001a  037  023  000    Old_age  Always      -      226697282

    197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

    198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

    199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0

    240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      144976621079867

    241 Unknown_Attribute      0x0000  100  253  000    Old_age  Offline      -      3589836420

    242 Unknown_Attribute      0x0000  100  253  000    Old_age  Offline      -      630453533

     

    SMART Error Log Version: 1

    No Errors Logged

     

    SMART Self-test log structure revision number 1

    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

    # 1  Short offline      Self-test routine in progress 50%      3028        -

    # 2  Short offline      Completed without error      00%      2801        -

    # 3  Extended offline    Completed without error      00%      2711        -

    # 4  Short offline      Completed without error      00%      2699        -

     

    SMART Selective self-test log data structure revision number 1

    SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

        1        0        0  Not_testing

        2        0        0  Not_testing

        3        0        0  Not_testing

        4        0        0  Not_testing

        5        0        0  Not_testing

    Selective self-test flags (0x0):

      After scanning selected spans, do NOT read-scan remainder of disk.

    If Selective self-test is pending on power-up, resume after 0 minute delay.

     

     

    (This seems to be a perfect drive?)

     

     

     

    /var/log/messages:

     

    Feb 14 07:49:06 TANK kernel: ata9: hard resetting link

    Feb 14 07:49:08 TANK kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

     

    Feb 15 04:47:41 TANK kernel:  sdk:md: disk3 read error

    Feb 15 04:47:42 TANK kernel: pe read error: 1205131344/3, count: 1

    Feb 15 04:47:43 TANK kernel: pe read error: 1205139208/3, count: 1

    Feb 15 04:47:43 TANK kernel: <4pe read error: 1205139216/3, count: 1

    Feb 15 04:47:43 TANK kernel: <4pe read error: 1205139248/3, count: 1

    Feb 15 04:47:43 TANK kernel: <pe read error: 1205139256/3, count: 1

    Feb 15 04:47:43 TANK kernel: pe read error: 1205139264/3, count: 1

     

    Feb 17 22:27:02 TANK kernel: scsi 9:0:0:0: Direct-Access    ATA      ST31000528AS    CC37 PQ: 0 ANSI: 5

    Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB)

    Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] Write Protect is off

    Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

     

     

     

     

     

  3. Weird... I had an issue that was/is similar..

     

    I'm running on a full slackware install with parity + cache. I got the disk full message even though there was plenty of free space.

     

    The shortish story: I was visually watching files disappear after unzipping them.. I had no idea where they went until my root partition was full (100%!). All the missing files ended up in the black hole of /mnt/user0/<folders>. Each of the shares that had the issue were set to high-water with no split level anything.

     

    Ive now switched these back to 'most-free' and will see how it goes.

     

     

    I think I just found another cause behind this... When I pressed 'stop' to stop the array, the cache mover was still moving files to disk8. As it couldn't 'unmount' the folder of /mnt/user0 im guessing that upon reboot it won't re-mount it.

     

    This would cause my /mnt/user0 to fill up quickly - in this case, it's an actual drive and not my flash drive.

  4. You didn't check the cables and still don't know why the drive was taken out of service?

     

    I believe the UDMA errors could be cables. I have one SATA drive that shows a high number after I had a cable problem.

     

    Peter

     

     

    Hi Peter,

     

    No, I didn't check the cables. The UDMA errors were from a long time ago (see http://lime-technology.com/forum/index.php?topic=3021.0 where the count is exactly the same). I believe the drive was taken out of service because of the multiple power failures and my laziness in checking the management page afterwards as this was around the same time as the problem began.

     

    According to the syslogs, it actually started directly after a reboot - not in the middle of a 'powered up session' (if that makes sense).

     

     

    Cheers

     

    Edit: The 'long' S.M.A.R.T. test passed without any errors.

     

     

  5. Absolutely fantastic response, thanks bjp999.

     

    I decided to rebuild onto the original drive and your theory does make sense. I'm confident that the errors are from a long time ago and not from recently. As I now recall from the logs, this issue started around the same day we had many power failures - each time a parity check was in place. It has taught me to check the unraid page more often though - if it wasn't for the weird network issues, I probably would have noticed for a few more days (which is scary if I had another failure!)

     

    I actually do a parity check at least 2-3 times per month, so I'm confident there isn't any *real* issues.

     

    The rebuild has now finished, and this is the results:

     

    Last checked on 9/7/2009 7:52:08 PM, finding 0 errors.)

     

    I went out and bought a 1TB drive to replace it anyway, so I'm going to add that to the array as I was running out of free space!

     

    I'll run a long test now.

     

     

  6. I feel a little bit silly right about now.

     

    It just occured to me that I had never powered off the server, I had only rebooted it.

     

    I powered down cleanly, and powered back up, and now unraid says:

     

    "Stopped. Disabled disk replaced." (which I haven't yet)

     

    The drive is now visible, but I can see this:

     

    root@TANK:~#  smartctl  -a  -d  ata  /dev/hdb

    smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

    Home page is http://smartmontools.sourceforge.net/

     

    === START OF INFORMATION SECTION ===

    Model Family:    Seagate Barracuda 7200.10 family

    Device Model:    ST3500630A

    Serial Number:    9QG1TV5X

    Firmware Version: 3.AAE

    User Capacity:    500,107,862,016 bytes

    Device is:        In smartctl database [for details use: -P show]

    ATA Version is:  7

    ATA Standard is:  Exact ATA specification draft version not indicated

    Local Time is:    Mon Sep  7 14:29:48 2009 CST

    SMART support is: Available - device has SMART capability.

    SMART support is: Enabled

     

    === START OF READ SMART DATA SECTION ===

    SMART overall-health self-assessment test result: PASSED

     

    General SMART Values:

    Offline data collection status:  (0x82) Offline data collection activity

                                            was completed without error.

                                            Auto Offline Data Collection: Enabled.

    Self-test execution status:      (  0) The previous self-test routine completed

                                            without error or no self-test has ever

                                            been run.

    Total time to complete Offline

    data collection:                ( 430) seconds.

    Offline data collection

    capabilities:                    (0x5b) SMART execute Offline immediate.

                                            Auto Offline data collection on/off support.

                                            Suspend Offline collection upon new

                                            command.

                                            Offline surface scan supported.

                                            Self-test supported.

                                            No Conveyance Self-test supported.

                                            Selective Self-test supported.

    SMART capabilities:            (0x0003) Saves SMART data before entering

                                            power-saving mode.

                                            Supports SMART auto save timer.

    Error logging capability:        (0x01) Error logging supported.

                                            General Purpose Logging supported.

    Short self-test routine

    recommended polling time:        (  1) minutes.

    Extended self-test routine

    recommended polling time:        ( 163) minutes.

     

    SMART Attributes Data Structure revision number: 10

    Vendor Specific SMART Attributes with Thresholds:

    ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

      1 Raw_Read_Error_Rate    0x000f  115  082  006    Pre-fail  Always      -      89052350

      3 Spin_Up_Time            0x0003  092  092  000    Pre-fail  Always      -      0

      4 Start_Stop_Count        0x0032  098  098  020    Old_age  Always      -      2743

      5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      6

      7 Seek_Error_Rate        0x000f  086  060  030    Pre-fail  Always      -      465126290

      9 Power_On_Hours          0x0032  084  084  000    Old_age  Always      -      14645

    10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

    12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      233

    187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

    189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0

    190 Airflow_Temperature_Cel 0x0022  061  048  045    Old_age  Always      -      39 (Lifetime Min/Max 39/39)

    194 Temperature_Celsius    0x0022  039  052  000    Old_age  Always      -      39 (0 17 0 0)

    195 Hardware_ECC_Recovered  0x001a  090  052  000    Old_age  Always      -      80496754

    197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

    198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

    199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      102

    200 Multi_Zone_Error_Rate  0x0000  100  253  000    Old_age  Offline      -      0

    202 TA_Increase_Count      0x0032  100  253  000    Old_age  Always      -      0

     

    SMART Error Log Version: 1

    ATA Error Count: 101 (device log contains only the most recent five errors)

            CR = Command Register [HEX]

            FR = Features Register [HEX]

            SC = Sector Count Register [HEX]

            SN = Sector Number Register [HEX]

            CL = Cylinder Low Register [HEX]

            CH = Cylinder High Register [HEX]

            DH = Device/Head Register [HEX]

            DC = Device Command Register [HEX]

            ER = Error register [HEX]

            ST = Status register [HEX]

    Powered_Up_Time is measured from power on, and printed as

    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

    SS=sec, and sss=millisec. It "wraps" after 49.710 days.

     

    Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

      When the command that caused the error occurred, the device was active or idle.

     

      After command completion occurred, registers were:

      ER ST SC SN CL CH DH

      -- -- -- -- -- -- --

      84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

     

      Commands leading to the command that caused the error were:

      CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

      -- -- -- -- -- -- -- --  ----------------  --------------------

      25 00 08 c7 87 5d e0 00      08:03:17.347  READ DMA EXT

      25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT

      10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]

      25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT

      25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

     

    Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

      When the command that caused the error occurred, the device was active or idle.

     

      After command completion occurred, registers were:

      ER ST SC SN CL CH DH

      -- -- -- -- -- -- --

      84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

     

      Commands leading to the command that caused the error were:

      CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

      -- -- -- -- -- -- -- --  ----------------  --------------------

      25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

      10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]

      25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT

      25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT

      c6 00 10 00 00 00 e0 00      08:03:16.023  SET MULTIPLE MODE

     

    Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

      When the command that caused the error occurred, the device was active or idle.

     

      After command completion occurred, registers were:

      ER ST SC SN CL CH DH

      -- -- -- -- -- -- --

      84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

     

      Commands leading to the command that caused the error were:

      CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

      -- -- -- -- -- -- -- --  ----------------  --------------------

      25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

      25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT

      c6 00 10 00 00 00 e0 00      08:03:14.162  SET MULTIPLE MODE

      00 00 40 00 00 00 00 06      08:03:16.463  NOP [Abort queued commands]

      ef 03 40 00 00 00 e0 02      08:03:16.023  SET FEATURES [set transfer mode]

     

    Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

      When the command that caused the error occurred, the device was active or idle.

     

      After command completion occurred, registers were:

      ER ST SC SN CL CH DH

      -- -- -- -- -- -- --

      84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

     

      Commands leading to the command that caused the error were:

      CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

      -- -- -- -- -- -- -- --  ----------------  --------------------

      25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

      c6 00 10 00 00 00 e0 00      08:03:14.172  SET MULTIPLE MODE

      00 00 40 00 00 00 00 06      08:03:14.162  NOP [Abort queued commands]

      ef 03 40 00 00 00 e0 02      08:03:14.152  SET FEATURES [set transfer mode]

      25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

     

    Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

      When the command that caused the error occurred, the device was active or idle.

     

      After command completion occurred, registers were:

      ER ST SC SN CL CH DH

      -- -- -- -- -- -- --

      84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

     

      Commands leading to the command that caused the error were:

      CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

      -- -- -- -- -- -- -- --  ----------------  --------------------

      25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

      25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT

      10 00 3f 00 00 00 e0 00      08:03:14.162  RECALIBRATE [OBS-4]

      25 00 08 c7 87 5d e0 00      08:03:14.152  READ DMA EXT

      25 00 08 c7 87 5d e0 00      08:03:14.141  READ DMA EXT

     

    SMART Self-test log structure revision number 1

    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

    # 1  Extended offline    Completed without error      00%    12818        -

    # 2  Short offline      Completed without error      00%    10432        -

     

    SMART Selective self-test log data structure revision number 1

    SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

        1        0        0  Not_testing

        2        0        0  Not_testing

        3        0        0  Not_testing

        4        0        0  Not_testing

        5        0        0  Not_testing

    Selective self-test flags (0x0):

      After scanning selected spans, do NOT read-scan remainder of disk.

    If Selective self-test is pending on power-up, resume after 0 minute delay.

     

    Is it worth replacing it?

  7. Fixing this type of thing quickly is a priority!

     

    I'm heading out the door now to buy a new drive :-)

     

    It is unfortunate if you rebooted the server without taking a screenshot and capturing a full syslog.  Every time you reboot the syslog is completely refreshed and hints of what caused events like this are lost.  All we know at this point is that unRAID removed the disk from the array.  The best way to know if the disk is good or bad is to look at its smart report.  (For more info. go to the troubleshooting link in my sig and read about smartctl).

     

    I'm running a full slackware install, all the syslog files are rotated so I still have access to them :-)

     

    ls -l /dev/disk/by-id

     

    Unfortunately, the drive doesn't show up here :-(

     

     

    Opening up the magic syslog shows this:

     

    Aug 31 11:30:24 TANK kernel: hdb: dma_timer_expiry: dma status == 0x61

    Aug 31 11:30:34 TANK kernel: hdb: DMA timeout error

    Aug 31 11:30:34 TANK kernel: hdb: dma timeout error: status=0xd0 { Busy }

    Aug 31 11:30:34 TANK kernel: ide: failed opcode was: unknown

    Aug 31 11:31:04 TANK kernel: ide0: reset: master: passed; slave: failed

    Aug 31 11:31:05 TANK kernel: hdb: status error: status=0x00 { }

    Aug 31 11:31:35 TANK kernel: end_request: I/O error, dev hdb, sector 401820735

    Aug 31 11:31:35 TANK kernel: md: disk3 read error

    Aug 31 11:31:35 TANK kernel: handle_stripe read error: 401820672/3, count: 1

    Aug 31 11:31:36 TANK kernel: end_request: I/O error, dev hdb, sector 401820743

    Aug 31 11:31:36 TANK kernel: end_request: I/O error, dev hdb, sector 401820759

    Aug 31 11:31:36 TANK kernel: end_request: I/O error, dev hdb, sector 401820767

    ^^ repeated hundreds of times

     

     

  8. Hi All,

     

    I'm running Unraid 4.4.2 on a full slackware install (maybe relevant, maybe not). Anyway, I noticed my HDD light on constantly today and when I went to use the unraid server it seemed a little "off color".

     

    Long story short, there were 1000+ errors on 'drive 3' (500gig IDE drive).

     

    After a reboot, unraid is still working, I can still get to the disk share via \\server\disk3 and I can also see it mounted via 'df -h'. I am concerned that I will lose data on this drive and I want to replace it with a SATA 1TB drive.

     

    Would I need to follow this process:

     

    1) 'Stop' the array

    2) 'Unassign' disk 3

    3) Shutdown the server

    4) Replace IDE disk with SATA disk (bigger capacity)

    5) Startup server

    6) Assign new SATA disk to disk 3

    7) Do a parity check / rebuild (??)

     

    Also, is it possible to assign my current parity drive as disk 3 and not lose any data?

     

    Syslog snippit is below:

    Sep  6 22:30:28 TOWER kernel: md: disk3 removed

     

     

     

     

  9. Weird... I had an issue that was/is similar..

     

    I'm running on a full slackware install with parity + cache. I got the disk full message even though there was plenty of free space.

     

    The shortish story: I was visually watching files disappear after unzipping them.. I had no idea where they went until my root partition was full (100%!). All the missing files ended up in the black hole of /mnt/user0/<folders>. Each of the shares that had the issue were set to high-water with no split level anything.

     

    Ive now switched these back to 'most-free' and will see how it goes.

  10.  

    from the slackware book http://www.slackware.com/config/init.php:

    The first program to run under Slackware besides the Linux kernel is init. This program reads /etc/inittab file to see how to run the system. It runs the /etc/rc.d/rc.S script to prepare the system before going into your desired runlevel. The rc.S file enables your virtual memory, mounts your filesystems, cleans up certain log directories, initializes Plug and Play devices, loads kernel modules, configures PCMCIA devices, sets up serial ports, and runs System V init scripts (if found).

     

    So I believe rc.S runs no matter what and rc.K is for single user mode and rc.M is multi-user. I looked in rc.M and there is no mention of fuse, and as I said before I disabled the rc.fuse mention in rc.S. dmesg still shows fuse loading, which I believe is the kernel.

     

    I run my server headless so I want to capture what streams by on the screen during boot. Is this stored in a file somewhere or do I just have to connect a monitor and watch it?

     

     

    Well that's good start that fuse is still loading before your go script or rc.local (which is what I was trying to get at before with the module thing). As for the boot screen, I thought dmesg shows this but I guess not.

     

    I may/may not have mentioned it, but I copied all the fuse stuff and md stuff from the unraid kernel to my kernel source before compiling. This may have made a difference too.

  11. What I found in /lib/modules/2.6.27.7/kernel/fs/fuse/ was fuse.ko. Is your intention to copy this to /lib/modules/2.6.27.7-unRAID/kernel/fs/fuse/? This is what I did. Then I renamed rc.fuse and removed the reference to fuse in rc.S.

     

     

    Just noticed... I believe rc.S. is for single user mode, normally, Linux boots into Multi-user mode which is rc.M. Because you have renamed rc.fuse, any scripts should not load this. I believe mine loads from the kernel itself.

     

    I'm happy to share whatever info you need from my system :-)

     

  12. Attempted what you recommended:

     

    1) mkdir /lib/modules/2.6.27.7-unRAID/kernel/fs/fuse

    2) cp /lib/modules/2.6.27.7/kernel/fs/fuse/* /2.6.27.7-unRAID

    3) mv /etc/rc.d/rc.fuse /etc/rc.d/rc.fuse.disabled

     

    Stop array / Reboot via emhttp

     

    Make sure you dont have any scripts that load /etc/rc.d/rc.fuse :-)

     

    What I found in /lib/modules/2.6.27.7/kernel/fs/fuse/ was fuse.ko. Is your intention to copy this to /lib/modules/2.6.27.7-unRAID/kernel/fs/fuse/? This is what I did. Then I renamed rc.fuse and removed the reference to fuse in rc.S.

     

    Tried a reboot and no-dice, then tried the start-stop method of getting it to work, which worked previously, and now it doesn't. Did I copy fuse.ko to the right place?

     

    Thanks,

     

    Phil/TW

     

    Hi Phil,

     

    Yes, that is the file I copied across to the same folder and it worked for me after that. I noticed that on bootup it was saying 'the fuse filesystem has already been loaded'. I had also compiled fuse support into my kernel as a module - perhaps thats why it's working for me?

     

    Hopefully this gives you something to work on... it does work perfectly for me now, however, it took me around 12 hours (straight) of messing around with it. I do remember copying /md* to the unraid kernel and (possibly) all the fuse stuff too before compiling... perhaps thats what I did.

     

     

    Someone with some more experience may be able to help :-)

  13. I've narrowed it down to this now (and I'm forgetting about NFS for a while):

     

    Using the details in the last part of this page: http://www.thetechguide.com/howto/unraid-on-hard-drive.html, in order to get user shares working I must (after a reboot):

     

    1) Stop the array

    2) Disable User shares

    3) Start the array

    4) Stop the array

    5) Enable user shares

    6) Start the array

     

    Surely I'm missing something?

     

     

    edit (again):

     

    I've FINALLY got it working on reboot...

     

     

    For those who are interested (and this may / may not work for you):

     

    1) mkdir /lib/modules/2.6.27.7-unRAID/kernel/fs/fuse

    2) cp /lib/modules/2.6.27.7/kernel/fs/fuse/* /2.6.27.7-unRAID

    3) mv /etc/rc.d/rc.fuse /etc/rc.d/rc.fuse.disabled

     

    Stop array / Reboot via emhttp

     

    Make sure you dont have any scripts that load /etc/rc.d/rc.fuse :-)

     

    This is all assuming that you have compiled a kernel with Fuse support as a module, running slackware 12.2 and unraid 4.4.2. My .config is below and is setup for an ABIT AB9 PRO.

     

     

    Cheers!

  14. OK, I have been working on this for the last 8.5 hours and now I'm beat.

     

    Things I have achieved:

     

    1) Installed Slackware 12.2 onto my primary IDE drive with 3 paritions: /dev/hda1 = cache parition, /dev/hda2 = root parition, /dev/hda3 = swap partition

    2) compiled a custom kernel with/for unRaid + virtualisation and rebooted into the new kernel successfully

    3) Got emhttp running, assigned my drives, all is well

    4) \\tower\diskX is working - that's good

    5) \\tower\<user share> is visible, but not working.

     

    I used the default .config file from unraid, and added IDE/SATA support, SysV support (for apache) and virtualisation support (for vmware).

     

    I can't find any syslogs at all that show any issues, I can't find any logs that smbd / nmbd is complaing about either. I know a couple of people have had this issue before but am unsure if anyone has successfully got user-shares working with unraid on a HDD.

     

    I've also copied these 2 files plus all the rc.<server> files to their respective directory:

     

    /unraid/etc/rc.d/rc.fuse

    /unraid/bin/fusermount

     

     

    So in brief... everything works except user-shares. Has anyone got any idea on how to get this working? the kids are killling me!

     

     

     

    Edit: I've been messing around with the Kernel again... I had also tried this with CIFS but have now hone back to SAMBA instead of CIFS as I read somewhere that this is what Tom is using in unRaid? Anyway, I enabled samba, and am now stuck on the NFS part (small issue in the scheme of things: (FATAL: Error inserting nfsd (/lib/modules/2.6.27.7-unRAID/kernel/fs/nfsd/nfsd.ko): Device or resource busy

     

     

    Should NFS server be a module or * ?

  15. Hi All,

     

    I'm currently running unRAID v4.4.2 (due to the instructions only being available for this version) with VMWare on top. Everything is working fine except the vmware guest is really slow due to disk writes which I read would happen anyway if you run it on a drive with parity. I have thought about this quite often and I'd like to run the following:

     

    1) Full slackware 12.2 distro installed onto a primary IDE drive (160gb)

    1a) partitioned into: 40gb [Cache], 110gb [OS], 5gb [swap]

    2) Install VMWare onto the OS partition

    3) Still boot into the unRAID kernel with my licence file (I'd assume I boot from the USB stick?)

     

    Is all this possible? I have setup unraid in vmware using slackware 12.2 + unraid 4.4.2, then packaged the files i needed to run vmware on my live server - so it shouldn't be a huge difference I'd imagine.

     

    My questions really relate to point 1a - Would this still work?

     

    Before you ask, I'd like to run other apps etc in the future so yes, a full distro would suit me better.

     

     

     

    Thanks!