Jump to content

dgirard

Members
  • Posts

    33
  • Joined

  • Last visited

Posts posted by dgirard

  1. New issue in safari with the latest update.

     

    When I go to open the webUI in safari, I get a big red box error "undefined is not an object (evaluating 'i.generateKey')"

     

    Looks to be related to a security key.  I found some notes referencing that for safari there are changes to some library that need to be made.  It's ok for now for me because it works just fine in Firefox, and this is just for webUI access to the client.

     

    Appears that the change is in the code, not on the safari side?    

     

    Anyway, wanted to make it known.

  2. I'm seeing this now on a brand new (to unraid) system running 6.8.3.   It's a trial key with only 2 drives and no additional setup--no vm's, no docker setup, not even any plugins installed or any data loaded/shares.

     

    So likely this is more basic than suggested above.   Note, I don't have this issue on my primary production system, so maybe it's cpu generation based or some other hardware interaction.

     

    FWIW, old system is 2x AMD 2431 on a Supermicro H8dm8-2,   new system is 1x AMD 6274 and Supermicro H8DG6.

     

     

  3. Hello!

    I'm having a problem similar to interwebtech.

     

    The web interface never gets past "scanning hard drives"

     

    When I look at the docker log (icon on the right in Unraid), I see several Java errors, here's the first one:

     

    lucee.runtime.exp.ApplicationException: Error invoking external process
    
    at lucee.runtime.tag.Execute.doEndTag(Execute.java:258)
    at scancontrollers_cfm$cf.call_000046(/ScanControllers.cfm:456)
    at scancontrollers_cfm$cf.call(/ScanControllers.cfm:455)
    at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:933)
    at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:823)
    at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:66)
    at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45)
    at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2464)
    at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2454)
    at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2427)
    at lucee.runtime.engine.Request.exe(Request.java:44)
    at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1090)
    at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1038)
    at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102)
    at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
    at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
    at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:684)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
    at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)

    There are several more, but I suspect they're all related to this one?

     

    I swapped out ScanControllers.cfm for the CreateDebugInfo.cfm as was previously suggested and am emailing the output.   Any ideas?

     

    Running unraid 6.6.7 and the latest DiskSpeed container.

     

     

  4. Just add Hard drives, and an unraid license/flash drive.

     

    Asking $250.   I can meet up anywhere in the Metro Detroit area (within 50 miles of zip 48111), or anywhere along the i75 corridor between Detroit and Cincinnati (I make that drive at least once a month for work)

     

    This is a used setup.

     

    If I recall correctly, one of the drive bays didn't work.  I think it was a backplane or cabling problem.  It's obvious in that it just doesn't work...so consider this to be a 19 drive system or take some time and try to fix what's wrong (probably something simple, but I really don't know).

     

    Also included in this setup:

    ABit AB9 PTO motherboard

    2 GB DDR2-800 RAM

    PowerPC and cooling Silencer 750 EPS12v Quad power supply (was enough to power 18 drives+ in this system)

    Intel CPU (honestly, I can't remember what CPU is in this guy--I can probably fire it up and see if you'd like--but it works)

    1x Supermicro AOC-saslp-MV8 Sas/SATA card with breakout cables for 8 SATA drives

    Total of 19 drive bays connected to SATA controllers (between the 8 on the Supermicro, and 11 on the motherboard.

    All 7 Fans appear to be working, but are used--easy to replace if there's any issue there

     

    Also to note, I seem to be missing ONE of the hot plug drive trays (I guess to go with the one slot that doesn't work)...I have it here somewhere and If I find it before delivery I'll add it (and I'll update this listing if I find it)

     

    I know the motherboard, ram and CPU are several generations old at this time, but this system manages unraid just fine!   Probably not the best for lots of Dockers or KVM's due to limited ram and CPU cores (that's why I upgraded)

     

    Let me know if you have any questions.  This system is pretty much ready to go!

     

    David

     

     

     

  5. Hello all.

     

    I had previously setup my VM's via the Gui and had edited the XML to set custom port numbers for VNC (the objective is to have consistent port numbers for specific VM's instead of them being assigned in the order that the VM's are started).  I also need to have a password on some of the VM's.  This was working fine until the last unraid update...

     

    Now it seems that I can set a password in the GUI, but still need to edit the XML to add the custom port number for VNC.  Unfortunately, as soon as I edit the XML for VNC, it "forgets" the password...so I go back and set the password in the GUI and it "forgets" the custom port number...so I can't figure out how to get both to stick...it just worked with previous versions...

     

    Any suggestions?  What am I missing here?

     

    Thanks

  6. May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

     

    It's the BadCRC and ICRC error flags that specifically indicate corrupted packets, usually from a bad SATA cable.  Since you have repeated ICRC error flags, which cause the pauses and resets, and cause the SATA link speed to be slowed down to hopefully improve communications integrity, I suspect you also have an increased UDMA_CRC_Error_Count on the SMART report for that drive.  I know you said you replaced the SATA cable, but it doesn't look like a good cable from here.  There's still a small chance that it may be a bad power situation instead.

    Rob: 

    My UDMA_CRC_Error_Count is 2,so it does not seem to be CRC errors from the drives perspective.

    I'm still happy to try another SATA cable if that still makes sense now that the errors seem to have stopped (I'm going to monitor for a couple days before confirming) with the NCQ setting change.

    Also, it's possible that rjscotts problem *is* a cable or power

    Yes, I was referring to the ICRC in rjstott's syslog extract.

     

    when I look back at my log, I see a different message before the failed command...I see:

    May  9 23:14:43 Tower kernel: ata16.00: cmd 61/00:50:e0:4f:b1/38:00:03:00:00/40 tag 10 ncq 7340032 out
    May  9 23:14:43 Tower kernel:         res 40/00:a8:d8:a7:ae/00:00:03:00:00/40 Emask 0x40 (internal error)
    May  9 23:14:43 Tower kernel: ata16.00: status: { DRDY }
    May  9 23:14:43 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED
    

    I do not see the CRC error message you pointed out...I don't know what DRDY means, but could it mean that we're overflowing the buffers sending data to the drive (that would explain the out of IOMMU space I reported earlier as well I would assume)

    DRDY just means 'Drive ReaDY', a good flag.  The important part of your exception is the 'internal error' message unfortunately.  When a programmer maps something odd or unexpected to 'internal error', it usually means they either don't expect it to happen, or don't want to deal with it, or don't know how to deal with it.  There's no further information available, so you are kind of stuck!  Any firmware updates available, for that disk controller?

     

    Thanks robj:

    It makes more sense to me now.

     

    I'll look into controller firmware...Im using the sata controller on the motherboard (it's a super micro) for this drive.  The drive itself has the latest.

     

    I checked the NCQ settings on several of my drives and they're all set to 31...so looks like the all-off isn't working (at least in my setup)  So either a bug or something strange with my setup...I'll look closer tomorrow.

     

    I'll also keep watching to see if the error messages occur now that the ssd is set to NCQ=1.  I suspect this is the bug that's alluded to in the bugzilla report (doesn't look like its been fixed by the linux kernel guys yet).

     

    Thanks again!

    David

     

     

  7. Rob: 

    My UDMA_CRC_Error_Count is 2,so it does not seem to be CRC errors from the drives perspective.

    I'm still happy to try another SATA cable if that still makes sense now that the errors seem to have stopped (I'm going to monitor for a couple days before confirming) with the NCQ setting change.

     

    Also, it's possible that rjscotts problem *is* a cable or power, since when I look back at my log, I see a different message before the failed command...I see:

    May  9 23:14:43 Tower kernel: ata16.00: cmd 61/00:50:e0:4f:b1/38:00:03:00:00/40 tag 10 ncq 7340032 out
    May  9 23:14:43 Tower kernel:         res 40/00:a8:d8:a7:ae/00:00:03:00:00/40 Emask 0x40 (internal error)
    May  9 23:14:43 Tower kernel: ata16.00: status: { DRDY }
    May  9 23:14:43 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED
    

    I do not see the CRC error message you pointed out...I don't know what DRDY means, but could it mean that we're overflowing the buffers sending data to the drive (that would explain the out of IOMMU space I reported earlier as well I would assume)

     

     

     

    May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

     

    It's the BadCRC and ICRC error flags that specifically indicate corrupted packets, usually from a bad SATA cable.  Since you have repeated ICRC error flags, which cause the pauses and resets, and cause the SATA link speed to be slowed down to hopefully improve communications integrity, I suspect you also have an increased UDMA_CRC_Error_Count on the SMART report for that drive.  I know you said you replaced the SATA cable, but it doesn't look like a good cable from here.  There's still a small chance that it may be a bad power situation instead.

  8. One more interesting observation:

     

    I have "force NCQ disabled=yes" on the disk configuration screen.  Yet it appears (maybe I'm looking the wrong way?) that NCQ is still enabled for all my drives, including this cache drive that's having the problems.

     

    If I

    cat /sys/block/sdc/device/queue_depth

    it reports a value of 31, which indicates NCQ is in play if I understand this correctly (I believe it should report 0 or 1 if NCQ is disabled?)

     

    now, if I change the queue_depth to 1 with

    echo 1 >/sys/block/sdc/device/queue_depth

    it appears that my errors with this ssd no longer occur (based on a quick test...set to 1, copy large file to ssd, no errors, set back to 31, recopy same file, errors occur.)

     

    Am I understanding this right?

     

     

    I thought about looking at this because of this: https://bugzilla.kernel.org/show_bug.cgi?id=89261

    and this document which explains how to dynamically change ncq settings: https://exemen.wordpress.com/2011/05/16/enabling-disabling-and-checking-ncq/

     

     

  9. @rjstott

     

    You might want to have a look at:

    http://www.techspot.com/article/997-samsung-ssd-read-performance-degradation/

     

    Maybe not relevant to the problems you are having, and maybe you won't be affected (cache disk usage - transient),

    but if you are holding VM / Container files or any other static files on the cache, you would be affected.

    Interesting article.  Sounds like I need to pull out the Samsung SSD or be faced with performance problems at some point.  I do not think this is the cause of our current problem (rjscott and I) as I reformatted my SSD and re-copied all the data to it and the errors continued immediately.

     

    I'm also not certain it's sata cable or power (I'm not ruling it out however)...I did replace the sata cable and even changed the sata port that it was connected to.  Power seems stable (it's in a super micro 24 drive server with the dual power supplies) and I have no other power problems.  In addition, it seems this problem started with beta 15.  It IS making the entire server pause while it tries to reset the sata port.

     

    If this problem is just limited to samsung SSD's as cache drives, then I'll just pull mine, but since the problem seems to have started with a release, is it possible this is a driver or kernel problem?

     

     

  10. binhex:

     

    Looks like similar problem exists with delugevpn...

     

    2015-05-06 06:04:34,620 DEBG 'setip' stderr output:
    /home/nobody/setip.sh: line 4: netstat: command not found
    

    and

    2015-05-06 06:04:34,730 DEBG 'setip' stderr output:
    /home/nobody/setip.sh: line 4: netstat: command not found
    
    2015-05-06 06:04:34,733 DEBG 'webui' stderr output:
    /home/nobody/webui.sh: line 4: netstat: command not found
    
    2015-05-06 06:04:34,734 DEBG 'setport' stderr output:
    /home/nobody/setport.sh: line 4: netstat: command not found
    

     

    Are we doing something wrong? or did they change the upstream OS distro on you perhaps?

     

    I connected to the container while running and couldn't find a netstat command anywhere...

     

    Thanks!

    David

     

  11. post a smart report for your ssd, it might be end of life?

     

    OK, here's the smart report.  Looks OK to me, but maybe I'm missing something?

     

    root@Tower:~# smartctl -a /dev/sdc

    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.4-unRAID] (local build)

    Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

     

    === START OF INFORMATION SECTION ===

    Device Model:    Samsung SSD 840 EVO 500GB

    Serial Number:    S1DHNSADA10381K

    LU WWN Device Id: 5 002538 8a008a13e

    Firmware Version: EXT0BB0Q

    User Capacity:    500,107,862,016 bytes [500 GB]

    Sector Size:      512 bytes logical/physical

    Rotation Rate:    Solid State Device

    Device is:        Not in smartctl database [for details use: -P showall]

    ATA Version is:  ACS-2, ATA8-ACS T13/1699-D revision 4c

    SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)

    Local Time is:    Tue May  5 10:25:55 2015 EDT

    SMART support is: Available - device has SMART capability.

    SMART support is: Enabled

     

    === START OF READ SMART DATA SECTION ===

    SMART overall-health self-assessment test result: PASSED

     

    General SMART Values:

    Offline data collection status:  (0x00) Offline data collection activity

    was never started.

    Auto Offline Data Collection: Disabled.

    Self-test execution status:      (  0) The previous self-test routine completed

    without error or no self-test has ever

    been run.

    Total time to complete Offline

    data collection: ( 6600) seconds.

    Offline data collection

    capabilities: (0x53) SMART execute Offline immediate.

    Auto Offline data collection on/off support.

    Suspend Offline collection upon new

    command.

    No Offline surface scan supported.

    Self-test supported.

    No Conveyance Self-test supported.

    Selective Self-test supported.

    SMART capabilities:            (0x0003) Saves SMART data before entering

    power-saving mode.

    Supports SMART auto save timer.

    Error logging capability:        (0x01) Error logging supported.

    General Purpose Logging supported.

    Short self-test routine

    recommended polling time: (  2) minutes.

    Extended self-test routine

    recommended polling time: ( 110) minutes.

    SCT capabilities:       (0x003d) SCT Status supported.

    SCT Error Recovery Control supported.

    SCT Feature Control supported.

    SCT Data Table supported.

     

    SMART Attributes Data Structure revision number: 1

    Vendor Specific SMART Attributes with Thresholds:

    ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

      5 Reallocated_Sector_Ct  0x0033  100  100  010    Pre-fail  Always      -      0

      9 Power_On_Hours          0x0032  097  097  000    Old_age  Always      -      12830

    12 Power_Cycle_Count      0x0032  099  099  000    Old_age  Always      -      27

    177 Wear_Leveling_Count    0x0013  084  084  000    Pre-fail  Always      -      190

    179 Used_Rsvd_Blk_Cnt_Tot  0x0013  100  100  010    Pre-fail  Always      -      0

    181 Program_Fail_Cnt_Total  0x0032  100  100  010    Old_age  Always      -      0

    182 Erase_Fail_Count_Total  0x0032  100  100  010    Old_age  Always      -      0

    183 Runtime_Bad_Block      0x0013  100  100  010    Pre-fail  Always      -      0

    187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

    190 Airflow_Temperature_Cel 0x0032  073  060  000    Old_age  Always      -      27

    195 Hardware_ECC_Recovered  0x001a  200  200  000    Old_age  Always      -      0

    199 UDMA_CRC_Error_Count    0x003e  099  099  000    Old_age  Always      -      2

    235 Unknown_Attribute      0x0012  099  099  000    Old_age  Always      -      24

    241 Total_LBAs_Written      0x0032  099  099  000    Old_age  Always      -      29197432370

     

    SMART Error Log Version: 1

    No Errors Logged

     

    SMART Self-test log structure revision number 1

    No self-tests have been logged.  [To run self-tests, use: smartctl -t]

     

     

    SMART Selective self-test log data structure revision number 1

    SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

        1        0        0  Not_testing

        2        0        0  Not_testing

        3        0        0  Not_testing

        4        0        0  Not_testing

        5        0        0  Not_testing

    Selective self-test flags (0x0):

      After scanning selected spans, do NOT read-scan remainder of disk.

    If Selective self-test is pending on power-up, resume after 0 minute delay.

     

     

  12. I'll start by apologizing for changing the subject...I didn't realize I was changing the entire thread.  Other boards create a sub-subject within a thread if the subject is edited.  Strange that I can even edit it.  Thanks Mods for fixing it...I meant no harm...

     

    I was hoping I'd distilled the problem down to these errors.  This is even more confusing now...if IOMMU is an Intel VT-d error then I must have some real problems, since I have AMD CPU's, and wasn't running any virtualization at the time these occurred.  I also only see these errors when writing to my SSD (one out of 19 drives).

     

    Perhaps the IOMMU's are just a side effect, the real problem seems to the "exception emask" and "failed command" messages.  there are hundreds of these.

     

    I'll start at the beginning, since perhaps the original snippit isn't enough to distill the problem.

     

    Here's the history:

    The night after I upgraded to beta 15 (from beta 14), the mover ran and tried to move all my files that were on cache onto a drive that was full...it generated a ton of the messages I quoted above, along with "out of space" errors..and blew away the files it wasn't able to move (I lost data).  It wasn't anything critical I'd lost, and it appeared that the problem was likely with the ssd or my share configuration, so I updated those as I indicated (reformatted, flashed firmware, added full drives to the "excluded" disks in the share configuration, etc) but the errors writing to the SSD didn't go away. 

     

    Here are some extracts from that first nights log:

     

    These errors occurred while I was copying one of my disk image files (to and from my cache drive) so I could test KVM vs XEN:

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

    Apr 30 22:17:52 Tower kernel: ata30: EH in SWNCQ mode,QC:qc_active 0xFFFF0 sactive 0xFFFF0

    Apr 30 22:17:52 Tower kernel: ata30: SWNCQ:qc_active 0x0 defer_bits 0x0 last_issue_tag 0x3

    Apr 30 22:17:52 Tower kernel:  dhfis 0x0 dmafis 0x0 sdbfis 0x0

    Apr 30 22:17:52 Tower kernel: ata30: ATA_REG 0x40 ERR_REG 0x0

    Apr 30 22:17:52 Tower kernel: ata30: tag : dhfis dmafis sdbfis sactive

    Apr 30 22:17:52 Tower kernel: ata30.00: exception Emask 0x0 SAct 0xffff0 SErr 0x0 action 0x6

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:20:a0:58:6f/24:00:28:00:00/40 tag 4 ncq 4718592 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:28:a0:7c:6f/20:00:28:00:00/40 tag 5 ncq 4194304 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:30:a0:9c:6f/24:00:28:00:00/40 tag 6 ncq 4718592 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:38:a0:c0:6f/28:00:28:00:00/40 tag 7 ncq 5242880 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:40:a0:e8:6f/20:00:28:00:00/40 tag 8 ncq 4194304 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:48:a0:08:70/20:00:28:00:00/40 tag 9 ncq 4194304 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:50:a0:28:70/28:00:28:00:00/40 tag 10 ncq 5242880 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:58:a0:50:70/34:00:28:00:00/40 tag 11 ncq 6815744 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:60:a0:84:70/28:00:28:00:00/40 tag 12 ncq 5242880 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:68:a0:ac:70/24:00:28:00:00/40 tag 13 ncq 4718592 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:70:a0:d0:70/28:00:28:00:00/40 tag 14 ncq 5242880 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:78:a0:f8:70/24:00:28:00:00/40 tag 15 ncq 4718592 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:80:a0:1c:71/0c:00:28:00:00/40 tag 16 ncq 1572864 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:88:a0:28:71/20:00:28:00:00/40 tag 17 ncq 4194304 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:90:a0:48:71/1c:00:28:00:00/40 tag 18 ncq 3670016 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

    Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:98:a0:64:71/20:00:28:00:00/40 tag 19 ncq 4194304 out

    Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

    Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

    Apr 30 22:17:52 Tower kernel: ata30: hard resetting link

    Apr 30 22:17:52 Tower kernel: ata30: nv: skipping hardreset on occupied port

    Apr 30 22:17:52 Tower kernel: ata30: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

    Apr 30 22:17:52 Tower kernel: ata30.00: configured for UDMA/133

    Apr 30 22:17:52 Tower kernel: ata30: EH complete

    This exact sequence appears approx 197 times in a row (sometimes with different number of WRITE QUEUED messages) from timestamp 22:17:31 to 23:51:11 (the time I was copying a 10GB file)  Also seems like a long time to copy 10 GB to and from an SSD...

     

    I didn't notice these messages until the next day, when I went looking to find out why a bunch of files were missing that should have been moved over from the cache drive.  Here's part of the log from the Mover script:

     

    May  1 03:40:01 Tower logger: moving "Movies"

    May  1 03:40:01 Tower logger: ./Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002/imdbinfo.nfo

    May  1 03:40:01 Tower logger: .d..t...... ./

    May  1 03:40:16 Tower shfs/user0: shfs_setxattr: lsetxattr: system.posix_acl_access /mnt/disk9/Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002 (28) No space left on device

    May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002, ACL_TYPE_ACCESS): Exec format error (8)

    May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002, ACL_TYPE_ACCESS): Exec format error (8)

    May  1 03:40:16 Tower logger: rsync: set_acl: sys_acl_set_file(Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002, ACL_TYPE_ACCESS): No space left on device (28)

    May  1 03:40:16 Tower logger: rsync: recv_generator: failed to stat "/mnt/user0/Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002/imdbinfo.nfo": Exec format error (8)

    May  1 03:40:16 Tower logger: .d..t...... Movies/CLEAN/

    May  1 03:40:16 Tower logger: .d.......a. Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002/

    May  1 03:40:16 Tower logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

    May  1 03:40:16 Tower logger: rm: cannot remove '/mnt/user0/./Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002/imdbinfo.nfo': Exec format error

    May  1 03:40:16 Tower logger: ./Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001/imdbinfo.nfo

    May  1 03:40:16 Tower shfs/user0: shfs_setxattr: lsetxattr: system.posix_acl_access /mnt/disk9/Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001 (28) No space left on device

    May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001, ACL_TYPE_ACCESS): Exec format error (8)

    May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001, ACL_TYPE_ACCESS): Exec format error (8)

    May  1 03:40:16 Tower logger: rsync: set_acl: sys_acl_set_file(Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001, ACL_TYPE_ACCESS): No space left on device (28)

    May  1 03:40:16 Tower logger: rsync: recv_generator: failed to stat "/mnt/user0/Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001/imdbinfo.nfo": Exec format error (8)

    May  1 03:40:16 Tower logger: .d..t...... Movies/CLEAN/

    May  1 03:40:16 Tower logger: .d.......a. Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001/

    May  1 03:40:16 Tower logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

    May  1 03:40:16 Tower logger: rm: cannot remove '/mnt/user0/./Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001/imdbinfo.nfo': Exec format error

    May  1 03:40:16 Tower logger: ./Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992/imdbinfo.nfo

    May  1 03:40:16 Tower shfs/user0: shfs_setxattr: lsetxattr: system.posix_acl_access /mnt/disk9/Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992 (28) No space left on device

    May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992, ACL_TYPE_ACCESS): Exec format error (8)

    May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992, ACL_TYPE_ACCESS): Exec format error (8)

    May  1 03:40:16 Tower logger: rsync: set_acl: sys_acl_set_file(Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992, ACL_TYPE_ACCESS): No space left on device (28)

    May  1 03:40:16 Tower logger: rsync: recv_generator: failed to stat "/mnt/user0/Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992/imdbinfo.nfo": Exec format error (8)

    May  1 03:40:16 Tower logger: .d..t...... Movies/CLEAN/

    May  1 03:40:16 Tower logger: .d.......a. Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992/

    May  1 03:40:16 Tower logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

    May  1 03:40:16 Tower logger: rm: cannot remove '/mnt/user0/./Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992/imdbinfo.nfo': Exec format error

    May  1 03:40:16 Tower logger: ./Movies/CLEAN/godzilla/Godzilla vs Gigan 1972/imdbinfo.nfo

    There is one set of these error messages for each movie that mover tried to move.  Here's the really strange thing:  each folder had a movie file (a file with extension MKV if it makes any difference) and an NFO file.  I don't see reference to any of the MKV files in the log, however they were completely deleted from the cache drive--not moved.  The NFO files were "moved", but after the move (and during according to the log) if you try to access them yo get the "Exe format error". 

     

    I had attributed the exe format error to corruption in the reiserfs file system--rather than try to repair it, I reformatted the drive as xfs (I've been systematically doing that anyway) and restored the files that were there.

     

    The other strange thing with this is that these files had not been moved from cache previously (I don't have the older log files, so I don't know what/if there were previous errors.  They'd all been there at least a week or two (perhaps since beta 14?).  Also, disk9, where mover was trying to move them to, was *almost* full (just a few K free), and was NOT part of the share configuration (not included or excluded) and there were drives that are included that had plenty of free space.  I've updated all my share configs to explicitly indicate which drives to include and exclude, so perhaps that was my mistake in configuration.

     

    Finally, the mover didn't nuke all my files, some of them failed like these (share was different)

    May  1 04:05:41 Tower logger: .d..t...... Usenet/download/move/

    May  1 04:05:41 Tower logger: >f+++++++++ Usenet/download/move/1973 interesting file.720p.ac3.CG.avi

    May  1 04:05:41 Tower shfs/user0: shfs_write: write: (28) No space left on device

    May  1 04:05:41 Tower shfs/user0: shfs_write: write: (28) No space left on device

    May  1 04:06:01 Tower logger: rsync: write failed on "/mnt/user0/Usenet/download/move/1973 interesting file.720p.ac3.CG.avi": No space left on device (28)

    May  1 04:06:01 Tower logger: rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]

     

    I had originally thought these mover errors stemmed from the IOMMU's earlier..perhaps I was wrong and it was simply configuration and riserfs corruption.  If that's the case then I think I'm fixed regarding lost files...but the IOMMU's seem troubling since I don't have an Intel CPU and they do seem to be impacting performance (all those ATA bus resets can't be good for performance).

     

    Lastly, here's a snip from the start of the log where the ata30 device starts up:

    Apr 30 22:13:20 Tower kernel: ata30: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

    Apr 30 22:13:20 Tower kernel: ata30.00: ATA-9: Samsung SSD 840 EVO 500GB, S1DHNSADA10381K, EXT0BB0Q, max UDMA/133

    Apr 30 22:13:20 Tower kernel: ata30.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)

    Apr 30 22:13:20 Tower kernel: ata30.00: configured for UDMA/133

     

     

    Is there anything else I should be looking at?  or are the remaining IOMMU messages and failed command messages just informational?

     

    Thanks!

     

  13. Hello:

    After upgrading to beta16, (possibly before, but I didn't notice these errors--I think they're new with beta16)...

    I'm getting a bunch of errors talking to my cache drive (it's an SSD).

     

    The errors in my log look like this:

    May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

    May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

    May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

    May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

    May  4 13:56:31 Tower kernel: ata16: EH in SWNCQ mode,QC:qc_active 0x3C00 sactive 0x3C00

    May  4 13:56:31 Tower kernel: ata16: SWNCQ:qc_active 0x0 defer_bits 0x0 last_issue_tag 0x9

    May  4 13:56:31 Tower kernel:  dhfis 0x0 dmafis 0x0 sdbfis 0x0

    May  4 13:56:31 Tower kernel: ata16: ATA_REG 0x40 ERR_REG 0x0

    May  4 13:56:31 Tower kernel: ata16: tag : dhfis dmafis sdbfis sactive

    May  4 13:56:31 Tower kernel: ata16.00: exception Emask 0x0 SAct 0x3c00 SErr 0x0 action 0x6

    May  4 13:56:31 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

    May  4 13:56:31 Tower kernel: ata16.00: cmd 61/00:50:68:79:05/20:00:00:00:00/40 tag 10 ncq 4194304 out

    May  4 13:56:31 Tower kernel:        res 40/00:70:58:d3:02/00:00:00:00:00/40 Emask 0x40 (internal error)

    May  4 13:56:31 Tower kernel: ata16.00: status: { DRDY }

    May  4 13:56:31 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

    May  4 13:56:31 Tower kernel: ata16.00: cmd 61/00:58:68:99:05/1c:00:00:00:00/40 tag 11 ncq 3670016 out

     

    At first I thought maybe firmware on the SSD, so I upgraded that, and while I was in there I replaced the SATA cable to the SSD.

     

    I also saw some strange storage behavior, with error messages on some folders when viewed via share0 indicating "wrong exec format"...I suspect those are a side effect of this problem.  Since this was my cache drive and it was formatted with btrfs, I moved everything off, reformatted with ifs, and put files back...I saw a TON of these errors when putting the files back...so it wasn't related to the file system.

     

    I have NCQ turned off in the unraid settings, and smart ctrl shows the cache drive is clean...

     

    Where should I look next?...or is this a bug in the sata_nv driver again?

     

     

  14. OpenVPN!

     

    OK, I can confirm that OpenVPN works out-of-the-box with this Arch os image.  (ok, not out-of-the-box,but without anything special other than pacman and configuration…)

     

    I don't think there's a need to add it to the unraid repository, since the one in the Arch repositories works just fine.

     

    all I had to do was:

     

    pacman -S openvpn

     

    and it installed the package and dropped the sample configuration files.

     

    I'm using it as a client to connect to "Private Internet Access" (that's the company that provides my anon-internet access service) and I just followed their guide for linux openvpn setup and it worked great!

     

    next step is to try to setup an inbound private vpn so I can get access to my systems from outside via vpn…it appears everything is already installed to make that happen as well, but it just needs configuration.  I'm happy to document that process once I get it going if anyone is interested.

     

    David

  15. Is anyone else having trouble with NFS mounts from the ArchVM to unraid?...I'm having a problem where one folder on a share (seems to be my sabtemp folder) becomes inaccessible...it shows ownership and permissions as "? ? ? ? ? ? ? ? ? ? ? ?".

     

    It seems to resolve itself after some time (hours?)...but in the mean time sabnzbd returns all kinds of errors and basically either looses the dl, or is uanble to run the sickbead post-process script...leaving me to clean up...

     

    It's happening on a regular basis now...daily...Not sure if this is the stale nfs file issue that's been seen in the past due to Fuse inode cleanup... I haven't tried remounting to see if it clears...

     

    I tried setting up the mount as SMB, but ran into permissions issues that just seemed silly to have to work through (why use smb to mount linux to linux?)....

     

    I'm just doing a basic mount with no options...so perhaps there's something I need to add to help prevent this...?

     

    Any suggestions?

  16. Regaring Open VPN...It was actually quite easy to get setup...I dont' remember all the steps, but I'll have a look through my logs and see if I still have the details. 

     

    Want to add OpenVPN as this was request previously, which packages do you need for this work? You tell me that, I can add them.

     

    I'm going to do a fresh install just to make sure I've got everything...There was more in the configuration than in the install.  I should have some notes early in the next couple days.

     

    David

  17. Hey all!

     

    Thanks to everyone for getting this going...I was just embarking on trying to set this up on my own when you guys (Tom included!)  jumped in and got the ball rolling.

     

    Regaring Open VPN...It was actually quite easy to get setup...I dont' remember all the steps, but I'll have a look through my logs and see if I still have the details.  I'm using it to make an outbound vpn connection (using privateinternetaccess.com).  I'm still testing, but it seems solid....it was actually quicker to get working than some of the other packages--probably because I was working it from scratch and knew where it was putting everything (vs the packages, where I had to dig to find out)

     

    I have one suggestion for the sabnzb package...can we change the default in the conf file to allow access from remote servers (ie, host = 0.0.0.0 vs host = localhost?)  Since this is destined for a headless server, it was a bit frustrating to install the package, see it running, but not know why the web page wouldn't open to configure it...

     

    anyway, thanks again!

     

     

     

    Hi there how difficult would open VPN server and SQL be to install?

     

     

    Thornwood

  18. @terrastrife:

    Yea, I've moved the drive...I can't believe it's a fried controller, since it works with one version (4.5.2), but not another (4.5.3)...I see that there's a new release...perhaps I'll try it again on that controller to see how it behaves...otherwise, the drives are running fine on the supermicro controller I put in.

     

    to everyone else who replied to my thread, thanks very much...still no actual solution, but the work-around is to not use that controller.

     

    I'll follow up after I try the latest release if anyone is curious...

     

     

  19. Kaygee:

     

    The drive always reports the same way every reboot...

     

    It doesn't get assigned a device identifyer, so I'm not sure I can use the hdparm command.  When I use hdparm under 4.5 it reports correctly, but under 4.5.3, a different drive is assigned it's drive letter, and it doesn't get one...

     

    The drive is connected to a Sil controller (two port, pci-e controller).  This controller also has another Hitachi drive connected to it, which is working correctly.

     

     

     

     

    d = 100 = 1100100

    i = 105 =  1101001

     

    Q = 81 = 1010001

    D = 68 = 1000100

     

    B = 66 = 1000010

    S = 83 = 1010011

     

    Not an obvious memory coruption issue, I'm assuming it is repeatable from your description? Does it always identify the drive as Hdtachi HQT721010^BLA360?

     

    From the console can you try a hdparm -i /dev/sd? where ? = drive letter for drive 7 (unRAID devices page in 4.5 will show this info).  It may well be slackware rather than unRAID issue.

     

    Also what controller is the drive attached too?

     

  20. The only thing interesting about this, is it's the first drive that's detected that seems to have the name corruption problem...

     

    it appears to be on a controller using the SIL24 driver...so this isn't a SAS issue (I think?)....

     

    From what I can tell by the syslog, there's another drive on this controller as well, and it's also a hitachi and it reports just fine...

     

    I'm at a loss....could this be a cabling issue?...strange that it doesn't manifest itself with 4.5, only 4.5.3 (I haven't tried anything in between [should i?])

     

    I've no idea what to try next...I don't want to go ripping things apart without having any idea what the best systematic approach would be from here...

     

    Should I:

    Try swapping ports with another drive?

    Try resetting my configuration and rebuilding parity using the "bad" drive name that's registering under 4.5.3 (and risk the name fixing itself later?)

    Wait for 4.5.4?

     

     

    I've got the supermicro board I'd like to install, and 2 brand new drives I'm ready to deploy...so waiting is probably my last choice...but all the others seem to carry some risk...

     

    Any suggestions/ideas/comments?

     

    David ???

  21. Sorry for the crosspost (I posted this in the announcements thread, but no response there yet...)

     

     

    I Have a disk showing as missing after the 4.5.3 upgrade...(from 4.5)

     

    Smells alot like the old disk missing issue, but doesn't seem to show the same way...

     

    My syslog shows it's detecting the drive with a slightly different name, and the menu screen shows the disk as "Missing".

    Nowhere does the different name show up in the user interface (not under devices either)...the drive shows with a red bullet.

     

    The drive is a Hitachi, and I have 2 other Hitachi's in my setup and they're detected just fine...It almost looks like the newly detected name is corrupted somehow.

     

    Downgrading to 4.5 shows everything alright.  I'm trying to upgrade to 4.5.3 in order to support the new supermicro board I'm about to install.

     

     

    Here's what my main screen shows:

     

    Version 4.5:

     

          Model / Serial No.    Temperature    Size    Free    Reads    Writes    Errors

    parity    ata-Hitachi_HDS721010KLA330_GTJ100PAG0YG3C    42°C    976,762,552    -    105    122    0

    disk1    ata-SAMSUNG_HD103UJ_S13PJDWS323320    34°C    976,762,552    36,868,312    54    6    0

    disk2    ata-SAMSUNG_HD103UJ_S13PJ1BQ702829    33°C    976,762,552    15,414,156    55    6    0

    disk3    ata-SAMSUNG_HD103UJ_S13PJDWS323304    30°C    976,762,552    38,589,180    52    6    0

    disk4    ata-SAMSUNG_HD103UJ_S13PJ1BQ735050    34°C    976,762,552    140,416,300    52    6    0

    disk5    ata-Hitachi_HDS721010KLA330_GTG000PAG17Z4C    39°C    976,762,552    85,548,068    53    6    0

    disk6    ata-Hitachi_HDS721010KLA330_GTJ100PAG0HBSC    40°C    976,762,552    636,785,268    118    78    0

    disk7    ata-Hitachi_HDT721010SLA360_STF604MR2MDJ6P    35°C    976,762,552    422,973,780    48    5    0

    disk8    ata-Hitachi_HDT721010SLA360_STF604MR3GM8SP    34°C    976,762,552    937,514,212    52    7    0

    disk9    Not installed

    disk10    Not installed

     

     

     

     

    -----------------------------------------

     

    Post 4.5.3 upgrade

     

        Model / Serial No.    Temperature    Size    Free    Reads    Writes   

    Errors

    parity    Hitachi_HDS72101_GTJ100PAG0YG3C    42°C    976,762,552    -    -    -   

    -

    disk1    SAMSUNG_HD103UJ_S13PJDWS323320    34°C    976,762,552    -    -    -    -

    disk2    SAMSUNG_HD103UJ_S13PJ1BQ702829    33°C    976,762,552    -    -    -    -

    disk3    SAMSUNG_HD103UJ_S13PJDWS323304    30°C    976,762,552    -    -    -    -

    disk4    SAMSUNG_HD103UJ_S13PJ1BQ735050    34°C    976,762,552    -    -    -    -

    disk5    Hitachi_HDS72101_GTG000PAG17Z4C    38°C    976,762,552    -    -    -   

    -

    disk6    Hitachi_HDS72101_GTJ100PAG0HBSC    40°C    976,762,552    -    -    -   

    -

    disk7    Missing

    Hitachi_HDT721010SLA360_STF604MR2MDJ6P    -    -

    976,762,552    -    -    -    -

    disk8    Hitachi_HDT72101_STF604MR3GM8SP    34°C    976,762,552    -    -    -   

    -

    disk9    Not installed

    disk10    Not installed

     

     

    I'm also attaching my syslogs...

    ...I think the entry of note in the 4.5.3 log is as follows:

    Apr 26 13:21:18 Tower kernel: ata1.00: model number mismatch 'Hdtachi HQT721010^BLA360          0' != 'Hitachi HDT721010SLA360'

     

    Not sure why this drive is being detected as a Hdtachi instead of Hitachi under 4.5.3

     

     

     

     

     

     

    Any ideas?

     

    David

    syslog.45.zip

    syslog.453.zip

×
×
  • Create New...