Jump to content

oh-tomo

Members
  • Posts

    148
  • Joined

  • Last visited

Posts posted by oh-tomo

  1. Found the box.  The controller is a Orico PVU3-4P USB 3.0 PCI-E Express Card which has been part of the VM since April 2019.  I paid $20 for it.  Is 30 months just the lifespan of a USB Card and I should get another?   I shut down and checked that the molex power into the card was seated and the rest of the power cable was secure at the other end.  I've started a file move operation within Windows from the unRAID storage to an external USB dock connected to the Orico.   I've done this many times over the past 30 months but only today has the move been interrupted by this Windows code 43 error.   At first I thought it was due to a Windows update that occured yesterday so I tried rolling back to the system restore point on October 31.   Or maybe something got jostled in the PC case when I swapped out one of the data drives yesterday.  Maybe pulling and reseating the molex power back into the Orico will help.

  2. So I upgraded the CPU to a Intel Core i7-9700K and it has helped the Win10 VM significantly.   It took quite a few reboots and unplugging of USB and HDMI to get the VM to get past the Tianocore screen but it finally got into Windows.

     

    But since upgrading the CPU there have been five "Cache disk is hot" notifications (53, 51, 53, 54, 50 C  -- one yesterday and four so far today.   So I raised the warning temperature for that drive to 70 C.

     

    How should I set the Logical CPUs for a MacOS VM?  I've allocated all 8 cores to Windows 10.  When I try to start MacOS Catalina VM while Win10 VM is running, audio playback becomes garbled in Win10.  Is there a specific balance of Logical CPUs between MacOS VM and Wind10 VM that won't result in warbly Windows audio?

      

    image.thumb.png.2de139b1afb5fc68bc4c4c39c6b6c549.png

     

    152390471_ScreenShot2021-07-27at4_25_14PM.thumb.jpg.468fc6f52c655cec4da234facb260d9a.jpg

  3. DNS change to 8.8.8.8, 8.8.4.4 and 1.1.1.1 didn't do it for me.  Trying an update to 6.9.2 from 6.8.3....

     

    update:  after updating unRAID to 6.9.2, checked Docker and still found "not available", so then checked "/usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php" and saw it already had the "@i" edit mentioned in the other thread, then checked Docker again and "not available" was gone and replaced with a mixture of "apply update" and "up-to-date"

     

    Hope everything else still works in 6.9.2...

  4. I noticed that the assigned device VIA Technologies VL805 USB 3.0 Host Controller in my Win10 VM wasn't showing up in Win10 (in device manager it had a stroke through it), so I rebooted the VM.   Rebooting got stuck on the TianoCore screen.  I tried powering off everything connected to the USB controller and rebooting again with a force stop first.   I'll next try pulling all the USB cables.   Don't recall having this issue before in the two years I've been running this VM.

  5. Well this goes back to my original question about my old disk1 passing the SeaTools tests.  

     

    From Seagate's "Warranty Claim Validation Process":
     

    Quote

    Please note that evidence of the following will result in rejected warranty claims:

    ...

    No Trouble Found (NTF). Before returning your product, you may use the Seagate SeaTools
    diagnostic tool to determine the condition of your product and whether it is eligible to be
    returned under warranty.

     

    Will Seagate hold me to SeaTool's inability to see the issues that my unRAID is experiencing?

  6. 21 hours ago, trurl said:

    Yes since they were increasing. Not sure where people like to draw the line but 3 digits is too many for me.

     

    What action should I have taken at 3 digits?

     

    21 hours ago, trurl said:

    Something like that.

     

    SMART extended self-test of disk1 has reached 40% 

     

    I'm doing a Windows long format of old disk 1 before submitting it for Seagate RMA.

     

  7. 26 minutes ago, trurl said:

    But are you getting those SMART warnings in email?

     

    You can control how different Notifications are given to you, SMART is important enough that you need to know about them even if you don't happen to open up the webUI. I have nothing notifying me in the Browser, and I get emails for Array Status, Notices, Warnings, and Alerts.

     

    When you do get a SMART warning, you need to make the warning go away. How you do this depends. If it is serious enough you replace the disk. If it is not serious enough, you Acknowledge it by clicking on it in the Dashboard. Once acknowledged, it will not warn you again unless it changes.

     

    No point in letting a Warning just sit there and not do anything at all about it. If you Acknowledge it, and it comes back, then you know it has gotten worse.

     

    CRC Errors are connection issues and not really a disk problem, but the disk counts these and keeps the count in its firmware. Basically it means the data it received was inconsistent. Not all connection problems will show up there because if it isn't getting any data it can't check the consistency.

     

    A small number of Reallocated is usually fine since disks are designed to have some spare sectors for that purpose.

     

    Those warnings are OK to acknowledge but of course we still want to see the results of the extended test on disk1.

     

     

    I got these SMART health emails on Nov 15:

     

    Quote

    Event: Unraid Disk 1 SMART health [5]
    Subject: Warning [SERVER2018] - reallocated sector ct is 40
    Description: ST12000VN0007-2GS116_*GDM (sdf)
    Importance: warning

     

    Quote

    Event: Unraid Disk 1 SMART health [5]
    Subject: Warning [SERVER2018] - reallocated sector ct is 96
    Description: ST12000VN0007-2GS116_*GDM (sdf)
    Importance: warning:

     

    Quote

    Event: Unraid Disk 1 SMART health [5]
    Subject: Warning [SERVER2018] - reallocated sector ct is 176
    Description: ST12000VN0007-2GS116_*GDM (sdf)
    Importance: warning

     

    And on Dec 15 this email:

     

    Quote

    Event: Unraid Disk 1 SMART health [5]
    Subject: Warning [SERVER2018] - reallocated sector ct is 184
    Description: ST12000VN0007-2GS116_*GDM (sdf)
    Importance: warning

     

    Should I have acted on them?

     

    There weren't any read error alert emails until March 23.

     

    I'll start an extended test on new disk1 tomorrow.   Does it take as long as a rebuild?

     

  8.  

    5 hours ago, trurl said:

    Looks good.

     

    I notice that disk has a few reallocated but should be OK as long as they don't start increasing.

     

    Does it (or any other) show SMART warnings on the Dashboard page? Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

     

    You might want to run an Extended SMART test on your disks occasionally.

     

     

    Dashboard shows SMART errors on:

     

    disk1 - Reallocated_Sector_Ct - 8

    disk3 - UDMA_CRC_Error_Count - 64

    disk5 - UDMA_CRC_Error_Count - 25

     

    I am set up with email notifications.  Past SMART health warnings have been about the *GDM (removed as disk1, formerly parity1 -- on 11/15 and 12/15) and *LX1 (new disk1, former parity2 -- all on 03/23) 12TB drives.

     

  9. 7 hours ago, trurl said:

    12TB disk (sdg) serial ending 0TGDM I assume was disk1.

     

    SMART for that disk looks OK and since emulated disk1 is mounted should be good to rebuild to that disk.

     

    Not sure what this end of syslog stuff is, seems related to nvidia and VMs:

    
    
    
    
    Mar 25 21:21:59 Server2018 kernel: pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
    Mar 25 21:21:59 Server2018 kernel: pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Mar 25 21:21:59 Server2018 kernel: pcieport 0000:00:1c.0:   device [8086:a33c] error status/mask=00000001/00002000
    Mar 25 21:21:59 Server2018 kernel: pcieport 0000:00:1c.0:    [ 0] RxErr                  (First)
    Mar 25 21:21:59 Server2018 avahi-daemon[4413]: Joining mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe00:2981.
    Mar 25 21:21:59 Server2018 avahi-daemon[4413]: New relevant interface vnet0.IPv6 for mDNS.
    Mar 25 21:21:59 Server2018 avahi-daemon[4413]: Registering new address record for fe80::fc54:ff:fe00:2981 on vnet0.*.
    Mar 25 21:23:08 Server2018 kernel: kvm [9357]: vcpu2, guest rIP: 0xfffff802693f6192 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop
    ### [PREVIOUS LINE REPEATED 9 TIMES] ###
    Mar 25 21:23:13 Server2018 kernel: kvm_set_msr_common: 12094 callbacks suppressed
    

     

    Is it causing any apparent problems?

    I haven’t noticed any VM issues with Nvidia.   Rebuild of disk1 has started.   

  10. 10 minutes ago, trurl said:

    Now that you have removed the disk it is a little more complicated how you should proceed. You should have asked for advice before doing anything.

     

    Ideally we would have gotten the diagnostics before you did anything and before rebooting your server. Then we would be better able to see why a write to the disk failed. As mentioned, bad connections are much more common than bad disks.

     

    Is your server currently running without the disk?

     

    Yes the server is running without the disk.   

  11. Quote

     

    Event: Unraid array errors

    Subject: Warning [SERVER2018] - array has errors

    Description: Array has 1 disk with read errors

    Importance: warning

     

     

    Disk 1 - ST12000VN0007-2GS116_********* (sdf) (errors 16)

     

     

    unRAID disabled the disk (red x)

     

    I removed the disk.  Contents are currently emulated.   I placed the disk in a HDD USB dock connected to a Windows 10 VM and ran some SeaTools tests:

     

    Quote

     

    --------------- SeaTools for Windows v1.4.0.7 ---------------

    2021-03-23 11:45:23 PM

    Model Number: ST12000V

    Serial Number: ********

    Firmware Revision: 0302

    Short DST - Started 2021-03-23 11:45:23 PM

    Short DST - Pass 2021-03-23 11:46:25 PM

    Short Generic - Started 2021-03-23 11:46:40 PM

    Short Generic - Pass 2021-03-23 11:47:34 PM

    Long Generic - Started 2021-03-23 11:47:56 PM

    Long Generic - Pass 2021-03-24 11:41:03 PM

    Identify - Started 2021-03-25 11:29:05 AM

     

     

    24 hours running a Long Generic SeaTools test and the HDD passed.   This is troubling.  The HDD is under warranty until August.  But unRAID thinks the HDD is error prone while SeaTools tests aren't finding errors.  I can't put the disk back in the array because unRAID has disabled it and I can't send the disk in for RMA to Seagate because their software doesn't show any problems.   

     

    How should I proceed?   

     

×
×
  • Create New...