oh-tomo

November 8, 2021

Found the box. The controller is a Orico PVU3-4P USB 3.0 PCI-E Express Card which has been part of the VM since April 2019. I paid $20 for it. Is 30 months just the lifespan of a USB Card and I should get another? I shut down and checked that the molex power into the card was seated and the rest of the power cable was secure at the other end. I've started a file move operation within Windows from the unRAID storage to an external USB dock connected to the Orico. I've done this many times over the past 30 months but only today has the move been interrupted by this Windows code 43 error. At first I thought it was due to a Windows update that occured yesterday so I tried rolling back to the system restore point on October 31. Or maybe something got jostled in the PC case when I swapped out one of the data drives yesterday. Maybe pulling and reseating the molex power back into the Orico will help.

November 8, 2021

My USB controller stops working after about 40 minutes after booting Win10 VM. Is this an unRAID VM thing or a hardware failure on the USB controller? How do I diagnose the issue?

September 12, 2021

On 10/20/2019 at 8:52 PM, david279 said:

Stub the card from the system

What does that mean? I can stub a toe. Or a cigarette. But stub a card?

July 27, 2021

So I upgraded the CPU to a Intel Core i7-9700K and it has helped the Win10 VM significantly. It took quite a few reboots and unplugging of USB and HDMI to get the VM to get past the Tianocore screen but it finally got into Windows.

But since upgrading the CPU there have been five "Cache disk is hot" notifications (53, 51, 53, 54, 50 C -- one yesterday and four so far today. So I raised the warning temperature for that drive to 70 C.

How should I set the Logical CPUs for a MacOS VM? I've allocated all 8 cores to Windows 10. When I try to start MacOS Catalina VM while Win10 VM is running, audio playback becomes garbled in Win10. Is there a specific balance of Logical CPUs between MacOS VM and Wind10 VM that won't result in warbly Windows audio?

July 25, 2021

Ran a PC test. Result: "PC Performing below expectations (23rd percentile)":

https://www.userbenchmark.com/UserRun/44921725

Would upgrading the unRAID's i3-8100 CPU help the VM significantly? Chrome tabs are no longer crashing but apps do become temporarily non-responsive.

July 2, 2021

DNS change to 8.8.8.8, 8.8.4.4 and 1.1.1.1 didn't do it for me. Trying an update to 6.9.2 from 6.8.3....

update: after updating unRAID to 6.9.2, checked Docker and still found "not available", so then checked "/usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php" and saw it already had the "@i" edit mentioned in the other thread, then checked Docker again and "not available" was gone and replaced with a mixture of "apply update" and "up-to-date"

Hope everything else still works in 6.9.2...

June 2, 2021

Rebooted unRAID and VM was able to boot again. Weird.

Looks like VIA USB controller is back too.

June 2, 2021

I noticed that the assigned device VIA Technologies VL805 USB 3.0 Host Controller in my Win10 VM wasn't showing up in Win10 (in device manager it had a stroke through it), so I rebooted the VM. Rebooting got stuck on the TianoCore screen. I tried powering off everything connected to the USB controller and rebooting again with a force stop first. I'll next try pulling all the USB cables. Don't recall having this issue before in the two years I've been running this VM.

May 18, 2021

How do I edit my.cnf file?

March 31, 2021

Well this goes back to my original question about my old disk1 passing the SeaTools tests.

From Seagate's "Warranty Claim Validation Process":

Quote

Please note that evidence of the following will result in rejected warranty claims:

...

No Trouble Found (NTF). Before returning your product, you may use the Seagate SeaTools
diagnostic tool to determine the condition of your product and whether it is eligible to be
returned under warranty.

Will Seagate hold me to SeaTool's inability to see the issues that my unRAID is experiencing?

March 31, 2021

Quote

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Extended offline Completed without error 00% 16136 -

# 2 Short offline Completed without error 00% 16119 -

Test done! RMA label printed.

March 30, 2021

21 hours ago, trurl said:

Yes since they were increasing. Not sure where people like to draw the line but 3 digits is too many for me.

What action should I have taken at 3 digits?

21 hours ago, trurl said:

Something like that.

SMART extended self-test of disk1 has reached 40%

I'm doing a Windows long format of old disk 1 before submitting it for Seagate RMA.

March 29, 2021

26 minutes ago, trurl said:

But are you getting those SMART warnings in email?

You can control how different Notifications are given to you, SMART is important enough that you need to know about them even if you don't happen to open up the webUI. I have nothing notifying me in the Browser, and I get emails for Array Status, Notices, Warnings, and Alerts.

When you do get a SMART warning, you need to make the warning go away. How you do this depends. If it is serious enough you replace the disk. If it is not serious enough, you Acknowledge it by clicking on it in the Dashboard. Once acknowledged, it will not warn you again unless it changes.

No point in letting a Warning just sit there and not do anything at all about it. If you Acknowledge it, and it comes back, then you know it has gotten worse.

CRC Errors are connection issues and not really a disk problem, but the disk counts these and keeps the count in its firmware. Basically it means the data it received was inconsistent. Not all connection problems will show up there because if it isn't getting any data it can't check the consistency.

A small number of Reallocated is usually fine since disks are designed to have some spare sectors for that purpose.

Those warnings are OK to acknowledge but of course we still want to see the results of the extended test on disk1.

I got these SMART health emails on Nov 15:

Quote

Event: Unraid Disk 1 SMART health [5]
Subject: Warning [SERVER2018] - reallocated sector ct is 40
Description: ST12000VN0007-2GS116_*GDM (sdf)
Importance: warning

Quote

Event: Unraid Disk 1 SMART health [5]
Subject: Warning [SERVER2018] - reallocated sector ct is 96
Description: ST12000VN0007-2GS116_*GDM (sdf)
Importance: warning:

Quote

Event: Unraid Disk 1 SMART health [5]
Subject: Warning [SERVER2018] - reallocated sector ct is 176
Description: ST12000VN0007-2GS116_*GDM (sdf)
Importance: warning

And on Dec 15 this email:

Quote

Event: Unraid Disk 1 SMART health [5]
Subject: Warning [SERVER2018] - reallocated sector ct is 184
Description: ST12000VN0007-2GS116_*GDM (sdf)
Importance: warning

Should I have acted on them?

There weren't any read error alert emails until March 23.

I'll start an extended test on new disk1 tomorrow. Does it take as long as a rebuild?

March 29, 2021

5 hours ago, trurl said:

Looks good.

I notice that disk has a few reallocated but should be OK as long as they don't start increasing.

Does it (or any other) show SMART warnings on the Dashboard page? Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

You might want to run an Extended SMART test on your disks occasionally.

Dashboard shows SMART errors on:

disk1 - Reallocated_Sector_Ct - 8

disk3 - UDMA_CRC_Error_Count - 64

disk5 - UDMA_CRC_Error_Count - 25

I am set up with email notifications. Past SMART health warnings have been about the *GDM (removed as disk1, formerly parity1 -- on 11/15 and 12/15) and *LX1 (new disk1, former parity2 -- all on 03/23) 12TB drives.

March 29, 2021

Rebuild of new disk1 finished.

server2018-diagnostics-20210329-0818.zip

March 28, 2021

Rebuild of new disk1 (old parity2) started.

Should I wait until rebuild is complete before wiping and RMA-ing old disk1?

server2018-diagnostics-20210328-0925.zip

March 28, 2021

13 minutes ago, trurl said:

also pending sectors plus reallocated increased.

Why are you rebuilding parity2?

Replace disk1.

parity2 is rebuilding because that HDD was upgraded to a larger HDD. Rebuild is done.

What do I do with old disk1 after I replace it? Wipe and send to Seagate? "WARRANTY Valid till 09/Aug/2021"

server2018-diagnostics-20210328-0832.zip

March 28, 2021

Smart health warning for Disk 1.

Unraid Disk 1 SMART health [198]: 27-03-2021 23:31

Warning [SERVER2018] - offline uncorrectable is 24
ST12000VN0007-2GS116_*0TGDM (sdi)

server2018-diagnostics-20210328-0016.zip

March 27, 2021

16 hours ago, oh-tomo said:

I haven’t noticed any VM issues with Nvidia. Rebuild of disk1 has started.

Rebuild is done!

server2018-diagnostics-20210326-2333.zip

March 26, 2021

7 hours ago, trurl said:

12TB disk (sdg) serial ending 0TGDM I assume was disk1.

SMART for that disk looks OK and since emulated disk1 is mounted should be good to rebuild to that disk.

Not sure what this end of syslog stuff is, seems related to nvidia and VMs:





Mar 25 21:21:59 Server2018 kernel: pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
Mar 25 21:21:59 Server2018 kernel: pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 25 21:21:59 Server2018 kernel: pcieport 0000:00:1c.0:   device [8086:a33c] error status/mask=00000001/00002000
Mar 25 21:21:59 Server2018 kernel: pcieport 0000:00:1c.0:    [ 0] RxErr                  (First)
Mar 25 21:21:59 Server2018 avahi-daemon[4413]: Joining mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe00:2981.
Mar 25 21:21:59 Server2018 avahi-daemon[4413]: New relevant interface vnet0.IPv6 for mDNS.
Mar 25 21:21:59 Server2018 avahi-daemon[4413]: Registering new address record for fe80::fc54:ff:fe00:2981 on vnet0.*.
Mar 25 21:23:08 Server2018 kernel: kvm [9357]: vcpu2, guest rIP: 0xfffff802693f6192 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop
### [PREVIOUS LINE REPEATED 9 TIMES] ###
Mar 25 21:23:13 Server2018 kernel: kvm_set_msr_common: 12094 callbacks suppressed

Is it causing any apparent problems?

I haven’t noticed any VM issues with Nvidia. Rebuild of disk1 has started.

March 26, 2021

6 hours ago, trurl said:

Stop the array and set disk1 to Not Assigned (or however it's worded) then start the array again.

Then shutdown, install the disk into your server again, boot up, but don't reassign the disk. Just start up normally and post new diagnostics.

Here are the new diagnostics.

server2018-diagnostics-20210325-2125.zip

March 25, 2021

2 minutes ago, trurl said:

Emulated disk1 is mounted so that's good.

Is the disk currently plugged in to your server?

It's plugged into a USB HDD dock and the USB is connected to a Win10 VM.

March 25, 2021

9 minutes ago, trurl said:

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

Attached!

server2018-diagnostics-20210325-1309.zip

March 25, 2021

10 minutes ago, trurl said:

Now that you have removed the disk it is a little more complicated how you should proceed. You should have asked for advice before doing anything.

Ideally we would have gotten the diagnostics before you did anything and before rebooting your server. Then we would be better able to see why a write to the disk failed. As mentioned, bad connections are much more common than bad disks.

Is your server currently running without the disk?

Yes the server is running without the disk.

March 25, 2021

Quote

Event: Unraid array errors

Subject: Warning [SERVER2018] - array has errors

Description: Array has 1 disk with read errors

Importance: warning

Disk 1 - ST12000VN0007-2GS116_********* (sdf) (errors 16)

unRAID disabled the disk (red x)

I removed the disk. Contents are currently emulated. I placed the disk in a HDD USB dock connected to a Windows 10 VM and ran some SeaTools tests:

Quote

--------------- SeaTools for Windows v1.4.0.7 ---------------

2021-03-23 11:45:23 PM

Model Number: ST12000V

Serial Number: ********

Firmware Revision: 0302

Short DST - Started 2021-03-23 11:45:23 PM

Short DST - Pass 2021-03-23 11:46:25 PM

Short Generic - Started 2021-03-23 11:46:40 PM

Short Generic - Pass 2021-03-23 11:47:34 PM

Long Generic - Started 2021-03-23 11:47:56 PM

Long Generic - Pass 2021-03-24 11:41:03 PM

Identify - Started 2021-03-25 11:29:05 AM

24 hours running a Long Generic SeaTools test and the HDD passed. This is troubling. The HDD is under warranty until August. But unRAID thinks the HDD is error prone while SeaTools tests aren't finding errors. I can't put the disk back in the array because unRAID has disabled it and I can't send the disk in for RMA to Seagate because their software doesn't show any problems.

How should I proceed?

oh-tomo

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by oh-tomo

Windows has stopped USB controller

Windows has stopped USB controller

Struggling with GPU Passthrough - stuck at Tianocore

Win10 VM performance upgrade

Win10 VM performance upgrade

Docker Version "not available"

Restarted Win10 VM stuck after USB controller issue

Restarted Win10 VM stuck after USB controller issue

[Support] Linuxserver.io - MariaDB

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests

disabled disk passes SeaTools tests