Jump to content

JorgeB

Moderators
  • Posts

    67,459
  • Joined

  • Last visited

  • Days Won

    706

Everything posted by JorgeB

  1. Yes, this would be good, at first I was going to suggest never touching /mnt/cache but limiting it to /mnt/disks would be much better especially with the upcoming multiple pools that will be mounted at /mnt/whatever_you_name_the_pool
  2. Appdata needs to be set to cache=yes and then run the mover, there are also some files from the isos share still on cache, note that open/duplicate files won't me moved.
  3. It has come up before, IIRC those devices are in fact 4 x 60GB SSDs in raid0 done by the Windows driver, and no such driver exists for Linux.
  4. Not necessarily, it means there's that corruption, and memory (when not ECC) is usually the most likely cause.
  5. btrfs checksum errors are a sign of data corruption, good idea to run memtest once you get the server back up.
  6. That would be my guess also, what's the use case of having the server started with the array stopped?
  7. Something strange happened here, you had a two device cache pool and removed one of the devices, then after array start a pool balance began to convert it to a single device, UD plugin detected the now unassigned former pool member but from what I can see it did nothing to it: May 29 09:50:44 Tower unassigned.devices: Mounting 'Auto Mount' Devices... May 29 09:50:44 Tower unassigned.devices: Disk with serial 'OCZ-SOLID3_OCZ-5QFK65330EYP720P', mountpoint 'OCZ-SOLID3_OCZ-5QFK65330EYP720P' is not set to auto mount and will not be mounted. All normal up to here, pool finished converting to single device without any error and the now unassigned device was removed from the pool: May 29 10:03:09 Tower kernel: BTRFS info (device sdb1): relocating block group 22020096 flags metadata May 29 10:03:12 Tower kernel: BTRFS info (device sdb1): found 14766 extents May 29 10:03:12 Tower kernel: BTRFS info (device sdb1): relocating block group 1048576 flags system May 29 10:03:12 Tower kernel: BTRFS info (device sdb1): found 1 extents May 29 10:03:12 Tower kernel: BTRFS info (device sdb1): device deleted: /dev/sdj1 May 29 10:03:12 Tower rc.diskinfo[7724]: SIGHUP received, forcing refresh of disks info. Strange part is after that, going to ping @dlandonto see if he can see why this happened, I don't find any reference of UD mounting the former cache member but then it tries to unmount it and believes it's still part of the pool so it forced unmounted /mnt/cache: May 29 11:53:10 Tower unassigned.devices: Unmounting disk 'OCZ-SOLID3_OCZ-5QFK65330EYP720P'... May 29 11:53:10 Tower unassigned.devices: Unmounting '/dev/sdj1'... May 29 11:53:10 Tower unassigned.devices: Unmount cmd: /sbin/umount '/dev/sdj1' 2>&1 May 29 11:53:10 Tower unassigned.devices: Unmount of '/dev/sdj1' failed. Error message: umount: /mnt/cache: target is busy. May 29 11:53:11 Tower unassigned.devices: Since there aren't any open files, will force unmount. May 29 11:53:11 Tower unassigned.devices: Unmounting '/dev/sdj1'... May 29 11:53:11 Tower unassigned.devices: Unmount cmd: /sbin/umount -fl '/dev/sdj1' 2>&1 May 29 11:53:11 Tower unassigned.devices: Successfully unmounted '/dev/sdj1' @chris BCache should be fine after rebooting, if not please post new diags.
  8. Filesystem should be fine now, those files are the result of fixing the corruption, there might be something important there, there might not, not always easy so see what they are.
  9. You just need to edit the XML and change the ID of the audio controller: change from: <address domain='0x0000' bus='0x0d' slot='0x00' function='0x3'/> to <address domain='0x0000' bus='0x0e' slot='0x00' function='0x3'/>
  10. Without the HBA this audio device is being passed-trough: 0d:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457] Subsystem: ASUSTeK Computer Inc. Device [1043:8733] Kernel driver in use: vfio-pci With the HBA the IDs change and instead you're passing-though this device: 0d:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f] Subsystem: ASUSTeK Computer Inc. Device [1043:8747] Kernel driver in use: vfio-pci
  11. Check the VM hardware pass-trough you're likely passing-trough a USB controller and because of the new HBA it's changing its ID, making you passthrough the wrong USB controller, where the flash drive is.
  12. Not really, though not always easy to tell by the xfs_repair output, you need to run it again without -n or nothing will be done, after it's done check lost+found folder for any lost files.
  13. Flash drive problems: May 30 10:50:01 Media kernel: FAT-fs (sda1): Directory bread(block 29352) failed May 30 10:50:01 Media kernel: FAT-fs (sda1): Directory bread(block 29353) failed May 30 10:50:01 Media kernel: FAT-fs (sda1): Directory bread(block 29354) failed May 30 10:50:01 Media kernel: FAT-fs (sda1): Directory bread(block 29355) failed Run chkdsk to see if it helps.
  14. You need to run a filesystem check on disk1.
  15. HBA is detected but there's no driver loaded which I kind of expected since that's a very new model, I know the 9400-8i is supported, I don't remember anyone else using a 9500-8i, you could try v6.9-beta1 which includes a much newer kernel.
  16. Because Unraid used parity plus the other disks to reconstruct those sectors, but those errors would be a problem if it was a disk rebuild instead.
  17. Yep, that confirms it really is a disk issue, and if you notice this attribute is climbing, though slowly: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 13
  18. You're getting an errors on multiple disks that I've never seen before: May 30 05:37:21 Unraid-G8 kernel: sd 1:0:5:0: [sdg] Unaligned partial completion (resid=148208, sector_sz=512) Not sure what it means, if the SAS disks share a miniSAS cable or power splitter/source try changing that first, but could be a more specif issue, or even a compatibly problem.
  19. There are some SMART issues: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 11 This should be zero on a healthy WD drive, though just because it isn't it's not definite proof the disk is failing, but it's never a good sign, especially if it keeps climbing. Error 1 [0] occurred at disk power-on lifetime: 58318 hours (2429 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 12 78 3b e0 40 00 Error: UNC at LBA = 0x12783be0 = 309869536 This error (UNC @ LBA) usually also means a disk problem, a bad/failing sector, and looking at the power-on-hours you can see the error is recent, again it's not 100% conclusive since I've seen similar errors logged like that and it wasn't a disk problem, but it usually is, and if the SMART test fails it will confirm.
  20. Since it's not a space issue the only other thing I can remember causing that is Windows going to sleep, check power options and make sure all power saving options are disable.
×
×
  • Create New...