Johann

March 2

Here's the problems right now. Disk 2 is disabled. Disk 3 is rebuilt, but I'm uncertain whether or not the data is correct since there weren't writes to the disk for a while but other disks were reading. Parity 2 has dropped itself from the array and currently its "unassigned.

March 2

I have a precleared drive sitting in unassigned drives and I'm not sure if this is related, but on the precleared drives, running fdisk -l, I see this error "Partition 1 does not start on physical sector boundary". I saw this before on the drives I precleared.

March 2

I found this similar situation but didn't really identify the fix here:

Looks like the rebuild thinks is continuing and reading from drives but its not writing to the rebuilding drive. I'll let it finish its course. I think the next step before doing anything would be to wait for its to finish "rebuilding" and running xfs_repair -v /dev/mdX on each disk in maintenance mode from https://docs.unraid.net/unraid-os/manual/storage-management/#xfs-and-reiserfs to verify everything all filesystems are working. Once confirmed all the filesystems are good to go, i think I would keep disk 2 disabled since the data should be intact on that drive but I'm not sure why it had read errors. But next step to rebuilding disk 3 on top of itself? The disk it is replacing is still in unassigned drives with data intact and untouched.

What would be the best course of action here, I'm gonna wait for a response before I break everything...

March 2

Under attributes it says Smartctl open device/dev/sdi failed for parity 2

All user shares on the Array do say "SOME OR ALL FILES UNPROTECTED"

March 2

Biggest question is since one drive is rebuilding, one is disabled and one parity drive is having a few million read errors, is my data fine at the moment?

March 2

Wow i'm scared at the moment. I'm trying to replace my remaining 12TB with 18TB. I unassigned a 12TB and assigned a precleared 18TB, all fine no issues, rebuild starts. 4TB in I get a notification that disk 2 has read errors and is disabled, 2048 errors. Ok i'll let it rebuild and replace that drive maybe its dying after. I wake up today and parity 2 has 563,197,701 errors and increasing by the second?! What is going on I attached my diagnostics. I've never had issues with unraid till the last few weeks in the years Ive been using it! Current rebuild has 5 hrs left. Looks like syslog is full cause of "Mar 2 09:37:00 Toblerone kernel: md: disk29 read error, sector=30864609368" I just increased log size to get more info. The temp on parity 2 has a star idk why but its still marked as green, "normal operation"?

toblerone-diagnostics-20240302-0932.zip

February 27

Hmm I will have to try that. I was looking at the diagnostics ive sent and it seems like I have always had this issue, but never really noticed it till trurl pointed it out. For some reason in the past, ive never seen it in the logs until recently.

February 25

I have tried switching PCIe slots and tried manually setting link speed to gen2. Still no go. However I did notice in the BIOS, the drives aren't showing up like they used to within the BIOS. I remember them showing up on the main bios page before.

February 25

Here are updated diagnostics:

toblerone-diagnostics-20240225-1809.zip

February 25

I thought it might be the backplane of my 4U case wasn't getting enough power, so I took out some drives I was preclearing back to what it was before. The "Power-on or device reset occurred" error still persist. I know I recently switched HBAs from the 2008 to 9500, so I switched it back and switched back the cables and its still persisting. I'm really lost and don't know whats wrong.

February 25

Look like the drops do affect plex, I changed it to eth0 with a custom ip the other day and every 10min it caused the stream to cut and rebuffer even though the playback bar indicates it had already buffered like the next 10 minutes. I switched it back to host, and it's no longer an issue.

February 24

I don't think its overheating it gets pretty good airflow with the fans in the 4u case. I'll definitely have to try reseating it, I have to finish rebuilding an existing drive first tho only a few more hrs. Just looking to learn, where can I see that its having problems communicating with some disks?

February 24

I was running docker safe new perms and I discovered disk 11 was disabled with read errors (sdi). I attached my diagnostics below, and wanted to see if this should be of concern and the best way forward.

Also should I be worried about the "x86/split lock detection" errors?

Thanks!

toblerone-diagnostics-20240224-0154.zip

February 17

I made a duplicate post on accident cause I thought I deleted this one, but I saw your link to your other post. Adding onto the previous question, I see that you are saying macvtap on (is this bridging off?), and macvlan on, doesn't have this issue?

February 17

6 hours ago, MAM59 said:

Turn on FLOW CONTROL on BOTH sides!

Your Unraid is set to "Receive only", your Switch to "none". This will very likely end up in lost packets, timeouts and retransmissions.

Thanks for the response! I just turned it on. I saw this post, where he says flow control being on causes issues?

February 17

6 hours ago, Mainfrezzer said:

The drops are caused by the macvtap interface.

I can recreate them on different machines at will. Big issue that you cannot disable the macvtap if you dont need it. The versions where just eth0 is present work without an issue.

https://forums.unraid.net/bug-reports/stable-releases/6126-macvtap-causes-consistent-package-loss-r2836/
If you wanna contribute to it^^

Edit: For the funsies, here is the result from my main pc, as before, completely different hardware, completely different cables.

Heres a comparison with an older Unraid version

Wow thanks for the videos! This is definitely it, I have bridging disabled because I use macvlan which I followed here https://docs.unraid.net/unraid-os/release-notes/6.12.4/, I like macvlan because it assigns a different MAC address. Is this still the recommended way to use macvlan, by disabling bridging? I have the same behavior in the RX drops as you showed in the videos.

February 17

6 hours ago, MAM59 said:

usually drops at 10G mean: BAD CABLE!!!

Make sure, you have got a real 10G cable. Most of them sadly are are "raw cables" which means, the cable is fine, but the plugs are not capable of 10G.

As a last resort the speed will drop to 5 or even 2,5 G to compensate the transmission errors.

BTW: it has NOTHING to do with power efficence. The link speed is the same for 2.5, 5 and 10G. Just the Usage/Pause times are different. So it is not wrong or uncommon, that a switch reports a 10G link while the real used speed is lower. Make sure that Flow Control is turned on and working, else you will notice a lot of retransmissions and slowdowns.

BTW2: if possible, avoid twisted pair 10G completly and go safe with fiber or direct connect SFP+

BTW3: you HAVE a 10G connection, read your list correctly! the 10000 comes before 2500 and 5000. And this is just an offered list, the picked speed is 10000 as you can see below.

Ah yes I didn't read the list correctly! I'll try a new cable and see if it fixes it. I did have a X540-T2 which worked well, but I did an upgrade and there was 10Gb on the motherboard and I wanted to simplify/have more PCIE slots available. I agree with you and I also prefer SFP+, much simpler and cheaper! Thank you!

February 17

7 hours ago, SimonF said:

Eth tool is showing 10Gb

Not sure why you are getting drops. There are comments about turning off power efficiency settings for windows, not found an option on linux yet.

Ah yes thank you. Right after I posted I noticed that, I deleted the post cause I just didn't see it at first for some reason, I assumed it listed it in order of speed. Thank you for pointing that out.

February 17

In addition: would the following be of concern, I get these often now:

x86/split lock detection: #AC: qemu-system-x86/9449 took a split_lock trap at address: 0x7ff0508c

February 17

It seems like sometimes I get intermittent drops in accessing the webUI and the receive drop counter just keeps on going up. Within a few days it went to 500k drops. Any help is appreciated. I just upgraded my server, new MB, CPU, RAM, HBA, so hopefully theres nothing seriously wrong here.

The image of the drops is within 20 min.

toblerone-diagnostics-20240217-0124.zip

February 17

This is the specific controller:

Ethernet controller: Aquantia Corp. AQC113CS NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 03)

I also tried a 10GB only SFP+ and also a multi-gig SFP+ adapter and I get the same behavior.

February 17

I just upgraded to the Asus Z790 ProArt Creator which has the 10gb nic built in. I have it connected via RJ45 at the board into a SFP+ to RJ45 adapter. On the Ubiquiti switch, it negotiates at 10GB, and in the webui main page, it says 10000. Using ethtool however, i see that the ubiquiti switch is asking for 10000, but the Asus motherboard can only advertise to 5000. I tried a different 10GB switch which has RJ45 ports and ran ethtool again and the same thing shows up, so I dont think its any fault of the switches or cables? I included the diagnostics and the screenshot below. Any help is greatly appreciated! I get occasional drop out in accessing the web ui and also over 500k read drops.

toblerone-diagnostics-20240217-0124.zip

December 13, 2023

Ah got it thank you!

December 13, 2023

Thank you for the getting back to me, is the cause of my issue that I had a browser window open and this caused the system to hang?

December 13, 2023

Today, many of my services were down with only some being up. I wasn't able to access the web interface, and logging in over KVM, I wasn't able to login to the shell with a login timeout error after X seconds. I was able to ping the server ip, but I wasn't able to SSH in either. On my PiKVM i did a shutdown, which I was able to see was initiated on the KVM output, but after 300sec, it sais unclean shutdown and I have attached the diagnostics if someone would be able to help me out. I suspect it might be from running out of resources but i'm not sure. Thank you in advance!

I hope the unclean shutdown diagnostics are anonymous...

toblerone-diagnostics-20231213-0951.zip

Johann

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by Johann

Rebuild Disabling Disk and Read Errors!

Rebuild Disabling Disk and Read Errors!

Rebuild Disabling Disk and Read Errors!

Rebuild Disabling Disk and Read Errors!

Rebuild Disabling Disk and Read Errors!

Rebuild Disabling Disk and Read Errors!

Disabled Disk following newperms

Disabled Disk following newperms

Disabled Disk following newperms

Disabled Disk following newperms

10Gb only showing as 5000mbps using ethtool

Disabled Disk following newperms

Disabled Disk following newperms

10Gb only showing as 5000mbps using ethtool

Thousands of RX drops.

10Gb only showing as 5000mbps using ethtool

10Gb only showing as 5000mbps using ethtool

10Gb only showing as 5000mbps using ethtool

Thousands of RX drops.

Thousands of RX drops.

10Gb only showing as 5000mbps using ethtool

10Gb only showing as 5000mbps using ethtool

System Hang, and Unclean Shutdown on Shutdown

System Hang, and Unclean Shutdown on Shutdown

System Hang, and Unclean Shutdown on Shutdown