danktankk

March 17, 2021

March 13, 2021

Per @jonp, I am creating this thread and adding my diagnostics for this situation.

Things I have tried is resetting the network, manually adding the MTU for the network card, and manually adding br0 in the routing table.

Thanks for any help!

baconator-diagnostics-20210313-1154.zip

December 13, 2020

Thank you for the reply. They werent being removed or anything, this just happened while the drives were in the chassis. They would be in both the array and unassigned devices at the same time as well.

The one odd thing i cant shake is why it is only these 4 drives.

I am going to try to rebuild them from parity one at a time to see if that helps.

I have found that putting these drives in an external eSata enclosure allows them to work without these strange errors.

December 11, 2020

No luck there either. I have upload several requested diagnostic files but no feedback from that yet.

December 10, 2020

I guess the backplane could be the problem, but why arent any of the other drives have any issues at all? I have 8 14TB drives, 1 12 TB drive, and 4 2 TB drives running on this same backplane and havent had a single error.

I dont know if this is important or not, but it is the error from the drive when it fails:

Dec 10 01:28:19 baconator kernel: sd 11:0:9:0: [sdk] Synchronizing SCSI cache
Dec 10 01:28:30 baconator kernel: sd 11:0:9:0: [sdk] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00

Here is the array:

And here are the 6TB also somehow simultaneously being in the unassigned drives area as well. It takes 10-30 minutes for them to show up in unassigned. Possibly after that error i posted above appears:

Ive also moved these 6TBdrives to numerous other locations in the chassis and they all still produce the same errors.

EDIT: I am trying a parity check to see if that may have something to do with that sync error. Its been running for 50 minutes and the drives are all still in the array. So maybe thats good news...

December 10, 2020

I have removed it. It is still there though. It must be from lsio nvidia unraid.

baconator-diagnostics-20201210-1213.zip

December 10, 2020

i am seeing this on the drives that are not "sticking" to the array:

Dec 10 00:58:35 baconator kernel: sd 11:0:9:0: [sdk] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
Dec 10 00:58:35 baconator kernel: sd 11:0:9:0: [sdk] 4096-byte physical blocks
Dec 10 00:58:35 baconator kernel: sd 11:0:9:0: [sdk] Write Protect is off
Dec 10 00:58:35 baconator kernel: sd 11:0:9:0: [sdk] Mode Sense: 7f 00 10 08
Dec 10 00:58:35 baconator kernel: sd 11:0:9:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA
Dec 10 00:58:35 baconator kernel: sdk: sdk1
Dec 10 00:58:35 baconator kernel: sd 11:0:9:0: [sdk] Attached SCSI disk
Dec 10 00:59:11 baconator emhttpd: ST6000VN0033-2EE110_ZAD55NRY (sdk) 512 11721045168
Dec 10 00:59:11 baconator kernel: mdcmd (12): import 11 sdk 64 5860522532 0 ST6000VN0033-2EE110_ZAD55NRY
Dec 10 00:59:11 baconator kernel: md: import disk11: (sdk) ST6000VN0033-2EE110_ZAD55NRY size: 5860522532
Dec 10 01:28:17 baconator kernel: sd 11:0:9:0: [sdk] tag#0 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00
Dec 10 01:28:19 baconator kernel: sd 11:0:9:0: [sdk] Synchronizing SCSI cache
Dec 10 01:28:30 baconator kernel: sd 11:0:9:0: [sdk] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00

this is one that isnt having any issues:


Dec 10 00:58:35 baconator kernel: sd 11:0:6:0: [sdh] 27344764928 512-byte logical blocks: (14.0 TB/12.7 TiB)
Dec 10 00:58:35 baconator kernel: sd 11:0:6:0: [sdh] 4096-byte physical blocks
Dec 10 00:58:35 baconator kernel: sd 11:0:6:0: [sdh] Write Protect is off
Dec 10 00:58:35 baconator kernel: sd 11:0:6:0: [sdh] Mode Sense: 7f 00 10 08
Dec 10 00:58:35 baconator kernel: sd 11:0:6:0: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA
Dec 10 00:58:35 baconator kernel: sdh: sdh1
Dec 10 00:58:35 baconator kernel: sd 11:0:6:0: [sdh] Attached SCSI disk
Dec 10 00:59:11 baconator emhttpd: WDC_WD140EDFZ-11A0VA0_9RJUWYGC (sdh) 512 27344764928
Dec 10 00:59:11 baconator kernel: mdcmd (6): import 5 sdh 64 13672382412 0 WDC_WD140EDFZ-11A0VA0_9RJUWYGC
Dec 10 00:59:11 baconator kernel: md: import disk5: (sdh) WDC_WD140EDFZ-11A0VA0_9RJUWYGC size: 13672382412

It appears there is a byte mismatch?

Dec 10 01:28:19 baconator kernel: sd 11:0:9:0: [sdk] Synchronizing SCSI cache
Dec 10 01:28:30 baconator kernel: sd 11:0:9:0: [sdk] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00

I have no idea how to fix correct something like this.

December 10, 2020

I have had UnRAID Nvidia build the entire time. That most definitely isn't the issue and you can look that error up. Its from a plugin that I just uninstalled called [PLUGIN] GPU Statistics. And perhaps nvidia unraid itself?

Its annoying, but not the reason for drives dropping.

December 10, 2020

Here is a new diagnostics file along with some images of these same 4 hard drives being in 2 places at the same time in the UnRAID GUI:

array:

unassigned devices:

I have no idea why this is happening or why it is only these 4 drives. *Any* help would be very appreciated!

baconator-diagnostics-20201209-2336.zip

December 9, 2020

i am currently running an extended self-test but have added the requested files. Thank you for the effort. Let me know if you need anything else.

baconator-diagnostics-20201209-0153.zip

December 9, 2020

I have a bit of an oddball situation. Today, I noticed that all 4 of the 6TB drives I have in my array are all giving read errors. They all have the same amount of errors and it doesnt matter where in the chassis I move them. If I reboot unraid, the drives are fine for a time and them they start with these errors again... always these 4 drives and always the same number of errors for each of them. My hardware that attaches these drives to my HBA is a BPN-SAS-846EL1. Any ideas or comments would be appreciated. I am kind of astonished. lol

EDIT:

when the 4 drives get these errors, they are then somehow moved to the unassigned devices as well. even though they are still sitting in the array. Very strange

December 8, 2020

I didnt know there was a difference between -h and -H

That did the trick. Thank you!

December 8, 2020

Is there a command that will display the same array space that unraid reports from the cli?

December 8, 2020

can anyone please explain how UnRAID formulates the total HDD space in an array?

when using

df -h

It will produce a nice even number, just not the same one that is reported by UnRAID. In addition, I use a grafana dashboard that has a total hard drive space panel and it also reports that same total space that is in agreement with UnRAID's total. It is pulled from /mnt/user0

I am just curious how the hard drive capacity on the disk is reported in UnRAID when almost, if not every, other utility knocks off just over 7% of the total hard drive space.

October 28, 2020

When I used a quadro p2000, I was able to get the power output readings from nvidia-smi, which would pass that information along to wherever I wanted to view it - in this case Grafana. When I tried to do the same with a quadro p1000, I get N/A for this field. That seems odd, but it may be that it doesn't support this field? I wasn't able to definitively find an answer for this. I was wondering if anyone could confirm this. I talked to a friend that mentioned it may be the driver itself? Thanks for any help!

I am using unraid nvidia 6.8.3 currently and the driver version is 440.59.

image.png.046f2c836293b80b2165b97d48eda4a7.png

October 3, 2020

Quote

And also no reason to remove unless you intend to replace it with a different disk. Still trying to decide whether or not to believe in parity or not. I filtered out a lot of that.

I never meant physically remove. Just to remove from array.

Quote

So the scheduled parity check was configured to write corrections to parity. Did you let it complete?

No. I paused it and it lost its progress.

Quote

Apparently you rebooted after that scheduled parity check

I rebooted on Sunday, yes, but the parity check was not running then that i know of. There was a parity check that started after an unclean shutdown. Some of the docker containers werent behaving correctly and I was forced to reboot unclean.

I have 2 14TB shucks coming in tomorrow, supposedly. One day shipping is iffy at best. I'll just wait until that point to rebuild from parity.

Quote

From the diagnostics emulated disk3 is mounted, so that would seem to indicate that rebuilding disk3 from parity should be successful.

It will take almost 2 days to rebuild from parity, but I guess I should just go ahead and do it. The original question was me wondering if there is a way to reset something in unraid so it will just "see" the drive that experienced those errors and "try again" without rebuilding. I guess that will not be the case though.

October 3, 2020

@trurl I know the spam to which you are referring. I dont know of a way to get rid of it as it is related to nvidia unraid, but i forget how. Here is the parity check schedule and corrections setting. Thank you for your time on this!

image.png.90144c97fc2746099ceb2c7d0e0cc5da.png

@jonathanm

The reason I would remove the drive is because unraid has effectively shut it down due to the I/O error it was experiencing - my guess - from a faulty cable. The format, which was just explained to me, is unnecessary and I would not be doing that now. I hope this helps.

October 2, 2020

I thought that I had mentioned that I first would remove the drive from array, then format, then re-add back to array - then parity rebuild.

It was doing a scheduled parity check. Not a parity rebuild when this happened.

Here is the diagnostic file you requested as well. Thank you for the reply!

baconator-diagnostics-20201002-1826.zip

October 2, 2020

Here is the situation I currently have:

I am pretty sure I have a bad set of breakout cables that is coming out of one of my my LSI SAS 9201 16i ports and the drive has been taken from the array.

image.png.daf765271d4bf34667b84301f6d3d47d.png

This also happened during a scheduled parity check.

The only way I know to fix this is to remove the faulty drive from the array, format, and re-add it and let it do a parity rebuild. I dont know if that will work this time due to it having been in the middle of a parity check when the drive started getting I/O errors and was subsequently removed. I plan on replacing the faulty cable as I have another one here. I was just wondering if there is a way other than risking 4TB of data to restore this drive to operational since is/was a bad cable?

I would hate to do a parity rebuild on this if I didnt have to in this case.

Thank you for any help you can offer!

User Customizations · October 1, 2020

I'm super excited to see 1.4 - then immediately 1.5! lol I just want the new plex/varken goodies to integrate with this dash

that and I would like to easily be able to get fan speeds and one CPU (threadripper) to display properly.

I've noticed that there really isn't an easy way to get Fahrenheit scales to work as well... I can live with Celsius though...

User Customizations · September 26, 2020

It isnt doing it for all my drives, but its a heck of a lot closer than i was! thank you brother

User Customizations · September 26, 2020

thanks. ill give that a go now

User Customizations · September 26, 2020

that just led to the first page, first post. ill look through this. thanks anyway.

User Customizations · September 26, 2020

i cant find the post where you show how to update JSON for a single panel. its late and im getting crosseyed. I would like to get this right tonight.. but it can wait obviously...

The way that the post argument for me is done is valid (minus lm_sensors) - that didnt work out well. It does not have to have apk in it as my :latest repo does not like that.

User Customizations · September 26, 2020

I have the post arguments in place, however it required some tweaking (ty @HalienElf) since telegraf would not start at all without it:

I use telegraf:latest repo and this is probably why. See below for a different way to implement.

bash -c 'apt update && apt install -y smartmontools && lm_sensors && telegraf'

lm_sensors is the latest effort for fans and CPU and is not integral to making S.M.A.R.T. work and doesnt appear to be working right now anyway.

BTW, drive life now makes much more sense and I do like that stat! lol

danktankk

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by danktankk

Upgrade from 6.8.3 to 6.9.1 made br0 disappear for me.

Upgrade from 6.8.3 to 6.9.1 made br0 disappear for me.

Strange read errors on 4 drives

Strange read errors on 4 drives

Strange read errors on 4 drives

Strange read errors on 4 drives

Strange read errors on 4 drives

Strange read errors on 4 drives

Strange read errors on 4 drives

Strange read errors on 4 drives

Strange read errors on 4 drives

UnRAID & total array HDD space

UnRAID & total array HDD space

UnRAID & total array HDD space

[Plugin] Linuxserver.io - Unraid Nvidia

Is there another way to add drive back to my array?

Is there another way to add drive back to my array?

Is there another way to add drive back to my array?

Is there another way to add drive back to my array?

Ultimate UNRAID Dashboard (UUD)

Ultimate UNRAID Dashboard (UUD)

Ultimate UNRAID Dashboard (UUD)

Ultimate UNRAID Dashboard (UUD)

Ultimate UNRAID Dashboard (UUD)

Ultimate UNRAID Dashboard (UUD)