btrfs errors on device loop2 Cache drive goes read only

DuzAwe · March 11, 2021

Syslog attached. Detail of build is in my sig. Motherboard BMC chip died and has been replaced. Other than that and the upgrade to 6.9. Nothing out of the norm.

Also has UPS if that makes any difference.

thelibrary-syslog-20210311-0928.zip

JorgeB · March 11, 2021

Assuming loop2 is the docker image, if you're not sure post the complete diagnostics, delete and recreate.

DuzAwe · March 11, 2021

Docker deleted (had to reboot, that kills all the logs and what not right?) and rebuilt. I guess now we wait.

DuzAwe · March 13, 2021

Ok here we go, It happened again. I reformatted the Cache disks after my post here.

thelibrary-diagnostics-20210313-0116.zip

JorgeB · March 13, 2021

If it keeps getting corrupt start by running memtest.

DuzAwe · March 13, 2021

Too rule out ram issues? How long should I leave it run?

JorgeB · March 13, 2021

Ideally 24H, but if there's a considerable problem it should be found quickly.

DuzAwe · March 13, 2021

Anything wrong with taking two sticks at a time to another machine to test?

JorgeB · March 13, 2021

You can do that, but if you don't find the problem should then still run it in the server as is.

itimpi · March 13, 2021

10 minutes ago, DuzAwe said:

Anything wrong with taking two sticks at a time to another machine to test?

That will check that the RAM sticks themselves are not faulty. However, sometimes ram needs to be tested "in situ" in case there is any sort of bus loading issue affecting RAM. Also different motherboard/CPU combinations may have different maximum RAM clock rates they successfully support regardless of what the RAM itself has specified as its maximum clock rate.

DuzAwe · March 13, 2021

Running in place now

DuzAwe · March 13, 2021

Have been thinking. Since all this started I haven't been able to get the smart tests to run on the cache drives. Is that more indicative of the drives failing then the RAM being an issue?

Memtest is two passes down with no errors.

DuzAwe · March 13, 2021

Nineish hours of testing three complete passes with no errors so far.

JorgeB · March 14, 2021

18 hours ago, DuzAwe said:

Since all this started I haven't been able to get the smart tests to run on the cache drives.

You can't run SMART tests on NVMe devices.

Let it run for 24H but if it didn't find any issues so far it likely won't, also note that memtest sometimes can't find issues even when there are some, but the problem can come from somewhere else.

DuzAwe · March 14, 2021

3.5 hours left on the test and its all clear. What should I gear up for next?

Edited March 14, 2021 by DuzAwe

JorgeB · March 14, 2021

Remove two DIMMs and reacreate the docker image, if it corrupts again try with just the other two, if still issues unlikely to be RAM related.

DuzAwe · March 14, 2021

24 hours no fails. 6 full passes. Running dim tests now.

dimitriz · March 14, 2021

JFYI, I started dealing with this same issue on my own last night. Also after upgrading to 6.9.1 a day earlier.

DuzAwe · March 14, 2021

Its worth noting in that case that this issue started with 6.9-RC2. My set up has not changed since it was built in Jan 2020 (better times) Its run stable up to the xmas period. With a few lock ups I'm starting to believe is related to logs filling the ram up. But Im not sure.

Anyway I have caught the device loop2 issue and hope to help get to the bottom of it.

Edited March 14, 2021 by DuzAwe

DuzAwe · March 14, 2021

So new vdisk created and its immediately corrupt

thelibrary-diagnostics-20210314-1413.zip

DuzAwe · March 14, 2021

Got a lovely 503 for web gui on my reboot

thelibrary-diagnostics-20210314-1427.zip

DuzAwe · March 14, 2021

Not to jinx it but..........the next set of ram dimms seem to have everything stable...........logs attached just because. I have ordered ECC replacement sticks to hopefully rule this issue out permanently. Hopefully this is the end of this issue.

Thanks for all the help. Ill report back should anything happen once I start using the system properly again.

thelibrary-diagnostics-20210314-1533.zip

btrfs errors on device loop2 Cache drive goes read only

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation