Cache drive keeps becoming unmountable.

bamhm182 · May 27, 2017

I have had an issue twice in the past day where everything will be working fine, then I notice I have an issue with a docker or VM. I take a look at the Main tab and next to my cache drive, it has nothing but a * under temperature next to it. I shutdown the array and it takes forever, then I try to restart it and the cache drive says unmountable. Additionally Fix Common Problems said Call Traces found on your server last time it did this, but this time it didn't say that. If I start the array in maintenance mode and tell the cache drive to do a File System Check, it outputs around 25,000 lines that say the following:

Quote

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
bad magic number
bad on-disk superblock 3 - bad magic number
primary/secondary superblock 3 conflict - AG superblock geometry info conflicts with filesystem geometry
would zero unused portion of secondary superblock (AG #3)
would reset bad sb for ag 3
bad uncorrected agheader 3, skipping ag...
sb_icount 65344, counted 41728
sb_ifree 137, counted 3705
sb_fdblocks 118821857, counted 89311086
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
entry "PLEX MEDIA SERVER" in shortform directory 268435554 references non-existent inode 805306466
would have junked entry "PLEX MEDIA SERVER" in directory inode 268435554
entry "PLEX DLNA SERVER" in shortform directory 268435554 references non-existent inode 805396887
- agno = 3
would have junked entry "PLEX DLNA SERVER" in directory inode 268435554
entry "appdata" in shortform directory 99 references non-existent inode 805499114
would have junked entry "appdata" in directory inode 99

...Similar until line 17,084....

- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected dir inode 100, would move to lost+found
disconnected dir inode 107, would move to lost+found
disconnected dir inode 132749, would move to lost+found

...Similar until line 21,174...

Phase 7 - verify link counts...
would have reset inode 99 nlinks from 15 to 12
would have reset inode 100 nlinks from 10 to 9
would have reset inode 132858 nlinks from 11 to 8

...Similar until line 24,585...

No modify flag set, skipping filesystem flush and exiting.

At this point, if I do a -v instead of a -n in the File System Check, it tells me that it can't do it. I should have written down the message, but it basically tells me that I should try remounting the drive or I can force it to fix the problems with -L. If I try, it still doesn't mount, so I run -L and it fixes the problem.

Since this started happening, I've taken the long overdue steps to configure CA AutoBackup and a VM Backup solution, but I would still like to see if there's anyone here that can help me diagnose the root cause in case it decides to come back again... I have also attached the zips generated with a diagnostics command. Thank you for your time!

Specs of my R710 in case they're needed:

CPU: 2x 6-Core Xeon Processors

RAM: 72 GB

HDD: 2x 3TB WD Reds (Both Data, no Parity)

SSD: 1x 512GB Intel 600p attached via Ablecon M.2 PCI-e Card (Cache)

r710-diagnostics-20170527-1302.zip

r710-diagnostics-20170526-2259.zip

JorgeB · May 27, 2017

No apparent motive on the logs for the corruption, is this a new config or a new cache device? If not back up your cache and re-format instead of repairing, restore data a see if it holds up.

bamhm182 · May 27, 2017

Thanks for the reply. I'll give it a reformat when I get a moment. Amazon says the package was delivered on 28APR2017, so it has been working fine for the past month or so. I put together unRAID w/ the two HDDs around 2 months ago, then added in the cache drive 1 month ago, and now all of the sudden I'm having issues. The only thing I've changed is that over the past week or two, I've spun up two VMs. One to host websites on an apache server, and another that has multicraft installed on it. Neither of these seem to me like they would be incredibly problematic, so I'm not sure what would be going on. They're both running strictly on the cache drive, as well as the 8 or so docker containers. I have all the docker containers and VMs running 24/7.

bamhm182 · May 29, 2017

Reformatted today and while I was putting my data back on the cache drive, it did it again. I just reformatted as XFS. General consensus says to avoid btrfs, but at this point, I'm willing to try it...

JorgeB · May 29, 2017

You can try but there may be an underlying issue, something hardware related, that's not normal.

Cache drive keeps becoming unmountable.

Recommended Posts

bamhm182

Link to comment

JorgeB

Link to comment

bamhm182

Link to comment

bamhm182

Link to comment

JorgeB

Link to comment

Archived