manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 I do have a single cache drive, SSD. What happened is after years of operating without issue, I had dockers stop, I believe because of the BTRFS kernel issue with balance space. So after that, I formatted XFS and restored cache drive to how it was, it was working perfectly once more for 2days, and all of a sudden it just croaked. Dockers stopped, after a reboot the cache said unmountable and filesystem check could not do anything. Which is why I was asked to run a memtest. So that is where I am, nothing points at that cache drive being bad just yet, but somehow after being xfs for 2 days it just kinda quit and I seem to have lost all my appdata share So if anything I think I would only be doing this to the SSD Cache drive, which right now is unmountable anyways so it seems my data on that drive is lost to begin with. Or were you referring to some other drives in the array? Quote Link to comment
Abzstrak Posted June 16, 2019 Share Posted June 16, 2019 26 minutes ago, manolodf said: I do have a single cache drive, SSD. What happened is after years of operating without issue, I had dockers stop, I believe because of the BTRFS kernel issue with balance space. So after that, I formatted XFS and restored cache drive to how it was, it was working perfectly once more for 2days, and all of a sudden it just croaked. Dockers stopped, after a reboot the cache said unmountable and filesystem check could not do anything. Which is why I was asked to run a memtest. So that is where I am, nothing points at that cache drive being bad just yet, but somehow after being xfs for 2 days it just kinda quit and I seem to have lost all my appdata share So if anything I think I would only be doing this to the SSD Cache drive, which right now is unmountable anyways so it seems my data on that drive is lost to begin with. Or were you referring to some other drives in the array? disk 5 was in question, so at the least run fsck on it... You're running a single cache with no redundancy and no backups? so, yeah, it's not a matter of IF you will lose data, just WHEN... that when might be right now. You can run badblocks on an SSD, but honestly that's not a perfect test because of th way ssd's work... you should run fsck on it too. I'd probably go grab whatever tool the manufacturer of the drive provides to test it, the only annoying thing is that might be some stupid windows utility... My gut feeling is your ssd is dying. also, you didn't answer when you last trim'd the ssd. ps, there is a plugin that will schedule a backup of your appdata to your array, you could use should at least do that in the future. Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 I do have the plugin that backs up appdata to array, I just hope to god it's intact. Thats my only backup of my cache drive, but setup for that reason, so I am just praying that to work. So you think the SSD may be the one crapping out, of all drives I expected that one to go last. I did not do any SSD Trimming I dont think. Is that something I should be doing regularly? With array on, when I click the smart test nothing happens. tower-smart-20190616-0134.zip Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 (edited) I checked and I did have the Dynamix SSD Trim Plugin Installed, though what I cannot find is if I had the Cron job or if there was a need to activate it. Edit: If I understand correctly, I clicked the scheduler and it was running weekly at 4am, I have changed that to Daily. Edited June 16, 2019 by manolodf Quote Link to comment
JorgeB Posted June 16, 2019 Share Posted June 16, 2019 RAM should be fine, use the array normally and grab and post diagnostics if there are any further issues. Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 Ok will do that. Should I worry that when I click the smart test buttons on the cache ssd that nothing happens? Quote Link to comment
JorgeB Posted June 16, 2019 Share Posted June 16, 2019 4 minutes ago, manolodf said: Should I worry that when I click the smart test buttons on the cache ssd that nothing happens? No, that's normal, NVMe devices don't support SMART tests, since it's useless for flash devices. Quote Link to comment
Abzstrak Posted June 16, 2019 Share Posted June 16, 2019 6 hours ago, johnnie.black said: No, that's normal, NVMe devices don't support SMART tests, since it's useless for flash devices. that's not true. Some SSD's do and some don't. He has an Intel 600p which supports some of the tests. Obviously things like amount of time it's been spun up is silly on an ssd, but keeping track of lifetime stats like data written/read is important and useful. so going by your pics in previous posts, you are 2/3 through the lifetime written max the ssd is rated for, but over at Tomshardware they had the 600p crap out around 105TB (you are at 94.5TB in the pic)... so it might be that its getting too beat up. references - Intel Ark - https://ark.intel.com/content/www/us/en/ark/products/94921/intel-ssd-600p-series-256gb-m-2-80mm-pcie-3-0-x4-3d1-tlc.html Toms hardware - https://www.tomshardware.com/reviews/intel-ssd-600p-nvme-endurance-testing,4826.html Good you were trimming occasionally, it helps a bit with wear leveling and maintaining performance. Weekly or monthly is fine though with your usage, daily just adds extra stress to the drive, I'd probably put it back to weekly. Another thing that is hard on nvme is the temperature, alot of people put a heatsink on them since they are only a few dollars and can make things last longer. On the plus side it has a 5 year warranty and it hasn't been sold that long, so worst case you gotta RMA the drive. If I were in your shoes (I might be some day since I have two 660p's in mine as cache), I'd probably try to run the intel toolbox on the drive since it has Intel's diagnostics. I've never run it on an nvme, so I'm unsure of the results, but it seems prudent. The only annoying thing is they only release it for Windows... i have no windows machines, you might not either... makes it irritating, but not impossible. https://www.intel.com/content/www/us/en/support/articles/000005800/memory-and-storage.html Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 lol yes, I'm on the same Window-less boat! So I guess just tread lightly then if it craps out get an RMA since it is not that old. Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 So I restored my appdata and reinstalled from previous apps but encountered the following issues: I got at first this error on installing all apps for some of them: Then whenever I tried to start those apps it gave a server error: Then after I stopped the apps that successfully started and tried to start again, it gives a 403 error: When I try to manually update one this is what I get: Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 Ok, so now its a real issue. After that I rebooted, and when it came back up the Cache drive came up as Unmountable again, unfortunately I did not get diagnostics before a reboot, but I am sure I can recreate it. Just let me know at what point we need diagnostics. Quote Link to comment
Abzstrak Posted June 16, 2019 Share Posted June 16, 2019 Best bet is probably to go cacheless for now. yeah, read that tomshardware link I posted, when the drive crapped out on them it entered a readonly state... Sure sounds similar to your issue. I would find out what is needed for RMA, I think it just bit the dust. Also, just an fyi, some peoplemake ram drives for things like the plex transcode folder when their appdate is on a ssd array, just to help reduce writes. might be worth considering in the future. Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 (edited) Do you think it has anything to do with XFS, like would formatting it BTFRS help the cache drive at all? After formatting BTRFS at least it survives a reboot without showing up as unmountable... I guess I have to order another one in the mean time and figure out the RMA. I do have the transcode mapped to Ram by using the /tmp on Plex, is that what you are referring to? Edited June 16, 2019 by manolodf Quote Link to comment
Squid Posted June 16, 2019 Share Posted June 16, 2019 9 minutes ago, manolodf said: Do you think it has anything to do with XFS, like would formatting it BTFRS help the cache drive at all? If anything, with only a single cache device, you're far better off having it formatted as xfs Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 (edited) 11 minutes ago, Squid said: If anything, with only a single cache device, you're far better off having it formatted as xfs That is actually why I formatted it as XFS after I had the issues on my cache drive. But being formatted xfs, every time I reboot it comes back up as unmountable: No File System and I lost all my appdata. - Actually it is after reboot once dockers have been installed that it does that. After restoring appdata it was fine, it was just after the dockers were installed and ran that it just crapped out. I just randomly tried formatting it brtfs, because well why not! And all of a sudden it survives a reboot without saying Unmountable! But I have not tried with dockers just yet. Edited June 16, 2019 by manolodf Accuracy of the reboot results Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 Ok, so I attempted XFS again, and after restoring my appdata I tried to do an XFS Check and got this error Attached is diagnostics: tower-diagnostics-20190616-1850.zip Now I can't see the used/free etc. Quote Link to comment
Squid Posted June 16, 2019 Share Posted June 16, 2019 Your BIOS is running a pre-release version. There have been many, many updates to it Not saying this is the problem, but generally I would never run a BIOS that is a pre-release Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 Oh wow, I had no clue, that must have been the one that came with the Mobo at the time I got it. I will get on updating that asap. Does Unraid have any specific steps for updating MB Bios? Quote Link to comment
itimpi Posted June 16, 2019 Share Posted June 16, 2019 13 minutes ago, manolodf said: Oh wow, I had no clue, that must have been the one that came with the Mobo at the time I got it. I will get on updating that asap. Does Unraid have any specific steps for updating MB Bios? No. Process for updating a MB is determined by the MB manufacturer. Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 So I was going to update bios and ran into this, could this be a reason for some of my XFS problems on checks? Quote Link to comment
Squid Posted June 16, 2019 Share Posted June 16, 2019 1 hour ago, manolodf said: Oh wow, I had no clue, that must have been the one that came with the Mobo at the time I got it. When buying a new mobo, after verifying basic functionality of it, the first thing I do is update its BIOS to whatever is current. Its criminal how often motherboards are sold with extremely outdated BIOS's 55 minutes ago, manolodf said: could this be a reason for some of my XFS problems on checks? Doesn't look good. Not sure where you're updating xfsprogs, but if there's a file within /boot/extra for it, you can delete it. Quote Link to comment
manolodf Posted June 16, 2019 Author Share Posted June 16, 2019 Yes, someone asked me to download and put a file in boot/extra for xfsprogs 5.0 so I did that. I noticed it was a file that started in ._ so it's likely the Apple extension or something so I deleted that extra file, but I left the other in /boot/extra Quote Link to comment
JorgeB Posted June 17, 2019 Share Posted June 17, 2019 17 hours ago, Abzstrak said: that's not true. Some SSD's do and some don't. I said NVMe devices don't support SMART tests, and AFAIK no NVMe device supports short or long SMART tests, if that's not correct please post a SMART output showing otherwise, all SATA SSD still support SMART tests. Quote Link to comment
JorgeB Posted June 17, 2019 Share Posted June 17, 2019 12 hours ago, manolodf said: could this be a reason for some of my XFS problems on checks? No, that's just updating xfsprogs to latest, though since it didn't resolve the earlier issue it should be deleted since Unraid will sooner or later use a more recent version. Quote Link to comment
Abzstrak Posted June 17, 2019 Share Posted June 17, 2019 5 hours ago, johnnie.black said: I said NVMe devices don't support SMART tests, and AFAIK no NVMe device supports short or long SMART tests, if that's not correct please post a SMART output showing otherwise, all SATA SSD still support SMART tests. Sorry, I thought you said SSD, not NVMe, reading too fast However smartctl can output on an NVMe, info and example --> https://www.smartmontools.org/wiki/NVMe_Support Most (but some) nvme don't support the sata/sas type commands for smart, but it really would be nice if unraid would just run a smartctl -x on an nvme and give us that. example on my box- # smartctl -x /dev/nvme1n1 smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.41-Unraid] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: INTEL SSDPEKNW512G8 Serial Number: BTNH90920C3Y512A Firmware Version: 002C PCI Vendor/Subsystem ID: 0x8086 IEEE OUI Identifier: 0x5cd2e4 Controller ID: 1 Number of Namespaces: 1 Namespace 1 Size/Capacity: 512,110,190,592 [512 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Mon Jun 17 08:36:10 2019 CDT Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Maximum Data Transfer Size: 32 Pages Warning Comp. Temp. Threshold: 77 Celsius Critical Comp. Temp. Threshold: 80 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 3.50W - - 0 0 0 0 0 0 1 + 2.70W - - 1 1 1 1 0 0 2 + 2.00W - - 2 2 2 2 0 0 3 - 0.0250W - - 3 3 3 3 5000 5000 4 - 0.0040W - - 4 4 4 4 5000 9000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 36 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 620,079 [317 GB] Data Units Written: 1,358,920 [695 GB] Host Read Commands: 3,438,919 Host Write Commands: 6,865,860 Controller Busy Time: 120 Power Cycles: 6 Power On Hours: 182 Unsafe Shutdowns: 0 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Error Information (NVMe Log 0x01, max 256 entries) No Errors Logged Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.