thirtythreemangos Posted May 6 Share Posted May 6 (edited) Hi all, So I've built a 7 drive NAS on unRAID and have been doing the very long, very slow process of it populating it with my data from my external drives, shucking them, adding them to the array, then continuing the process one drive at a time. Everything has been going fine, until now. Out of the blue one of the disks in the array all of a sudden started having a ton of the UDMA CRC errors. I followed instructions on the forum, downloaded the diagnostics, ran an extended SMART check (which found no errors), the disk itself could unmount and mount just fine, and was displaying the expected contents in the file browser, so after getting the SMART results I proceeded with the process to rebuild the drive onto itself. Very soon after the sync process began I again got flooded with error messages and the sync process stopped after 14 minutes. The array is currently sitting in maintenance mode and I just don't want to touch anything else without getting input. As I type this I'm not seeing anywhere to upload the diagnostics and SMART results files. Diagnostics are attached now. Please help? miraid-diagnostics-20240505-0716.zip miraid-smart-20240506-1021.zip Edited May 6 by thirtythreemangos Added diagnostics files Quote Link to comment
itimpi Posted May 6 Share Posted May 6 CRC errors typically indicate a connection issue. Did you do anything to confirm that the SATA and power cabling to the drive are fine. The syslog in the posted diagnostics is full off resets on the drive that look like this sort of issue which is also consistent with getting lots of CRC errors. Quote Link to comment
thirtythreemangos Posted May 6 Author Share Posted May 6 @itimpi You mean like like unplug and check the pins or something? Quote Link to comment
itimpi Posted May 6 Share Posted May 6 4 minutes ago, thirtythreemangos said: @itimpi You mean like like unplug and check the pins or something? No - I mean did you make sure the cables were well seated in case they worked slightly loose, or alternatively try different cables. Cabling seems to be the commonest cause of errors. Quote Link to comment
thirtythreemangos Posted May 6 Author Share Posted May 6 @itimpi I can try swapping a new data cable. What's the best next step right now? Am I ok to stop the array and shut it all down so I get in to the drives? Do I need to do any diagnostics tests before anything? Or backup anything? I'm operating under the assumption that despite a repeat "drive disabled" thing the data is still intact in parity and I won't lose anything just shutting down and unplugging stuff? Quote Link to comment
trurl Posted May 6 Share Posted May 6 The emulated drive is mounted and has plenty of data, so rebuild should recover that if you take care of your connections. But, you really should consider making some more free space on disks 1, 2, 3. Or at least quit writing to them since they are too full. If you ever need to repair a filesystem, repair may need some free space to do its work in. 19 minutes ago, thirtythreemangos said: backup anything? You must always have another copy of anything important and irreplaceable. Parity is not a substitute for backups. 19 minutes ago, thirtythreemangos said: the data is still intact in parity The data is intact in the parity array since it is being emulated by all the other disks. Parity itself contains none of your data. Parity is just an extra disk that contains parity bits. Those parity bits allows the bits of a missing disk to be calculated from all the other disks. Quote Link to comment
thirtythreemangos Posted May 6 Author Share Posted May 6 @trurl I mean in terms of unRAID specific stuff. I'm still unfamiliar with the OS and am learning as I go. I know RAID is not backup, and I don't have anything irreplaceable in the array, just digital media so far. I know theres some sort of backup option for the unRAID thumb drive, but I haven't gotten into that yet. But at least for now it seems like I can stop the array and shut off the machine and such without any increased worry. I'll check and replace cables. As far as making free space, I know there is a setting somewhere to change the level of reserve space or something? I know I set them up as "fill up" instead of "high water" or whatever those modes are called. What's the best way of shifting some files from one physical drive to another? Just move on Krusader? Will that affect hardlinking or anything? I assume Plex will be able to find them just fine with a rescan, but I've learned that what I know about computers hasn't really translated a whole lot to this Linux/unRAID world, so I appreciate all the tips. Quote Link to comment
trurl Posted May 6 Share Posted May 6 9 minutes ago, thirtythreemangos said: backup option for the unRAID thumb drive You should always have a current backup of the boot flash. You can download a zipped backup of flash at Main - Boot Device - Flash - Flash Backup. The Unraid Connect plugin will also keep the current flash backup on the Unraid cloud. 8 minutes ago, thirtythreemangos said: I know there is a setting somewhere to change the level of reserve space or something? There is a Minimum Free setting for each of your User Shares. You should set each to larger than the largest file you expect to write to the share. If a disk has less than Minimum, another will be chosen. Pools (such as cache) also have Minimum Free setting. 15 minutes ago, thirtythreemangos said: best way of shifting some files from one physical drive to another? Depending on how you have it configured, Krusader may not allow you to work directly with the disks. Dynamix File Manager plugin is safer and easier and will let you work with the disks. 17 minutes ago, thirtythreemangos said: Plex will be able to find them just fine with a rescan Assuming you have mapped plex libraries to user shares and not disks, it won't notice the difference and won't require rescan. Quote Link to comment
thirtythreemangos Posted May 6 Author Share Posted May 6 @itimpi Quote I mean did you make sure the cables were well seated in case they worked slightly loose Just shut everything down and took the drive out. Everything looked to be seated nice and tight, and it's a brand new set of SATA cables and PSU, so i really dont have any idea why anything would have happened. The only even potential ideas is maybe when I kapton taped the 3.3v pin on the drive I either didn't cover it enough or covered a slight hair of an adjacent pin? Or maybe the way the cable is bent to fit everything in the case caused a short or something? It's a Fractal Design Node 804 so there's decent space and I'm not doing anything that's not intended to be done, so I don't know. Quote Link to comment
thirtythreemangos Posted May 6 Author Share Posted May 6 @trurlThank you! I really appreciate the input. Quote Link to comment
trurl Posted May 6 Share Posted May 6 All cable connections must sit squarely on the disk connector, with no tension on the cable that might cause it to move. Also, don't bundle data cables. Do you have any splitters on the disk power cables? Quote Link to comment
thirtythreemangos Posted May 6 Author Share Posted May 6 @trurl Quote All cable connections must sit squarely on the disk connector, with no tension on the cable that might cause it to move. Also, don't bundle data cables. We're good there, no significant tension, just have had to carefully route the cables. I specifically bought the 2' cables to avoid that in routing from one side of the machine where the drives nest to the other side where where the motherboard ports are. Quote Do you have any splitters on the disk power cables? No splitters. Although if I DO end up filling the case completely with all 8 drives that it accommodates then I'll have to figure out something, though. I went a tier up on the PSU I picked in part because this one includes 8 SATA power connectors as opposed to 6 on the first PSU I was looking at, and I had to use up one of the SATA power spots for the cases external fan speed switch (which passes through power to 3 of the fans in the case). The more I think about it, the more I wonder if I just did an inadequate job of inappropriately kapton taping the 3.3v pin on the drive. My understanding is that if that pin receives power, it shuts the drive down/prevents it from turning on or spinning up, so you have to tape the pin to prevent the connection in order to permit internal use (its a shucked external WD drive.) The cables and connections all looked good when I opened it up so I think the most likely thing is that the pin just wasn't perfectly covered and it was getting little connections every so often and the issues resulted from the drive trying to shut itself off (or whatever actually occurs electromechanically when that pin receives power). Quote Link to comment
thirtythreemangos Posted May 6 Author Share Posted May 6 @trurl @itimpi Well, I just re-did the kapton taping of the 3.3v pin. Reinstalled the drive. Rebooted. The drive shows in unassigned devices, but this time around I'm getting the "Device '/dev/sdg' failed to mount. Check the syslog for details" error message. It mounted just fine before I tried to rebuild it, but now after the round of error messages/failures that brought me here to the forum, it won't mount. Wnat should I do? Quote Link to comment
bmartino1 Posted May 6 Share Posted May 6 I have experienced this with a HBA and the multi cable on one of my drives I had to replace the sata a min sas cable and it did help. Also is using a HBA split it is best to follow # order meaning that sata 1-4 are connected if you have sata 3 disconnect but 4 I have seen this create the crc error as it is a cable issue. UDMAC also is early signs of HD spinning disk failure. It is recommended to replace the drive sooner than latter. https://www.minitool.com/partition-disk/udma-crc-error-count.html Quote Link to comment
trurl Posted May 6 Share Posted May 6 2 hours ago, thirtythreemangos said: Wnat should I do? Post new diagnostics Quote Link to comment
Solution thirtythreemangos Posted May 9 Author Solution Share Posted May 9 @trurl So I got it figured out, more or less. I removed that drive, reapplied the kapton tape to the 3.3v pin making extra effort to be thorough and complete and have it stick well and appropriately, reinstalled the drive, booted into safe mode, ran another SMART check on the drive (which again produced no error messages), started the array in maintenance mode and added the drive to the array, performed an XFS filesystem check first with the -nv and then again without, and then again re-built the drive onto itself. I can only guess that the original kapton tape was the true culprit, as when initially trying to re-build the drive produced a slew of UDMA CRC errors and an array failure, but booting in safe mode, starting array in maintenance mode, running the checks, and then re-building in that mode seemed to fix it. When rebuilding the drive, I DID get maybe another 10 UDMA CRC errors, but nothing like the 4744 of them that caused this whole debacle. But thank you everyone for the input. Hopefully anyone finding this post may consider the kapton tape as a potential culprit for similar issues. I'll eventually phase out those shucked drives that require the 3.3v pin fix (or perhaps there is an actually reliable adapter out there for this specific type use?), but the budget is not there presently. Quote Link to comment
itimpi Posted May 9 Share Posted May 9 I think an alternative is to use a Molex->Sata power splitter is an alternative to using the Kapton tape. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.