May 16, 201610 yr Last friday (5/13/2016) I had a drive redball. SMART report showed several pending sectors. I attempted to restart my array with the drive unassigned and run off parity. It only worked for a few minutes before I had thousands of read errors across several disks. When I got home I removed the two drives that I had connected that were not part of the array. I was able to run the array without an assigned disk in slot 6 and use parity during the whole weekend. No errors on any disks, including the emulated disk. During this time I ran a preclear on the redballed drive and SMART found several more pending sectors and a few offline unrecoverable sectors. It was my theory that my PSU may be to blame. Since I went from 9 HDDs + 2 SSDs to 7 HDDs + 2 SSDs and had no problems. The PSU in question is a Corsair CX430 http://www.newegg.com/Product/Product.aspx?Item=N82E16817139026 OK so that drive is dead I thought. I powered down my server, removed the failed drive and reinserted my spare 4TB WD green drive. I also checked all sata and power connections for all drives. I began a data rebuild and the speed was extremely slow. Like <1MB/s slow. Eventually it dropped below 1MB/s slow. I stopped the rebuild and received a message that the rebuild had "finished" with tens of thousands of errors. So at this point I'm left wondering if I have a drive problem, a PSU problem or both and where I should go from here. I have attached my diagnostics. Any advice would be appreciated. nalbonefs1-diagnostics-20160516-0825.zip
May 16, 201610 yr Author More weirdness. Now I'm getting read errors from other array disks and one of my cache disks is missing and the other is "unmountable". HELP!
May 16, 201610 yr Community Expert I can see that two people have already downloaded your diagnostics file but there are a few things you can do and questions that you can answer while they are looking for issues. First, list all of your Hardware--- CPU, RAM and size, SATA cards, and anything else in the case. (That will give as clue as to whether your PSU is undersized.) Second, have you looked in the case while the server is running to verify that all of the case and CPU cooler fans are running? Is there a lot of dust and dirt in that case? Is the CPU cooler fins clogged with dust? Third, did you really try to rebuilt on a disk that had several pending sectors and offline sectors?
May 16, 201610 yr Author Stepped out of my office. Will reply when I return, but no I did not rebuild with bad drive. I had a precleared spare. Sent from my Nexus 6P using Tapatalk
May 16, 201610 yr Author CPU = AMD A4-3400 Dual-Core 2.7 GHz RAM = 2x4GB DDR3 1333MHz Mobo = MSI A55M-P33 HBA = IBM M1015 flashed to IT mode Drives in Use: 5x WD Green 4TB 1x HGST Green 4tb 1x Seagate 8TB shingled Drive 2x Kingston 250GB SSD cache pool Spare drives that were plugged in but have since been removed. 1x WD Green 4TB 1x Seagate 8TB shingled drive At this point I removed the drive that failed, and replaced the with a spare WD 4TB. I tried to rebuild and it went extremely slow. Eventually it failed. I have since rebooted and now only one of my 2 SSDs is showing up and unraid reports it is unmountable. I currently have the array stopped. The case and fans are relatively clean. There is nothing clogging them and they are all spinning, or there were when I checked this morning before leaving for work. I can see that two people have already downloaded your diagnostics file but there are a few things you can do and questions that you can answer while they are looking for issues. First, list all of your Hardware--- CPU, RAM and size, SATA cards, and anything else in the case. (That will give as clue as to whether your PSU is undersized.) Second, have you looked in the case while the server is running to verify that all of the case and CPU cooler fans are running? Is there a lot of dust and dirt in that case? Is the CPU cooler fins clogged with dust? Third, did you really try to rebuilt on a disk that had several pending sectors and offline sectors?
May 16, 201610 yr Without reviewing anything, I think you may have a couple of problems. One is the PSU - that Corsair CX series looks very close to undersized for your drives, but worse, it is not considered a very good series of PSU. Meaning it may not provide what it claims, especially under load. I would strongly consider acquiring a better quality one, with a little more more power too. There are a number of very good PSU threads in the Hardware sections. But as far as I know, poor power can't create bad sectors, and you have those too (and you are dealing with them now). I just wish you had a better PSU though, to do it, because some of the newest issues (not the bad sectors) may just be collateral damage from weak power. Preclears and drive rebuilds and parity builds/checks drive a system pretty hard.
May 16, 201610 yr Author I checked out the PSU threads and ordered a new one this morning. Should be here tonight. Hooray for Amazon Prime same day delivery! Have you reviewed the diagnostics? I thought I only had bad sectors on the one drive. Am I mistaken?
May 17, 201610 yr Author I installed a new PSU tonight. All my drives were detected and the cache pool was mount correctly. I've started the rebuild and so far so good. Looks like it could take over 1 day but at least it isn't slowing to a crawl (yet).
May 17, 201610 yr Author Once again, the rebuild has slowed to a crawl. At this point it looks like I can't actually open any files from the array. It does appear that I can write however. I can create a new file, open it, edit it, save it, close it, reopen it, delete it etc. Opening existing files does not appear to work though. Very strange. I have uploaded diagnostics. What should I try next? nalbonefs1-diagnostics-20160517-0624.zip
May 17, 201610 yr Community Expert The syslog is reporting that you have corruption on disk6 that will need to be repaired using xfs_repair. It is also full of error messages referencing /dev/loop0 which where the docker image is mounted. This may mean you have a corrupt image that will need to be deleted and then recreated.
May 17, 201610 yr Author Disk 6 is the disk that is being emulated/rebuilt. Can I repair it as an emulated disk? Rebuilding the docker image is no problem though. Sent from my Nexus 6P using Tapatalk
May 17, 201610 yr Community Expert Disk 6 is the disk that is being emulated/rebuilt. Can I repair it as an emulated disk? yes - xfs_repair can be run against an emulated disk with no problems.
May 17, 201610 yr Author So should I stop the rebuild and repair. Is there a post with repair instructions somewhere on the forum? Sent from my Nexus 6P using Tapatalk
May 17, 201610 yr Author So after a while the rebuild picked up the pace and should be finished in a few hours. I would like to let it finished, but after the fact how do I go about checking/repairing the filesystem? Is there a sticky with detail somewhere?
May 17, 201610 yr Community Expert https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS
May 17, 201610 yr Author Thanks! https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS
May 19, 201610 yr Author Update on this. Drive rebuild finished early yesterday morning. I put the array in maintenance mode and ran xfs_repair on the replacement disk. It took less than 5 minutes and only found <100MB data to put in lost+found which has made my life much easier. As far as my cache pool and docker image go, I ran btrfs scrub on both and have had no further issues. I am currently running a parity check on the array to be safe. Thank you to everyone who provided advice.
Archived
This topic is now archived and is closed to further replies.