October 18, 20169 yr So it has been a month or two since I have logged into my webgui. And I have this big red x next to my Disk 3. I ran the diagnostics in the Tools menu and will attach the results as well if anyone can take a look. I also started a smart self extended test, but that may take awhile, its been running for 20 minutes and has not moved beyond 10%. I have not tried to do anything yet as I am scared to loose my data by doing the wrong thing here. Would appreciate some help guys! Thanks in advance! unraid-diagnostics-20161017-2322.zip
October 18, 20169 yr Community Expert A few observations: -syslog starts from October 2nd and the disk was already disable, you should enable notifications so you're not surprised like this again. -there's a constant "lost communication with UPS" error, check UPS settings -since the syslog doesn't show what happened and SMART for the failed disk looks OK, let the extended test finish, it will take several hours, if it passes try to rebuild to the same disk.
October 18, 20169 yr Author The extended smart self-test completed without error. How do i go about rebuilding the array using the same disk?
October 18, 20169 yr Community Expert https://lime-technology.com/wiki/index.php/Troubleshooting#Re-enable_the_drive
October 18, 20169 yr Author Well it looks like it rebuilt and everything is running fine now. Looking at what you said I think I had a power outage on Oct 2, and my UPS was plugged into the server but the USB data cable was not connected. So it didnt shutdown properly when the battery ran out. So i think that is the culprit here. Is there a way to save logs after a reboot? Thanks again for all your help saving my data!
November 29, 20169 yr Author Well, the device is offline again, after rebooting for unraid update. Short Smart Test is saying Passed. Is this a good indication that I have a bad drive? Why isnt the smart report finding anything? unraid-diagnostics-20161129-0804.zip
November 29, 20169 yr Community Expert You're having issues with the SASLP controller, check that it's well seated, cables, etc, you can also try a different PCIe slot if available. If issues continue consider replacing it.
November 30, 20169 yr I'm having the same issue. I have a 4 drive array -- the newest drive had the red X and the emulated message I tested it 3 times with short and once with long -- test finished without errors all four times. When I try to re-enable/rebuild I instantly get errors again. All the drives are on the same controller and only Disk 3 is having this issue.
November 30, 20169 yr I'm having the same issue. I have a 4 drive array -- the newest drive had the red X and the emulated message I tested it 3 times with short and once with long -- test finished without errors all four times. When I try to re-enable/rebuild I instantly get errors again. All the drives are on the same controller and only Disk 3 is having this issue. Replied in your other thread.
December 1, 20169 yr Author You're having issues with the SASLP controller, check that it's well seated, cables, etc, you can also try a different PCIe slot if available. If issues continue consider replacing it. Ok, ironically I accidentally bought 2 of the same controller when i installed it. So i just swapped out the controllers for now and left the cables, as they are buried in my server. I do have spare cables if I am still having issues after this. I do not have an extra PCI slot tho, well kinda. I have 2, the other is filled with a video card. While I could swap the controller to that PCI slot, the video card if installed has to be in that slot. But I don't think a PCI slot can go bad?
December 1, 20169 yr But I don't think a PCI slot can go bad? Oh, I think they can! Being both electrical and mechanical they have two ways of failing. Reseating the connector is the first thing to try and if it doesn't sit square in the slot then don't be afraid of bending the mounting bracket a little.
April 11, 20179 yr Author Ok, So I am still having issues. The same drive failed several times, at like 4 week intervals. Each time it looked the same as before, and I was able to rebuild the array. I was excited for dual parity, and quickly upgraded to that. Since I did, a whole new set of issues popped up. Here is what my last few parity checks look like: On Feb 28, I was streaming Plex during the weekly Parity check. At some point during that Plex tosses an error that It cannot read the data anymore, I go and check and find drive 3 and 4 failed this time. So I did a system reboot. After the reboot, drives 3/4 are being emulated now, but also my cache 1 drive is now missing. Not failed, but it straight up did not boot and the server doesn't even see it. Did another reboot to see if it would come back up and it does not. I ended up switching to another cable/port on my raid controller. And it came back up and mounted with cache2 to restore my appdata. So I ran extended smart tests on both drives 3/4 and they passed with no errors. So i rebuilt the array and everything checked out. I have the syslogs and will attach them at the bottom. I put 2 syslogs down there, one before and the reboot after the failures. Now on Apr 10, literally the exact thing happened again, right down to me streaming plex. And I went through the motions again and checked the smart reports, and swapped cache1 cable on the controller. I dont have the syslogs for this due to the fact Powerdown wasnt updated for 6.3 and I didnt realize this till today. I am getting paranoid now that at some point I will get a scenario where I cant rebuild my array which is now 23TB and will loose data. This cannot be a coincidence either, that after updating to dual parity i have 2 drive failures shortly after. And that its occuring during parity check, and then the cache drive blows my mind. I think my next step needs to be identifying all my hdds in my case, and seeing which ones are on my raid controller and which ones are on the onboard sata ports. I do know that both my cache drives are on the raid controller. In my previous post I mentioned I did indeed swap out the controller card, and thought the issue was resolved. The odds of both controllers going bad and starting with drive 3 failing has me thinking it cant be the card. I would really appreciate it if someone can take a peek at my syslog from Feb28 and point me in the right direction towards getting this resolved! Thanks in advance! syslog-20170228-041347.txt syslog-20170301-215946.txt
April 12, 20179 yr Community Expert On 29/11/2016 at 1:52 PM, johnnie.black said: You're having issues with the SASLP controller, check that it's well seated, cables, etc, you can also try a different PCIe slot if available. If issues continue consider replacing it. Adding to the above, you can also try looking for a board bios update or disabling vt-d if you don't need it. If nothing helps your best bet is replacing the SAS2LP with an LSI controller. Edited April 12, 20179 yr by johnnie.black
Archived
This topic is now archived and is closed to further replies.