Yarok Posted September 2, 2015 Share Posted September 2, 2015 Hi everyone, Yesterday evening, while cleaning up some files on my server, I’m absently logged on to see how much free space was still available… only to find that one of the hdd (disk1) “dark-red-balled” on me and now shows as "Not installed". Please consider that I very rarely log on the webgui so I don’t really know for how long this has been the case. Anyway, this setup has been running almost care and maintenance-free for a bit shy of three years, so I have a few questions in the general “what now?” direction, I hope you’ll be able to help me out ? 1) Let’s get the dumbest question right out of the way first: how do I physically identify the broken drive in my setup? This is silly but I did not take the care of labelling my drives so as to know which is which (not clever). 2) Are there things I can do (other than checking for an unplugged power/data cable, which I did) or checks I might run to detect further issues on this drive?* 3) I guess I can already look for a replacement drive. The defective drive was 2To, so I think I have two options: either I purchase a new 2To and make a swap and rebuild disk1, or I purchase a bigger drive and upgrade my parity while I’m at it? 4) If I decide to upgrade my parity, what is the correct sequence to do so in order to avoid data loss? I think this would also be the right time to upgrade from 5.0 to 6.1. Me not being your expert tinkerer, I’m looknig at a fair amount of head scratching. I've attached a screencap if this might help. Thanks in advance for your support! Patrick *[sOLVED] It so happens that i simply forgot to perform one of the easier troubleshooting tests that is to try and connect a valid drive to the "invalid" SATA port. Verdict: the SATA port was fine, it was the hdd that was faulty, a simple swap with a new hdd of the same capacity was all it took to fix the problem Thanks to all who took the time to help and contribute! Link to comment
RobJ Posted September 2, 2015 Share Posted September 2, 2015 As always, the very first step is to capture the syslog BEFORE you reboot, so we can see what actually happened. Please see Capturing your syslog, and attach a syslog, zipped if necessary. Then we need to see the SMART report for that drive, see Obtaining a SMART report. Link to comment
Yarok Posted September 2, 2015 Author Share Posted September 2, 2015 Hi Rob, thanks. I thought about the syslog but we had a power outage very recently which forced the reboot. I did not think any further and rebooted manually afterwards, though. I've attached the syslog file. For the SMART report, i think i made a mistake in my command line somewhere since i get "/dev/sda: Unknown USB bridge [0x0951:0x1643 (0x100)]", and i'm not sure what i should do next. syslog.txt Link to comment
trurl Posted September 2, 2015 Share Posted September 2, 2015 Looks like sda is your flash drive. We need a smart report from the drive that failed if you can get it. And since you don't often check on things, it would be good to get a smart report of all the other drives as well since they will be needed for rebuilding. Also, your cache drive looks like it is using IDE mode. Is it an IDE drive? unRAID should be emulating the missing drive. Can you see its contents? Link to comment
Yarok Posted September 2, 2015 Author Share Posted September 2, 2015 I've made a SMART report in attachment for every drive from sda to sdf, thus seeing six drives, though i should see seven of them (one parity plus six). I can actually see the content of the missing Disk1 drive if i access the server via Windows Explorer and type only \\servername. The "not installed" drive remains so even if i stop the array. What can i do next? Archive.zip Link to comment
trurl Posted September 2, 2015 Share Posted September 2, 2015 I would wait until everything is back to normal before considering an upgrade to v6. Read this wiki: Replace a failed disk, including the swap-disable procedure. Link to comment
Yarok Posted September 2, 2015 Author Share Posted September 2, 2015 Okay thanks. So my drive is definitely dead then. What i find confusing is that the webGui shows the drive as "not installed" and not as "disabled" Not sure about the procedure to find the dead drive though: should i unplug every drive and reconnect them one by one? Or should i stop the array, power down, remove one drive, power up and start the array, then see if the drive i removed is the defective one and if not, do it all over again? Link to comment
RobJ Posted September 2, 2015 Share Posted September 2, 2015 Your syslog shows the onboard SATA controller supports 6 SATA ports, but one is unusable, so you only have 5 SATA ports, all in use (sdb through sdf). It also shows an ITE821 IDE controller, supporting both ide0 and ide1, which would allow 4 IDE drives, at hda through hdd. Syslog only shows hdb connected (your Cache drive), and it had some issues during initialization, a little suspicious. So the only way the missing drive would be connected is as an IDE drive on the same cable as the Cache drive (at hda) or on another IDE cable as either hdc or hdd. Do you have a record as to what it used to be, and does this accord with your understanding (Disk 1 was an IDE drive)? Currently, Disk 1 is being emulated, so its data is not lost. But it will be hard to rebuild without connecting an IDE drive of the right size! Link to comment
Yarok Posted September 2, 2015 Author Share Posted September 2, 2015 Okay so what i can tell you is that i only have one IDE drive and that is my 250Gb cache drive. All six other drives are SATAs, one being the 2To parity, then the five usable drives themselves. Disk 1 has always been a SATA drive my setup is still the same the one i posted on 2013, except now it has two extra SATA drives - as you can see on the pic, only the bottom one is an IDE. Link to comment
trurl Posted September 2, 2015 Share Posted September 2, 2015 If the drive is so dead it isn't even spinning, you might try the screwdriver "stethoscope". Spin up all drives, then put the handle of a screwdriver in your ear and the tip on each drive and see if you can tell if one drive isn't spinning. Link to comment
itimpi Posted September 2, 2015 Share Posted September 2, 2015 Okay thanks. So my drive is definitely dead then. What i find confusing is that the webGui shows the drive as "not installed" and not as "disabled" The drive would show up as disabled if it was seen at the physical level, but was no longer being used by unRAID because of an earlier write error. The fact it is showing as "not installed" suggests it is not being seen at the physical level. Is it see at the BIOS level? Not sure about the procedure to find the dead drive though: should i unplug every drive and reconnect them one by one? Or should i stop the array, power down, remove one drive, power up and start the array, then see if the drive i removed is the defective one and if not, do it all over again? The first thing to do is to make sure you know the serial numbers of all the drives that are shown in the Web GUI. That would at least mean in the worst case you can identify them by removing them and looking at the serial number on the labels. In the short term if you do not want to remove the drive for examination then you could power down, disconnect the cable to the drive, and power back up to see which drive has just disappeared. Note the physical position against you table of drive serial numbers created earlier and then repeat until you know exactly which drive is in which position. You should be able to identify the 'defective' drive by elimination. Note that it is always possible that you have simply disturbed the cabling to the drive marked as not installed and that is causing it to not be seen. It is always worth checking if it is being seen at the BIOS level as if that does not see it, unRAID will not see it. Link to comment
RobJ Posted September 3, 2015 Share Posted September 3, 2015 Your syslog shows the onboard SATA controller supports 6 SATA ports, but one is unusable, so you only have 5 SATA ports, all in use (sdb through sdf). All six other drives are SATAs, one being the 2Tb parity, then the five usable drives themselves. Disk 1 has always been a SATA drive OK, then there's a problem with the third motherboard SATA port, as currently it's a dead port. The syslog is showing it as unusable, not as an empty port. It's marked as a 'DUMMY', which usually is the designation for a SATA port with no actual connector hooked up to it. I wonder if there was a bad electrical spike associated with that power outage you mentioned, that damaged the motherboard or messed up the BIOS configuration. Go into your BIOS settings, make notes of any changes you have in there, and reset them all to defaults, save it, then come back in and put your changes back in. We're trying to get the system to reset, and see if the onboard SATA controller can again see all 6 ports. One good thing, Disk 1 should be unaffected, data good, just inaccessible until we get a good SATA port to connect to. If we can't get the third SATA port working, you are going to have to add a SATA controller card. I can recommend the Asmedia ASM1062 cards, inexpensive ($11 to $15), PCIe-x1, 2 ports, fast. But you will probably want whatever you can find that's quick to obtain, within your price range. Link to comment
Yarok Posted September 3, 2015 Author Share Posted September 3, 2015 Thanks again, you're all being very helpful I'll have to check out the BIOS tonight so i'll get back on that question. In the event that there actually is one SATA port damaged on my motherboard and since my IDE cache is plugged to the IDE/PCIe controller, i think my next moves should be: - to purchase a SATA controller card (your Asmedia looks sweet but it will be hard to find in Belgium - i'll try and get a SATA III card anyway, should set be back around 40EUR...); - to purchase or recycle another SATA cache drive. What should i look for in a new cache drive? speed? cache size? Is this all correct? Cheers, Patrick Link to comment
CHBMB Posted September 3, 2015 Share Posted September 3, 2015 If the drive is so dead it isn't even spinning, you might try the screwdriver "stethoscope". Spin up all drives, then put the handle of a screwdriver in your ear and the tip on each drive and see if you can tell if one drive isn't spinning. That's a neat trick, going to remember that. Gotta make sure I get the ends the right way around though. Link to comment
Yarok Posted September 3, 2015 Author Share Posted September 3, 2015 Okay so i've managed to check the BIOS this evening. As Rob already confirmed, port SATA3 also shows as Not present there, as you can see from the screenshot i took from the BIOS itself. I guess this leave me with the PCIe/SATA controller option I found the info i needed regarding a new cache drive on the wiki so i'll be looking for a small and speedy drive with a decent cache memory. With this option in mind, what is the correct sequence of operations? I think i should proceed as follows: - unplug the cache drive and replace the IDE controller with the SATA controller; - disconnect disk1 from SATA3 and plug it in the 1st port of the controller card; - plug a new cache drive in the 2nd port of the controller card Is that it? Cheers Link to comment
RobJ Posted September 4, 2015 Share Posted September 4, 2015 I get things mixed up sometimes between different users issues, and I can't remember why the Cache drive has to be dealt with now. I would forget about the Cache drive for now, just deal with Disk 3, and then later you can decide what you want to do with the Cache drive. One issue at a time always seems safest. But I apologize if I've forgotten something. Link to comment
Yarok Posted September 4, 2015 Author Share Posted September 4, 2015 No problem Rob, you already helped me a great deal as it is The thing is, my IDE cache drive is currently plugged into a controller card (because my mobo only has 6 SATA ports, minus one now). If i deal with Disk 3 by plugging it into a SATA controller, i lose my cache completely since i don't have somewhere to plug it into anymore. Hence my questions about a new cache drive Link to comment
Yarok Posted September 5, 2015 Author Share Posted September 5, 2015 Hi guys, Yesterday i got the PCIe/SATA controller card i could find (this one), installed it this morning, plugged Disk2 in and it does not seems to be detected at all. Did i purchased a wrong type of controller? Do i have to install a particular driver and if so, how? Thanks for you help! Link to comment
itimpi Posted September 5, 2015 Share Posted September 5, 2015 There have been problems with disk controllers using some Marvell chipsets. Not sure if that is one of those. As a check can the drive be seen at the BIOS level. If not then there will be no chance of unRAID seeing it. It is always possible that there is a cabling issue, or that the drive itself is faulty? Link to comment
RobJ Posted September 5, 2015 Share Posted September 5, 2015 Yesterday i got the PCIe/SATA controller card i could find (this one), installed it this morning, plugged Disk2 in and it does not seems to be detected at all. What I would like to see is the results of the lspci command, but I'm not sure your unRAID version has it. Did i purchased a wrong type of controller? Do i have to install a particular driver and if so, how? I couldn't find definitive info on the chipset it has, but it's probably a Marvell 9128, which is on the Marvell bug list. It will work fine in v5, so it's at least a short term solution for you, but is likely not to work in v6 unless you turn IOMMU off, which may limit your ability to use the v6 virtualization features. When you say the drive wasn't detected, are you saying it did not appear on the drop down list? No problem with the driver, it should already be included. Upgrading to v5.0.6 *might* be better. At some point, it would be even better to upgrade to v6. Link to comment
Yarok Posted September 6, 2015 Author Share Posted September 6, 2015 Unless i'm not reading my BIOS correctly (although there is not much to look at), i can't see the drive connected, and no, it does not appear in the drop down list either. It remains as "not installed" and when the array is stopped, the drop list says "Not assigned" without any other option. I could give the lspci command a try but when i typed it i got this: Linux 3.9.6p-unRAID. root@DEIMOS:~# lspci -v -bash: lspci: command not found Did i do this correctly? Should i just upgrade to v6, then? [edit] i found an ASM1061 on eBay, can i get this one instead of ASM1062, which i have a hard time finding? Link to comment
RobJ Posted September 7, 2015 Share Posted September 7, 2015 The ASM1061 is what I have, works great. I'm starting to see it referred to as a 1062, so thought my memory was bad. It appears to be one card (same chipset) called both numbers, so either I'm confused or something is! When I have time, I'll try to un-confuse myself! Apparently, lspci wasn't included with unRAID until v6. I don't know why the Marvell did not work for you, can't think of any other reasons. Possibly a different PCI slot would work? Link to comment
Yarok Posted September 7, 2015 Author Share Posted September 7, 2015 Hey Rob, A different PCI is not an option for me, there is only one PCIe on my mobo. I'll get the ASM1061; it's cheaper and it rules out the Marvell-bug altogether Can i upgrade to v6 with this ongoing issue? Link to comment
RobJ Posted September 7, 2015 Share Posted September 7, 2015 Can i upgrade to v6 with this ongoing issue? Well, you could, but it's never a good idea to try dealing with more than one problem at a time, and the v6 upgrade is BIG. Lots of decisions and a learning curve. Might be better to get the system working first, all drives accessible. You don't need lspci, it's just a tool that makes it easy to see model info on PCI devices, and would have been helpful with the Marvell card, which didn't work anyway! Link to comment
Yarok Posted September 7, 2015 Author Share Posted September 7, 2015 Ok, just as i thought. Well, i'll let this thread pending until i receive the Asmedia card. I'm counting a couple of weeks, just so the package has time to travel half the globe Cheers! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.