May 14, 20179 yr So here's the gist of my build right now - https://pcpartpicker.com/list/vcbmKZ I have the Pro license, so I have plenty of extra slots. Right now I have one 8TB parity drive, one 8TB data drive, and eight 4TB data drives (all seagate 3.5 SATA drives). Whenever I try to add another drive in, it doesn't show up in unRAID when I bring down the array and try to assign it. I haven't gone through bios yet to see if it's detected there (since it can be difficult to connect a monitor to it), but I was curious if anyone knew what was going on here. Is the chip limiting me from adding in another drive with PCIE lanes ( I read that some drives like NVMe drives take up these lanes, so I thought maybe it happened with SATA drives too). Or could there be an issue with the power supply I'm using? I've had to power some drives on molex adapters, but maybe the power supply just can't handle anymore? I don't think it's the controller card as I have tried moving around the connected SATA cables. I would love to hear your thoughts. Any insight would be greatly appreciated.
May 14, 20179 yr Author 1 minute ago, johnnie.black said: Diagnostics? I'm in the middle of a data rebuild, but here's the log file tower-diagnostics-20170514-1104.zip
May 14, 20179 yr I'm missing something, you have 10 array HDDs, server booted up with 11 HDDs connected, then disk8 dropped offline (or was pulled from the server) and you used the spare disk to begin a rebuild, so you had all 12 SATA ports used and detected by unRAID (11 HDDs plus the cache SSD), so I don't know what is the problem.
May 14, 20179 yr Author This issue with not being able to add another disk to the array has been going on for months, but I have just ignored it. So, earlier today I installed - Rosewill 3 x 5.25-Inch to 4 x 3.5-Inch Hot-swap SATAIII/SAS Hard Disk Drive Cage - Black (RSV-SATA-Cage-34) - that into my tower. I put the same disks I had in there and booted it up. It said disk 8 was bad (which it shouldn't have been), so I started looking for answers. Saw an unRAID wiki say to check cables and then power it back on. unRAID said it was still bad so I powered it down, and moved that disk 8 to another slot of the drive cage I had just installed. Booted it back up and it still said disk 8 was bad. So I figured, since I have an extra hard drive, I will put that in there and just use that (just in case the original drive did go bad during the transition). So I put the new drive in (with the original disk 8), unassigned the original disk 8, assigned the new drive, but it said the new disk was bad. I brought the array down, swapped it back to the original disk 8 and was going to leave the new disk unassigned. But it's weird because now unRAID is saying the original disk 8 is okay and the data is being rebuilt from the parity drive. However, now the newly installed drive doesn't show up anymore. It's not there as an unassigned disk, and when bringing down the array it doesn't let me add that new disk into a new slot. So I'm back to how things were, with the original disk 8 working in the newly installed drive cage, but unRAID is not seeing that I have another drive connected. There's not an unassigned data drive on the Main tab, so when bringing down the array and clicking on the next unassigned data drive slot, there isn't an available drive to add. But again, this issue of unRAID not seeing this extra disk has been happening for months. I just ignored it and decided to start upgrading to 8TB drives to increase capacity. I'm just curious what would be causing my computer/unRAID to not recognize that last drive. unRAID should be seeing 12 hard drives. 1 cache ssd, one parity, and 10 data drives. This is what I have connected to my computer. Using up all 8 SATA ports on my motherboard, plus the 4 from the controller card. Edited May 14, 20179 yr by Endda
May 14, 20179 yr unRAID was seeing the 12 devices, but I was missing something, I skipped over the log too fast and didn't notice what you actually did: So, the array booted with 12 devices total: May 14 10:05:20 Tower emhttp: ST8000AS0002-1NA17Z_Z840NQR1 (sdm) 7814026532 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_Z303XZQK (sdj) 3907018532 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_S300Y66M (sdk) 3907018532 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_Z304MPSG (sdg) 3907018532 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_Z3078KN2 (sdh) 3907018532 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_S301P7TN (sdd) 3907018532 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_S301P7GV (sde) 3907018532 May 14 10:05:20 Tower emhttp: Samsung_SSD_850_EVO_250GB_S21NNXAGA97348R (sdb) 244198552 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_Z304LNBD (sdf) 3907018532 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_Z300M6EX (sdc) 3907018532 May 14 10:05:20 Tower emhttp: ST8000AS0002-1NA17Z_Z840NQT2 (sdl) 7814026532 May 14 10:05:20 Tower emhttp: ST4000DM000-1F2168_Z30307XE (sdi) 3907018532 You then replaced disk8 with the spare disk, new disk8 was disabled immediately after array start: May 14 10:06:41 Tower kernel: ata8: hard resetting link May 14 10:06:41 Tower kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) May 14 10:06:41 Tower kernel: ata8.00: both IDENTIFYs aborted, assuming NODEV May 14 10:06:41 Tower kernel: ata8.00: revalidation failed (errno=-2) May 14 10:06:41 Tower kernel: ata8.00: disabled You then again replaced disk8 with the old disk and that one is rebuilding at the moment. Still, at boot time all disks were being detected, now it's not since the spare disk was disabled, you'll need to reboot (or power cycle) to get it back.
May 14, 20179 yr Author Thank you for looking into this for me. It will take about 8 hours to do the data rebuild. Once that is done, my course of action will be to reboot the machine and then hopefully assign that disk to the array. If it doesn't show up again, should I grab a new diagnostic log and attach it to a reply to this thread? Thank you again for looking into the issue
May 14, 20179 yr Yes, also grab the diags before an after trying to assign it, as if it drops offline again there will be no SMART report for that disk like in the current diags. Edited May 14, 20179 yr by johnnie.black
May 15, 20179 yr Author 19 hours ago, johnnie.black said: Yes, also grab the diags before an after trying to assign it, as if it drops offline again there will be no SMART report for that disk like in the current diags. so after the rebuild, I did a full restart of the computer. when it booted back up, again, there wasn't an unassigned disk for me to put into slot 10. this is how things have been for months and I just can't understand why it won't recognize that disk. this was even a disk that I used to have in the machine before I upgraded to 8TB drives here's the diagnostic file. this was right after a clean boot. since it wasn't there for me to try and assign, I didn't know when/if I should try to grab a second one tower-diagnostics-20170515-0748.zip
May 15, 20179 yr There's a problem with the disk ST4000DM000-1F2168_Z30307XE or the port where it's connected, this is the same disk that got immediately disabled earlier, now it's not being correctly detected by linux, so it's not unRAID, it's hardware related. Quote May 15 07:48:24 Tower kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) May 15 07:48:24 Tower kernel: ata8.00: both IDENTIFYs aborted, assuming NODEV You can try to swap places with another disk and see if the problem follows the disk or stays with the port, but note that if it stays with the port it may disable (or not find) the array disk you connect there. Edited May 15, 20179 yr by johnnie.black
May 15, 20179 yr Author So I'm guessing the connector on the controller card is bad? I could have sworn I tried swapping cables connected to those SATA ports when I first came across the issue. But I may not have. It's certainly not the disk. I was using it not even a week ago as either a data or parity disk. Would this not be an issue with PCIe lanes or anything for the Intel chip I'm using? I guess my next step will be to buy a new controller card and connect that drive to a different port on the card. Thanks again for the help. I figured it was a hardware issue, but didn't know what the issue was. I really appreciate it
May 15, 20179 yr Just now, Endda said: Would this not be an issue with PCIe lanes or anything for the Intel chip I'm using? No, it's a hardware problem, disk, controller or cables.
May 15, 20179 yr Author See, it's so weird. I just opened up the computer to install a new hard drive cage at the bottom. Didn't mess with any connectors on the drives, but I did have to unplug the power cables connecting to the power supply. Installed the HDD cage, plugged the power supply cables back in and started the computer back up. Now the new hard drive is showing up as unassigned. I assigned it to a new slot, it began to clear them, but then started producing errors on the main tab Here are diagnostic logs for before I shut down the array (to attach it), and after I saw it having errors. It shouldn't be the cables since I had this issue before, and used cables from the first hard drive cage I had installed. IT shouldn't be the disk since again, I didn't have any issues with it when I was using it a week ago. IT is possible, but the likelihood is slim. So I'm thinking it has to be that controller port. Unless you can find something else from these logs. I'll be buying a new controller card today so it'll be easy to check once that is delivered. tower-diagnostics-20170515-1212-BEFORE.zip tower-diagnostics-20170515-1218-AFTEr.zip Took the array down to unassign it and now the drive disappeared again, lol Edited May 15, 20179 yr by Endda
May 15, 20179 yr I can only tell you that the same disk got disable again: May 15 12:14:58 Tower kernel: ata8.00: both IDENTIFYs aborted, assuming NODEV May 15 12:14:58 Tower kernel: ata8.00: revalidation failed (errno=-2) May 15 12:14:58 Tower kernel: ata8.00: disabled It's like the disk is being disconnected, but can't tell you what's causing, you need to start replacing one thing at a time to rule them out, it's either disk, controller or cables/backplane.
May 15, 20179 yr Author 1 minute ago, johnnie.black said: I can only tell you that the same disk got disable again: May 15 12:14:58 Tower kernel: ata8.00: both IDENTIFYs aborted, assuming NODEV May 15 12:14:58 Tower kernel: ata8.00: revalidation failed (errno=-2) May 15 12:14:58 Tower kernel: ata8.00: disabled It's like the disk is being disconnected, but can't tell you what's causing, you need to start replacing one thing at a time to rule them out, it's either disk, controller or cables/backplane. That's cool. thanks again (again!). A new controller card will be here on Wednesday
Archived
This topic is now archived and is closed to further replies.