tential Posted January 19, 2018 Author Share Posted January 19, 2018 24 minutes ago, trurl said: Go to Settings - Notifications. Turn on Help. Configure email. The default settings I think should be good enough to send you an email when you have disk issues. Got that setup and got the card/cables ordered! Now I wait til this drive rebuilds I guess. Quote Link to comment
tential Posted January 20, 2018 Author Share Posted January 20, 2018 (edited) Done rebuilding, 0 errors. Now my write speeds are extremely slow. I'm guessing this is in part due to having a parity for my first time and also in part due to having a sub optimal HDD setup through my mobo? I'm getting a write/transfer of 11 MB/s no matter the PC. Is that too slow or to be expected? Edit: Replugged the cable and got full gigabit speeds after running ethtool. Not sure why that happened weird. Edit2: Speeds are now at ~42 Edited January 20, 2018 by tential Current transfer speeds Quote Link to comment
JorgeB Posted January 20, 2018 Share Posted January 20, 2018 5 hours ago, tential said: Edit2: Speeds are now at ~42 That's about right for normal write mode, you might get some more with turbo write enable, but turbo write is also slowdown by controller bottlenecks, same as parity check or disk rebuilds. Quote Link to comment
tential Posted January 21, 2018 Author Share Posted January 21, 2018 (edited) Ok, well, I'm all setup! Now , I'm just struggling with docker mappings. it's all setup right... but it's not ! I'll create a new thread for that issue i guess. Still waiting on the card delivery of course. I'm sure that will come with its own host of problems. This has not been easy at any point! Edited January 21, 2018 by tential Quote Link to comment
tential Posted February 6, 2018 Author Share Posted February 6, 2018 (edited) Ok, so I had everything working. I ended up just reusing the same 5TB drive as disk 10. I was too tired to get a new one out. I've been running for the last 15 days or so, but got the new controller card in, and was close to out of space anyway, so set that up. Everything was recognized and running fine.tower-diagnostics-20180206-1210.zip I then went to sleep, woke up, and again Disk 10, not recognized/unavailable. I'm guessing that Disk 10 isn't actually dead. At this point, it's 100 Gigs away from being full, so I'd like to keep it if it's working and not throw away ~$100 (even though I have lots of extra drives, I am building a second server of course now that I'm out of drive bays.) That's my diagnostic of when I first noticed it was wrong. I thought I had downloaded a second diagnostic but I guess not. I also now have an error on Drive 9? Ugh racking my head here, was hoping I could add this card smoothly and I almost did!!!! My guess is I just need to AGAIN, unplug and replug everything. My wiring scheme is all hell in that case. I'm guessing anyway that I doubt I got a drive failure from the drive failing but rather from the user (Me) doing something when I installed the new card. tower-diagnostics-20180206-1303.zip (Current diagnostic) Edit: Maybe I should just pick up my Xeon server upgrade now and get this over with before I knock something else loose/screw something else up. Edited February 6, 2018 by tential Quote Link to comment
tential Posted February 6, 2018 Author Share Posted February 6, 2018 Reseated everything Booted up: tower-diagnostics-20180206-1409.zip This has to be an issue with me? A different drive now having issues too? Ugh. I just need to take this all apart and redo the whole thing before I make this worse don't I? Quote Link to comment
JorgeB Posted February 6, 2018 Share Posted February 6, 2018 Your having problems with various disks at the same time, disk9 and 10, these are both on the same Marvell controller, so maybe try reseating it, parity is on the LSI controller and SMART looks fine, maybe power issues? Quote Link to comment
tential Posted February 6, 2018 Author Share Posted February 6, 2018 (edited) 43 minutes ago, johnnie.black said: Your having problems with various disks at the same time, disk9 and 10, these are both on the same Marvell controller, so maybe try reseating it, parity is on the LSI controller and SMART looks fine, maybe power issues? Thanks a lot. Disk 9 JUST started, so I think your diagnosis of them being on the marvell controller helps since that is right next to the LSI controller, and when I was setting it up those cables are hard to touch while working with the LSI Controller. I recabled again and it looks a lot better in there I'm starting it up again now. It could be power issues? I only have a 500 Watt CX(Corsair)? But at the same time, I've been running this for the last 15 days, I just didn't have sata cables on the 3 new HDDs. Would that matter at all? A modular PSU upgrade sounds nice. The issues only happened when I installed the LSI Controller, and started to clear the 3 new drives. I MUST have knocked something loose on drives 9/10 while installing the LSI Controller that sounds reasonable right? Should I run a Diagnostic before starting the array? What steps should I be taking here I feel like I did a bunch more harm than good right now be restarting constantly as my head hasn't been clear working on this. I started up but my rebuild speed is incredibly slow at 2 MB/sec. The tower-diagnostics-20180206-1548.zip is the new one now. Edit: Speed is up to 120 MB/s yay? So I guess I've got another 20+ hours ahead of me. 12 hours for this rebuild, another 8 for adding the new drives. Hopefully everything checks out! What should I do regarding reenabling my first Parity drive again? I've got a pretty good idea exactly which cables were loose now that you've mentioned what was connected to what. If I'm able to get everything working, this thread seriously should just be called "What happens when you're terrible at cabling power/sata cables..." Edited February 7, 2018 by tential Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 Still ATA errors on disks 9 and 10, if everything was checked maybe a problem with the controller, a CX500 with 15 disks is pushing it a little, I would use 550/600W, but these ATA errors are slmost certainly unrelated to the PSU Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 4 hours ago, johnnie.black said: Still ATA errors on disks 9 and 10, if everything was checked maybe a problem with the controller, a CX500 with 15 disks is pushing it a little, I would use 550/600W, but these ATA errors are slmost certainly unrelated to the PSU Uh oh, that's worrisome. I just don't see why Disk 9 would have issues, I haven't even been using it. Literally only got errors after I touched something. Disk 9 still is showing as available in my array/working. Disk 10 is even showing more errors. Could I have damaged these 2 drives at this point?tower-diagnostics-20180207-0445.zip Latest Diagnostic. Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 These ATA errors are likely controller related, and the reason you were seeing the slow rebuild at the start, it recovered but these errors are not normal and can result in the disk being dropped. In the last diags there are also similar errors on ATA7, another disk on the same Marvell controller: Feb 6 20:48:07 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 6 20:48:07 Tower kernel: ata7.00: failed command: IDENTIFY DEVICE Feb 6 20:48:07 Tower kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 18 pio 512 in Feb 6 20:48:07 Tower kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 6 20:48:07 Tower kernel: ata7.00: status: { DRDY } Feb 6 20:48:07 Tower kernel: ata7: hard resetting link Feb 6 20:48:07 Tower kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 6 20:48:07 Tower kernel: ata7.00: NCQ Send/Recv Log not supported Feb 6 20:48:07 Tower kernel: ata7.00: NCQ Send/Recv Log not supported Feb 6 20:48:07 Tower kernel: ata7.00: configured for UDMA/133 Feb 6 20:48:07 Tower kernel: ata7: EH complete Then more errors on ATA9: Feb 6 20:54:03 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 6 20:54:03 Tower kernel: ata9.00: irq_stat 0x40000001 Feb 6 20:54:03 Tower kernel: ata9.00: failed command: READ DMA EXT Feb 6 20:54:03 Tower kernel: ata9.00: cmd 25/00:40:98:39:cf/00:05:ed:00:00/e0 tag 10 dma 688128 in Feb 6 20:54:03 Tower kernel: res 53/40:00:a8:3e:cf/00:00:ed:00:00/00 Emask 0x8 (media error) Feb 6 20:54:03 Tower kernel: ata9.00: status: { DRDY SENSE ERR } Feb 6 20:54:03 Tower kernel: ata9.00: error: { UNC } Feb 6 20:54:03 Tower kernel: ata9.00: NCQ Send/Recv Log not supported Feb 6 20:54:03 Tower kernel: ata9.00: NCQ Send/Recv Log not supported Feb 6 20:54:03 Tower kernel: ata9.00: configured for UDMA/33 Feb 6 20:54:03 Tower kernel: sd 10:0:0:0: [sdi] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 6 20:54:03 Tower kernel: sd 10:0:0:0: [sdi] tag#10 Sense Key : 0x3 [current] Feb 6 20:54:03 Tower kernel: sd 10:0:0:0: [sdi] tag#10 ASC=0x11 ASCQ=0x0 Feb 6 20:54:03 Tower kernel: sd 10:0:0:0: [sdi] tag#10 CDB: opcode=0x88 88 00 00 00 00 00 ed cf 39 98 00 00 05 40 00 00 Feb 6 20:54:03 Tower kernel: print_req_error: I/O error, dev sdi, sector 3989780888 You need to get rid of that controller, all these errors resulted in this: Feb 6 20:54:03 Tower kernel: md: disk9 read error, sector=3989780824 Feb 6 20:54:03 Tower kernel: md: recovery thread: multiple disk errors, sector=3989780824 Feb 6 20:54:03 Tower kernel: md: disk9 read error, sector=3989780832 Feb 6 20:54:03 Tower kernel: md: recovery thread: multiple disk errors, sector=3989780832 Feb 6 20:54:03 Tower kernel: md: disk9 read error, sector=3989780840 md: recovery thread: multiple disk errors is unRAID speak for "there are errors in more disks than current redundancy can correct, the rebuild/sync will continue but there will be some (or a lot) of corruption." Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 (edited) I have 2 of that controller. Can I swap that controller, and then move PCI Expres slots as well (in case that's the issue)? I think it's in a PCI Express x1 lane now, but I have x16 GPU slots available still. Also what should I do about the first parity drive? I'm guessing that was simply a cabling issue for sure, how do I re-enable that? Edited February 7, 2018 by tential Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 36 minutes ago, tential said: Can I swap that controller, and then move PCI Expres slots as well (in case that's the issue)? You can try, cancel current rebuild so it will start over. 37 minutes ago, tential said: Also what should I do about the first parity drive? I'm guessing that was simply a cabling issue for sure, how do I re-enable that? Same as re-enabling a data drive, difference it it will be resynced instead of rebuilt. http://lime-technology.com/wiki/Troubleshooting#Re-enable_the_drive Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 (edited) 3 minutes ago, johnnie.black said: You can try, cancel current rebuild so it will start over. Same as re-enabling a data drive, difference it it will be resynced instead of rebuilt. http://lime-technology.com/wiki/Troubleshooting#Re-enable_the_drive Rebuild is already complete. So my first step then, swap marvel controller, move to new PCI Express slot. Start Rebuild(/resync?) again, but on Parity drive first so I have both Parity drives working. Then on Drive 10? Edited February 7, 2018 by tential Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 You can sync parity1 and rebuild disk10 at the same time, though you might just sync parity first after swapping the controller to see if the issues continue. Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 tower-diagnostics-20180207-0625.zip That's what my current diagnostic says now. Still having issues with that same drive getting it to show up. Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 The disk on ATA9 is not being detected, possibly a cable issue, or maybe power: Feb 7 06:24:00 Tower kernel: ata9: softreset failed (1st FIS failed) Feb 7 06:24:00 Tower kernel: ata9: softreset failed (1st FIS failed) Feb 7 06:24:00 Tower kernel: ata9: limiting SATA link speed to 3.0 Gbps Feb 7 06:24:00 Tower kernel: ata9: softreset failed (device not ready) Feb 7 06:24:00 Tower kernel: ata9: reset failed, giving up Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 Ok recabled, everything is showing up. I tried to sync the 8TB drive for the parity, it's going super slow at 500 kb/sec currently. Maybe it will speed up? I left Disk 10 as emulated. Diagnostic is super slow and hasn't finished yet still. Says 212 days to finish the parity sync at this rate and the unassigned drives section takes a second to populate when I go to the main page. Should I cancel and do another diagnostic? Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 3 minutes ago, tential said: Should I cancel and do another diagnostic? Probably best Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 5 minutes ago, johnnie.black said: Probably best tower-diagnostics-20180207-0752.zip Ok, here is where I'm at now. Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 The errors on ATA9/10 persist, constantly, you'll need to get a different controller. Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 3 minutes ago, johnnie.black said: The errors on ATA9/10 persist, constantly, you'll need to get a different controller. I've been using this controller the whole thread, and now 2 different ones are showing errors? What about the third drive connected? This is a 4 port controller with the ATA9/10 ones showing errors but there is a third drive connected right? I still have the 2 port controller card, I can move those over to that card specifically. Previously, I believe I had the SSDs wired on the controller card(I believe). I moved those to the mobo, and some from the mobo to the controller card. Just confused as to why I'm having issues with the controller card now after so long of up time. I'm just confused in general though at this point. Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 It's strange, I was going to say try different disk models, but the other disk is same model and no issues so far, so checkinh SMART reports again, disk9 is failing: 197 Current_Pending_Sector -O--C- 100 100 000 - 8 198 Offline_Uncorrectable ----C- 100 100 000 - 8 Disk10 dropped offline, maybe it's failing also, power cycle the server and get new diags. Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 tower-diagnostics-20180207-0840.zip That's the latest. tower-diagnostics-20180207-0840.zip Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 Yep, it's failing also: 197 Current_Pending_Sector -O--C- 079 079 000 - 7112 198 Offline_Uncorrectable ----C- 079 079 000 - 7112 Kind of my bad for not checking that earlier, but two disks with issues at the same time on the same controller made me suspect of the controller, especially because some Marvell controllers tend to act up in unRAID. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.