trurl Posted January 31, 2019 Share Posted January 31, 2019 25 minutes ago, dogfluffy said: At this point if I disconnect the 4 drives that are connected to the other onboard controller and connect these last 3 drives. I could then pull the SMART logs and it wouldn't hurt anything or cause any more errors since the bad controller isn't connected to anything? Sound good? Yes that should be fine. Link to comment
trurl Posted January 31, 2019 Share Posted January 31, 2019 I see you edited your post as I was replying. I don't know. Maybe something with the power connection? Link to comment
dogfluffy Posted January 31, 2019 Author Share Posted January 31, 2019 Gah lag. Yes in every sense, I think this is it. I had a power molex connecting to a string of SATA power connections and it looks like it might have wiggled loose and been causing problems. I ALSO seem to have a bad drive, although that could be the result of this loose connection. I pulled disk13 and connected it via a USB with it's own power and it says Smart read failed and was disabled unless via ATA connection. I had it connected via the ATA and it would report that SMART was disabled. I tried to enable it from the console but was unsuccessful. I can connect it back up to a different power rail and reboot inside the case again. I think this was the drive that was clicking though. I was able to get the SMART off disk8 and disk10 and attached the report. Disk13 is still connected via the USB right now. Of course 5 minutes before I ordered the Supermicro controller card. I should try to find direct SATA power cables for my Seasonic modular power supply so I can toss these power Molex to SATA wyes. smartreportpartial.zip Link to comment
dogfluffy Posted January 31, 2019 Author Share Posted January 31, 2019 And yeah, disk13 I connected to the supermicro controller and a native SATA power connection and it shows DISK_DISABLE_NP in unMENU and isn't even available in the disk management pulldown to try and run SMART report on. I also had it connected to another wye on accident and rebooted with the intended native SATA and now it just says DSK_DSBL in unMENU and not installed on the normal unraid screen. It still won't run smartctl and returns failed. So I suppose I should order another disk? I had been thinking about buying a 6 or 8 TB to be my next parity drive down the road, if I was going to upgrade, someday. It's looking like someday is here. Statistics for /dev/sdd ST3000DM001-1CH166_W1F2JKX7 smartctl -a -d ata /dev/sdd smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Link to comment
trurl Posted January 31, 2019 Share Posted January 31, 2019 SMART for those other 2 disks looks OK. I thought this Disk13 was a new disk. Of course new disks can be bad. Not sure what you have in mind thinking about buying a larger disk. Your current parity is only 3TB so none of your array disks can be larger than that. You could replace parity with a larger disk and rebuild parity, but then you wouldn't be able to rebuild Disk13. But you may not be able to rebuild Disk13 anyway and even if you did it might not give any better results than if you just tried to use the original Disk13. I am going to start another post so hold on. Link to comment
trurl Posted January 31, 2019 Share Posted January 31, 2019 If those other 2 disks are showing up now, you should be able to start the array if only Disk13 is missing. If you can start it post a new screenshot. Link to comment
dogfluffy Posted January 31, 2019 Author Share Posted January 31, 2019 https://www.rosewill.com/product/rosewill-rsv-l4500-4u-rackmount-server-case-or-chassis-15-internal-bays-8-cooling-fans-included/ Oh no this thing is all ripped apart right now. I disconnected the drives attached physically to one of the onboard controllers and also a string of SATA power on a molex splitter. It's a mess, but I was just trying to get a SMART report off disk13. I thought it was new, but is apparently trashed regardless. I have 2 other 3TB drives but they have files on them I may need, I don't know yet. It's sounding like this may be trashed...but I could order another 3TB and I already ordered the Supermicro controller. It might be best to just let this sit and try to isolate what is causing the bad mojo. Those 3 disks were connected to the same controller with the 4th cable unused, and I also had a whole series of wyes and power adapters going on I couldn't see very well along the fan rail. I can't tell how many because I think I needed a couple just as extensions for a longer run and it was all tucked away out of sight. Link to comment
trurl Posted January 31, 2019 Share Posted January 31, 2019 Well the good news is most of your disks look to be fine. And since each disk in Unraid is independent, that means most of your files should be fine as well. We can continue after you get the new controller and your cabling fixed. Link to comment
dogfluffy Posted January 31, 2019 Author Share Posted January 31, 2019 Excellent. Yes I have a project now. I ordered a fresh drive and I'll label these to the unraid serial number and re-inspect these power connections before I put it all back together. If I bring it all online with the new Supermicro controller and the new drive ready to become disk13 during a rebuild is that the optimal plan going forward next week hopefully? Link to comment
trurl Posted January 31, 2019 Share Posted January 31, 2019 39 minutes ago, dogfluffy said: If I bring it all online with the new Supermicro controller and the new drive ready to become disk13 during a rebuild is that the optimal plan going forward next week hopefully? Yes I think that will be fine. Probably if you have everything hooked up the array will start but with nothing assigned as disk13. Then you would have to stop the array, assign the new disk13, then starting the array will begin the rebuild. Whether or not it works like this let us know and we can figure out where to go from there. Link to comment
dogfluffy Posted January 31, 2019 Author Share Posted January 31, 2019 Thanks again for all your help. It looks like shipping is ETA Feb 6, and I should be set to bring the array online and rebuild it, again. Hopefully no gremlins this time and be able to see what the file situation looks like. Link to comment
dogfluffy Posted February 5, 2019 Author Share Posted February 5, 2019 Thanks to advances in modern logistics I'm already back up and running! I still have some housekeeping to do hardware wise to clean all this up, but new drive and controller are installed and running. I assigned the new disk13 and rebuilt last night with 0 errors. So I'm back to the Unformatted Disk10 and an essentially blank Disk11. I'm not sure how to best proceed, but I can start viewing files and trying to piece together what data is missing, and what data I have on previous unraid drives. There was one console message about a reiserfs error on device md10(disk10), probably due to my scan of that disk I gather, but no messages since and no bad noises. Also I forgot about a lost+found on disk9. This is probably the useful bit of info you need? Feb 4 16:34:20 Calculon logger: mount: /dev/md10: can't read superblock Feb 4 16:34:20 Calculon emhttp: _shcmd: shcmd (42): exit status: 32 (Other emhttp) Feb 4 16:34:20 Calculon emhttp: disk10 mount error: 32 (Errors) Feb 4 16:34:20 Calculon emhttp: shcmd (43): rmdir /mnt/disk10 (Other emhttp) Feb 4 16:34:20 Calculon emhttp: shcmd (44): mkdir /mnt/disk11 (Routine) Feb 4 16:34:20 Calculon emhttp: shcmd (45): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md11 /mnt/disk11 |$stuff$ logger (Other emhttp) Feb 4 16:34:20 Calculon kernel: REISERFS error (device md10): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] (Errors) Feb 4 16:34:20 Calculon kernel: REISERFS (device md10): Remounting filesystem read-only (Drive related) Feb 4 16:34:20 Calculon kernel: REISERFS (device md10): Using r5 hash to sort names (Routine) Do you need any logs or more info from me? Link to comment
trurl Posted February 5, 2019 Share Posted February 5, 2019 I guess the next step is to repair the filesystem on Disk10. Please capture the output from trying the repair so you can post it. https://wiki.unraid.net/Check_Disk_Filesystems#Drives_formatted_with_ReiserFS_using_unRAID_v5_or_later Link to comment
JorgeB Posted February 5, 2019 Share Posted February 5, 2019 3 minutes ago, dogfluffy said: got failed to open the device '/dev/md10':no such file or directory. That would suggest the array isn't started, and don't forget start in maintenance mode. Link to comment
dogfluffy Posted February 5, 2019 Author Share Posted February 5, 2019 Hah yeah I swear I hit the button! Anyway ran and got Bad root block 824657253. (--rebuild-tree did not complete) Aborted (cored dumped) Link to comment
JorgeB Posted February 5, 2019 Share Posted February 5, 2019 Run with --rebuild-tree again. Link to comment
dogfluffy Posted February 5, 2019 Author Share Posted February 5, 2019 It's working! Counting down from a very large number. I'm going to grab some lunch. Thanks again for the help. I assume this is processing the lost+found folder or does it run off parity? I'm not sure what handles lost+found? Link to comment
itimpi Posted February 5, 2019 Share Posted February 5, 2019 8 minutes ago, dogfluffy said: It's working! Counting down from a very large number. I'm going to grab some lunch. Thanks again for the help. I assume this is processing the lost+found folder or does it run off parity? I'm not sure what handles lost+found? This is using parity plus the other data disks to work out what each sector on the emulated disk should contain. It is then reading every sector on the emulated disk trying to find file structures and reconstruct the folder/file name information. It is only at the end of this process does it decide if it has some files that appear to have no related directory information and it is these that go into the lost+found folder. Link to comment
dogfluffy Posted February 5, 2019 Author Share Posted February 5, 2019 Wall that was my second guess. Thanks for the explanation! I *think* this might explain why disk13 and disk3 (I believe) had very similar data? It also could be just from my mistake disk shuffling without proper labels. Link to comment
JorgeB Posted February 5, 2019 Share Posted February 5, 2019 I assume this is processing the lost+found folder or does it run off parity? I'm not sure what handles lost+found? Parity won't be used for a filesystem check, unless doing it on an emulated disk, a lost+found folder might be created by reisersfck if it finds some files, complete or partial, that doesn't know what folder they were in. Link to comment
trurl Posted February 5, 2019 Share Posted February 5, 2019 Disk10 isn't disabled so it isn't emulated. Parity is not involved except for the fact that any changes made by the repair are writes to the data disk that will make parity update, as I already explained here: On 1/30/2019 at 3:29 PM, trurl said: Repairing a filesystem is also a write operation. It writes corrections to the filesystem metadata. But if you are working at the command line, there are 2 different ways to refer to the filesystem. You can repair the partition on the sd device, which is what you did way back in the first post. Doing it this way leaves parity out of it and so parity becomes invalid when you take this approach. Or you can repair the md device, which includes parity when writing those corrections, and so parity is maintained. Since you are repairing the md device, parity is updated when the repairs are written. But parity itself has none of your data as mentioned before, and since you aren't working with the emulated disk, parity and the other disks aren't even read. Emulation (if it were involved here, which it isn't) could not possibly be the reason for: 42 minutes ago, dogfluffy said: why disk13 and disk3 (I believe) had very similar data Link to comment
dogfluffy Posted February 6, 2019 Author Share Posted February 6, 2019 OK it finished. Has a long console message, but Objects w/o names 2692, Empty dirs 25, Dirs linked /lost+found: 235, Dirs w/o:25,Files linked to lost+found 2457, pass 4 finisheddone 576403, 603/sec, Deleted unreachable 53670, flushing, syncing finished. Back to prompt. Link to comment
trurl Posted February 6, 2019 Share Posted February 6, 2019 Sounds like you wound up with a lot of stuff in lost+found. Trying to make any sense of what is there may be impractical. Have you looked at it? Link to comment
dogfluffy Posted February 6, 2019 Author Share Posted February 6, 2019 Not yet, but I watched the console scroll off and on. Looks promising! I didn't want to screw it up now by mounting something wrong, although I think we're basically finished now? I think that before I thought the parity works backwards but now I understand it. I also have the 2 disks I pulled out along the way. So once I restart the array I guess I can copy it back over? I also had a weird counter intuitive disk assignment for my data share, where I always assigned new disks as available to the share, thinking it would always be large enough. But it sort of scattered some data it seems across the nearly blank disks too. I'm still piecing it back together but it looks good and again no errors on the unraid main screen. I suppose I may have had a flaky controller or loose power, or all 3 at once going on. Thanks for all your help with this, and teaching me how it actually works. It's parity magic. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.