wheel
Members-
Posts
236 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Everything posted by wheel
-
I think that's the part that's killing me - I've been through dozens of drives with no issues ever arising on preclear, and then out of the blue, my last new drive actually died mid-preclear (cycle 1 out of 2). Had that not happened, I'd probably just have slapped the new one in over the 1TB... Now, especially with a not-so-cheap 4TB in play, I feel like I really need to preclear. I guess I'm just going to bite the bullet and try the 'new config' option after I run a normal parity check tomorrow, and just try and do it on a day when I know I'll be occupied by something that'll keep me from going back and checking the parity sync every few minutes Thank you again for all of the assistance!
-
Ahh, that makes sense - so by running my pre-clear before I try the 'new config,' I'm minimizing the chances of a dead disk (or even worse, two-plus) during the new parity sync (which will probably take 12-16hrs with the existing 4TB parity disk and all the 3-4TBs in this tower). Do you recommend any additional preclear flags or unmenu tricks that could help a little extra towards ensuring my unprotected 'new config' period is as safe as it can possibly be?
-
No spare PCs, unfortunately (just a couple of laptops, none with eSATA), so I'm probably going to shoot for itimpi's hybrid suggestion that seems to maintain array protection the whole time - one parity check now, then set up the 'new config' with 18 drives instead of 19, then initiate preclear while 'new config' is calculating new parity. In this case, does it make more sense for me to save my final parity check for AFTER the new 19th drive has been pre-cleared and added to the 'new config' 18-drive setup? Or should I still run another parity check immediately after the parity sync for the 'new config,' prior to adding the new 19th drive? Thank you both for all of your help!
-
Quick question: It's been about a month since I've run a parity check (which came back fine), and I'm about to replace an old drive in my array with a larger new drive (no slots left, it was only 1TB, and I've moved all its content to other drives in the array). Should I run my monthly parity check BEFORE or AFTER I remove the 1TB from the array (bringing it down to 18 drives so I can safely run preclear on my new drive while the array is running)? Or (and I'm pretty sure the answer is "no," but I figured I'd ask) am I better off simply killing the array for a few days' preclear and then "reconstructing" the empty drive as a way to give all of my drives their monthly workout? Thanks in advance for any guidance you can provide!
-
[SOLVED] sync errors at very start of Monthly parity check
wheel replied to hwilker's topic in General Support (V5 and Older)
I could swear that I did, but there's definitely a chance that I didn't (just checked drive purchase date, and it seems like my last parity check date was after that). Are you thinking it could be that rebuild bug related to writes? I'm pretty positive nothing would have been written to the array during that rebuild (I'm the only one using the system)... Thanks, dgaschk! -
[SOLVED] sync errors at very start of Monthly parity check
wheel replied to hwilker's topic in General Support (V5 and Older)
I've just encountered this identical situation on 4.7 (right down to the disk failure about a month ago and the number of sync errors reported on the non-correcting parity check), but my complete var/log/syslog reads only as follows: May 15 04:40:01 Tower syslogd 1.4.1: restart. May 15 08:06:54 Tower kernel: md: sync done. time=33742sec rate=57895K/sec (unRAID engine) May 15 08:06:54 Tower kernel: md: recovery thread sync completion status: 0 (unRAID engine) May 15 08:26:36 Tower kernel: NTFS driver 2.1.29 [Flags: R/O MODULE]. (System) GENERALLY, it sounds like this is nothing to be worried about, but I've seen a couple of similar threads dealing with these "housekeeping" few-error type situations where memory tests and reiser file system checking are recommended. In light of the tiny syslog, is there any easy way to determine whether I should simply go with the "correct parity" option and write these off as "housekeeping," non-data-related errors OR do some more detailed checking (or is there a way to dig back further into the syslog history - or is that even necessary at this point?). -
Thanks for the quick reply, prostuff1 - I'll definitely stick with it and keep everyone posted. Does this mean that the "taking old conf and replacing old version file name with new version file name" approach should work for pulling down the most up-to-date version of SB manually, too? Or am I missing a step anywhere along the way?
-
Bad news: took the "let SB update itself" plunge (Sickbeard just made a huge update on 12/31/12), and Sickbeard failed to finish the process on its own. When I managed a clean shutdown, Sickbeard wouldn't install as usual when prompted. Moved all Sickbeard folders into backup folders, started with a fresh install - first by modifying drcorso's conf file (since I hadn't seen any negative comments) to pull down the most recent master gz from github, then when that installed but complained that it couldn't reach github to check version number (and it clearly wasn't the newest version since it still listed old indexers), I started from scratch again and tried drcorso's file as crafted (same issue with github communication), and started from scratch again with prostuff1's January conf from the first page of this thread. Even with what I believe to be a completely fresh install, I'm getting the github communication errors, and I've had a good long history of keeping up with prostuff1's conf files without ever seeing this error (or any other real trouble - thanks for all of your hard work on this, prostuff1!). Is there a chance that something has changed on Sickbeard's end with this 12/31/12 update? Any advice or guidance would be greatly appreciated!
-
unRAID Server Release 5.0-rc6-r8168-test Available
wheel replied to limetech's topic in Announcements
Forced restart, syslog attached - everything seems good... but what the hell could have caused this? Definitely left field... Hope the log helps development! syslog8-9-12.txt -
unRAID Server Release 5.0-rc6-r8168-test Available
wheel replied to limetech's topic in Announcements
So, I managed a bit of research on PuTTY at lunch - it seems like my tower's missing IP address might be a deal-breaker for this approach (it's not showing up on the network at all). Does anyone else have any other ideas before I go for the forced-restart in a few hours when I make it home? -
unRAID Server Release 5.0-rc6-r8168-test Available
wheel replied to limetech's topic in Announcements
PeterB: that's exactly what I've been doing (plus reboot for drives on the MARVELL cards). Thanks for the clarification! -
unRAID Server Release 5.0-rc6-r8168-test Available
wheel replied to limetech's topic in Announcements
I probably used the wrong language (or I've been doing the wrong thing in the past) - I've "hot swapped" by placing a new drive in an empty hot swap slot and restarting the system (at which point it's picked up by the MARVELL cards and reported to unraid). This time, I went for the reboot without actually placing the drive in the slot (even though that was my original plan - rough morning), so even if I've been doing this wrong, the "hot swap" itself shouldn't be a factor on this error... Hope this clears things up! -
unRAID Server Release 5.0-rc6-r8168-test Available
wheel replied to limetech's topic in Announcements
That sounds like a plan - I'll research the steps during lunch and implement as soon as I get home. Thanks for the suggestion! -
unRAID Server Release 5.0-rc6-r8168-test Available
wheel replied to limetech's topic in Announcements
That was actually one of the first things I did - unfortunately, the monitor was receiving no signal. It's all pretty weird... -
unRAID Server Release 5.0-rc6-r8168-test Available
wheel replied to limetech's topic in Announcements
I almost hate to post this, as there's a chance it's a non-5.0-rc6-r8168-related issue (and it'll be hard to determine for sure until I get home tonight and force-reboot it, assuming that's the recommendation here), but just in case it's a bug: I've been up and running without issue for well over a week now, transferring files and checking parity without issue since the last weird webgui-related crash (see earlier in this thread). I decided to restart it this morning to hot swap a new drive, but never made it that far - after stopping the array, and selecting restart from the main webgui, everything hung. I can't access the webgui anymore (hell, the entire tower is missing from my network list now), and my unraid-equipped USB stick is flashing nonstop (at regular intervals; one flash, nothing, one flash, nothing, one flash, nothing). With that sort of activity on the flash drive, I'm loathe to force-reboot at this point - it seems like it's the only way to get a syslog, though. If anyone has any ideas on how to handle without a force-reboot, please let me know; otherwise, I'll have a full syslog up tonight. -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
Hey, everyone - I'm sorry for disappearing, but just made it through a major disruption of my personal life... Much as it was at the back of my mind constantly, I haven't been able to sit down at either of my unRAID boxes until last night - and luckily, it looks like everything worked out fine with disk16. I ran reiserfsck --checks on every drive, and they came back solid; parity check came back solid; I think I'm set to go. I'll post the last 5000 lines of syslog when I make it home tonight if anyone is curious about this situation, but I'm leaning towards an "if it ain't broke" attitude at this point. Thanks again for all of the help! -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
OK, I took each of those steps - everything SEEMS fine, but I just noticed these errors on the syslog immediately after formatting the precleared d16: Aug 4 18:20:33 Tower logger: mount: wrong fs type, bad option, bad superblock on /dev/md16, Aug 4 18:20:33 Tower logger: missing codepage or helper program, or other error (Errors) Aug 4 18:20:33 Tower logger: In some cases useful info is found in syslog - try Aug 4 18:20:33 Tower logger: dmesg | tail or so Aug 4 18:20:33 Tower logger: Aug 4 18:20:33 Tower emhttp: _shcmd: shcmd (298): exit status: 32 (Other emhttp) Aug 4 18:20:33 Tower emhttp: disk16 mount error: 32 (Errors) Aug 4 18:20:33 Tower emhttp: shcmd (299): rmdir /mnt/disk16 (Other emhttp) Does anyone have any idea what caused these (and whether they're a real issue)? -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
Sounds good - should I delete all of the files on d16 first (since I don't know which ones are corrupt) before I set the new parity with initconfig? -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
I've been looking all over for situations like mine (reiserfsck --check gives a segfault error), but it seems like most segfault problems are associated with --rebuild-tree. So, I've taken a hard look at the data on d16, and determined it won't be the end of the world if I have to replace it all manually (maybe 400gb or so, most of it seeming to work when I load it through SMB, though I haven't tested every single file - planning on moving the "corrupt" files to a known-good disk and playing with them one-by-one until I find the trouble files)... problem is, I don't want to do anything to the file system that would impact the integrity of the rest of my drives. Does anyone have any advice on how to reach a "clean slate" on an irreparably bad disk (like d16 seems to be) without affecting parity or unraid in general? -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
Thanks, Joe - that clarifies the situation considerably! I ran reiserfsck --check on each drive last night, fixed the errors on md2 and md14 via --fixable and --rebuild-tree, respectively (though d14's "5 corruptions" ended up with 4203 @ lost and found, and 27458 deleted!), but ran into a roadblock with md16. md16 goes into the bitmap comparison stage ("phase one" of the reiserfsck, as far as I can tell) and immediately starts throwing back errors like: "the problem in the internal node occurred (439552052), whole subtree is skipped" I count 12 of these (along with "Zero bit found in on-disk bitmap after the last valid bit") before the reiserfsck ends with "Segmentation fault" and bumps me back to the command line. Does anyone have any suggestions on how to handle this particular drive? Or is it unfixable by reiserfsck (or any other method) at this point? -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
OK, since no one recommended I run --rebuild-tree, and all of the files I'd moved Thursday night appear to be where they're supposed to be, I used unmenu to run a non-correcting parity check... and it came back with 0 errors. I figured I was good, but JUST TO BE SAFE I went back and ran "reiserfsck --check" on disk14 (one with Thursday-move files)... and received the EXACT same five errors (and --rebuild-tree suggestion) I saw when I ran it last week. If all the files look like they're where they're supposed to be, and if a parity check is coming back clean, but "reiserfsck --check" is still reporting errors on multiple disks, is this one of those rare "trust my array" moments, or is something really wrong here? -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
Always with a powerdown from the webgui.... I think I've figured the cause of the disk error based on current error spread (they're all disks that I'd moved files onto Thursday night - right after the parity check came back good, having corrected 6 errors). I ran another parity check overnight (after receiving those errors), and now the webgui tells me parity's checked with 0 errors found. Reiserfs, not so much... -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
OK, disk 14 (first rebuilt disk) returned some errors: Reiserfs journal '/dev/md14' in blocks [18..8211]: 0 transactions replayed Zero bit found in on-disk bitmap after the last valid bit. Checking internal tree.. \/ 1 (of 21|/ 34 (of 86... [this led to] "The level of the node (0) is not correct, (1) expected the problem in the internal node occurred (462061569), whole subtree is skipped" x5, basically, then "Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Bad nodes were found, Semantic pass skipped 5 found corruptions can be fixed only when running with --rebuild-tree" Same thing with the second rebuilt disk (disk16), which came back with 12 of those errors - but this time, just dumped me back to the command line after the 12th error (no "Comparing bitmaps.." message, no recommendation of --rebuild tree, unlike disk14) I'm coming back here with this based on the recommendation in the instructions - is this a recommended time to run --rebuild-tree? I've parity checked the tower to the point where it comes back with 0 errors, but these 5 corruptions give me pause... [EDIT] - hitting the other disks now, and no problems on 1-3, but #4 came back with two errors (but no "Level of the node is not correct" messages, just the "Comparing bitmaps / differs" message - and this one tells me to use the "--fix-fixable" command) -
Parity last checked on 7/25/12, finding 66,423,368 errors
wheel replied to wheel's topic in General Support (V5 and Older)
The second parity check reported 6 errors fixed, but 2 hours later I'm seeing this: Jul 26 21:46:29 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 0 does not match to the expected one 3 Jul 26 21:46:29 Tower kernel: REISERFS error (device md14): vs-5150 search_by_key: invalid format found in block 432978720. Fsck? Jul 26 21:46:29 Tower kernel: REISERFS error (device md14): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [3719 3721 0x0 SD] Jul 26 21:46:29 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 0 does not match to the expected one 3 Jul 26 21:46:29 Tower kernel: REISERFS error (device md14): vs-5150 search_by_key: invalid format found in block 432978720. Fsck? Jul 26 21:46:29 Tower kernel: REISERFS error (device md14): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [3719 3721 0x0 SD]