October 24, 201312 yr Hello internet folks who are smarter than I- I've had my 24-drive unRAID server for about 6 months now, built off of the suggested specs from one of bjp999's old threads. Recently I experienced my first red ball, which I swapped out the drive on, only to rebuild and have a second red ball on a totally different row/drive the next day when I came in. Odd, but maybe those drives just couldn't live without one another. We're now in Day 3. During this second rebuild I decided I might as well take the opportunity to upgrade to 5.0- because when things aren't working, it's best to throw more variables into the equation right? 5.0 seems to load fine, although it did need a parity check after then 2nd rebuild. It's now at 3% of the way through the parity check, but one of my main user share folders is now acting like it's totally empty... when that user share is one of the biggest in terms of size on the server. All of my other shares seem to eventually load fine (after a slight delay) with all of their contents intact, but not this specific share. All the hard drive size/disk space stats in the Main tab of the GUI are the same (meaning hopefully all the data is still there), but I can no longer access these files through either SMB or AFP connections once I click on that share. Is there any hope? Or will this share magically return after the parity check finishes? System Log attached for reference. Many thanks for any insight! syslog.txt
October 24, 201312 yr Does unRAID show any errors? Post SMART reports for disks 8 and 11. Disks 8 and 11 have file system errors that can only be corrected with reiserfsck. See Check Disk Filesystems in my sig. And after the file systems are corrected see here: http://lime-technology.com/forum/index.php?topic=28484.0
October 24, 201312 yr Author First thanks for the kind reply. There's a lot of info within it that I'm still trying to read up on/digest- I'm a very novice networking guy. Is there any way to easily run these SMART reports from a Mac connected to the server through a Terminal command? The wiki on SMART reports looks like it's easiest to run these directly from the console, but I'd love to be able to run this from my current machine. Please keep in mind my knowledge of terminal commands is limited to understanding what "cd" and "ls" do. Thanks again for your patience and help-
October 24, 201312 yr Author Also is it worth stopping the current parity check to run the SMART report, or should I wait for parity to finish on the new drives before attempting to run the SMART report?
October 26, 201312 yr Author Hello again dgaschk- I've successfully figured out how to use Telnet and and am in the process of trying to run these SMART reports you requested. When I run the commands in this wiki: http://lime-technology.com/wiki/index.php/Troubleshooting#Obtaining_a_SMART_report I get the following message from terminal: /dev/sda: Unknown USB bridge [0x054c:0x02a5 (0x100)] Smartctl: please specific device type with the -d option If I'm trying to run these report on specific disks, what is the correct command to use in Telnet?
October 26, 201312 yr Author Got the same error, see below. I'm logged into the server as the root user. root@Mainstream:~# smartctl /dev/sda smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net /dev/sda: Unknown USB bridge [0x054c:0x02a5 (0x100)] Smartctl: please specify device type with the -d option. Use smartctl -h to get a usage summary root@Mainstream:~#
October 27, 201312 yr /dev/sda is the flash drive and the flash drive does not have SMART. Disk 8 is /dev/sdn and disk 11 is /dev/sdq. These values are shown on unRAID main. sda is used as an example in the wiki. The wiki also states: Look at the Devices page for the device identifier (within the parentheses) for each disk, and substitute that for 'sda' on the command line. UnRAID main is the "Devices page." Please read the instructions completely and follow them carefully.
October 27, 201312 yr Author My apology. I didn't realize what a device identifier was. An honest thank you for your help so far. Attached are the SMART reports originally requested for the two drives that threw the first errors. Would you suggest I continue with the reiserfsck check? disk8.txt disk11.txt
October 28, 201312 yr Author Those SMART reports in the previous post were before I had run any self-tests. The reports attached here include results from a short test on each disk. Are there any red flags here, or am I safe to continue with your earlier advice with reiserfsck? disk8_v2.txt disk11_v2.txt
October 28, 201312 yr Reports look ok. Disk 11 may have a loose power connection. Firmly reseat the connection. Apply pressure to all of the other connecors and on the other disks to make sure that none are loosened by this intrusion. Replace any power splitters feeding disk 11. Then proceed with reiserfsck.
October 28, 201312 yr Author I checked the power connections and completed the reiserfsck on each trouble disk, reports attached. Looks like it wants to run the --rebuild-tree command to attempt repair of the file system. Since this option is written in red in the wiki, it appears to be a fairly severe corruption of the file tree on these disks. Is my data in real jeopardy here? The original disk 8 & 11 that failed have since been replaced within the array, and I wonder if the rebuild/parity check just copied these file system errors onto the new disks that replaced them. A friend suggested I create a new user share and go disk by disk in my array copying data from the old share folder into the new. Would this be a any safer/easier than the --rebuild-tree option? reiserfsck-disk8.txt reiserfsck-disk11.txt
October 30, 201312 yr Author Results from both --rebuilt-tree commands on disk 8 & 11 are below. They were too large to upload as attachments: http://thejonshow.com/downloads/mymusic/reiserfsck-rebuildtree-disk8.txt http://thejonshow.com/downloads/mymusic/reiserfsck-rebuildtree-disk11.txt How bad is the diagnosis? Safe to proceed with the Extended Attributes Fix thread commands? Many thanks.
October 30, 201312 yr Yes, run the Extended Attributes Fix. If a lot of files are missing you can use the scan-whole-partition option. There are several Windows reiserfs recovery tools available as well.
October 31, 201312 yr Author I got my User Shares to show their content again all thanks to dgaschk & this forum!! It turns out I did lose quite a bit of data (luckily most of which is backed up off the server), I'll look into this "scan-whole-partition" command you mention, as well as any reiserfs recover tools that may work from my mostly Mac based network. From trying to dig through the new lost+found share that the --rebuild-tree command generated, I get the following error across the top of my Shares page... any ideas why? Warning: filetype(): Lstat failed for A in /usr/local/emhttp/plugins/indexer/Browse.php on line 24 Warning: filectime(): stat failed for /usr/local/emhttp/mnt/user/lost+found/33159_33197 in /usr/local/emhttp/plugins/indexer/Browse.php on line 31 Warning: filetype(): Lstat failed for A in /usr/local/emhttp/plugins/indexer/Browse.php on line 24 Warning: filectime(): stat failed for /usr/local/emhttp/mnt/user/lost+found/33370_33411 in /usr/local/emhttp/plugins/indexer/Browse.php on line 31 Warning: filetype(): Lstat failed for A in /usr/local/emhttp/plugins/indexer/Browse.php on line 24 Warning: filectime(): stat failed for /usr/local/emhttp/mnt/user/lost+found/33869_33916 in /usr/local/emhttp/plugins/indexer/Browse.php on line 31 Warning: filetype(): Lstat failed for A in /usr/local/emhttp/plugins/indexer/Browse.php on line 24 Warning: filectime(): stat failed for /usr/local/emhttp/mnt/user/lost+found/33914_34156 in /usr/local/emhttp/plugins/indexer/Browse.php on line 31 Some additional questions I have if you have the time/patience to educate a new kid: [*]Do you have any way of knowing how much of this issue was a corrupted filesystem versus the AFP share name bug? [*]Is there a way of sharing the lost+found share that the --rebuild-tree command generated without sharing all the individual drives it got created on? [*]Is there a way to quickly gauge the current health of my server now (latest syslog attached)? The main unRAID page shows full parity, but as this experience has taught me, that does not mean the server is completely sound. What are some giveaways to look for in the server log or telnet? [*]What did I do wrong to cause this kind of serious filesystem corruption (if anything)? I realize the spaces in user shares might have exploited a pre-existing bug, but would love to understand possible reasons this happened so I can avoid this downtime in the future. [*]I'm guessing file system corruptions get cloned if a disk happens to red ball and you rebuild the data on a brand new drive. Is that correct? Again, many thanks. Our little production company is fully up and running now. -Jon syslog_30.10.13.txt
October 31, 201312 yr I got my User Shares to show their content again all thanks to dgaschk & this forum!! It turns out I did lose quite a bit of data <SNIP> Sorry I have no answers for you, but I have to say that you are uncommonly cheerful for someone who just lost "quite a bit of data." SMALL DISCLAIMER: I know Jon personally, and I am the one who suggested unRAID as a network-storage solution for his company. He is the fourth friend I have convinced to go unRAID...and the first of us to lose data. I really hope someone steps up and explains what happened to your build, as it seems like you did everything right but you still lost data, and for whatever reason this post has only garnered the attention of (the amazingly helpful) dgaschk. It seems to me the community would be much more interested in this. Is it because everyone has moved out of the "General Support [unRAID OS 5.0-rc" forum and into "General Support"? If so, seeing as how this is a really important issue, perhaps starting a new topic over there might be in order, with a quick summary of what transpired and what dgaschk helped you with. Also be sure to give a breakdown of your system specs--maybe you'll get more attention if you give some concrete hardware details of your build. Good luck.
Archived
This topic is now archived and is closed to further replies.