Jump to content

[Solved] Cannot access on network, parity-swap, and permissions... Oh my!


Recommended Posts

So, I attempted to update from 5.06a to 5rc3. (After making a backup) I ran into permission issues as others have so I attempted to revert to 5.06a. Now I can no longer access the server via telnet or http. Locally, I can log in and generated a syslog file, but I can't find where the issue lies as it's full of error and other concerning data.

 

Checking with ifconfig yields the same IP as shown in my router's connection table. So it looks like it's connected, but I can't access it at all. Any help would be appreciated as I've been working on this damn thing for about 5hrs and this is our "TV server instead of cable" option that's drawing wife and kid aggro. Cheers,

  hlid.

syslog-hlid.txt

Link to comment

Okay, I managed to reboot it using "shutdown now" (Which took 2hrs) and after booting, I once again have access to the array via console and web. Obviously I have some serious trouble with disk 1, and need to replace it. Sadly I had hoped to replace the parity drive first as it has to be the largest and I have one new 3tb drive to install but only 2tb drives in the array. I'm getting thousands of errors on drive one now, although the array has not yet removed the disk.

 

Does anyone know of any option other than buying another 2tb drive, letting parity fix it then replace the parity with the new 3tb drive?

Link to comment

Okay, I swapped the failed disk out for the parity disk, and put the new 3tb disk in for the parity. Boot up the array but I cannot start the array to initiate the parity-swap. Also, my 5th disk is showing a dark blue ball as well. Any ideas folks? Attached is a screenshot.

 

Edit: Oh, and strangely enough I could only access it by IP for both telnet and GUI. The server's name is "Munin" but for some reason won't connect using the name. (Win7/Firefox)

Main.jpg.0d8619377506a5c42bed50484bbd9f64.jpg

Link to comment

you did not do as instructed.

 

The "swap-disabled" procedure can ONLY be used if the initial disk to be replaced is ALREADY disabled and all the other disks are working and no other changes are being made.  Your data disk was not ALREADY disabled.

 

So...  to attempt to get back to where you were... put all the disks back where they were.  Put the old parity disk in place as parity.    remove the new disk entirely.

 

See if the array will then let you start, even with the data disk having errors. 

If yes... good.  If not, post a new syslog and screen shot if necessary.

 

Then, if the array starts, stop it... power down.  unplug the disk you will be replacing.  (the failing disk)

Power up and start the array without it.  It will show as "disabled"  (actually, you'll need to check the I'm sure checkbox to start the array)

 

Once it shows as disabled... with all the other indicators green, and that one failing disk as "red", stop the array once more.  Then, take the NEW disk and assign it as parity.  Assign the original parity as the failed disk.

 

At that point, the array should say something to the effect of it performing the parity swap-disable process when you next start it.  When you do start the array it will first copy the parity information from the old parity drive to the new and then proceed with the re-construction.

 

The unRAID array will probably be off-line until after the parity copy is complete.  That process can take 8 to 10 hours or more as the entire 3TB drive must be written.  (if writing at 100MB/s it will take 10 seconds to write 1GB.  There are 3000GB to write... 3000 * 10 = 30000 seconds or 8.3 hours...)  Most disks can NOT be written at 100 MB/s, and, you will be both reading the original disk AND writing the new, so expect the effective rate to be somewhere between 50 and 80 MB/s. 

 

Only after the parity disk is copied will the array come back online as the data disk gets re-constructed.

 

Joe L.

Link to comment

Thank you very much. I thought I had done it correctly (started array with disk, then shutdown, remove disk) but I had not successfully started the array with the disk disabled, just powered it on apparently. This time I got it right and was given the option for the parity-swap. So, I've gone to work my 12hr shift today and hopefully when I get home the server will be up and running.

 

Any ideas on why I can only log in via IP address on telnet and gui now instead of via the server name? That one I'm still confused on.

 

I can't stress enough how grateful I am for the help, I generally just use google-fu to figure out what I need too but I had to literally throw my hands up in the air prior to posting. From all the threads I've read this is a great community, kudos folks!

Link to comment

Okay, I have good and bad news. The good news is that the Parity-Swap worked and appears successful. (For those wondering if it would work on RC4 there's some hope.) I seem to be able to once again log in via telnet and GUI using the server name and not just the IP as well! I couldn't using "Munin/Main" as bookmarked, but simply "Munin" and it autodirected to working "Munin/Main" page, strange.

However, I still have two issues.

 

1) On any machine but my main computer, I get access denied to the folders/files in the shares. I've run the permissions script, but no change. My main machine runs into "Read Only" issues as well. Not all files are affected, just most it seems. I'll work on it as there's a few things on the forum I haven't tried yet but I'm way too tired atm to try. (Switching to nightshifts tomorrow, I get a long sleep tonight and a few hours alone to tinker tomorrow afternoon.) Hopefully it's easy, as I'd not run into this issue before the upgrade.

 

2) Disk 5 shows as part of the array, on the GUI but unformatted. No files/folders are visible, and format type is listed as "unknown". Now, I think I might have somehow initiated a format for a few moments when I was dealing with my incorrect protocols earlier while trying to join it to the array (See post above about the "blue ball".) but managed to shut the server down quick. Foolish mistake I know, I blame fatigue for misreading plain language. Now, if the data is still there awesome, and there's still somethings left to try I recall reading on the forum. But I've been up 19hrs on 4hrs sleep and I've learned to my chagrin not to attempt complex computer issues dead tired so I'll again look at it tomorrow. I do have a couple quick questions though if someone knows off the top of their head:

 

    - Since the new parity drive was a copy of the original one, can I trick the array into thinking Disk 5 failed and has been replaced, then let it recover the data? Or did it rebuild parity on the old parity disk, (the new disk 1) when disk 1 was restored, using only the available drives? I'm not sure if the parity-swap was a straight swap or a swap/rebuild.

 

    - If the drive can't be recovered, will the array still take it as precleared and result in a quick format or would it take N-hours and better to pull it from the array, preclear again and then reintroduce it as a new disk to start storing data on it again? Near all of it's data was TV/Movies it's a matter of time to replace more than anything.

 

Cheers,

  hlid.

Disk5_Unformatted.jpg.65e80f22472786842d1f1fae72a4db14.jpg

Link to comment

Just a bit of an update (and again thanks for the continuing support and input.) I ran the reiserfsck --check, and then --rebuild-sb as recommended. Currently attempting to rebuild the tree, although it's likely going to take about 7hrs or so if it remains constant. I have my fingers crossed, but honestly if I have to fully reformat this disk I won't cry, it'll just take time to rebuild. (Nothing vital on it, just media.)

 

As for the permissions, I had tried running the script once again, but it seems to jam out after a bit and just reset. From the fsck errors on the tower's display, I'm going to assume it's likely a problem with either disk 5 being fubar, or a error (possibly fixable) on the new disk 1 (old parity) from the swap. I'll try it again once I've completed the rebuild. Strange though, this was only an upgrade from 5.06a, not a beta 4, and by what I read either that or the fact I'm now on 5rc4 should have fixed/eliminated the "read only" errors on my admin account and the read errors on the others. (Every file was created on the drive from a user, not a root, account.) Ah well, one issue at a time 'eh? Cheers.

Link to comment

My oddessy continues, so another update.  :D

 

It looks like the --rebuild-tree was mostly successful. Only about 250gb of 1750gb of data wound up in the lost and found, with anything not media very easy to figure out and duplicate backups! Unfortunately, due to my continuing permissions issues, I can't do squat to rename/delete/move/copy the data. Having said that, and run a parity update, the array is running far smoother than before the rebuild.

 

I then attempted to run the new permissions script again, and once again it crapped out on disk 1. The log shows an out of memory error, and the captured syslog was over .5GB! Since I can't attach that, I used the log in the GUI and found this: (2 separate logs)

Jun 9 16:59:37 Munin kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 7861 does not match to the expected one 1
Jun 9 16:59:37 Munin kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 398721025. Fsck?
Jun 9 16:59:37 Munin kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 7861 does not match to the expected one 1
Jun 9 16:59:37 Munin kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 398721025. Fsck?
Jun 9 16:59:37 Munin kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 7861 does not match to the expected one 1
Jun 9 16:59:37 Munin kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 398721025. Fsck?
Jun 9 16:59:37 Munin kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 7861 does not match to the expected one 1
Jun 9 16:59:37 Munin kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 398721025. Fsck?
Jun 9 16:59:37 Munin kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 7861 does not match to the expected one 1
Jun 9 16:59:37 Munin kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 398721025. Fsck?

Jun 9 17:00:06 Munin kernel: [ 1549] 0 1549 873 532 2 0 0 awk
Jun 9 17:00:06 Munin kernel: [22375] 0 22375 593 226 3 -17 -1000 udevd
Jun 9 17:00:06 Munin kernel: [22376] 0 22376 593 202 1 -17 -1000 udevd
Jun 9 17:00:06 Munin kernel: [22506] 0 22506 7454 224 2 0 0 shfs
Jun 9 17:00:06 Munin kernel: [22521] 0 22521 2296 480 2 0 0 nmbd
Jun 9 17:00:06 Munin kernel: [22523] 0 22523 3955 931 2 0 0 smbd
Jun 9 17:00:06 Munin kernel: [22529] 0 22529 3958 342 3 0 0 smbd
Jun 9 17:00:06 Munin kernel: [22681] 0 22681 494742 494399 1 0 0 chmod
Jun 9 17:00:06 Munin kernel: Out of memory: Kill process 22681 (chmod) score 479 or sacrifice child
Jun 9 17:00:06 Munin kernel: Killed process 22681 (chmod) total-vm:1978968kB, anon-rss:1977136kB, file-rss:460kB

 

I'm running 4GB of memory on the server, so obviously it's not normal memory usage causing issues. The above first block of text on the server's monitor would constantly scroll so fast you could read it nearly as a single page of static text. So I attempted to remove disk one from the script execution by stopping the array, unmounting disk one, and restarting the array.

- samba stop

- umount /dev/md1

- samba start

 

This seemed to partialy do the trick as running the script after, it actually progressed to disk 2 through disk 5. However, on disk 5 it again hung up as it did on disk 1, with this code:

Jun 9 17:00:06 Munin kernel: [22376] 0 22376 593 202 1 -17 -1000 udevd
Jun 9 17:00:06 Munin kernel: [22506] 0 22506 7454 224 2 0 0 shfs
Jun 9 17:00:06 Munin kernel: [22521] 0 22521 2296 480 2 0 0 nmbd
Jun 9 17:00:06 Munin kernel: [22523] 0 22523 3955 931 2 0 0 smbd
Jun 9 17:00:06 Munin kernel: [22529] 0 22529 3958 342 3 0 0 smbd
Jun 9 17:00:06 Munin kernel: [22681] 0 22681 494742 494399 1 0 0 chmod
Jun 9 17:00:06 Munin kernel: Out of memory: Kill process 22681 (chmod) score 479 or sacrifice child
Jun 9 17:00:06 Munin kernel: Killed process 22681 (chmod) total-vm:1978968kB, anon-rss:1977136kB, file-rss:460kB

 

So, if I understand things correctly I have to somehow fix the problems on disk 1 and disk 5, then re-run the permissions scripts successfully. That should let me deal with the data on disk 5 in the lost and found. Then with the new user share I created (set to read only) for all the household HTPCs, things might just work the way they did prior to this mess.

 

So, what I'm not sure about is how to go about fixing those errors. I'd attach a syslog if they weren't so huge, but if there's some specific data from them that would help I can post an excerpt. Also, I'd like to know if I'm on the right track to do with the permission issues. Any input is always appreciated. Cheers,

  hlid.

Link to comment

Home from work again and managed to try a few things. Looks like I've narrowed down at least the script problem for disk 5, the script seems to have failed for the Lost and Found folder. I'll assume as it doesn't know how to change files with unknown extensions and such. The new user share and script run of most of the array seems to have done the trick for the permissions issues, so it's not all bad and the family is once again able to watch TV/Movies! I shall reward myself with a fine glass of ale later. If any of my helpful commenters are in the Vancouver area, come by for a pint on me.  ;)

 

I figure Disk 1's troubles are mostly caused by the original disk failure followed by the panic-relieving parity swap. When I get my warrantied disk back I'll likely just use it to replace the current one with a parity rebuild, and maybe I'll get lucky. Then format that one and do the same for disk 5. I'm hoping that the errors stem from the format of the disks and not the data. *crosses fingers*

 

As for how to access/rename/move/delete the lost and found files, I'm open to suggestions if anyone has any. Ideally I'd just like to move them to my desktop PC for proccessing and get them off the array entirely for now, but it won't let me do that let alone delete them after. Time for bed in the meantime, cheers!

hlid.

Link to comment

How long did you let the permissions script run? I'd give it at least 24 hours.

 

Well, prior to the fixes the script would crash 2min in on disk 1. After unmounting disk one, the script seemed to complete to disk 5 in about 10 min. It's moot now though, I've fixed my issues!

 

I ran a --checkdisk on disk 1 now (the one that used to be the parity drive before the swap) and then a --rebuild-tree as recommended. It took 8hrs and 4 passes but came up clear in the end. Then before bed last night I ran the script again and left it. The server is now working as it did prior to the disk crash and my partial format of disk 5. Except I'm now running RC4 and have a 3tb parity drive. The whole fiasco seems to have only cost me about 200gb of data (mostly media) and left 1tb worth in the lost and found. 95% of that is easily IDd and copied back.

 

I can't thank your help enough folks, this is a wonderful community for experts and noobs alike. I've been using my Unraid system since March last year, and in fact used it to completely replace cable TV in the house. The family loves it and I just ordered my first Norco box to expand for the future. Cheers from Vancouver!

  hlid.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...