Oddwunn

Members
  • Posts

    111
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Oddwunn's Achievements

Apprentice

Apprentice (3/14)

0

Reputation

  1. "Syslog is missing a lot of time, can't see the rebuild, disk21 needs a filesystem check, you can try that before using the old disk, but without seeing what happened with the rebuild probably best to use the old disk as there might be data corruption." I am pretty sure that the 4TB disk 21 was trashed (or maybe defective out of the box) during the rebuild, so I want to pull it out of the array. I just want to make sure that the 8 steps I outlined above are EXACTLY correct because if I am wrong, I might inadvertently rebuild the 3 TB disk with the bad parity information. Are steps 6, 7, and 8 precisely correct? That is, when I reassign the parity drive and then start the array, will parity be newly recreated rather than rebuilding disk 21 with the old, bad parity? Thank you for your help!
  2. Oops...sorry. Here it is. tower2-diagnostics-20210219-0733.zip
  3. I have been replacing 3TB disks in my array with larger 4TB disks. The first 2 went in just fine (one at a time, of course), and the array properly rebuilt the new disks with the data from the old disks. When I got to the 3rd replacement (disk 21), the data rebuild was horrendously slow with the WebGUI reporting that the rebuild would take 28 days to complete. At that point I tried, while the rebuild was in progress, to copy the data from disk 21 to disk 15 which was completely empty. I did this by using Putty and running MC to do the disk copy internally in the server. That internal transfer was not working, so I aborted the MC copy to disk 15. When I did that, then suddenly the data rebuild, for no apparent reason, picked up speed tremendously and reported that the data rebuild would be complete in 9 hours (normally this rebuild takes 18 hours on this server). When the rebuild was complete, I discovered that disk 15 no longer was mountable, so I stopped the array, restarted it and told it to format any unmountable drives. Disk 15 was then formatted, was now mountable, and parity began rebuilding itself and I thought all was well .... until I tried reading disk 21, the disk that had been rebuilt with the weird slow/fast rebuild time. As it turns out, disk 21 was completely trashed with most of the files gone and and a directory that I can't even enter. So now I am trying to save disk 21 by replacing the trashed 4tb drive with the original 3 tb drive, which should be fine. Please let me know if I am planning this correctly: 1. Stop the array 2. Unassign the parity drive 3. Replace the trashed 4TB drive with the original 3 TB drive. 4. Start the array to check that the files are ok on that drive. 5. Stop the array again. 6. Hit the "new config" button and tell it to keep current assignments (I have never tried a "new config", so this is where I want to make sure that I don't screw up) 7. Reassign the parity drive as per before. 8. Start the array once again and parity should be reuilt from scratch, right? Is this the correct way to fix my problem, or is there a more efficient or otherwise better way to get the job done?
  4. Hmmm.....I thought it was Seagate who bought them. Well, I told ya I am a moron. Ah, crap! The finest consumer level drives ever made, and now we won't be able to buy them soon. Edit: I should qualify my statement, as "finest" is too broad a term. I should have said "most dependable" instead, as there have always been better performers in other parameters. No matter...I will still stay as far away as possible from Seagate. Virtually every one of the ~75 to 80 drives that I have bought in the past have failed (most of the time catastrophically) while in use. Seagate is a failure looking for a place to happen. I stll have 4 more of them in use, so now that I know that I won't be seeing HGST being sold for much longer, I will swoop up as many as I can before they disappear completely.
  5. Ok, thanks again, Johnnie.Black! I will try that. The data I lost simply represented a lot of work, not a lot of irreplaceable data. I am happy and thankful that parity has saved several disks in the past...I just have to remember NEVER to buy another Seagate drive again, as I have about a 90% failure rate with them, while maintaing a 0% failure rate with Hitachi/HGST (out of about 500 drives that I have bought over the years...wierd considering that Seagate now owns HGST). Anyway, many thanks for all of your help!
  6. Ok, as suggested by tee-tee jorge, I performed a check filesystem on the EMULATED disk2, and sure enough, it asked for -L (to clear a log?) and then the check was performed. The result was that the emulated disk no longer comes up as "Unmountable: no file system", though the red "x" is still there because the disk is not physically installed. Looking at the contents of the disk (it was originally a 4TB disk filled with about 3TB of data), I found that I have roughly 2TB of data which seems to be recovered and about 1TB of data in the "lost and found" directory, with all of that data put into separate numerical directories, and I have no way of knowing which data goes where. Unless anyone has any idea of something else I can do, I think I should just accept that I have lost disk2 and I will research the procedure in the wiki as to how to install a new disk2, with no data on it, simply formatted and ready to use with NEW data. My remaining concerns, though, are along these lines: 1. It seems to me to be suspiciously coincidental that all of these problems started only when I updated to 6.8.2. (Read my original post to understand what I mean by "all of these problems".) 2. Maybe I have another hardware issue that is wreaking havoc on my server? 3. Is there any possibility that reverting back to 6.7.2 could clear up any of this mess, or would I be simply adding fuel to the fire? 4. I guess parity protection is useless when the problem disk develops a format problem. It would have been nice to be given a head's up by UnRAID that the disk should be replaced or reformatted, while still protecting the data on it. As it stands now, once the format goes bad, then UnRAID thinks that the bad format needs to be protected in addition to the data, or at least that is what it looks like to me. Right now I am scared to use this server at all. Here is the latest diagnostics, in case anyone wants to see them: tower2-diagnostics-20200222-1140.zip
  7. Great! Thanks tons! I didn't realize I could check the filesystem on an emulated disk. I will do so and report back as soon as the diskcopy I am currently running is finished.
  8. Ok, here it is....:) (Sorry...I meant to dl and attach it earlier.) tower2-diagnostics-20200220-1239.zip
  9. I have a 23 data drive, 1 parity drive server that had been running great for the last 4 or 5 years. A few days ago I upgraded from version 6.7.2 to version 6.8.2. When I rebooted the server to effect the update, I looked at the dashboard and noticed that I had a single SMART error on 21 of the 23 data drives, "UDMA CRC error count 1" and then acknoledged the error in order to clear it (21 times, of course). When using the WEB interface, I checked my user shares and discovered that one of the shares, "Games" (spanning disks 1 - 5) was empty, though all of the data appeared intact on all of the individual disks. I tried rebooting the server a couple of times, and after the second time I found disk 3 reading as "not installed" and was disabled in the array. Being the total dummie-moron that I am, I immediately replaced the drive and then allowed the software to rebuild the replaced drive. All went well and the new drive looked fine, so I thought that the only problem I had left was to try to fix the user share "Games" that was still currently empty, thinking that maybe the bad disk had somehow affected my user share. I finally gave up trying to fix the share and figured that I would just access the "Games" drives one disk at a time until I could research a solution. I started to write some data to disk2 today and found that I could not rename the directory I had just written....hmmm. I copied a couple more directories to disk2 with no issues and then, for no apparent reason (to me, anyway) the entire contents of disk2 disappeared. I rebooted the server and the next time it came up I found disk2 with a red "X" and it was listed as "Unmountable: no file system". I then ran xfs_repair (since all of my disks are formatted XFS) from the console using the -nv switches. It took only 4 seconds, but the log it generated was absolutely HUGE. I am guessing that every single file on the disk has been trashed (not by the repair utility, since it was in read only mode, but by something else unknown to me), so I chose to shut down the server and remove the trashed disk2. I tried to mount disk2 on a Windows 10 machine using a Linux utility from Paragon Software, but Windows won't mount the drive either. Is there any way to recover the data on disk2 or it trashed completely? Would reverting back to 6.7.2 while the disk is missing give me the opportunity to replace the disk or otherwise recover the data? Are all of these problems that popped up since upgrading to 6.8.2 just a coincidence or could the new version be causing problems? (I have another 23 disk server that upgraded with no issues whatsoever, but it does have a slightly newer MB, CPU, and SATA cards. I have the problem server online and am copying data as fast as I can to another server before more problems pop up. I should be done in about a month. Anyone have any advice as to the easiest way to get out of this dilemma, even if it isn't the cheapest way to get things back to normal?
  10. It has been almost 24 hours since I followed the above advice, and so far everything is working properly. I am not technically educated enough to know why this solution works, but I am just thrilled to have things working properly with both servers. Many thanks to everyone who contributed to this thread. You guys really know your stuff!!
  11. Ok, got it...I will do that next. Yes, that is correct. Ok...got it. Nope...I don't own anything with Apple's name on it. While I have been composing this reply, I made the changes the 2 of you suggested, and now the server has a distinctly different "feel" to it (for lack of a better word). The webUI responds very quickly now, where it once seemed lethargic even when it was working. I am testing it further by writing several large files to it in order to see how it responds over a longer period of time. I will report back here as soon as I have the results (some time tomorrow, as I am writing about 2TB of files.) Once again, than you for hanging with me with this problem, as I am very hopeful that the problem may be solved. I have no idea where to find that information.
  12. Hi guys, Thanks for the responses, though admittedly a lot of it is over my head. My original post #3 reads: Over the last few days, I will amend that statement to now read "SOMETIMES connecting via the IP address works and sometimes it doesn't." But the main issue here is not in accessing the rogue server via the webUI, as MOST of time I have no problems connecting. The IMPORTANT, MAIN problem is in writing to the array disks. Writing to the array is a hit or miss proposition at best. I can read from it 100% of the time, but sometimes I can not write to it, sometimes I can write to it but it bails out part way, and sometimes I can not write to it at all. Since this is a media server with large MKV video files, I would be panicking big time if I could not read from the array reliably. My workaround solution to writing to the array is to write the files to the cache disk, open a Putty window, and then use Midnight Commander to write the files to the final destination array disk, a slow and cumbersome process at best. To answer some of the recent questions: 1. Yes, I can control the range of IPs available to be handed out by DHCP. Should I limit the DHCP handout range and then assign a static IP outside of the range to the rogue server? 2. "On the array server having issues go to Settings > Network Settings and check the Network protocol it is running on the active interface (IPv4, IPv6, both?)" I don't see anything even remotely close to that. I see one section labeled "Interface eth 0" that displays the MAC address, several IPs, and 2 settings which allow me to choose between static and automatic IP. There is nowhere there that provides me with values like "IPv4, IPv6, or both" and since there is only a single interface, there is no other interface to look at. 3. Tstor said, in an earlier post "Clean up the DHCP setup (static IPs OR reservations, but not both)", yet kizer said "2. Set each machine to a static ip address. 3. Login to router and set those two address's as reserved" Which one is correct? 4. Opening a command prompt and typing arp -a yields perfect results - right now.
  13. I have set up a DHCP reservation on my router, changed the IP to "automatic" in unRaid and rebooted both units, yet the problem persists. I can not find any other computer in my LAN named "Tower1". I have attached the latest syslog to this message in hopes that there will be more information there that might provide a clue as to what is going on. Like I mentioned, I ordered a new NIC to install in the server, so I will change that card as soon as it arrives on Friday. The syslog seems to indicate some sort of error with the WebUI, yet the WebUI seems to be working perfectly. The problem is still the same - Tower1 simply chooses to refuse the connection at random times, for random amounts of time, on random disks within the array, yet ALWAYS writes to the cache disk without issue. Is there anything else I can try while waiting for the new NIC to arrive? OddwunnSyslog2.txt
  14. When you wrote that, I started thinking and realized that my MB had 2 RJ45 ports included on the back panel. As I remember, they failed, so I bought a PCI card with a single RJ45. When I read your statement above, I wondered if I had disabled the 2 onboard RJ45s in the mobo's BiOS. I just rebooted the server, entered the BiOS, and found that both RJ45 connectors had been disabled, but I have to wonder if they still could somehow be the source of my conflict. Possible? Since I have reservations setup on my router, I can't leave the static IP setup in unRaid? That is, I should switch unRaid back to "automatic"? So, I should switch the server back to "automatic" in 2 places: 1. settings/IP address assignment = automatic 2.settings/DNS server assignment = automatic Is this correct? BTW, I ordered a new PCI NIC from Amazon - Intel PWLA8391GT PRO/1000 GT PCI Network Adapter Is that a good choice?