March 22, 201412 yr Well, almost. this morning I could not access my server (\\Tower) on my PC. I used putty to telnet and ls /mnt, only disk4 was there (out of 9 disks). I tried to access unraid menu thru browser, it just hung Finally, I tried shutdown and reboot. After reboot, situation is unchanged. I copied the /var/log/syslog file to /boot then I tried to view it with "less", the telnet session hung. I opened a new telnet window and logged in. Tried to read log again and it hung :-( I then shut down and removed the flash key and put it in my PC and copied it all (including the syslog which is attached). I dont see anything strange in the syslog but I am no expert in reading them. If someone could have a look and let me know if I missed something, I would appreciate it. What now? I suppose it could be something wrong with the flash Key itself. when I mounted it on the PC, it complained and offered to run ckhdsk. Perhaps it is corrupted? But I can still read the files? should I put a fresh copy of the bzimge, etc on it and try again? Really stuck here. Thanks for any help. syslog.3-22-2014.9.zip
March 22, 201412 yr Couple things I see, the array is not set to auto start, so /mnt will not be populated until you manually start the array. Your log file cut off just after the eth0 link went up so there was no network access to unRAID for the duration of the log file you posted. Post a log file with longer duration if things are still 'broke' and we'll see what it might tell us.
March 22, 201412 yr Author Couple things I see, the array is not set to auto start, so /mnt will not be populated until you manually start the array. Your log file cut off just after the eth0 link went up so there was no network access to unRAID for the duration of the log file you posted. Post a log file with longer duration if things are still 'broke' and we'll see what it might tell us. Thanks but I don't know how to start the array if the Unraid web menu is not available. When I try to open the /Tower or /Tower:8080 (for unmenu), the browser hangs so I assume that the web server is not running. Is there some way to manually start that via a telnet or the console? Here is some new news: I put the flash back in the server and tried to boot it. It would not even power up! The CPU fan jumps briefly but it will not start running. So I must have a hardware problem (dead motherboard?) that I need to debug first. I think this will mean disassembling the entire machine :-( When I get it to boot up again, I will post again. Thanks again.
March 22, 201412 yr Hardware for sure. Be sure nothing is caught in the CPU fanblades and prevent it from running. Watch the Boot Up via Console...Check your BIOS and make sure that the Flash Drive is in the boot sequence. Eventually, you'll want to run chkdsk (PC) or Disk Utility (Mac) on your Flash Drive.
March 23, 201412 yr It would not even power up! The CPU fan jumps briefly but it will not start running.First thing that I think of when that happens is either power supply or motherboard. Try unplugging everything from the motherboard except the power supply and CPU, then see if you get missing memory error code beeps. If it still won't fully power on, unplug all the drive power connectors from the PSU and try again. If you get beep codes, then add the RAM back and see if it posts. Add parts and plugs back one at a time until it either runs, or you get to the bad component. Leave your unraid usb out until you get it posting properly with all the drives connected.
March 23, 201412 yr Author It would not even power up! The CPU fan jumps briefly but it will not start running.First thing that I think of when that happens is either power supply or motherboard. Try unplugging everything from the motherboard except the power supply and CPU, then see if you get missing memory error code beeps. If it still won't fully power on, unplug all the drive power connectors from the PSU and try again. If you get beep codes, then add the RAM back and see if it posts. Add parts and plugs back one at a time until it either runs, or you get to the bad component. Leave your unraid usb out until you get it posting properly with all the drives connected. Sound like a good plan. I doubt if the power supply is bad its a nice Seasonic 800. Motherboard is a Gigabyte so probably that. It is only 2 years old! Man, I hate debugging this stuff. The wife is out of town and I was going to binge on movies :-(
March 24, 201412 yr Author OK. Strange happenings. The server started right up when I wanted to see if the BIOS settings had somehow changed. I was convinced that the problem was DIMM seating, but I ran the memory test and it passed! So here is the situation: Linux system boots, but none of the disks are mounted. Trying some commands like, cat, less, etc hang the system, while others like ls are fine??? Here is what I plan next: get a fresh copy of the bz file and reformat the Flash and the copy the new files. Questions: 1. I think all I need to restore my system is bzimage bzroot memtest syslinux.cfg syslinux.exe config: disk.cfg go ident.cfg network.cfg Pro2.key secrets.tdb share.cfg super.dat If these are undamaged, then the system should boot to the old configuration, correct? If not, then I will come back and ask for more advise. Thanks.
March 26, 201412 yr Author OK. I reformatted the Flash, updated to 5.05 and it still wont start the Web Gui and most commands in a telnet session will hang the session (note that I can still start a new session, it's just that executing some commands will hang up a session). To get a syslog, I have to dismount the Flash and mount on a PC. I have captured a syslog and I have an older log, captured before this behavior started. I used sdiff to get a line-by-line comparison but I don't see anything that would explain the current behavior. My hope is that someone with more experience will see something that I don't. I opened a new question on this here: http://lime-technology.com/forum/index.php?topic=32577.0. Hopefully someone (garycase?) can help. It's been over a week now and no movies :-( Going into withdrawal.
March 27, 201412 yr Did you move this to a new topic? About 'how to read a syslog?' ...I'm thinking its the same issue.
March 27, 201412 yr Author Did you move this to a new topic? About 'how to read a syslog?' ...I'm thinking its the same issue. Not really moved. I just wanted some help in how to interpret the syslogs. I did a side-by-side compare of a previous log (when the system was working fine) and the current one, where unraid does not respond with web gui and many linux commands in telnet result in hanging the session. However, the logs don't seem to have any useful pointers so I now thinking about starting over will a completely new, blank unRAID installation. I am told that this will not destroy data on my current drives, that, assuming that I can access the unRAID control GUI, be able to re-assign the disks to their old places. I am going to post a question on exactly how to do this today.
March 27, 201412 yr Do this FIRST: Take a screenshot of your Web GUI (and save it on your PC) showing the EXACT serial numbers of your drives and their assignments. In theory, you can figure it out without it. In practice, you'll want to DOUBLE CHECK that you've assigned the same PARITY drive (and cache drive). Mix them up with a data drive and you'll have a lot of work and trouble unwinding it all.
March 27, 201412 yr Author Do this FIRST: Take a screenshot of your Web GUI (and save it on your PC) showing the EXACT serial numbers of your drives and their assignments. In theory, you can figure it out without it. In practice, you'll want to DOUBLE CHECK that you've assigned the same PARITY drive (and cache drive). Mix them up with a data drive and you'll have a lot of work and trouble unwinding it all. Thanks, but the main problem is that the unRAID web gui does not respond to attempts to load on a browser. I can't access the web gui. So the question is: is there anything useful that can be done to debug this without the unRAID web gui? The machine boots and loads unix and appears to run emhttp, but no web gui and no disks mounted. A couple of people have looked at the syslog but I got no suggestions on what might be wrong from that. The consensus seems to be that there is a hardware problem, but the machine POSTS fine, runs the BIOS and passes the memory test. How to debug a problem with no symptoms? This is what leads me to want to start over with known good unRAID installation files on the theory that one or more of my config files is corrupted. Since I can read the "text" configs such as disk.cfg, etc, it appears that they are fine. That leaves super.dat and secrets.tdb, which are binary files. Secrets.tdb seems to be a file that contains information from the samba system so corruption of that file does not seem like it would cause these problems. So that leave super.dat. Searching for information on that, I found a couple cases in which it was corrupted, but that was reported by the system (maybe it has a checksum?) and I get no such report. Anyway, just looking for what to do next. I do have an old screenshot and I think I know which disks are mapped to which HDD (we really need unambiguous terminology for these thing). If I delete super.dat but leave the .cfg files (and I have not moved the drives around), will the system be able to recover my config? Thanks.
March 27, 201412 yr If I delete super.dat but leave the .cfg files (and I have not moved the drives around), will the system be able to recover my config? My understanding is that all of the .cfg files just save the settings that you configure in the webGUI. super.dat is the config that you are trying to recover. If you delete it then you will have to reassign your drives. While it is true that disk.cfg has settings for the disks by disk number, I don't think there is anything there that tells it what disk should be what number. I don't really know how unRAID kept track before v5 introduced assignments by serial number. It seems that the sdx can't be relied on to stay constant for a specific disk. Maybe someone else knows. Like I said on your other thread, you can try to assign all of your drives as data drives and parity will be the only one that shows up as unformatted. If more than one shows as unformatted then you have another problem and shouldn't proceed without further advice. Do you have a backup of your data? Maybe you should consider trying to mount the drives on another computer and copy the data off. Of course the other computer will have to be able to read reiserFS, but supposedly there are ways to do that even on Windows.
March 27, 201412 yr Not sure what to say here. There is another thread running (just closed successfully) where the user had a bad super.dat file, emailed to Tom privately and was returned 'fixit' directions. However that user was trying to upgrade and his 4.7 was bad.
March 31, 201412 yr Author OK. I finally resolved this problem! However, I am NOT marking it "Solved" because I don't understand the problem. As an engineer, if you don't understand the problem, you can't really claim to have solved it. After much fruitless pouring over log files and deleting and renaming various parts of the config, I decided to "start over". I reformatted the flash and copied a fresh set of unRAID files from Limetech, with none of my config files (just the default ones). I rebooted and ... I still could not access the web GUI. I could telnet to the server and the same symptoms were in evidence (some Linux commands would hang the session, etc. Well, I thought, there is obviously a hardware problem somewhere. Strictly on a hunch, I power cycled two of the four ethernet switches in the network between my PC and the server and VOILA, I could access the unRAID main page via browser on my PC! I then took a chance and restored from backup my previous unRAID flash files and the server then appeared normal (except that main reported an unsafe shutdown and wanted to check parity). I ran the parity check and no errors were found! So, how is it possible that anything "stuck" in an ethernet switch would cause the emhttp to fail to load and to prevent access via the web GUI, and YET, allow me to telnet to the box? This is the puzzle now. Can anyone come up with a testable hypothesis? I would really like to understand and document this for future reference. Thanks to all who replied.
April 1, 201412 yr Sheesh...I don't think I've ever restarted an ethernet switch. Unless it was to physically move it around. And I suppose an occasional thunderstorm might cause a brief power outage. Your question is gonna keep me up all night.
April 1, 201412 yr I've seen this type of issue several times over the years -- NOT a lot ... it's rare enough it's certainly not high on my list of things to try => but whenever there are unexplained networking issue (strange access symptoms such as you were having; inconsistent speeds (i.e. 100Mb speeds on a Gb network); or other weird networking problems, I always do two things: (a) power cycle (with min 30 seconds OFF) ALL of the involved networking components (switches, routers, etc.); and (b) change ALL of the cables involved in transfers between the offending PC and the server. This ALMOST always resolves things. If it doesn't, I try using a different set of switch/router ports ... and that will usually catch whatever the first try didn't. The most common causes I've seen are bad cables [NOT "bad" in the sense that they don't work; but bad in the sense that they've clearly got some discontinuity in at least one of the pairs; which will cause slower transfers in one direction (rarely both)]; and bad switch ports [simply changing the port will typically resolve this).
Archived
This topic is now archived and is closed to further replies.