September 12, 200718 yr Last week I started getting frequent lockups. It happens when I'm accessing files mostly. It might have happened while it was just sitting idle but I can't remember any specific times of that. It started after I did some major reshuffling of files, deleting old stuff, etc. I'm running 4.1 now, but I was still running 3.0 when I did the major file deletions. Any help would be appreciated. Here is my syslog: A 9QH004DP offset: 63 size: 390711352 Sep 12 16:31:18 Tower kernel: [ 60.992527] md8: import [34,0] (hdg) HDS722525V LAT80 VNR93EC6CKMY0M offset: 63 size: 244198552 Sep 12 16:31:18 Tower kernel: [ 60.992672] md9: import [34,64] (hdh) ST3400633 A 5NF1XAGJ offset: 63 size: 390711352 Sep 12 16:31:18 Tower kernel: [ 60.992745] md10: import [56,0] (hdi) ST3500630 A 5QG0YBF2 offset: 63 size: 488386552 Sep 12 16:31:18 Tower kernel: [ 60.992875] md11: import [8,32] (sdc) SAMSUNG H D501LJ S0MUJ1KP203453 offset: 63 size: 488386552 Sep 12 16:31:18 Tower kernel: [ 60.992880] md12: import: no device Sep 12 16:31:18 Tower kernel: [ 60.992882] md13: import: no device Sep 12 16:31:18 Tower kernel: [ 60.992883] md14: import: no device Sep 12 16:31:18 Tower kernel: [ 60.992885] md15: import: no device Sep 12 16:31:18 Tower emhttp[1304]: shcmd (3): killall -w smbd nmbd Sep 12 16:31:19 Tower emhttp[1304]: Scanning user shares... Sep 12 16:31:19 Tower emhttp[1304]: shcmd (4): rm -r /mnt/user/* 2>/dev/null Sep 12 16:31:19 Tower emhttp[1304]: merge_dir opendir: No such file or directory Sep 12 16:31:19 Tower emhttp[1304]: shcmd (5): /usr/sbin/nmbd -D Sep 12 16:31:19 Tower emhttp[1304]: shcmd (6): /usr/sbin/smbd -D Sep 12 16:31:19 Tower emhttp[1304]: driver cmd: start STOPPED Sep 12 16:31:19 Tower kernel: [ 62.340379] mdcmd (3): start Sep 12 16:31:19 Tower kernel: [ 62.340491] md0: import [8,16] (sdb) ST3500630A S 5QG0RAWC offset: 63 size: 488386552 Sep 12 16:31:19 Tower kernel: [ 62.340504] md1: import [56,64] (hdj) ST3400633 A 5NF1XLRH offset: 63 size: 390711352 Sep 12 16:31:19 Tower kernel: [ 62.341219] md2: import [57,0] (hdk) ST3400633A 5NF205WY offset: 63 size: 390711352 Sep 12 16:31:19 Tower kernel: [ 62.341346] md3: import [57,64] (hdl) ST3500630 A 9QG1KF1E offset: 63 size: 488386552 Sep 12 16:31:19 Tower kernel: [ 62.341459] md4: import [33,0] (hde) ST3500630A 9QG1KGAD offset: 63 size: 488386552 Sep 12 16:31:19 Tower kernel: [ 62.341578] md5: import [33,64] (hdf) ST3400620 A 9QH00FVF offset: 63 size: 390711352 Sep 12 16:31:19 Tower kernel: [ 62.341666] md6: import [22,0] (hdc) ST3400633A 3PM06223 offset: 63 size: 390711352 Sep 12 16:31:19 Tower kernel: [ 62.341746] md7: import [22,64] (hdd) ST3400620 A 9QH004DP offset: 63 size: 390711352 Sep 12 16:31:19 Tower kernel: [ 62.341824] md8: import [34,0] (hdg) HDS722525V LAT80 VNR93EC6CKMY0M offset: 63 size: 244198552 Sep 12 16:31:19 Tower kernel: [ 62.341992] md9: import [34,64] (hdh) ST3400633 A 5NF1XAGJ offset: 63 size: 390711352 Sep 12 16:31:19 Tower kernel: [ 62.342064] md10: import [56,0] (hdi) ST3500630 A 5QG0YBF2 offset: 63 size: 488386552 Sep 12 16:31:19 Tower kernel: [ 62.342194] md11: import [8,32] (sdc) SAMSUNG H D501LJ S0MUJ1KP203453 offset: 63 size: 488386552 Sep 12 16:31:19 Tower kernel: [ 62.342198] md12: import: no device Sep 12 16:31:19 Tower kernel: [ 62.342200] md13: import: no device Sep 12 16:31:19 Tower kernel: [ 62.342202] md14: import: no device Sep 12 16:31:19 Tower kernel: [ 62.342203] md15: import: no device Sep 12 16:31:19 Tower kernel: [ 62.343778] unraid: allocated 25462kB Sep 12 16:31:19 Tower kernel: [ 62.344035] md1: running, size: 390711352 block s Sep 12 16:31:19 Tower kernel: [ 62.344052] md2: running, size: 390711352 block s Sep 12 16:31:19 Tower kernel: [ 62.344070] md3: running, size: 488386552 block s Sep 12 16:31:19 Tower kernel: [ 62.344085] md4: running, size: 488386552 block s Sep 12 16:31:19 Tower kernel: [ 62.344099] md5: running, size: 390711352 block s Sep 12 16:31:19 Tower kernel: [ 62.344121] md6: running, size: 390711352 block s Sep 12 16:31:19 Tower kernel: [ 62.344135] md7: running, size: 390711352 block s Sep 12 16:31:19 Tower kernel: [ 62.344152] md8: running, size: 244198552 block s Sep 12 16:31:19 Tower kernel: [ 62.344166] md9: running, size: 390711352 block s Sep 12 16:31:19 Tower kernel: [ 62.344186] md10: running, size: 488386552 bloc ks Sep 12 16:31:19 Tower kernel: [ 62.344201] md11: running, size: 488386552 bloc ks Sep 12 16:31:19 Tower emhttp[1304]: driver cmd: check Sep 12 16:31:19 Tower kernel: [ 62.582637] mdcmd (5): check Sep 12 16:31:19 Tower kernel: [ 62.582646] md: recovery thread got woken up .. . Sep 12 16:31:19 Tower kernel: [ 62.582648] md: recovery thread checking parity ... Sep 12 16:31:19 Tower kernel: [ 62.582654] md: writing superblock to /boot/con fig/super.dat Sep 12 16:31:19 Tower emhttp[1304]: shcmd (7): udevsettle Sep 12 16:31:19 Tower kernel: [ 62.589948] md: using 1152k window, over a tota l of 488386552 blocks. Sep 12 16:31:20 Tower emhttp[1385]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md1 /mnt/disk1 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1388]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md2 /mnt/disk2 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1391]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md3 /mnt/disk3 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1393]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md4 /mnt/disk4 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1395]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md5 /mnt/disk5 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1398]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md6 /mnt/disk6 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1401]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md7 /mnt/disk7 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1403]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md8 /mnt/disk8 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1408]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md9 /mnt/disk9 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1411]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md10 /mnt/disk10 >/dev/null 2>&1 Sep 12 16:31:20 Tower emhttp[1414]: shcmd (: mount -t reiserfs -o noatime,nodi ratime /dev/md11 /mnt/disk11 >/dev/null 2>&1 Sep 12 16:31:20 Tower kernel: [ 63.472768] ReiserFS: md6: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.472788] ReiserFS: md6: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.486198] ReiserFS: md6: journal params: devi ce md6, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.487046] ReiserFS: md6: checking transaction log (md6) Sep 12 16:31:20 Tower kernel: [ 63.568557] ReiserFS: md6: replayed 2 transacti ons in 0 seconds Sep 12 16:31:20 Tower kernel: [ 63.702652] ReiserFS: md1: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.702673] ReiserFS: md1: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.702942] ReiserFS: md2: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.702957] ReiserFS: md2: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.703173] ReiserFS: md11: found reiserfs form at "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.703189] ReiserFS: md11: using ordered data mode Sep 12 16:31:20 Tower kernel: [ 63.703431] ReiserFS: md10: found reiserfs form at "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.703449] ReiserFS: md10: using ordered data mode Sep 12 16:31:20 Tower kernel: [ 63.703681] ReiserFS: md9: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.703697] ReiserFS: md9: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.703915] ReiserFS: md8: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.703926] ReiserFS: md8: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.704142] ReiserFS: md7: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.704158] ReiserFS: md7: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.704386] ReiserFS: md5: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.704403] ReiserFS: md5: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.704622] ReiserFS: md4: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.704645] ReiserFS: md4: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.704896] ReiserFS: md3: found reiserfs forma t "3.6" with standard journal Sep 12 16:31:20 Tower kernel: [ 63.705079] ReiserFS: md3: using ordered data m ode Sep 12 16:31:20 Tower kernel: [ 63.715334] ReiserFS: md6: Using r5 hash to sor t names Sep 12 16:31:20 Tower emhttp[1398]: remount: /dev/md6 Sep 12 16:31:20 Tower kernel: [ 63.806872] ReiserFS: md1: journal params: devi ce md1, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.807737] ReiserFS: md1: checking transaction log (md1) Sep 12 16:31:20 Tower kernel: [ 63.811718] ReiserFS: md2: journal params: devi ce md2, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.812725] ReiserFS: md2: checking transaction log (md2) Sep 12 16:31:20 Tower kernel: [ 63.812901] ReiserFS: md11: journal params: dev ice md11, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.813781] ReiserFS: md11: checking transactio n log (md11) Sep 12 16:31:20 Tower kernel: [ 63.813978] ReiserFS: md10: journal params: dev ice md10, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.814909] ReiserFS: md10: checking transactio n log (md10) Sep 12 16:31:20 Tower kernel: [ 63.815109] ReiserFS: md9: journal params: devi ce md9, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.816378] ReiserFS: md9: checking transaction log (md9) Sep 12 16:31:20 Tower kernel: [ 63.816595] ReiserFS: md8: journal params: devi ce md8, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.817540] ReiserFS: md8: checking transaction log (md8) Sep 12 16:31:20 Tower kernel: [ 63.817748] ReiserFS: md7: journal params: devi ce md7, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.818708] ReiserFS: md7: checking transaction log (md7) Sep 12 16:31:20 Tower kernel: [ 63.819181] ReiserFS: md5: journal params: devi ce md5, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.820196] ReiserFS: md5: checking transaction log (md5) Sep 12 16:31:20 Tower kernel: [ 63.820428] ReiserFS: md4: journal params: devi ce md4, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.821677] ReiserFS: md4: checking transaction log (md4) Sep 12 16:31:20 Tower kernel: [ 63.821894] ReiserFS: md3: journal params: devi ce md3, size 8192, journal first block 18, max trans len 1024, max batch 900, ma x commit age 30, max trans age 30 Sep 12 16:31:20 Tower kernel: [ 63.822947] ReiserFS: md3: checking transaction log (md3) Sep 12 16:31:20 Tower kernel: [ 63.903947] ReiserFS: md7: replayed 2 transacti ons in 0 seconds Sep 12 16:31:21 Tower kernel: [ 64.056896] ReiserFS: md7: Using r5 hash to sor t names Sep 12 16:31:21 Tower kernel: [ 64.063872] can't shrink filesystem on-line Sep 12 16:31:21 Tower emhttp[1401]: remount: /dev/md7 Sep 12 16:31:21 Tower kernel: [ 64.300610] can't shrink filesystem on-line Sep 12 16:31:21 Tower kernel: [ 64.396940] ReiserFS: md1: replayed 2 transacti ons in 1 seconds Sep 12 16:31:21 Tower kernel: [ 64.397199] ReiserFS: md10: replayed 2 transact ions in 1 seconds Sep 12 16:31:21 Tower kernel: [ 64.438940] ReiserFS: md5: replayed 2 transacti ons in 1 seconds Sep 12 16:31:21 Tower kernel: [ 64.439039] ReiserFS: md2: replayed 2 transacti ons in 1 seconds Sep 12 16:31:21 Tower kernel: [ 64.439166] ReiserFS: md8: replayed 2 transacti ons in 1 seconds Sep 12 16:31:21 Tower kernel: [ 64.745432] ReiserFS: md8: Using r5 hash to sor t names Sep 12 16:31:21 Tower kernel: [ 64.745564] ReiserFS: md5: Using r5 hash to sor t names Sep 12 16:31:21 Tower kernel: [ 64.767763] ReiserFS: md1: Using r5 hash to sor t names Sep 12 16:31:21 Tower kernel: [ 64.777648] ReiserFS: md10: Using r5 hash to so rt names Sep 12 16:31:21 Tower emhttp[1403]: remount: /dev/md8 Sep 12 16:31:21 Tower emhttp[1395]: remount: /dev/md5 Sep 12 16:31:21 Tower kernel: [ 64.910909] ReiserFS: md2: Using r5 hash to sor t names Sep 12 16:31:22 Tower emhttp[1411]: remount: /dev/md10 Sep 12 16:31:22 Tower kernel: [ 65.026415] ReiserFS: md9: replayed 2 transacti ons in 2 seconds Sep 12 16:31:22 Tower emhttp[1385]: remount: /dev/md1 Sep 12 16:31:22 Tower emhttp[1388]: remount: /dev/md2 Sep 12 16:31:22 Tower kernel: [ 65.373765] ReiserFS: md9: Using r5 hash to sor t names Sep 12 16:31:22 Tower emhttp[1408]: remount: /dev/md9 Sep 12 16:31:22 Tower kernel: [ 65.843131] can't shrink filesystem on-line Sep 12 16:31:22 Tower kernel: [ 65.871459] can't shrink filesystem on-line Sep 12 16:31:23 Tower kernel: [ 66.231685] can't shrink filesystem on-line Sep 12 16:31:23 Tower kernel: [ 66.332827] can't shrink filesystem on-line Sep 12 16:31:23 Tower kernel: [ 66.333092] can't shrink filesystem on-line Sep 12 16:31:23 Tower kernel: [ 66.417761] can't shrink filesystem on-line Sep 12 16:32:14 Tower kernel: [ 117.013857] ReiserFS: md11: replayed 375 transa ctions in 54 seconds Sep 12 16:32:14 Tower emhttp[1414]: remount: /dev/md11 Sep 12 16:32:14 Tower kernel: [ 117.078777] ReiserFS: md11: Using r5 hash to so rt names Sep 12 16:32:15 Tower kernel: [ 118.061223] can't shrink filesystem on-line Sep 12 16:33:33 Tower kernel: [ 196.415284] ReiserFS: md4: replayed 373 transac tions in 133 seconds Sep 12 16:33:33 Tower kernel: [ 196.466945] ReiserFS: md4: Using r5 hash to sor t names Sep 12 16:33:33 Tower emhttp[1393]: remount: /dev/md4 Sep 12 16:33:34 Tower kernel: [ 196.977245] can't shrink filesystem on-line Sep 12 16:34:59 Tower kernel: [ 281.964819] ReiserFS: md3: replayed 356 transac tions in 219 seconds Sep 12 16:34:59 Tower emhttp[1391]: remount: /dev/md3 Sep 12 16:34:59 Tower kernel: [ 282.026533] ReiserFS: md3: Using r5 hash to sor t names Sep 12 16:35:00 Tower kernel: [ 282.607762] can't shrink filesystem on-line Sep 12 16:35:00 Tower emhttp[1304]: shcmd (: killall -w smbd nmbd Sep 12 16:35:02 Tower emhttp[1304]: Scanning user shares... Sep 12 16:35:02 Tower emhttp[1304]: shcmd (9): rm -r /mnt/user/* 2>/dev/null Sep 12 16:35:02 Tower emhttp[1304]: oldpath=/mnt/disk2/Movies/Thumbs.db already exists Sep 12 16:35:02 Tower emhttp[1304]: oldpath=/mnt/disk6/Movies/Beatles, The A Har d Day's Night/Thumbs.db already exists Sep 12 16:35:02 Tower emhttp[1304]: oldpath=/mnt/disk6/Movies/Beatles, The A Har d Day's Night/folder.jpg already exists Sep 12 16:35:02 Tower emhttp[1304]: oldpath=/mnt/disk9/Movies/Thumbs.db already exists Sep 12 16:35:03 Tower emhttp[1304]: oldpath=/mnt/disk10/Movies/Thumbs.db already exists Sep 12 16:35:03 Tower emhttp[1304]: oldpath=/mnt/disk11/Movies/Flawless/Thumbs.d b already exists Sep 12 16:35:03 Tower emhttp[1304]: oldpath=/mnt/disk11/Movies/Flawless/folder.j pg already exists Sep 12 16:35:03 Tower emhttp[1304]: merge_dir opendir: No such file or directory Sep 12 16:35:03 Tower emhttp[1304]: user share: TV shows Sep 12 16:35:03 Tower emhttp[1304]: user share: My Documents Sep 12 16:35:03 Tower emhttp[1304]: user share: iTunes Sep 12 16:35:03 Tower emhttp[1304]: user share: Music Sep 12 16:35:03 Tower emhttp[1304]: user share: Software to Install Sep 12 16:35:03 Tower emhttp[1304]: user share: Movies Sep 12 16:35:03 Tower emhttp[1304]: shcmd (10): /usr/sbin/nmbd -D Sep 12 16:35:03 Tower emhttp[1304]: shcmd (11): /usr/sbin/smbd -D Sep 12 16:36:07 Tower in.telnetd[1452]: connect from 192.168.0.179 (192.168.0.17 9) Sep 12 16:36:09 Tower login[1453]: ROOT LOGIN on `pts/0' from `192.168.0.179' root@Tower:~#
September 12, 200718 yr The part of the syslog you've included above does not show any serious error, although it does seem to be starting up after a crash. The fact that it is replaying transactions just means it is catching up with unfinished work, and with probably no data loss. The 3 drives with significant transaction replaying are Disk3, Disk4, and Disk11, and they add a minute to a minute and a half each, to the startup time. It would help if you would list your hardware and attach a complete syslog. Then you will probably have to do a little problem isolation yourself: which drives seem to be most affected (possibly one or more of the 3 mentioned above?), what actions precede the lockups, is it both reading and writing of the drives, etc. You should also eliminate the usual suspects, such as cpu heat, system temps, sufficient power, good power supply, no loose cable connections, etc. There are several threads on running reiserfsck on your drives (similar to Windows scandisk).
September 12, 200718 yr Author Thanks RobJ. I've had lockups with all 3 of those drives. They are my most accessed drives. 3 is music. 4 is TV Shows. 11 is the most used Movies directory. Lockups have occurred with both reading and writing. You mention a "full syslog" I'm not sure what you mean by that. In telnet I type cat /var/log/syslog Then I select all and copy and paste. What else should I be doing? Hardware list: Celeron D 3.06 Mhz Asus P5PE-VM Kingwin KF-21 Mobile Racks Promise Ultra 133 controllers 512 MB old DDR Ram Linksys gigabit NIC Antec Neo HE 550W I'll go research reiserfsck. Thanks.
September 12, 200718 yr When you say, "lockup", can you still access via telnet or do you have to hard reset?
September 12, 200718 yr ------------------------------------------------------------------------------------ To obtain a copy of your current syslog, at the unRAID console or in a Telnet session, type the command: cp /var/log/syslog /boot This will make a copy of the system log in the root directory of your flash drive, which you can either copy directly from the flash share of your server, or plug the flash drive into your PC and access the syslog there. Any file manager such as Windows Explorer can access the file across the network. For example, if your unRAID server name is Tower, then you can access your newly created syslog as \\Tower\flash\syslog. I recommend renaming it with the date and time and the .txt extension, for example syslog2007-08-28-1630.txt. ------------------------------------------------------------------------------------
September 12, 200718 yr Author I have to hard reset. I can't even use the power button on the front of my server. I have to use the back switch on the PSU. Thanks for the clarification on full syslog. When I get home tonight, i'll try to post that. Maybe I should plug in a monitor to see what is going on when it locks up.
September 13, 200718 yr Thats one of the symptoms i was having with my SATA150TX4 cards. However since i went to the beta whilst my other problems still occur i dont get complete lockups. Try the latest beta
September 13, 200718 yr Author Hmmm. Well, I don't have any SATA cards. All but two of my 11 drives are PATA. Also, lockups happen with drive 11, which is using the MB SATA controller. Following the directions above, i made a new syslog and attached it. It was too long to fit into the body of a post. I find it suspicious that I see oldpath "Flawless" and "Hard Days Night" every time I reboot.
September 13, 200718 yr Sorry my talk about SATA might be a red herring. I have no idea what the problem the beta fixed or if it is relevant at all to your problem. I only suggest it as a trivial thing to try to see if it helps.
September 13, 200718 yr Author It locked up this afternoon without me really accessing it. The only thing I did was copy the syslog so that I could post it. It locked up within 30 minutes of powering it up.
September 13, 200718 yr Hmmm. Well, I don't have any SATA cards. All but two of my 11 drives are PATA. Also, lockups happen with drive 11, which is using the MB SATA controller. Following the directions above, i made a new syslog and attached it. It was too long to fit into the body of a post. I find it suspicious that I see oldpath "Flawless" and "Hard Days Night" every time I reboot. Good news is there are no h/w errors being logged; bad news is, if it is a h/w problem, may be difficult to diagnose. Here are some things to try: - Turn off User Shares and see if system is still unstable. - Download latest 4.2-beta4 and see if system is still unstable.
September 13, 200718 yr I'm probably not the best person to respond about the oldpath stuff, because I have never used user shares, but here's what I understand. When the user shares are created, the unRAID system creates a special folder that is a union of the same-named folders on the individual disks. If it finds a file of the same name and path on more than one disk, then it has to report an error and skip additional occurrences, since you cannot have multiple files with the same filename in the same folder. For example, the first 'oldpath' error is "oldpath=/mnt/disk2/Movies/Thumbs.db already exists". That means that there is a "/mnt/disk1/Movies/Thumbs.db", and in fact you also have that Thumbs file on Disk9 and Disk10. Someone else can better advise you as to how to avoid the multiple occurrences.
September 13, 200718 yr Author Thanks for the info on oldpath. That makes sense. I turned off all shares and it seemed stable. Of course with all shares turned off, I couldn't test read/write for stability. Then I turned on disk shares and moved a few gigs to two separate disks at the same time. Plus, I streamed an .avi file to XBMC from a different disk. It seems stable for now. So that means that something screwy is going on with User Shares. I'll try to install the 4.2 beta today and report back.
September 14, 200718 yr Could be running out of memory - 4.2 is more efficient at User share memory management that previous releases.
September 14, 200718 yr Author Well 4.2 beta didn't fix it. It locked up again. So you think it could be running out of memory? My question would be, why would it run just fine for a year with no problems and then run out of memory all of a sudden? What should be my next step in troubleshooting? I never did run reiserfsck. I guess I should look into that.
September 14, 200718 yr Try running memtest86 for an hour or two. When unRaid boots you should get the option to run it. You may have a bad DIMM. -Stefan
September 14, 200718 yr Well 4.2 beta didn't fix it. It locked up again. So you think it could be running out of memory? My question would be, why would it run just fine for a year with no problems and then run out of memory all of a sudden? What should be my next step in troubleshooting? I never did run reiserfsck. I guess I should look into that. I'm not going to be the best person to reply but I have a suspicion or a gut feeling I might have had similar symptoms. Could you run dmesg next time you telnet over and post the results? Mola
September 14, 200718 yr Author Well now, I'm stuck. When I telnet in and tried login as root, I get an error message that says (disconnect bypassed -- root login not allows). I've tried rebooting 3 times and I always get the same message.
September 15, 200718 yr Thats a new thing but i think you are misinterpreting what it is say. When you see this you are almost certianly logged in. Type whoami and you will see
September 15, 200718 yr Author I'm logged in (sort of) but I can't use any commands. EDIT: Ok..that's not true. I could use commands, but it locked up within 10 minutes. Even with User shares turned off now. I'm really thinking this is a memory problem, like everyone has suggested. I'm going to try some other RAM today and see how it does.
October 7, 200718 yr Author Ok. I was too busy to work on it for the last couple weeks. Today I had some time and I did two things. I swapped out the RAM for 2 X 1GB sticks. More than enough and I'm fairly sure that the RAM is good. Also, I upgraded to 4.2.1, thinking that my problems were related to the problems in the other thread. I have certainly gotten that error message a lot. The changes didn't help. I am still having the same problem. I lose network connectivity to the server after a length of time, about an hour or so. I can't telnet in or access via network places or windows explorer. Sometimes I have to hard reboot with the PSU switch and sometimes the front reset button works. I tried to attach my syslog, but it wont' allow it. The file is only 53K but I'm getting told it is too large. Weird. Thanks for any help.
Archived
This topic is now archived and is closed to further replies.