April 23, 201214 yr The trouble began when I added two new Supermicro cages and tried to add eight new disks. I immediately had trouble, so I booted up with only the six disks from before. Great, one disk is disabled. I tried to swap one of the about to be added disks for the newly disabled disk, and it displayed as a 1TB disk, not 500GB. Okey, dokey. I finally swapped in one of the new 1TB disks (*9176), and the rebuild took place today. Current issues: You can see in the screen shot, none of the Command Area buttons are there. Bummer. My one and only share is gone, and has been gone since the first time I tried to boot up with all eight new disks active. disk5 does not let me into view the contents. Windows Explorer, Total Commander, Midnight Commander, unRAID file browser, etc. None will allow me access to the disk's contents. Things look stable now, but I'd like to perform a simple reboot to see if the share comes back. Possibly related to these hard shutdowns that are continuing is the attempt to access disk that is filling up the memory. You can see in the attachment where invoking powerdown simply fails. I'm REALLY frustrated here, and I'm hoping someone can ask the correct questions to get me in the right direction to solve this. Also, a better subject line to help someone else? version 4.7 Five of the six disks are plugged into a SUPERMICRO AOC-SASLP-MV8, and the sixth is plugged into the motherboard. syslog-2012-04-22.txt powerdown.fail.txt
April 23, 201214 yr Attach the entire syslog. zip if needed. There may be something wrong with the new cages or a hardware relating to their install.
April 23, 201214 yr Author Attach the entire syslog. zip if needed. There may be something wrong with the new cages or a hardware relating to their install. That's all I have for a syslog. After the last boot, I took the screenshot, tried to access disk5 directly, the web interface stopped responding, and I had to do a hard shutdown. The server is off for now. Only one of the new cages has active disks in it at this point, although the bottom one does have power so its fan runs.
April 23, 201214 yr Your still using that 500 watt power supply? The additions may have been too much. On my 650 I had trouble on the 13th drive but I'm running a few 7200s.
April 23, 201214 yr Author Your still using that 500 watt power supply? The additions may have been too much. On my 650 I had trouble on the 13th drive but I'm running a few 7200s. All fourteen disks only tried to spin up once, then I scaled back to just six disks. Everything I read states this 500 watt PSU is plenty for fifteen disks.
April 23, 201214 yr Author Okay, current config is as follows: Top cage is an Icy Dock MB455SPF-B, and all slots have disks, and they are all plugged into the motherboard. Middle cage is a Supermicro, and it has a single disk also plugged into the mobo. Command Area buttons did not return. Could not access disk5 directly. New syslog taken before the web interface crashed. Other login: root Linux 2.6.32.9-unRAID. root@Other:~# top top - 11:22:32 up 9 min, 1 user, load average: 3.88, 2.28, 0.99 Tasks: 89 total, 4 running, 84 sleeping, 0 stopped, 1 zombie Cpu(s): 32.2%us, 67.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2041064k total, 765164k used, 1275900k free, 62692k buffers Swap: 0k total, 0k used, 0k free, 154116k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2662 root 20 0 28396 1040 680 S 40.5 0.1 2:53.83 shfs 2560 root 20 0 13664 3536 2748 R 38.9 0.2 2:15.01 smbd 1503 root 20 0 1616 536 468 S 8.3 0.0 0:08.80 logger 3362 root 20 0 783m 465m 668 R 8.3 23.3 0:08.61 ls 1348 root 20 0 1688 592 504 S 4.0 0.0 0:08.02 syslogd 1 root 20 0 704 308 264 S 0.0 0.0 0:01.49 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 4 root 20 0 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0 5 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/0 6 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper 11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 async/mgr 112 root 20 0 0 0 0 S 0.0 0.0 0:00.00 sync_supers 114 root 20 0 0 0 0 S 0.0 0.0 0:00.00 bdi-default 116 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kblockd/0 117 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpid 118 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify syslog-2012-04-23.txt
April 23, 201214 yr Author Just trying stuff here that I think could be related to my problem: root@Other:~# samba stop root@Other:~# umount /dev/md1 root@Other:~# reiserfsck --check /dev/md1 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md1 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Mon Apr 23 13:43:38 2012 ########### Replaying journal: Done. Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 116405 Internal nodes 746 Directories 236 Other files 3497 Data block pointers 117298800 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Mon Apr 23 13:54:21 2012 ########### root@Other:~# umount /dev/md2 root@Other:~# reiserfsck --check /dev/md2 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md2 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Mon Apr 23 13:55:27 2012 ########### Replaying journal: Done. Reiserfs journal '/dev/md2' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 52389 Internal nodes 323 Directories 29 Other files 152 Data block pointers 52996172 (3292732 of them are zero) Safe links 0 ########### reiserfsck finished at Mon Apr 23 13:57:45 2012 ########### root@Other:~# umount /dev/md3 root@Other:~# reiserfsck --check /dev/md3 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md3 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):^Yes root@Other:~# Yes -bash: Yes: command not found root@Other:~# umount /dev/md3 umount: /dev/md3: not mounted root@Other:~# samba stop root@Other:~# umount /dev/md3 umount: /dev/md3: not mounted root@Other:~# reiserfsck --check /dev/md3 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md3 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Mon Apr 23 16:46:56 2012 ########### Replaying journal: Done. Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 8477 Internal nodes 59 Directories 11 Other files 45 Data block pointers 8557606 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Mon Apr 23 16:49:15 2012 ########### root@Other:~# umount /dev/md4 root@Other:~# reiserfsck --check /dev/md4 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md4 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Mon Apr 23 16:49:28 2012 ########### Replaying journal: Done. Reiserfs journal '/dev/md4' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 9281 Internal nodes 62 Directories 16 Other files 69 Data block pointers 9372090 (97686 of them are zero) Safe links 0 ########### reiserfsck finished at Mon Apr 23 16:51:36 2012 ########### root@Other:~# umount /dev/md5 umount: /mnt/disk5: device is busy umount: /mnt/disk5: device is busy root@Other:~# umount /dev/md5 umount: /mnt/disk5: device is busy umount: /mnt/disk5: device is busy root@Other:~#
April 24, 201214 yr Would formatting the flash drive and starting from scratch be a bad idea? About as likely to help as emptying the ash-tray will help, when your car's engine keeps stalling. Joe L.
April 24, 201214 yr Your still using that 500 watt power supply? The additions may have been too much. On my 650 I had trouble on the 13th drive but I'm running a few 7200s. All fourteen disks only tried to spin up once, then I scaled back to just six disks. Everything I read states this 500 watt PSU is plenty for fifteen disks. I just clicked on the link in your signature for your power supply. It is described as having a single 20 Amp 12 volt rail. You apparently have 15 disks attached to it. Figure 2 Amps capacity needed for each "green" drive, and 3 Amps for each non-green. For now, I'll assume all are "green" Now, the motherboard, CPU, and fans need a few amperes of 12 Volt supply. For kicks, let's estimate 5 Amps. 15 disks * 2 Amps = 30 Amps + 5 Amps (CPU,MB,Fans) = 35 Amps capacity needed. You've overloaded your poor power supply's capacity of 20 Amps. I'd replace your power supply before re-formatting your flash drive. I'd use a single-rail supply with a capacity of 40 Amps or more. Anything less will probably result in random weird issues when spinning up/down disks, especially when you are drawing 35 Amps from it as you apparently are. Edit: With 6 green disks, you are barely within the capacity of the power supply. 6 * 2 = 12 Amps + 5 Amps (CPU,MB,Fans) = 17 Amps. Joe L.
April 24, 201214 yr The syslog you posted in your first post in this thread shows the Kernel-Out-Of-Memory process killing what it thinks are processes that have been idle the longest. It was doing this in an attempt to free up some RAM it needed for another process. I do not see evidence of the syslog filling memory, but basically, you ran out of free RAM. Either you are writing to RAM by something you've added on and using it up, or have less memory than you think you do. (your post says you have 2Gig. It might not be enough for all you are doing) If nothing else, a memory test is in order. (Just to be sure your 2 Gig is working properly) Apr 22 20:36:47 Other kernel: md: md_do_sync: got signal, exit... Apr 22 20:36:47 Other kernel: md: recovery thread sync completion status: -4 Apr 22 20:37:37 Other in.telnetd[2667]: connect from 192.168.3.10 (192.168.3.10) Apr 22 20:37:39 Other login[2668]: ROOT LOGIN on `pts/0' from `192.168.3.10' Apr 22 20:39:24 Other kernel: mc invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0 Apr 22 20:39:24 Other kernel: Pid: 2682, comm: mc Not tainted 2.6.32.9-unRAID #8 Apr 22 20:39:24 Other kernel: Call Trace: Apr 22 20:39:24 Other kernel: [<c104ab61>] oom_kill_process+0x59/0x1cd Apr 22 20:39:24 Other kernel: [<c104afb9>] __out_of_memory+0xef/0x102 Apr 22 20:39:24 Other kernel: [<c104b02a>] out_of_memory+0x5e/0x83 Apr 22 20:39:24 Other kernel: [<c104cfe9>] __alloc_pages_nodemask+0x375/0x42f Apr 22 20:39:24 Other kernel: [<c1059686>] handle_mm_fault+0x254/0x8f1 Apr 22 20:39:24 Other kernel: [<c101c3eb>] ? __wake_up+0x31/0x3b Apr 22 20:39:24 Other kernel: [<c106f362>] ? vfs_fstatat+0x2d/0x54 Apr 22 20:39:24 Other kernel: [<c106f3cd>] ? vfs_lstat+0x16/0x18 Apr 22 20:39:24 Other kernel: [<c106f3e3>] ? sys_lstat64+0x14/0x28 Apr 22 20:39:24 Other kernel: [<c10771df>] ? vfs_readdir+0x6c/0x7d Apr 22 20:39:24 Other kernel: [<c1076fec>] ? filldir64+0x0/0xcd Apr 22 20:39:24 Other kernel: [<c1017050>] do_page_fault+0x17c/0x1e4 Apr 22 20:39:24 Other kernel: [<c1016ed4>] ? do_page_fault+0x0/0x1e4 Apr 22 20:39:24 Other kernel: [<c12a07ce>] error_code+0x66/0x6c Apr 22 20:39:24 Other kernel: [<c1016ed4>] ? do_page_fault+0x0/0x1e4 Apr 22 20:39:24 Other kernel: Mem-Info: Apr 22 20:39:24 Other kernel: DMA per-cpu: Apr 22 20:39:24 Other kernel: CPU 0: hi: 0, btch: 1 usd: 0 Apr 22 20:39:24 Other kernel: Normal per-cpu: Apr 22 20:39:24 Other kernel: CPU 0: hi: 186, btch: 31 usd: 63 Apr 22 20:39:24 Other kernel: HighMem per-cpu: Apr 22 20:39:24 Other kernel: CPU 0: hi: 186, btch: 31 usd: 156 Apr 22 20:39:24 Other kernel: active_anon:441194 inactive_anon:2912 isolated_anon:0 Apr 22 20:39:24 Other kernel: active_file:2567 inactive_file:2427 isolated_file:0 Apr 22 20:39:24 Other kernel: unevictable:33338 dirty:0 writeback:0 unstable:0 Apr 22 20:39:24 Other kernel: free:11958 slab_reclaimable:677 slab_unreclaimable:1861 Apr 22 20:39:24 Other kernel: mapped:1799 shmem:32 pagetables:1010 bounce:0 Apr 22 20:39:24 Other kernel: DMA free:7996kB min:64kB low:80kB high:96kB active_anon:7884kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15792kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Apr 22 20:39:24 Other kernel: lowmem_reserve[]: 0 867 1983 1983 Apr 22 20:39:24 Other kernel: Normal free:39340kB min:3732kB low:4664kB high:5596kB active_anon:775548kB inactive_anon:896kB active_file:52kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:887976kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:2708kB slab_unreclaimable:7444kB kernel_stack:736kB pagetables:3200kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:62 all_unreclaimable? no Apr 22 20:39:24 Other kernel: lowmem_reserve[]: 0 0 8927 8927 Apr 22 20:39:24 Other kernel: HighMem free:496kB min:512kB low:1712kB high:2912kB active_anon:981344kB inactive_anon:10752kB active_file:10216kB inactive_file:9708kB unevictable:133352kB isolated(anon):0kB isolated(file):0kB present:1142688kB mlocked:0kB dirty:0kB writeback:0kB mapped:7192kB shmem:128kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:824kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:36026 all_unreclaimable? yes Apr 22 20:39:24 Other kernel: lowmem_reserve[]: 0 0 0 0 Apr 22 20:39:24 Other kernel: DMA: 1*4kB 1*8kB 1*16kB 1*32kB 0*64kB 2*128kB 2*256kB 2*512kB 2*1024kB 0*2048kB 1*4096kB = 7996kB Apr 22 20:39:24 Other kernel: Normal: 49*4kB 16*8kB 7*16kB 8*32kB 2*64kB 1*128kB 2*256kB 2*512kB 0*1024kB 2*2048kB 8*4096kB = 39348kB Apr 22 20:39:24 Other kernel: HighMem: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB Apr 22 20:39:24 Other kernel: 38369 total pagecache pages Apr 22 20:39:24 Other kernel: 0 pages in swap cache Apr 22 20:39:24 Other kernel: Swap cache stats: add 0, delete 0, find 0/0 Apr 22 20:39:24 Other kernel: Free swap = 0kB Apr 22 20:39:24 Other kernel: Total swap = 0kB Apr 22 20:39:24 Other kernel: 515744 pages RAM Apr 22 20:39:24 Other kernel: 287922 pages HighMem Apr 22 20:39:24 Other kernel: 5481 pages reserved Apr 22 20:39:24 Other kernel: 5609 pages shared Apr 22 20:39:24 Other kernel: 494724 pages non-shared Apr 22 20:39:24 Other kernel: Out of memory: kill process 2668 (bash) score 13881 or a child Apr 22 20:39:24 Other kernel: Killed process 2682 (mc)
April 24, 201214 yr Author Your still using that 500 watt power supply? The additions may have been too much. On my 650 I had trouble on the 13th drive but I'm running a few 7200s. All fourteen disks only tried to spin up once, then I scaled back to just six disks. Everything I read states this 500 watt PSU is plenty for fifteen disks. I just clicked on the link in your signature for your power supply. It is described as having a single 20 Amp 12 volt rail. You apparently have 15 disks attached to it. Figure 2 Amps capacity needed for each "green" drive, and 3 Amps for each non-green. For now, I'll assume all are "green" Now, the motherboard, CPU, and fans need a few amperes of 12 Volt supply. For kicks, let's estimate 5 Amps. 15 disks * 2 Amps = 30 Amps + 5 Amps (CPU,MB,Fans) = 35 Amps capacity needed. You've overloaded your poor power supply's capacity of 20 Amps. I'd replace your power supply before re-formatting your flash drive. I'd use a single-rail supply with a capacity of 40 Amps or more. Anything less will probably result in random weird issues when spinning up/down disks, especially when you are drawing 35 Amps from it as you apparently are. Edit: With 6 green disks, you are barely within the capacity of the power supply. 6 * 2 = 12 Amps + 5 Amps (CPU,MB,Fans) = 17 Amps. Joe L. PSU stats (Corsair CX500 v2) for six disks and this server: +3.3V@25A, +5V@20A, +12V@34A, [email protected], [email protected] PSU stats (CORSAIR CMPSU-500CX) for my other fourteen disk server: +3.3V@25A, +5V@20A, +12V@34A, [email protected], +5VSB@3A I don't see how they are any different, but it is early and I'm barely awake... >_> Previous PSU (Antec SL350) for these six disks: +3.3V@28A; +5V@35A; [email protected]; +12V@16A; [email protected]; +5VSB@2A A spare PSU (Corsair TX650) in another machine: +3.3V@24A, +5V@30A, +12V@52A, [email protected], [email protected] The six disks attached to this server: 500GB WD Caviar Blue 1TB Samsung HD103UJ 500GB WD Caviar Blue 500GB WD Caviar Blue 1TB WD Caviar Black 1TB WD Caviar Black The other server: 2x 2tb WD EARX 2x 2tb WD EADS 8x 2tb WD EARS 2x 2tb Samsung F4 I appreciate your input, so please clarify if I'm wrong. I'm going to shutdown this server and switch to a different PSU. Any preference for the Antec over the Corsair TX650?
April 24, 201214 yr I clicked on the link in the post you point to in your signature. http://www.newegg.com/Product/Product.aspx?Item=N82E16817139018 It is for a CORSAIR Builder Series CX500 (CMPSU-500CX) 500W ATX12V v2.3 Its detail specs show: +3.3V@25A, +5V@20A, +12V@34A, [email protected], +5VSB@3A Apparently I was looking at the 20A rating of the 5V supply. In any case, 34A is enough for your 6 disks, but probably insufficient for 15.
April 24, 201214 yr Author I clicked on the link in the post you point to in your signature. http://www.newegg.com/Product/Product.aspx?Item=N82E16817139018 It is for a CORSAIR Builder Series CX500 (CMPSU-500CX) 500W ATX12V v2.3 Its detail specs show: +3.3V@25A, +5V@20A, +12V@34A, [email protected], +5VSB@3A Apparently I was looking at the 20A rating of the 5V supply. In any case, 34A is enough for your 6 disks, but probably insufficient for 15. Weird. That's the one Raj recommends for a 15 disk build, and which is currently running a 14 disk build. Just so we're clear, the CX500 is in the "currently working fine" server, and the CX500 v2 is in the "fuxxored all to hell" server. They have the same specs, correct?
April 24, 201214 yr Yes, the power supplies both spec the same. Doesn't mean the one in the acting up server isn't damaged though. It seems you have some issue with disk5. Personally, I'd do these troubleshooting steps and see if anything improves; Pull the AOC-SASLP-MV8 and try without it. Completely disconnect disk5 and try without it. See if disk5 can be simulated. Run initconfig on the server and rebuild parity without disk5. Hopefully, one of the steps will point to the issue. I do find this odd; Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy7:1-0x0700000000000000:7-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy6:1-0x0600000000000000:6-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy5:1-0x0500000000000000:5-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy4:1-0x0400000000000000:4-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy0:1-0x0000000000000000:0-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy1:1-0x0100000000000000:1-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy2:1-0x0200000000000000:2-lun0: No such file or directory Especially the part with the phy3 or 3-lun0 line is missing. assuming these lines should exist, it's as if the MV8 card has a problem on port3.
April 24, 201214 yr Author Yes, the power supplies both spec the same. Doesn't mean the one in the acting up server isn't damaged though. It seems you have some issue with disk5. Personally, I'd do these troubleshooting steps and see if anything improves; Pull the AOC-SASLP-MV8 and try without it. Completely disconnect disk5 and try without it. See if disk5 can be simulated. Run initconfig on the server and rebuild parity without disk5. Hopefully, one of the steps will point to the issue. I do find this odd; Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy7:1-0x0700000000000000:7-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy6:1-0x0600000000000000:6-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy5:1-0x0500000000000000:5-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy4:1-0x0400000000000000:4-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy0:1-0x0000000000000000:0-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy1:1-0x0100000000000000:1-lun0: No such file or directory Apr 23 11:13:15 Other emhttp: restart_md_driver: stat pci-0000:02:00.0-sas-phy2:1-0x0200000000000000:2-lun0: No such file or directory Especially the part with the phy3 or 3-lun0 line is missing. assuming these lines should exist, it's as if the MV8 card has a problem on port3. Thanks. All disks are currently connected to the motherboard, no MV8 in use. I think the errors you see are from after it was not in use. I pulled disk5 from its cage, and am booting up. Command area is back without disk5 attached. Running initconfig...
April 24, 201214 yr Author What about the data that's on disk5? Will it be recoverable at some point?
April 24, 201214 yr I was hoping you'd try each line at a time. It would be possible to rebuild disk5 if the data on disk5 could be simulated by starting the array without it. Then, you'd just be trying a new drive connected to a new port for the rebuild. If this test had failed then the disk can not be rebuilt so that's the point where you abandon it and try to recover the data off the disk later.
April 24, 201214 yr Author I was hoping you'd try each line at a time. It would be possible to rebuild disk5 if the data on disk5 could be simulated by starting the array without it. Then, you'd just be trying a new drive connected to a new port for the rebuild. If this test had failed then the disk can not be rebuilt so that's the point where you abandon it and try to recover the data off the disk later. Well, that's a bummer. I searched for simulate drive, but couldn't find anything that seemed to lay out the steps. Maybe I'm misunderstanding what the term means. Anyhoo, the loss is not the end of the world. I have another machine with an eSATA dock that I can boot to unRAID and try to recover the data.
April 24, 201214 yr Author Data is now transferring from the disk that previously would display as busy. It's connected to another unRAID machine via eSATA dock. Transfer rate is ~30,000 kbytes/s across a gb lan into a Windows machine.
April 25, 201214 yr Good to hear the data is OK. It was likely some type of connection problem in your server. I'd change the SATA cable on that slot and try it again with the preclear script on a disk. If a disk fails then it can be simulated by using the other disks and the parity. You just start the array without the disk connected. There will be a warning along the lines of the array being degraded and not protected against another disk failure.
April 26, 201214 yr Author I want to thank everyone who helped me through this difficult time... Although the server is once again stable and I'm adding disks to it, I think I figured out the root of the issue. One of the 3WARE Cable Multi-lane Internal Cable (SFF-8087) sets would seat, but not LOCK into the Supermicro card. I tried both cables in both positions. Only one will lock, and it locks into both. The (likely) defective cable was connected to every disk that I was having trouble with. That cable was attached to the slots shown as SAS1. The replacement order is already started with amazon. It is estimated to deliver tomorrow.
April 28, 201214 yr Author New cable arrived, and it LOCKS into the card as it should! Adding the rest of the disks right now.
April 28, 201214 yr Author Things look good for now. I'm going to wait a week, then start moving data to it. This is going to hold all of the tv shows we plan on watching, then deleting. The ones for long term archiving will stay on the other server. Thanks again for all the help. I think I managed to get a better and more fitting thread subject.
Archived
This topic is now archived and is closed to further replies.