July 15, 20178 yr This system has been running quite well for about a year. When powered on from cold the system will boot ok with raid offline. At this point I can log into the web interface ok and see that Disk 5 is marked as faulty. Click Start to take the array online, a message appears in bottom left corner indicating Mounting Disks and then I lose contact with the array. Web interface doesn't respond (page won't load if refreshed). But I can still ping and telnet into the array ok. In my attempts to troubleshoot this I have powered off the array (without doing a clean shutdown) and I suspect it's doing some tidy up work now when I ask it to go online. Yesterday I replaced faulty Disk 5 with a new (unformatted), Seagate 8TB disk and clicked Start to bring the array online. It's been in this state for about 12 hours and web interface is still not loading. Telnet access is all ok. If it really is doing some tidy up work before going online then I am happy to leave it alone and let it do this, but is there a way to check what it's really doing from command line ? I can run TOP command which seems to work ok, but I don't know how to interpret the results. top - 11:07:05 up 12:21, 1 user, load average: 2.02, 2.01, 2.00 Tasks: 249 total, 1 running, 247 sleeping, 0 stopped, 1 zombie %Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 7348412 total, 6589400 free, 283640 used, 475372 buff/cache KiB Swap: 0 total, 0 free, 0 used. 6376064 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1632 root 20 0 9532 2508 2256 S 0.7 0.0 1:47.77 cpuload 7 root 20 0 0 0 0 S 0.3 0.0 0:09.42 rcu_preempt 1001 root 20 0 0 0 0 S 0.3 0.0 0:01.16 usb-storage 20571 root 20 0 16604 3016 2208 R 0.3 0.0 0:01.79 top 1 root 20 0 4360 656 600 S 0.0 0.0 0:10.35 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.75 ksoftirqd/0 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_sched 9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 10 root rt 0 0 0 0 S 0.0 0.0 0:04.66 migration/0 11 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 lru-add-drain 12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0 13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1 14 root rt 0 0 0 0 S 0.0 0.0 0:04.60 migration/1 15 root 20 0 0 0 0 S 0.0 0.0 0:00.81 ksoftirqd/1 17 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 18 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs 19 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns 280 root 20 0 0 0 0 S 0.0 0.0 0:00.00 oom_reaper 281 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback 283 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kcompactd0 284 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd 285 root 39 19 0 0 0 S 0.0 0.0 0:01.29 khugepaged 286 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto 287 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kintegrityd 288 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 290 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd 426 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ata_sff 444 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 devfreq_wq 544 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 rpciod 545 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xprtiod 573 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kswapd0 574 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 vmstat 669 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 nfsiod 672 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 cifsiod 673 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 cifsoplockd 681 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 690 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfsalloc 691 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs_mru_cache 722 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kthrotld 740 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 acpi_thermal_pm 796 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 799 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 802 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 805 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 808 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 811 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 815 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 818 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 821 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 822 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 823 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 824 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 825 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 826 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 827 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 828 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 829 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 832 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 838 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 839 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 840 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 841 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 842 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset Edited July 17, 20178 yr by MPR Files
July 15, 20178 yr Community Expert Probably filesystem corruption on one of the disks, after starting the array, when it hangs, type diagnostics using the console/SSH, upload resulting zip.
July 15, 20178 yr Author Hi Johnnie, Thanks for your help. Done that and can locate the file through terminal, but don't know how to get it back to my Mac to upload it.
July 15, 20178 yr Author Tower login: root Linux 4.9.30-unRAID. Last login: Sat Jul 15 10:48:24 -0700 2017 on /dev/pts/0 from iMac. root@Tower:~# diagnostics Starting diagnostics collection... done. ZIP file '/boot/logs/tower-diagnostics-20170715-1134.zip' created. root@Tower:~# cd/boot -bash: cd/boot: No such file or directory root@Tower:~# cd /boot root@Tower:/boot# dir bzimage* bzroot-gui* config/ ldlinux.sys* logs/ make_bootable_linux* memtest* bzroot* changes.txt* ldlinux.c32* license.txt* make_bootable.bat* make_bootable_mac* syslinux/ root@Tower:/boot# cd /logs -bash: cd: /logs: No such file or directory root@Tower:/boot# cd /boot/logs root@Tower:/boot/logs# dir tower-diagnostics-20170715-1134.zip* root@Tower:/boot/logs#
July 15, 20178 yr Community Expert 5 minutes ago, MPR Files said: Done that and can locate the file through terminal, but don't know how to get it back to my Mac to upload it. Flash drive is a share @ \\tower\flash it will be in the logs folder, alternatively insert the flash drive on your pc/mac.
July 15, 20178 yr Author Remove the flash drive and insert into iMac, why didn't I think of that :-) Diagnostics attached. tower-diagnostics-20170715-1134.zip
July 15, 20178 yr Community Expert Start the array in maintenance mode and run xfs_repair on disk4 (md4) https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_XFS
July 15, 20178 yr Author root@Tower:~# xfs_repair -v /dev/md4 -L Phase 1 - find and verify superblock... - block cache size set to 622928 entries Phase 2 - using internal log - zero log... zero_log: head block 901713 tail block 557528 ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... Metadata corruption detected at xfs_agf block 0xe8e05f31/0x200 freeblk count 5 != flcount 6 in ag 3 flfirst 118 in agf 2 too large (max = 118) sb_ifree 144, counted 64 sb_fdblocks 481256666, counted 482015027 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:901695) is ahead of log (1:2). Format log to cycle 4. XFS_REPAIR Summary Sat Jul 15 12:10:09 2017 Phase Start End Duration Phase 1: 07/15 12:03:34 07/15 12:03:34 Phase 2: 07/15 12:03:34 07/15 12:05:11 1 minute, 37 seconds Phase 3: 07/15 12:05:11 07/15 12:06:28 1 minute, 17 seconds Phase 4: 07/15 12:06:28 07/15 12:06:28 Phase 5: 07/15 12:06:28 07/15 12:06:29 1 second Phase 6: 07/15 12:06:29 07/15 12:06:42 13 seconds Phase 7: 07/15 12:06:42 07/15 12:06:42 Total run time: 3 minutes, 8 seconds done
July 15, 20178 yr Author Is it ok to start array normally now or should I be running other checks first ?
July 15, 20178 yr Author Started ok. Doing a data rebuild now for disk 5 which will take a couple of days. Volume mounts ok and I can browse my data. Thank you so much for your help.
July 17, 20178 yr Author Disk 5 rebuild complete and everything is running fine now. Thank you. If someone could tell me how to edit the thread title I'll mark it as solved.
Archived
This topic is now archived and is closed to further replies.