cyberstyx

Members
  • Posts

    50
  • Joined

  • Last visited

Everything posted by cyberstyx

  1. The rebuild finished successfully with 0 errors. I also did a sample check on file structure, all files are there and all are working as should. I have also attached the diagnostics file if you think you want to have a look. I will start tackling the libvirt.img corruption issue probably tomorrow, I have some unraid OS backups if needed and the config of the VMs has not changed in a long time. I will also check the suggestion about the proper usage of SSD in unraid as suggested. Thank you all again for your help, especially JorgeB. Having 4+ hardware fails one after the other (one PSU and various SATA cables) and many errors due to that was something tackled only by experts. Christos. tower-diagnostics-20231002-1857.zip
  2. Hello JorgeB, Replaced SATA cable, did a filesystem check -n on the disk and restarted the rebuild.
  3. I stopped the operation, after 195,362,860 writes, Disk9 was giving Errors. I have attached the diagnostics file. While rebuilt was running, shared folders where not working properly. The configuration for them was there (in Shares tab), I could see the shared folders over the network but they were empty. When I checked a share folder from the console I got "/bin/ls: reading directory '.': Input/output error". Disk contents from /mnt disks were there. When I stopped rebuilding, the share folder contents where visible again. I started the Array in Maintenance Mode so I could do a file system check on Disk 9 (with flag -n). I got this: Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error I 've stopped for further instructions now. tower-diagnostics-20231001-0005.zip
  4. After a week's delay from the shop to send me the bought PSU, I got the replacement a few days ago and had time today to remove the server PC from its installed location and swap the PSU. I followed the suggested steps again to make a new config and start the array and then replace the disk, the array is now being rebuilt. Hopefully it will finish tomorrow morning without any new surprises, and I will share my news then.
  5. Thank you for that info. I will read more about this and ask you again when I restore the system
  6. Since Disk1 was on MB controller and Disk5 was on PCI controller, it is probably a PSU issue. Will come back to this as soon as I get a new PSU and replace all power cabling. Thanks for you help JorgeB, have a nice Sunday.
  7. Unfortunately the files are inaccessible. Rebuild finished without errors, file structure and files are there, but again the video files are unplayable and the text files are NULL characters. Performing and ls - lsR on /mnt/disk2 gives sometimes results like: ? d????????? ? ? ? ? ? HowTo/ /bin/ls: cannot open directory './User Christos/RaspberryPi3/HowTo': Structure needs cleaning With our without a "structure cleaning" message all files are not rebuild (sampled randomly in different directories around 15 files). After the rebuild finished and after waiting for a few minutes to check for any status changes in the GUI, when I tried to open the first file from Disk2 to check for its contents, Disk9 got disabled. You know better, but to me it looks like Parity cannot be used anymore to rebuild Disk 2. What concerns me more now is the stability of my system: the constant disablement of different, even newly added disks and if any newly added content in the array will be there for future access. I don't mind re-downloading some of the contents I like from Disk2, but the system should operate as designed. Even though the system is old (old CPU, old MB, old RAM, old PSU), it is operating as should, and the disks are new and newish. The system, due to power consumption, is usually OFF, and I turn it on for home work or media access. It is not on 24/7 therefore the disks are not on all the time. Let me know what you think, if you want me to format Disk2, add some new content on it to see if everything is ok, use previous procedure to bring Disk9 online, and if there are any checks that could be made on hardware and OS configuration side to make some good assumptions about the stability of the whole system. But at this point I am not going to invest more cash on it for buying hardware for testing as I have other expense priorities. tower-diagnostics-20230917-1130.zip
  8. Check finished Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... agf_freeblks 8620987, counted 8620468 in ag 0 sb_fdblocks 498026383, counted 498027136 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 4 - agno = 2 - agno = 5 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:2050985) is ahead of log (1:1850812). Format log to cycle 4. done Driver mounted, all other drives appear ok so far, rebuild started. Speed is slow, it will take ~20 hours to complete (speed will drop to 50MB/s later on) Total size:6 TB Elapsed time:5 minutes Current position:24.7 GB (0.4 %) Estimated speed:90.0 MB/sec Estimated finish: 18 hours, 17 minutes Will report back as soon as I have news, hopefully tomorrow with a successful completion. Thank you JorgeB.
  9. I stopped - started the Array in Maintenance mode. File system type for Disk 2 is "auto" and there is no Check button. The other disks have all options available. tower-diagnostics-20230916-1429.zip
  10. You mean by creating a new config etc. Will start that procedure in 30' as soon as I am back from my chores.
  11. Unfortunately the rebuilt got paused due to Disk 8 going disable. Disk 1 was a SATA HDD on the motherboard, Disk 8 is an SSD disk (for VMs and containers) on an expansion PCI card. The operation got paused at 5.5%, which was probably after 2-3 hours and while I was sleeping. In the morning I saw the Pause status and I resumed it - later on I saw on you post you wanted me to cancel and post new diagnostics. Resume had the same behavior as my initial post here - it was rebuilding with a speed of 3.9GB/s and finished in 26', not a normal speed of 50-100MB/s and 18+ hours. All contents on Disk 2 are said to be there, but they are actually empty / NULL files. I opened a few movie files, they cannot be played, I opened a few text files, they are NULL characters. The rebuilt was not completed successfully. I proceeded doing the same procedure as with Disk 1, shut the PC down, checked the power cable, replaced the SATA cable. Turned it back on, the array was down, all disks were present, when I started the array everything is mounted but Disk 8 is disabled. During BIOS post all disks are reported present (6 disks on M/B , 2 PCI expansions each with 2 disks). Disk 8 (SSD) SMART checks are completed without errors but I cannot see the SMART log. On the monitor of the physical unraid server I see messages every 5-10" for md2 about metadata I/O error, metadata corruption and that I should unmount and xfx_repair it. There is nothing on the web GUI that would suggest something is wrong about Disk2, only that Disk8 is disabled. On the Main page there is constant Reads and Writes without anyone accessing the server - there are no errors. The current situation is the GUI suggested that everything is ok with Disk 8 been disabled (but active and healthy). But Disk 2 is not successfully rebuilt and has some sort of corruption which is not shown on the GUI (it shows mounted, active, healthy with 4TB of data). tower-diagnostics-20230916-1050.zip
  12. Normally it should have been a basic operation that I have done 20+ times in the past 7 years, the multiple errors put me off that 's why I did ask for help rather than try to further troubleshoot it on my own. The help was invaluable. Yes, parity is not a backup, actual work files are backed up on an external SSD and on cloud storage. Thank you again.
  13. Everything looks great now, disk is been rebuilt but will take a long time to complete, as expected from large and slow disks. You can consider this issue solved. Thank you a lot for you help!
  14. Changed the SATA cable and re-sat the power cable, Disk 1 looks back online and I can browse its contents. There is a SMART health error from its UDMA CRC error count which is expected and not to worry I guess? Should I stop the array, add the installed disk as Disk 2 and start the rebuild? tower-diagnostics-20230915-2113.zip
  15. The main page got refreshed after some time, for Disk 1 it shows "Unmountable: Wrong or no file system"
  16. Done without any errors (except the missing Disk 2) and Disk 1 is enabled. tower-diagnostics-20230915-2040.zip
  17. Yes, please let me know what I can do to fix this. The array is in maintenance mode. I have attached the status of the Main page.
  18. I replaced the failed drive (md2) with the new bought disk, I didn't add it as a new drive. Yes, I let it finish whatever it was doing, but I think it was doing a parity check not a rebuild. On the Dashboard GUI I see on the top right (Parity area): Last check completed on Thu 14 Sep 2023 08:00:17 PM EEST (yesterday) Duration: 25 minutes, 54 seconds. Average speed: 3.9 GB/s Finding 1465130633 errors And this is the only thing it did. It was automatic after I replaced the disk, I didn't initiate anything. Now on the Main screen I see for both Disk 1 (the failed disk) and Disk 2 (the newly replaced): Unmountable: Wrong or no file system Also those two disks are available for formatting in the Main screen / Array Operations: Unmountable disks present: Disk 1 • WDC_WD60EFAX-68SHWN0_... (sdd) Disk 2 • WDC_WD60EFAX-68JH4N1_... (sdi)
  19. I had a mechanical disk failure (md2) on an old 4TB disk. I replaced it with a new 6TB disk. When I added it unraid started preparing it (not rebuilding it yet). After 20-30% of its process another disk (md1) got disabled. I now have 2 disks to be formatted and I can't proceed with rebuilding the new disk until I sort things out firstly with the disabled disk. I did a "Check -n" from Disk 1 properties (unraid GUI), I left it running yesterday night and return from work today afternoon only to find it "dotting" the progress window for 22 hours: Phase 1 - find and verify superblock... couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!! attempting to find secondary superblock... ....found candidate secondary superblock... unable to verify superblock, continuing... ............................ etc. I don't expect anything helpful after so many hours. I am not sure how to proceed fixing md1 before rebuilding md2, and I have one parity disk. Some of the errors you can see in the appropriate logs: Sep 14 20:34:32 Tower kernel: XFS (md1): Mounting V5 Filesystem Sep 14 20:34:32 Tower kernel: XFS (md1): Internal error !uuid_equal(&mp->m_sb.sb_uuid, &head->h_fs_uuid) at line 253 of file fs/xfs/xfs_log_recover.c. Caller xlog_header_check_mount+0x60/0xb4 [xfs] Sep 14 20:34:32 Tower kernel: XFS (md1): Corruption detected. Unmount and run xfs_repair Sep 14 20:34:32 Tower kernel: XFS (md1): log has mismatched uuid - can't recover Sep 14 20:34:32 Tower kernel: XFS (md1): failed to find log head Sep 14 20:34:32 Tower kernel: XFS (md1): log mount/recovery failed: error -117 Sep 14 20:34:32 Tower kernel: XFS (md1): log mount failed Sep 14 20:34:32 Tower root: mount: /mnt/disk1: mount(2) system call failed: Structure needs cleaning. Sep 14 20:34:32 Tower root: dmesg(1) may have more information after failed mount system call. I would rather ask experts about this as I 'm not a Linux OS user and I 'd rather not make a bigger mess. Any help will be greatly appreciated. Thank you in advance, Christos. tower-diagnostics-20230915-1850.zip
  20. Solved it today. Did some more searching around, trying to use virtfs etc to get some more data about what is happening. Didn't progress so much with that, buy while reading about different solutions to this type of problem, I found out that having your free disk space reduced below a certain point could create issues with VMs running that would instantly pause them without any appropriate messages whatsoever. It seemed a bit strange as my VMs have all their disk size (40GB each) pre-allocated, but the problem is with the amount of free space the system uses for this operation. People had issues with those files being on Cache Disks which, when dropped below a certain point, could cause this. My problem was a bit more obvious, if you knew where to look ofc. The drive hosting the VMs, and I suspect hosting other files for this reason, dropped to the amazing 20,5KB of free space. Which was strange as I had stopped all writes on this disk at 50+GB. That nice docker I have, which is a Minecraft server for the kids, ended up eating up all available free space due to daily backups. Clearing that space and returning the disk to 50+ free GB allowed my VM to start correctly. I can't tell you how nice it is to hit F5 and have the Web GUIs running on that VM to actual no say something rude to me... I hope this situation is also helpful to someone else to, especially you ashman70 I will change the topic to solved, and I hope it stays like that
  21. I don't think it 's the actual VM due to its behavior, which seems to me OS-irrelevant. It seems like it's either a VM configuration option (even though I didn't touch anything and it first happened when the VM was on and iddle for many days) or something going on with KVM. I hope an expert user will have some time to check the logs or suggest something...
  22. I tried to access my Win Server 2012 VM (through Web VNC). When I logged in and as Desktop appeared and started loading, this VM also got paused as well and could not resume it. I tried to stop the array to do a system reboot, the Web GUI got in an infinite "Unmounting shares... Trying to unmount shares" loop. SSHed to the system, tried to powerdown, the command would not complete. Had to do a shutdown -r now. After the restart, Win Server 2012 got up again (was on a auto-power-up mode), Win10 was paused again. Shares, docker, WinServer2012 MySQL connections etc, everything ok, no problems in the logs, as initially mentioned. Trying to resume Win10 fails 9/10 of the tries (web GUI refreshes but the pause icon is still there). After a few tries the icon refreshes to "started", but refreshing the Web GUI with F5 shows it again as paused. When there is that very fast paused > resumed > paused again sequence, I can see from the VNC connection that the VM Win10 machines gets "one step" ahead: it shows the Windows logo, then it will show the progress bar / rotation, then it will update a bit the bar / rotation... it 's like the VM gets auto-paused, at some point it may resume for a fraction and then gets insta-paused again. As this seems to be a system issue and not a VM issue, since Win10 is auto-blocked before the OS even loads and since WinServer2012 also got behaved similarly at least once, I think there is no point in trying to make more VMs with different OSes in case the OS has an issue. For the time being I am not accessing through VNC / RDP the WinServer2012 as, at least its MySQL DB services, are working fine.
  23. ashman70, so what did you eventually do? Created a new Win10 VM? A different OS VM?
  24. Here are the diagnostics tower-diagnostics-20170303-0109.zip