Craigb

Members
  • Posts

    13
  • Joined

  • Last visited

Everything posted by Craigb

  1. Success! The array started without problem. The faulty disk is being emulated and the data appears to be completely intact. The replacement drive goes in this morning along with a second parity drive. Many thanks for your assistance! Very much appreciated!
  2. Check that. Found it... Need to get my glasses checked!!
  3. Disk image complete, new config step complete, drives are in the correct positions. However, there is no tick box for "parity is already valid". Did I miss something?
  4. Thanks for getting back to me and the advice! The server is currently down as I'm doing the byte level copy to a new drive. That's got about 5 hours to run but I'll get back on this as soon as it's completed.
  5. And the most recent diagnostics... nas1-diagnostics-20210413-1345.zip
  6. The rebuild completed with errors at 1555. The first diagnostic file (nas1-diagnostics-20210409-1217) was after disk 12 was disabled, prior to the rebuild attempt. The second diagnostic file (nas1-diagnostics-20210409-1458) is a continuation of the first diagnostic's syslog, after disk 12 was unassigned and reassigned, but still prior to the rebuild attempt. The third file (syslog-20210409-161331) is the syslog from immediately after the rebuild. I did not get a diagnostic file at this point. As of this morning, disk 12 is assigned but unmountable. The parity disk is disabled, red cross. I'm doing a byte level image of the data disk to preserve whatever data might still be recoverable and barring any suggestions from the forum, will attempt to run XFS_repair. All the syslogs from before the failure through to today are available. Thanks!
  7. Got it - I think. The attached is the syslog which includes the rebuild and related events. I don't have a diagnostics package for that period. At some point I'm going to do a byte level image of that disk to try to preserve whatever may be recoverable from further damage. Thanks!! syslog-20210409-161331.rar
  8. Thanks!! I have several since then. I think this is the one post rebuild. nas1-diagnostics-20210409-1458.zip
  9. Help! I noticed that one of the data disks was showing a red cross. I ran an extended SMART test with no faults detected. After rebuilding the drive, I noticed that the parity drive had become disabled (red cross) at some point during the rebuild process. The original data disk is now not mountable. The file system is XFS with a single parity drive and about 40TB in total capacity. No other drive is showing any issues. As a precaution, I re-seated all disks, connectors and HBA cards. I received the following when I ran XFS_repair from the GUI: >>> ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. >>> This is from the read/parity check history. >>>> Date /Duration /Speed /Status /Errors 2021-04-09 15:55:44 27 min, 4 sec 3.7, GB/s OK 1465130625 2021-02-01 06:56:03 18 hr, 3 min, 48 sec 92.3 MB/s OK 0 2020-10-27 09:12:39 17 hr, 32 min, 20 sec 95.0 MB/s OK 0 2020-09-22 04:44:56 16 hr, 58 min, 34 sec 98.2 MB/s OK 0 >>> That's a 6T drive which is about 95% full. Other than data that might have been written to the log from the rebuild process, nothing else has been written there for weeks. Not sure what my options are if the drive can't be mounted and the log replayed as suggested but, there is a substantial amount of data that I'd really like to keep if possible. I've included the diagnostics from before the rebuild. Thanks! nas1-diagnostics-20210409-1217.zip
  10. Thanks guys! It's 2 GB on the RAM. And now that you mention memory being overwritten, what I observed makes sense. Took a deep breath and rebooted successfully. Tried to update the OS and it failed again in exactly the same way. Tried a third time, with a bit less than half the 2GB of RAM free, and again, the same result. Everything seems fine with the exception of the Dynamix Stats plugin which I've uninstalled. The server is back on service now, thanks for the pointers! Craig
  11. Hi All, Got a really strange one today. I can't find any references to this elsewhere on the forum, so here goes. Tried to update from 6.4.0 rc18 to 6.4.0 stable using the button in the Tools area. Got this after the download apparently aborted. SSH'd into the directory and noted that there was zero free space, which was consistent all the way up that path, until I get up to the root, where I then see something like 93% free space... I'm suspecting a failed or failing flash drive, which wouldn't surprise me since it's hosted unRAID since 2008. I was able to mount it and successfully copy off all the files. I did get a diagnostic dump but every file in it is 0 bytes in length except for TOP. At this point, the main web page was loaded with all sorts of Dynamix related error strings, which alternated between the usual information and the errors every 5 seconds or so, and rapidly became non-responsive. The GUI hosted locally on the server seemed to work or at least didn't show the same behaviour as the remote webGUI. Up to this point, unRAID has performed pretty much flawlessly. Any feedback on this would be appreciated. Thanks!! UNRAID syslog.txt tower-diagnostics-20180120-1214.zip
  12. Many thanks for the feedback. As suggested, I reset the values back to default which seems to have corrected the problem. The array is rebuilding now. Fingers crossed that it completes successfully in 522 minutes!
  13. Hi, I'm seriously stuck with not being able to rebuild a replacement disk. The array is functioning as I can still access all the files that were on the original, possibly now failed, disk but the array is unprotected. When I start a rebuild it instantly causes unRAID to crash. The problems started when I attempted to increase the md_sync window size to improve performance. I set this to 876. It had been 512. Md_num_stripes was 3840 and md_write_limit was 2304. These two values have not been changed and based on Tom’s discussion elsewhere on the forum, these values should work fine with 2 GB of system memory and 13 disks. I'm running unRAID Pro 4.7 on an Asus P5Q Premium with an Intel E8400 3.0 Ghz CoreDuo CPU, 2 GB RAM, 2 x AOC SASLP MV8 controllers, 12 data drives, 1 parity drive, and 1 cache drive. Up to this point this system has been in daily use and rock solid for almost 18 months. After saving the change to md_sync size, the instant I restarted the array, a red ball showed up on Disk 7 (Seagate ST31500341AS, 1.5 T). I tried moving the disk to different slots and controllers to eliminate potential points of failure, all with the same results. I then checked the disk using SeaTools (Windows) which failed the short test. I ran SeaTools in repair mode, which completed successfully. I then ran a long SMART test, which also passed. I reinstalled the disk and started the server which came up with Disk 7 missing. I stopped the array and assigned the disk back to Disk 7. When the array was started (to rebuild), unRAID instantly collapsed. The internal web server quit shortly afterwards but I was still able to telnet in and capture the logs. A truncated syslog is attached. I've also included dmesg with an error logged between lines 879 and 913. I forced a power down, then removed the disk from the server and restarted it. It booted ‘normally’ and the array came up, again missing Disk 7. Fortunately all data on all drives, including the missing Disk 7, is readable. A SMART report for all disks, including the suspect disk, reported no errors on any kind. Results attached. I then decided to replace the original disk. I externally pre-cleared a new Seagate 2T drive (ST2000DM001-9YN) using Joe L’s preclear_disk.sh. This disk was installed and the server started. Again, the server came up OK, but missing Disk 7. I stopped the array and assigned the new disk to Disk 7. I started a rebuild (no other option was offered), and again had exactly the same problem as before. I had to force the shutdown. The server boots and the array starts unprotected when no disk is assigned to Disk 7. Any attempt to assign either disk to Disk 7 results in the instant failure and generation of many, many MB of errors in the log when the array is restarted. This behaviour occurs irrespective of physical location or controller. I'm guessing I need to avoid, if at all possible, forcing a parity rebuild as that would result in the loss of the remaining 'protected' Disk 7 data. Finally, I ran reiserfsck against both suspect disks. The original disk shows no errors and apparently lots of data. However the replacement disk comes back with all sorts of errors with the sub-tree and bad nodes. Results attached. Thanks in advance for any assistance! SMART_results.zip dmesg_after_failed_restart_160712.zip syslog_after_changing_stripe_size_down_shortened_version.zip reiserfsck_results.txt