November 10, 20241 yr Hello, A short while ago, my cache ssd died. I did not try fixing it immediately, as i expected it to be very time consuming. Now that i have some extra time and a new ssd to replace it with, i cant seem to get the GUI to let me do any changes to the cache pool. No matter what i do, it will not let me unselect the defective ssd, and select the new one. there are some errors in the syslog about "read only file system" related to disks: Nov 10 21:57:39 SpireTower emhttpd: error: put_disk_settings, 4905: Read-only file system (30): fopen: /boot/config/disk.cfg I am also not able to update any plugins, getting generic error, tried with array started and not started using unraid version 6.12.6 attached diagnostics and the smart report for the broken ssd spiretower-diagnostics-20241110-2208.zip spiretower-smart-20241110-1844.zip
November 10, 20241 yr Community Expert Apart from your cach drive thats failing, its also your unraid usb, by the looks of it. "/boot" is the usb stick itself. Since its read only, its most likely on its way out. Only thing you can do it make a backup, get another usb and boot from that one.
December 26, 20241 yr Author finally got around to working on this again, new usb drive and attempting to restore from a unraid connect backup. at first boot, the server refuses any connection to the webgui. I started it using gui mode, so i can see the OS on screen, but the webbrowser does not connect to localhost, and i cant connect using my other computer(s) using either hostname or ip address. the servers ip address is correctly set, and i can ping other devices on my network from the server. ssh refuses to let me use my old password from the backup, but it works when signing in using gui mode Edited December 26, 20241 yr by Spirevipp
December 26, 20241 yr Author attached diagnostics. it also havent set the name i selected in the usb creator tool when restoring from backup there have been some file not found errors aswell, /var/local/emhttp/var.ini not found when using shutdown cmd in terminal tower-diagnostics-20241226-0410.zip EDIT: just realized this is from safe mode, will create one from normal boot aswell Edited December 26, 20241 yr by Spirevipp
December 26, 20241 yr Author Looking in the usb drive, it does not seem like the usb creator tool has properly setup the drive? its missing all the plugins from backup and none of the .cfg files has been copied over?
December 26, 20241 yr Community Expert Flash drive is missing files, including the go file, create a stock install then restore the config folder with the files you have, will likely still need to reconfigure some stuff
December 26, 20241 yr Author Okay, now it works, server is up and running using the new flashdrive. now to fixing the dead cache ssd. using checking-a-file-system from the docs, i have ran a btfrs check of the ssd with unmountable file system, i do not know what this means so here is the result. [1/7] checking root items [2/7] checking extents [3/7] checking free space tree [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) Opening filesystem to check... Checking filesystem on /dev/sdd1 UUID: c6c53b8d-4791-4359-b767-671dd4d4a71c found 362191216640 bytes used, no error found total csum bytes: 344665484 total tree bytes: 1369554944 total fs tree bytes: 641613824 total extent tree bytes: 268550144 btree space waste bytes: 303860724 file data blocks allocated: 2428063203328 referenced 345975992320 attached is the SMART report from the drive. Do you think this might be recoverable, atleast temporarily? not all appdata folders are backed up, so it could be nice to be able to get the data off the drive before its trashed spiretower-smart-20241226-1358.zip
December 26, 20241 yr Author diagnostics from after starting the array (without maintenance mode enabled) spiretower-diagnostics-20241226-1431.zip
December 26, 20241 yr Community Expert Pool appears to have a damaged log, but before attempting to fix that, there are constant ATA errors for the pool device, replace cables and post new diags, after array start again.
December 26, 20241 yr Author diagnostics after replacing sata cable. spiretower-diagnostics-20241226-1647.zip
December 26, 20241 yr Community Expert Still plenty of ATA errors, also check the power cable, or swap both cables with a different device, to also rule out the port.
December 26, 20241 yr Author Here is diagnostics of after i connected a different drive to the sata port of the defective drive, formatted it and mounted it using unassigned devices. I also connected the defective drive to another sata port (on a different sata controller), still throwing errors. spiretower-diagnostics-20241226-1926.zip
December 26, 20241 yr Community Expert Solution That SSD appears to be failing, if there's important data there, you can try cloning it to a different device using ddrescue, and then attempt to repair the filesystem, if still needed.
December 26, 20241 yr Author okay so the ddrescue operation completed after a single pass, and the resulting cloned disk is mountable, and everything seems to be there? here is output from the process: root@SpireTower:~# ddrescue -f /dev/sdh /dev/sdc /boot/ddrescue.log GNU ddrescue 1.27 Press Ctrl-C to interrupt ipos: 512110 MB, non-trimmed: 0 B, current rate: 50028 kB/s opos: 512110 MB, non-scraped: 0 B, average rate: 127 MB/s non-tried: 0 B, bad-sector: 0 B, error rate: 0 B/s rescued: 512110 MB, bad areas: 0, run time: 1h 7m 9s pct rescued: 100.00%, read errors: 0, remaining time: n/a time since last successful read: n/a Copying non-tried blocks... Pass 1 (forwards) Finished root@SpireTower:~# cat /boot/ddrescue.log # Mapfile. Created by GNU ddrescue version 1.27 # Command line: ddrescue -f /dev/sdh /dev/sdc /boot/ddrescue.log # Start time: 2024-12-26 20:29:10 # Current time: 2024-12-26 21:36:37 # Finished # current_pos current_status current_pass 0x773C250000 + 1 # pos size status 0x00000000 0x773C256000 + i also did the next step for btrfs, for finding missing / corrupt files, and it did not do any replaces root@SpireTower:~# printf "Unraid " >~/fill.txt root@SpireTower:~# ddrescue -f --fill=- ~/fill.txt /dev/sdc /boot/ddrescue.log GNU ddrescue 1.27 Press Ctrl-C to interrupt Initial status (read from mapfile) filled size: 0 B, filled areas: 0 remaining size: 0 B, remaining areas: 0 Current status filled size: 0 B, filled areas: 0, current rate: 0 B/s remain size: 0 B, remain areas: 0, average rate: 0 B/s current pos: 0 B, run time: 0s Finished root@SpireTower:~# cat /boot/ddrescue.log # Mapfile. Created by GNU ddrescue version 1.27 # Command line: ddrescue -f --fill=- /root/fill.txt /dev/sdc /boot/ddrescue.log # Start time: 2024-12-26 21:45:00 # Current time: 2024-12-26 21:45:00 # Finished # current_pos current_status current_pass 0x00000000 + 1 # pos size status 0x00000000 0x773C256000 + am i interpreting this correctly that it was successfully able to clone the disk? no missing data? any other tasks i should do before restoring my array and setting up a new cache? spiretower-diagnostics-20241226-2213.zip
December 27, 20241 yr Community Expert Since there weren't any read errors, all the data should be OK, problem with the old SSD is likely that it can't write, only read.
December 27, 20241 yr Author one last question, is there a "best" way to transfer the files from the ssd to the array? i thought of just doing a normal copy from a tmux session, but array would need to be started then? the ssd i used for the clone is the one im gonna use for the new cache, so i cant set it up until after i have transfered the files Edited December 27, 20241 yr by Spirevipp
December 27, 20241 yr Community Expert You can just copy the data, another option is to use the mover: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/#findComment-511923
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.