plantsandbinary

Members
  • Posts

    343
  • Joined

  • Last visited

Everything posted by plantsandbinary

  1. Hi, I'm making a new thread (hope this is okay) after suffering metadata loss in my other thread here: What I've been able to do so far: Figure out what degree of data was orphaned per drive. //TOWER/lost+found/ Disk 1 - WDC_WD40EFRX-68N32N0_WD-WCC7K5CCCED9 - 4 TB (sde) 1772 objects: 1014 directories, 758 files (64.8 GB total) Disk 2 - WDC_WD40EFRX-68N32N0_WD-WCC7K5CCCED9 - 4 TB (sde) 6419 objects: 550 directories, 5869 files (17.4 GB total) Disk 3 - TOSHIBA_DT01ACA300_14SRJ24GS - 3 TB (sdc) 4163 objects: 3170 directories, 993 files (84.6 GB total) root@Tower:/mnt/user/lost+found# files * cryptobin.co/r1k6s5y5 password: unraid The majority of damaged data seems to be JPG, PNG, HTML and FLAC files as shown in the crypobin link above. I found this post: https://techcult.com/how-to-restore-files-from-lostfound/ Which contains a script at the bottom which I will put here (had to edit slightly to fix bad syntax): #!/bin/bash fsck -y /dev/sdc1 mkdir /tmp/recover mount /dev/sdc1 /tmp/recover -o rw cd /tmp/recover/lost+found ( echo 'set -v' file * | grep directory 2>/dev/null | perl -pe 's/^(\#[0-9]+)\:.*$/ls -l '"'"'$1'"'"'/' ) | sh > /tmp/listing I was wondering does this have some chance of working for me? What exactly does it do? It looks to append file extensions to the files using their headers via the 'file' command? Or am I wrong about this? EDIT: Couldn't get the script to work. I don't think it's written properly.
  2. Ok thank you so much for the help. Big lessons learned today. I'll give this and a few more things a try and see how it goes. Failing that, maybe I'll send the disks to a data recovery company. It's my entire childhood so money isn't really a factor here. All the data seems to be there but just without its original folder tree structure, metadata and file extensions. Failing that, I'll store the disks away for a few years and who knows maybe 10-30 years from now technology will advance far enough that AI or something will be able to dig through all this and repair it all for me. 😂 I've bought 5 new drives to start over. So one way or another these ones are done for using. I won't risk the data getting more damaged than it already has been.
  3. No this was it. I've been using UNRAID now for years. My understanding was that I would need 2 drive failures (and one would have to be parity) to lose anything. So short of a lightning strike or super bad luck with 2 simultaneous drive failures. I'd be able to recover the data. In this case I lost all the metadata and now I just have 120k files and 62k folders just sitting in lost+found which I have no idea what to do with. I'm grateful for the help thus far, but everything shows that my hardware is fine, and I only ever experienced problems after updating. It's my own fault for not having another backup but I was suggested to update to solve a security issue I found during login. This was the result. So naturally I am feeling super bitter about this. I carried a lot of this stuff through CDs in 2003 onward. This is 18 years worth of stuff smashed and the damage extends through everything I have. How could kernel issues cause so many inode issues? Is there any way I can restore the metadata? Surely someone else has been in a similar situation, even to a lesser extent. Is there some utility I can use on these xfs drives? Also, is there a more redundant way I can set this up in the future using UNRAID of course excepting having another backup of all of my data (because I will obviously do that). Maybe use zfs or btrfs or something instead?
  4. Don't the diagnostics show anything? No one has commented on any of them.
  5. How could bad RAM destroy the metadata for everything across 4 disks? I'm just trying to make sense of this. I have certified HP ECC memory too. I realise things break at times but this was supposed to be 'the backup'. @itimpi how do I do that? Sorry my head is swimming.
  6. Looks like my docker didn't backup properly either for some reason. So I've lost everything there too... Some of this data loss dates back to 2003 when I was just a kid, and I carried this stuff across like 15 computers and probably more than 30 USB drives.
  7. Ok so... I've run xfs_repair -vL /dev/md1, 2, 3, etc. Rebooted the machine. I'm up and running again and the disks are mounted. Got myself a lost+found share with wait for it.... with no extensions or other metadata information. So it seems I've lost roughly 80% of all of my data... I guess it's been moved into the lost+found share. Almost all of the stuff I lost are irreplaceable personal files. Photos, videos, writings etc. I've also lost scans of old important documents. This is the worst data loss I've ever suffered. I don't understand how this happened. All from updating the OS? I had no Docker issues. No problem with my cache drive. Did not change any docker containers. No VMs running. How did this happen? Isn't the whole point of the parity drive that I am supposed to be able to recover my data? Unless both the parity AND another drive both fail at the same time. Please someone, help me understand... Is it possible to rebuild everything from lost+found? I just have a bunch of 3436, 6457472, 75683784 etc. files which I have no idea what they are but some are huge and some are tiny. @JorgeB
  8. Ran a check first, jeez. What happened? Last check completed on Thursday, 14 April 2022, 22:26 (today) Duration: 8 hours, 12 minutes, 13 seconds. Average speed: 135.5 MB/s Finding 46789 errors If I've ever had a dirty shutdown or something I've only seen like 4-6 errors max, but near to 47,000??? Here's the full diagnostics. I cannot get these to mount no matter what and nothing should have touched them. I just swapped out the cache drive, that's it. tower-diagnostics-20220414-2320.zip
  9. Downgraded and also replaced the drive with a spare. Still getting this error "Unmountable" I haven't changed the filesystem or anything on the single parity and 3 array drives. What am I supposed to do now? @JorgeB
  10. Sorry for all the posts, frantically trying to fix things. Seems I need to remove the intel_iommu=on from the Linux kernel boot parameters. Due to this issue: https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c04565693
  11. So how can I fix the filesystem problem? I just zeroed the entire cache drive. Reformatted it as xfs and now I get this: Phase 1 - find and verify superblock... bad primary superblock - bad magic number !!! attempting to find secondary superblock... It never found the secondary superblock. I've no idea how I could have a filesystem problem unless this happened after upgrading to 6.10.0-rc4 because that's the only thing I did recently. My system is an HP ProLaiant Microserver Gen8. Running the last BIOS they released for it. My syslog is getting polluted with these errors too. Never seen them before: Apr 14 11:01:34 Tower kernel: DMAR: ERROR: DMA PTE for vPFN 0xaf83a already set (to af83a003 not 101a6b803) Apr 14 11:01:34 Tower kernel: ------------[ cut here ]------------ Apr 14 11:01:34 Tower kernel: WARNING: CPU: 6 PID: 3903 at drivers/iommu/intel/iommu.c:2387 __domain_mapping+0x2e5/0x390 Apr 14 11:01:34 Tower kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net tun vhost vhost_iotlb tap xfs md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding tg3 ipmi_ssif x86_pkg_temp_thermal intel_powerclamp i2c_core coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate intel_uncore acpi_ipmi ahci libahci ipmi_si thermal acpi_power_meter button [last unloaded: tg3] Apr 14 11:01:34 Tower kernel: CPU: 6 PID: 3903 Comm: unraidd1 Tainted: G W I 5.15.30-Unraid #1 Apr 14 11:01:34 Tower kernel: Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019 Apr 14 11:01:34 Tower kernel: RIP: 0010:__domain_mapping+0x2e5/0x390 Apr 14 11:01:34 Tower kernel: Code: 2b 48 8b 4c 24 08 48 89 c2 4c 89 e6 48 c7 c7 8f c7 ef 81 e8 48 96 2c 00 8b 05 05 38 c3 00 85 c0 74 08 ff c8 89 05 f9 37 c3 00 <0f> 0b 8b 74 24 38 b8 34 00 00 00 8d 0c f6 83 e9 09 39 c1 0f 4f c8 Apr 14 11:01:34 Tower kernel: RSP: 0018:ffffc9000377f788 EFLAGS: 00010046 Apr 14 11:01:34 Tower kernel: RAX: 0000000000000000 RBX: ffff8881076371d0 RCX: 0000000000000027 Apr 14 11:01:34 Tower kernel: RDX: 0000000000000000 RSI: ffffc9000377f618 RDI: ffff888436f9c550 Apr 14 11:01:34 Tower kernel: RBP: ffff888107420000 R08: ffff888447f65a58 R09: 0000000000000000 Apr 14 11:01:34 Tower kernel: R10: 2933303862366131 R11: 366131303120746f R12: 00000000000af83a Apr 14 11:01:34 Tower kernel: R13: ffff8881076371d0 R14: 0000000000001000 R15: 00000000af83a000 Apr 14 11:01:34 Tower kernel: FS: 0000000000000000(0000) GS:ffff888436f80000(0000) knlGS:0000000000000000 Apr 14 11:01:34 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 14 11:01:34 Tower kernel: CR2: 00001513000281c0 CR3: 0000000108726006 CR4: 00000000000606e0 Apr 14 11:01:34 Tower kernel: Call Trace: Apr 14 11:01:34 Tower kernel: <TASK> Apr 14 11:01:34 Tower kernel: ? mempool_alloc+0x68/0x14f Apr 14 11:01:34 Tower kernel: intel_iommu_map_pages+0xf3/0x102 Apr 14 11:01:34 Tower kernel: __iommu_map+0x138/0x211 Apr 14 11:01:34 Tower kernel: __iommu_map_sg+0x8c/0x110 Apr 14 11:01:34 Tower kernel: iommu_dma_map_sg+0x245/0x3e3 Apr 14 11:01:34 Tower kernel: __dma_map_sg_attrs+0x63/0x95 Apr 14 11:01:34 Tower kernel: dma_map_sg_attrs+0xa/0x12 Apr 14 11:01:34 Tower kernel: ata_qc_issue+0xec/0x1ab Apr 14 11:01:34 Tower kernel: __ata_scsi_queuecmd+0x1f2/0x1fd Apr 14 11:01:34 Tower kernel: ata_scsi_queuecmd+0x41/0x7a Apr 14 11:01:34 Tower kernel: scsi_queue_rq+0x57d/0x6db Apr 14 11:01:34 Tower kernel: blk_mq_dispatch_rq_list+0x2a7/0x4da Apr 14 11:01:34 Tower kernel: __blk_mq_do_dispatch_sched+0x23d/0x281 Apr 14 11:01:34 Tower kernel: ? update_cfs_rq_load_avg+0x138/0x146 Apr 14 11:01:34 Tower kernel: __blk_mq_sched_dispatch_requests+0xd5/0x129 Apr 14 11:01:34 Tower kernel: blk_mq_sched_dispatch_requests+0x2f/0x52 Apr 14 11:01:34 Tower kernel: __blk_mq_run_hw_queue+0x50/0x76 Apr 14 11:01:34 Tower kernel: __blk_mq_delay_run_hw_queue+0x4d/0x108 Apr 14 11:01:34 Tower kernel: blk_mq_sched_insert_requests+0xa2/0xd9 Apr 14 11:01:34 Tower kernel: blk_mq_flush_plug_list+0xfb/0x12c Apr 14 11:01:34 Tower kernel: blk_mq_submit_bio+0x2e6/0x406 Apr 14 11:01:34 Tower kernel: submit_bio_noacct+0x9d/0x203 Apr 14 11:01:34 Tower kernel: ? raid5_generate_d+0xce/0x105 [md_mod] Apr 14 11:01:34 Tower kernel: unraidd+0x11a5/0x1237 [md_mod] Apr 14 11:01:34 Tower kernel: ? md_thread+0x103/0x12a [md_mod] Apr 14 11:01:34 Tower kernel: ? rmw5_write_data+0x17d/0x17d [md_mod] Apr 14 11:01:34 Tower kernel: md_thread+0x103/0x12a [md_mod] Apr 14 11:01:34 Tower kernel: ? init_wait_entry+0x29/0x29 Apr 14 11:01:34 Tower kernel: ? md_seq_show+0x6c8/0x6c8 [md_mod] Apr 14 11:01:34 Tower kernel: kthread+0xde/0xe3 Apr 14 11:01:34 Tower kernel: ? set_kthread_struct+0x32/0x32 Apr 14 11:01:34 Tower kernel: ret_from_fork+0x22/0x30 Apr 14 11:01:34 Tower kernel: </TASK> Apr 14 11:01:34 Tower kernel: ---[ end trace 7069ea449eb4fd61 ]--- Apr 14 11:01:34 Tower kernel: DMAR: ERROR: DMA PTE for vPFN 0xaf83b already set (to af83b003 not 14c18f803) Apr 14 11:01:34 Tower kernel: ------------[ cut here ]------------ Apr 14 11:01:34 Tower kernel: WARNING: CPU: 6 PID: 3903 at drivers/iommu/intel/iommu.c:2387 __domain_mapping+0x2e5/0x390 Apr 14 11:01:34 Tower kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net tun vhost vhost_iotlb tap xfs md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding tg3 ipmi_ssif x86_pkg_temp_thermal intel_powerclamp i2c_core coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate intel_uncore acpi_ipmi ahci libahci ipmi_si thermal acpi_power_meter button [last unloaded: tg3] Apr 14 11:01:34 Tower kernel: CPU: 6 PID: 3903 Comm: unraidd1 Tainted: G W I 5.15.30-Unraid #1 Apr 14 11:01:34 Tower kernel: Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019 Apr 14 11:01:34 Tower kernel: RIP: 0010:__domain_mapping+0x2e5/0x390 Apr 14 11:01:34 Tower kernel: Code: 2b 48 8b 4c 24 08 48 89 c2 4c 89 e6 48 c7 c7 8f c7 ef 81 e8 48 96 2c 00 8b 05 05 38 c3 00 85 c0 74 08 ff c8 89 05 f9 37 c3 00 <0f> 0b 8b 74 24 38 b8 34 00 00 00 8d 0c f6 83 e9 09 39 c1 0f 4f c8 Apr 14 11:01:34 Tower kernel: RSP: 0018:ffffc9000377f788 EFLAGS: 00010046 Apr 14 11:01:34 Tower kernel: RAX: 0000000000000000 RBX: ffff8881076371d8 RCX: 0000000000000027 Apr 14 11:01:34 Tower kernel: RDX: 0000000000000000 RSI: ffffc9000377f618 RDI: ffff888436f9c550 Apr 14 11:01:34 Tower kernel: RBP: ffff888107420000 R08: ffff888447f65f50 R09: 0000000000000000 Apr 14 11:01:34 Tower kernel: R10: 2933303866383163 R11: 383163343120746f R12: 00000000000af83b Apr 14 11:01:34 Tower kernel: R13: ffff8881076371d8 R14: 0000000000001000 R15: 00000000af83b000 Apr 14 11:01:34 Tower kernel: FS: 0000000000000000(0000) GS:ffff888436f80000(0000) knlGS:0000000000000000 Apr 14 11:01:34 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 14 11:01:34 Tower kernel: CR2: 00001513000281c0 CR3: 0000000108726006 CR4: 00000000000606e0 Apr 14 11:01:34 Tower kernel: Call Trace: Apr 14 11:01:34 Tower kernel: <TASK> Apr 14 11:01:34 Tower kernel: ? mempool_alloc+0x68/0x14f Apr 14 11:01:34 Tower kernel: intel_iommu_map_pages+0xf3/0x102 Apr 14 11:01:34 Tower kernel: __iommu_map+0x138/0x211 Apr 14 11:01:34 Tower kernel: __iommu_map_sg+0x8c/0x110 Apr 14 11:01:34 Tower kernel: iommu_dma_map_sg+0x245/0x3e3 Apr 14 11:01:34 Tower kernel: __dma_map_sg_attrs+0x63/0x95 Apr 14 11:01:34 Tower kernel: dma_map_sg_attrs+0xa/0x12 Apr 14 11:01:34 Tower kernel: ata_qc_issue+0xec/0x1ab Apr 14 11:01:34 Tower kernel: __ata_scsi_queuecmd+0x1f2/0x1fd Apr 14 11:01:34 Tower kernel: ata_scsi_queuecmd+0x41/0x7a Apr 14 11:01:34 Tower kernel: scsi_queue_rq+0x57d/0x6db Apr 14 11:01:34 Tower kernel: blk_mq_dispatch_rq_list+0x2a7/0x4da Apr 14 11:01:34 Tower kernel: __blk_mq_do_dispatch_sched+0x23d/0x281 Apr 14 11:01:34 Tower kernel: ? update_cfs_rq_load_avg+0x138/0x146 Apr 14 11:01:34 Tower kernel: __blk_mq_sched_dispatch_requests+0xd5/0x129 Apr 14 11:01:34 Tower kernel: blk_mq_sched_dispatch_requests+0x2f/0x52 Apr 14 11:01:34 Tower kernel: __blk_mq_run_hw_queue+0x50/0x76 Apr 14 11:01:34 Tower kernel: __blk_mq_delay_run_hw_queue+0x4d/0x108 Apr 14 11:01:34 Tower kernel: blk_mq_sched_insert_requests+0xa2/0xd9 Apr 14 11:01:34 Tower kernel: blk_mq_flush_plug_list+0xfb/0x12c Apr 14 11:01:34 Tower kernel: blk_mq_submit_bio+0x2e6/0x406 Apr 14 11:01:34 Tower kernel: submit_bio_noacct+0x9d/0x203 Apr 14 11:01:34 Tower kernel: ? raid5_generate_d+0xce/0x105 [md_mod] Apr 14 11:01:34 Tower kernel: unraidd+0x11a5/0x1237 [md_mod] Apr 14 11:01:34 Tower kernel: ? md_thread+0x103/0x12a [md_mod] Apr 14 11:01:34 Tower kernel: ? rmw5_write_data+0x17d/0x17d [md_mod] Apr 14 11:01:34 Tower kernel: md_thread+0x103/0x12a [md_mod] Apr 14 11:01:34 Tower kernel: ? init_wait_entry+0x29/0x29 Apr 14 11:01:34 Tower kernel: ? md_seq_show+0x6c8/0x6c8 [md_mod] Apr 14 11:01:34 Tower kernel: kthread+0xde/0xe3 Apr 14 11:01:34 Tower kernel: ? set_kthread_struct+0x32/0x32 Apr 14 11:01:34 Tower kernel: ret_from_fork+0x22/0x30 Apr 14 11:01:34 Tower kernel: </TASK> Apr 14 11:01:34 Tower kernel: ---[ end trace 7069ea449eb4fd62 ]--- Apr 14 11:01:34 Tower kernel: DMAR: ERROR: DMA PTE for vPFN 0xaf83c already set (to af83c003 not 103e3c803)
  12. That many LBAs = 56.40 TB of data written. It's 3.5 years old also. So I'd say it's finally kicked the bucket. Samsung states the 850 EVO should last 5 years or to 75TB TBW. So it's more than 2/3rd of the way there.
  13. Safe to say it's dead jim? Here's the SMART data: smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.30-Unraid] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Samsung based SSDs Device Model: Samsung SSD 850 EVO 250GB Serial Number: S3R0NF0J888231T LU WWN Device Id: 5 002538 d422b6b74 Firmware Version: EMT03B6Q User Capacity: 250,059,350,016 bytes [250 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Thu Apr 14 11:30:52 2022 EEST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x53) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 133) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 9 Power_On_Hours -O--CK 093 093 000 - 30832 12 Power_Cycle_Count -O--CK 099 099 000 - 72 177 Wear_Leveling_Count PO--C- 082 082 000 - 374 179 Used_Rsvd_Blk_Cnt_Tot PO--C- 100 100 010 - 0 181 Program_Fail_Cnt_Total -O--CK 100 100 010 - 0 182 Erase_Fail_Count_Total -O--CK 100 100 010 - 0 183 Runtime_Bad_Block PO--C- 100 100 010 - 0 187 Uncorrectable_Error_Cnt -O--CK 100 100 000 - 0 190 Airflow_Temperature_Cel -O--CK 070 051 000 - 30 195 ECC_Error_Rate -O-RC- 200 200 000 - 0 199 CRC_Error_Count -OSRCK 100 100 000 - 0 235 POR_Recovery_Count -O--C- 099 099 000 - 7 241 Total_LBAs_Written -O--CK 099 099 000 - 121124616664 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 1 Comprehensive SMART error log 0x03 GPL R/O 1 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x13 GPL R/O 1 SATA NCQ Send and Receive log 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa1 SL VS 16 Device vendor specific log 0xa5 SL VS 16 Device vendor specific log 0xce SL VS 16 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (1 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (1 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing 255 0 65535 Read_scanning was never started Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 256 (0x0100) Device State: Active (0) Current Temperature: 30 Celsius Power Cycle Min/Max Temperature: 27/41 Celsius Lifetime Min/Max Temperature: 19/48 Celsius Under/Over Temperature Limit Count: 0/0 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 10 minutes Min/Max recommended Temperature: 0/70 Celsius Min/Max Temperature Limit: 0/70 Celsius Temperature History Size (Index): 128 (106) Index Estimated Time Temperature Celsius 107 2022-04-13 14:20 31 ************ ... ..( 7 skipped). .. ************ 115 2022-04-13 15:40 31 ************ 116 2022-04-13 15:50 30 *********** ... ..( 8 skipped). .. *********** 125 2022-04-13 17:20 30 *********** 126 2022-04-13 17:30 29 ********** 127 2022-04-13 17:40 30 *********** 0 2022-04-13 17:50 29 ********** ... ..( 20 skipped). .. ********** 21 2022-04-13 21:20 29 ********** 22 2022-04-13 21:30 28 ********* ... ..( 3 skipped). .. ********* 26 2022-04-13 22:10 28 ********* 27 2022-04-13 22:20 29 ********** 28 2022-04-13 22:30 29 ********** 29 2022-04-13 22:40 29 ********** 30 2022-04-13 22:50 30 *********** ... ..( 2 skipped). .. *********** 33 2022-04-13 23:20 30 *********** 34 2022-04-13 23:30 31 ************ ... ..( 65 skipped). .. ************ 100 2022-04-14 10:30 31 ************ 101 2022-04-14 10:40 30 *********** 102 2022-04-14 10:50 29 ********** 103 2022-04-14 11:00 29 ********** 104 2022-04-14 11:10 34 *************** 105 2022-04-14 11:20 30 *********** 106 2022-04-14 11:30 30 *********** SCT Error Recovery Control: Read: Disabled Write: Disabled Device Statistics (GP/SMART Log 0x04) not supported Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 20 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 20 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000d 2 0 Non-CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x0013 2 0 R_ERR response for host-to-device non-data FIS, non-CRC
  14. It looks like the cache drive is just dead? I backed up the data with CA Backup and Restore. Formatted the drive, booted it in maintenance mode and this is what I get: [1/7] checking root items [2/7] checking extents checksum verify failed on 8355840 wanted 0x00000000 found 0xb6bde3e4 checksum verify failed on 8355840 wanted 0x00000000 found 0xb6bde3e4 bad tree block 8355840, bytenr mismatch, want=8355840, have=0 checksum verify failed on 5783552 wanted 0xdaed5a47 found 0x77236f47 checksum verify failed on 5783552 wanted 0xdaed5a47 found 0x77236f47 Csum didn't match checksum verify failed on 7684096 wanted 0x13f0f7b0 found 0x578f2147 checksum verify failed on 7684096 wanted 0x13f0f7b0 found 0x578f2147 bad tree block 7684096, bytenr mismatch, want=7684096, have=2969063443 checksum verify failed on 7700480 wanted 0x13f077b1 found 0xcdece909 checksum verify failed on 7700480 wanted 0x13f077b1 found 0xcdece909 bad tree block 7700480, bytenr mismatch, want=7700480, have=2977452051 checksum verify failed on 7716864 wanted 0x00000000 found 0x22b4c582 checksum verify failed on 7716864 wanted 0x00000000 found 0x22b4c582 bad tree block 7716864, bytenr mismatch, want=7716864, have=0 checksum verify failed on 7733248 wanted 0x00000000 found 0xb6bde3e4 checksum verify failed on 7733248 wanted 0x00000000 found 0xb6bde3e4 bad tree block 7733248, bytenr mismatch, want=7733248, have=0 --------------------------- there is a billion lines of similar info here ---------------------- skip to the bottom ... root 5 inode 75857766 errors 2001, no inode item, link count wrong unresolved ref dir 260 index 89 namelen 8 name jellyfin filetype 2 errors 4, no inode ref root 5 inode 75857768 errors 2001, no inode item, link count wrong unresolved ref dir 260 index 90 namelen 8 name Jellyfin filetype 2 errors 4, no inode ref ERROR: errors found in fs roots Opening filesystem to check... Checking filesystem on /dev/sdf1 UUID: 7026cabb-40c4-4fd4-9d10-135feb1312ea The following tree block(s) is corrupted in tree 5: tree block bytenr: 6094848, level: 1, node key: (5494035, 108, 696893440) found 30542331904 bytes used, error(s) found total csum bytes: 2828108 total tree bytes: 109182976 total fs tree bytes: 80052224 total extent tree bytes: 23707648 btree space waste bytes: 21342871 file data blocks allocated: 389063712768 referenced 17720246272 I've formatted it as xfs, back to btrfs etc. and the check gives these errors every time. Also for some reason during this my /var/log is giving the following error: Either your server has an extremely long uptime, or your syslog could be potentially being spammed with error messages. A reboot of your server will at least temporarily solve this problem, but ideally you should seek assistance in the forums and post your More Information My /var/log partition is 568MB in size to account for the statistics plugin etc. I have no idea why it instantly went from 15% to 100% after this. I guess it wrote the above billion lines of SSD errors to file?
  15. How do I fix this? Seems that Unraid doesn't have rights to write to my cache drive and the docker.img can't be created. Apr 14 10:54:12 Tower emhttpd: shcmd (310): /usr/local/sbin/mount_image '/mnt/user/system/docker/docker.img' /var/lib/docker 40 Apr 14 10:54:12 Tower root: Creating new image file: '/mnt/user/system/docker/docker.img' size: 40G Apr 14 10:54:12 Tower root: touch: cannot touch '/mnt/user/system/docker/docker.img': Read-only file system Apr 14 10:54:12 Tower root: failed to create image file Apr 14 10:54:12 Tower emhttpd: shcmd (310): exit status: 1 Apr 14 10:54:13 Tower avahi-daemon[31433]: Server startup complete. Host name is Tower.local. Local service cookie is 633978188. Apr 14 10:54:14 Tower avahi-daemon[31433]: Service "Tower" (/services/ssh.service) successfully established. Apr 14 10:54:14 Tower avahi-daemon[31433]: Service "Tower" (/services/smb.service) successfully established. Apr 14 10:54:14 Tower avahi-daemon[31433]: Service "Tower" (/services/sftp-ssh.service) successfully established.
  16. For some reason also some of my shares are being reported as only partially protected, eg. the cache despite the fact that I have appdata set to "cache only".
  17. So I updated to 6.10.0-rc4 , to solve an issue I was having and Docker seems to have spontaneously died. Logs indicated something wrong with my cache drive. I get this error even after multiple reboots: Apr 13 11:01:36 Tower kernel: ata5: SATA max UDMA/133 abar m2048@0xfacd0000 port 0xfacd0300 irq 36 Apr 13 11:01:36 Tower kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 13 11:01:36 Tower kernel: ata5.00: supports DRM functions and may not be fully accessible Apr 13 11:01:36 Tower kernel: ata5.00: ATA-9: Samsung SSD 850 EVO 250GB, EMT03B6Q, max UDMA/133 Apr 13 11:01:36 Tower kernel: ata5.00: disabling queued TRIM support Apr 13 11:01:36 Tower kernel: ata5.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 32), AA Apr 13 11:01:36 Tower kernel: ata5.00: Features: Trust Dev-Sleep NCQ-sndrcv Apr 13 11:01:36 Tower kernel: ata5.00: supports DRM functions and may not be fully accessible Apr 13 11:01:36 Tower kernel: ata5.00: disabling queued TRIM support Apr 13 11:01:36 Tower kernel: ata5.00: configured for UDMA/133 Apr 13 11:01:36 Tower kernel: sd 5:0:0:0: [sdf] 488397168 512-byte logical blocks: (250 GB/233 GiB) Apr 13 11:01:36 Tower kernel: sd 5:0:0:0: [sdf] Write Protect is off Apr 13 11:01:36 Tower kernel: sd 5:0:0:0: [sdf] Mode Sense: 00 3a 00 00 Apr 13 11:01:36 Tower kernel: sd 5:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 13 11:01:36 Tower kernel: sdf: sdf1 Apr 13 11:01:36 Tower kernel: sd 5:0:0:0: [sdf] Attached SCSI disk Apr 13 11:01:36 Tower kernel: BTRFS: device fsid 7026cabb-40c4-4fd4-9d10-135feb1312ea devid 1 transid 21806631 /dev/sdf1 scanned by udevd (700) Apr 13 11:01:47 Tower emhttpd: Samsung_SSD_850_EVO_250GB_S3R0NF0J888231T (sdf) 512 488397168 Apr 13 11:01:47 Tower emhttpd: import 30 cache device: (sdf) Samsung_SSD_850_EVO_250GB_S3R0NF0J888231T Apr 13 11:01:48 Tower emhttpd: read SMART /dev/sdf Apr 13 11:01:56 Tower emhttpd: shcmd (42): mount -t btrfs -o noatime,space_cache=v2 /dev/sdf1 /mnt/cache Apr 13 11:01:56 Tower kernel: BTRFS info (device sdf1): flagging fs with big metadata feature Apr 13 11:01:56 Tower kernel: BTRFS info (device sdf1): using free space tree Apr 13 11:01:56 Tower kernel: BTRFS info (device sdf1): has skinny extents Apr 13 11:01:56 Tower kernel: BTRFS info (device sdf1): bdev /dev/sdf1 errs: wr 0, rd 0, flush 0, corrupt 844, gen 0 Apr 13 11:01:56 Tower kernel: BTRFS info (device sdf1): enabling ssd optimizations Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 7700480 have 2977452051 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 7716864 have 0 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 7733248 have 0 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 7749632 have 0 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 7684096 have 2969063443 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 8323072 have 0 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 8339456 have 0 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 8339456 have 0 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 8323072 have 0 Apr 13 11:01:59 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 8339456 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 8355840 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 8355840 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 10780672 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 10764288 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 10780672 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 10780672 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 10780672 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 10780672 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 10780672 have 0 Apr 13 11:06:58 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 10780672 have 0 Apr 13 11:07:09 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 41320448 have 18446719884523653056 Apr 13 11:07:17 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 701038592 have 0 Apr 13 11:07:29 Tower kernel: BTRFS error (device sdf1): bad tree block start, want 1017528320 have 4096 Apr 13 11:07:29 Tower kernel: BTRFS: error (device sdf1) in __btrfs_free_extent:3069: errno=-5 IO failure Apr 13 11:07:29 Tower kernel: BTRFS info (device sdf1): forced readonly Apr 13 11:07:29 Tower kernel: BTRFS: error (device sdf1) in btrfs_run_delayed_refs:2150: errno=-5 IO failure I'm now getting the error in fix common problems that the cache drive can't be written to. Obvously due to the above readout of "forced readonly" I've also deleted my docker.img and then run the docker safe permissions script. I cannot get docker to start again though no matter what. Diagnostics attached. What's the steps to follow in this case? I haven't formatted the cache drive because I am quite sure that there's some system files there. root@Tower:/mnt/cache# ls CommunityApplicationsAppdataBackup/ appdata/ domains/ isos/ system/ root@Tower:/mnt/cache# cd system root@Tower:/mnt/cache/system# ls docker/ libvirt/ This is a pretty big catastrophe for my system. I've lost over 50 docker images and a lot of working services. tower-diagnostics-20220413-1114.zip
  18. It does not happen on the 'next' branch: 6.10.0-rc4 2022-03-19 (vs. 6.10.0-rc3)
  19. Version 6.9.2 2021-04-07 Sorry, I should have made that clear. EDIT: I'll update and see if it still happens.
  20. Could someone please share a way to generate an SSL cert for tailscale on Unraid? The tailscale cert doesn't work either from inside the container, nor on my host machine.
  21. Nope. I stopped using it. It's overall too unstable for me.
  22. Input a password on the main login screen that is wrong a bunch of times to get locked out It will say that too many attempts happened Input the correct password It will still tell you too many attempts happened and nothing will happen Refresh the page, you are now logged in and at the Unraid menu This has to be a bug. Basically it seems the login page takes the password even though it spits the message "too many attempts" and you get a cookie which stores that you are logged in. Reloading the page pushes you straight to the Unraid menu and bypasses the 'login lock' which you had from all the failed attempts. Even though you put the password in correctly, you should expect to be locked out until the lock expires. Tried this on 3 different browsers. Found it by accident.
  23. OOF. It was the setting in the shares > 'share name' > Case-sensitive names: lower Changed it to 'Auto' and it works fine. Explains why filenames with all lowercase were fine, and those with mixed case or Capital case were inaccessible.
  24. It said there were no updates available and I was on the default branch that came with the container. I'm running the linuxserver one now without any problems. Not sure what it was.