caplam

Members
  • Posts

    335
  • Joined

  • Last visited

Everything posted by caplam

  1. yes by ssh as it was impossible via gui. After reboot all is running perfectly well. I received a new hdd for parity 2 : currently preclearing godzilla-diagnostics-20201026-0941.zip
  2. I thought i was fine but i'm definitely a "black cat" (french expression to say very unlucky). This night smbd process crashed during vm backup. This morning some dockers and some vm were unresponsive. I had to kill smbd to be able to stop array and reboot. It took me one hour. I hope this is a one time bug (i never had this before).
  3. i had multiple problems since few days that forced me to reboot server several times. Each time the server reboots "Fix Commons Problems" scan the server and throw me errors about share with can only setting setup. I run 6.9 beta30 and i have 2 pools (one for docker the other for vm)
  4. here are the diags. You only see errors on the former parity 2 disk. godzilla-diagnostics-20201025-1517.zip
  5. For now i have not access to my server. But rebuild started and i had still errors on sas controllers. So i rebooted and a new rebuild started. Now it’s finished and the array is ok. Disk2 is enabled. I have no more sas errors. As i seem very unlucky this week end the replacement disk throw some read error rate. I have messages that point to raw read error rate and few minutes or hours later it’s back to normal. I am also running a 2 passes preclear on the first replacement disk2.
  6. i continue to see sas errors in log: Oct 24 19:37:26 godzilla kernel: sas: Enter sas_scsi_recover_host busy: 9 failed: 9 Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x00000000b6e063ac Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x00000000b6e063ac Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 00000000b6e063ac, old_request == 00000000faf8b36b Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 00000000b6e063ac , old_request == 00000000faf8b36b Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x00000000b6e063ac is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x00000000b6e063ac is done Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x000000001fb3ea36 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x000000001fb3ea36 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 000000001fb3ea36, old_request == 00000000f191f1ed Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 000000001fb3ea36 , old_request == 00000000f191f1ed Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x000000001fb3ea36 is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x000000001fb3ea36 is done Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x0000000029adb82b Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x0000000029adb82b Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 0000000029adb82b, old_request == 0000000057d81343 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 0000000029adb82b , old_request == 0000000057d81343 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x0000000029adb82b is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x0000000029adb82b is done Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x000000001f95191f Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x000000001f95191f Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 000000001f95191f, old_request == 00000000a10e1ea9 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 000000001f95191f , old_request == 00000000a10e1ea9 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x000000001f95191f is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x000000001f95191f is done Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x000000005c2436ca Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x000000005c2436ca Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 000000005c2436ca, old_request == 000000009da361b3 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 000000005c2436ca , old_request == 000000009da361b3 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x000000005c2436ca is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x000000005c2436ca is done Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x0000000023c5b3a7 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x0000000023c5b3a7 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 0000000023c5b3a7, old_request == 0000000047e60ca1 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 0000000023c5b3a7 , old_request == 0000000047e60ca1 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x0000000023c5b3a7 is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x0000000023c5b3a7 is done Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x000000006facca72 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x000000006facca72 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 000000006facca72, old_request == 0000000024adc4d9 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 000000006facca72 , old_request == 0000000024adc4d9 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x000000006facca72 is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x000000006facca72 is done Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x00000000520d3197 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x00000000520d3197 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 00000000520d3197, old_request == 00000000d6f7823f Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 00000000520d3197 , old_request == 00000000d6f7823f Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x00000000520d3197 is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x00000000520d3197 is done Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x0000000039adefa1 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x0000000039adefa1 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 0000000039adefa1, old_request == 000000001462f4c3 Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 0000000039adefa1 , old_request == 000000001462f4c3 Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x0000000039adefa1 is done Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x0000000039adefa1 is done Oct 24 19:37:26 godzilla kernel: sas: ata10: end_device-9:0: cmd error handler Oct 24 19:37:26 godzilla kernel: sas: ata10: end_device-9:0: dev error handler Oct 24 19:37:26 godzilla kernel: ata10.00: exception Emask 0x0 SAct 0x3fe00 SErr 0x0 action 0x6 frozen Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/98:00:38:22:0b/02:00:00:00:00/40 tag 9 ncq dma 339968 in Oct 24 19:37:26 godzilla kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/98:00:d0:24:0b/01:00:00:00:00/40 tag 10 ncq dma 208896 in Oct 24 19:37:26 godzilla kernel: res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: sas: ata14: end_device-9:1: dev error handler Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/38:00:68:26:0b/02:00:00:00:00/40 tag 11 ncq dma 290816 in Oct 24 19:37:26 godzilla kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: sas: ata15: end_device-9:2: dev error handler Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/08:00:f8:87:c1/00:00:2a:00:00/40 tag 12 ncq dma 4096 in Oct 24 19:37:26 godzilla kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/28:00:a0:28:0b/01:00:00:00:00/40 tag 13 ncq dma 151552 in Oct 24 19:37:26 godzilla kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/70:00:c8:29:0b/00:00:00:00:00/40 tag 14 ncq dma 57344 in Oct 24 19:37:26 godzilla kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/20:00:e8:04:62/00:00:5d:01:00/40 tag 15 ncq dma 16384 in Oct 24 19:37:26 godzilla kernel: res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/58:00:38:2a:0b/02:00:00:00:00/40 tag 16 ncq dma 307200 in Oct 24 19:37:26 godzilla kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/50:00:90:2c:0b/00:00:00:00:00/40 tag 17 ncq dma 40960 in Oct 24 19:37:26 godzilla kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY } Oct 24 19:37:26 godzilla kernel: ata10: hard resetting link Oct 24 19:37:26 godzilla kernel: ata10.00: configured for UDMA/133 Oct 24 19:37:26 godzilla kernel: ata10: EH complete Oct 24 19:37:26 godzilla kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 9 tries: 1
  7. so i put a third disk. This time it's a iron wolf (3 pass precleared) which has already 20k hours. Rebuilding has started.
  8. is it the disk itself or a connection problem. The disk is used but i had it precleared 3 pass.
  9. when i try to spin up i have : Unraid Disk 2 SMART health [1]: 24-10-2020 19:16 Warning [GODZILLA] - raw read error rate is 132 WDC_WD40EFRX-68WT0N0_WD-WCC4E1KN5L9R (sdr) Unraid Disk 2 SMART health [200]: 24-10-2020 19:16 Warning [GODZILLA] - multi zone error rate is 1 WDC_WD40EFRX-68WT0N0_WD-WCC4E1KN5L9R (sdr) Can i rebuild another disk ? i have others on spare.
  10. i can't for now disk is spun down and i can't spin it up
  11. from the log i see many sas errors. I have no sas disk in the array. All array disk are in my main case on sata ports. on the sas controller i have disks in an external case. For now all are unassigned. I also see write errors and before that "link is slow to respond" on a sata port. What does it mean ? bad cable ? (it would be bad luck)
  12. cool i ignored that. 😀 I used to have tapes drives but the last one was a lto3.
  13. here it is. godzilla-diagnostics-20201024-1855.zip
  14. during rebuilding i was able to browse disk2. Now rebuilding is done but end with disk2 disabled again. What can i do now ? edit : there is something strange : i see the disk2 as disk2 which is disabled: it's device sdr and i see it too as unassigned device it's device sdq.
  15. I followed the re enable procedure. Stop array Unassign disk2 Start array Stop array Assign disk2 Start array At this point rebuild starts Before that i ran xfs repair Within the gui (so on emulated disk) Rebuilding is on its way. I think i’ll start vms and dockers. Normally, it should Be god. If not i’ll have to find how to start with a blank drive2 and restore from backup i guess. But i also have old disk2. From what i saw corrupted data weren’t important(temporary download files)
  16. You can’t use a tape drive like a hdd. You have to use a backup software and setup backup jobs. It’s good hardware to backup on cold media but it’s absolutely not a external storage.
  17. Last year i went that path. I replaced my nas and servers with a hp workstation that i upgraded (z620 with 2xe5 2650v2 128gb ram) I put a 3in2 sata cage and a 2 sata 1nvme bracket. It can hold 6 3,5’´ hdd, 2 m2 sata ssd. For now i don’t use nvme slot. I also added an hba (9207 8e). The tricky part was to find an external jbod case. It to ok me one year but i now have 8 more 3,5’´ slots. I also put a nvidia p400 which can hold 4 transcodes (plex or tdarr).
  18. i've just realised that xfs_repair had been run (from gui) on /dev/md2 which is the emulated disk; so it's logical that the drive is still disabled. I shouldn't have let the rebuild run. I was useless. So a new rebuild is running, next stage in 7 hours. In the mean time ca i re-enable docker and vm ?
  19. i ran xfs_repair on disk2 in maintenance mode. I then restarted the array but no luck : disk2 is disabled.
  20. no but i didn't lost data. I had to make several tries because i forgot to save vm definition files You have to rebuild the same vm. In the end i managed to succeed. I had several debian vm and 1 win8 but no macos (these ones are tricky).
  21. from what i remember i had to rebuild my vms when i switched from proxmox to unraid. Unraid takes raw (img) and qcow2 vdisk files. If you have vmdk you have to convert it with qemu-img.
  22. ok so i guess that when rebuild is finished i stop the array and restart it in maintenance mode. In the gui i can run xfs repair. I have to do it for both disk 2 and 4.
  23. i think i'll let rebuild finish. As you said the disk was corrupted as rebuild was started. I stopped it around 2%. I have a 6Tb disk to order. I have spare ones in 4Tb but not in 6.
  24. Dans la version francaise il y a une erreur de traduction sur la page des partages quand il n'y a pas de "protection" parce qu'un rebuild d'un disque est en cours il y a un triangle orange. Lorsque l'on passe la souris dessus un message s'affiche: en anglais "some or all files unprotected" en français la traduction actuelle est "certains ou tous les fichiers sont cryptés" Pendant un moment j'ai eu les boules 😁