obuwunkunubi

August 19, 2022

7 minutes ago, JorgeB said:

unless a balance is running, pool is OK for now

Since the "Balance" button isn't greyed out and I am not seeing any balance progress I'm assuming that it isn't running.

image.png.62a51e7336b7e51800e40d1a5687e8a6.png

Thanks for the help and suggestions! I'll take a look at the monitoring script.

August 19, 2022

1 hour ago, JorgeB said:

Yes.

Thanks, that seems to have worked! There were no uncorrectable errors and there were no I/O errors in the logs. There are also no errors in the logs when trying to open a file.

image.png.b2a92192d1edba422c308864a9d14f83.png

Do I have to take any further steps regarding my cache pool or can I use it normally now?

It still says "No balance found on '/mnt/cache'", is that an issue or is it supposed to be like that?

image.png.9d0a9dd9a89a46fb5510287c41e84811.png

August 19, 2022

1 hour ago, JorgeB said:

try a different SATA port, also make sure you're using a different power cable

Thanks for the suggestion, I have tried both now. Things such as smart info are still showing for now but my log is still filled with errors of the following two types (in the code box below (sdc is the new drive that has been giving me issues)) whenever I try to access a file that is stored on the cache. The file still opens and plays normally otherwise.

Aug 19 11:23:06 Serverina kernel: BTRFS error (device sdb1): bad tree block start, want 1271081533440 have 0
Aug 19 11:23:06 Serverina kernel: BTRFS error (device sdb1): bad tree block start, want 1271081549824 have 0

Aug 19 11:23:12 Serverina kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 1270182952960 (dev /dev/sdc1 sector 944117472)
Aug 19 11:23:12 Serverina kernel: BTRFS info (device sdb1): read error corrected: ino 0 off 1270182957056 (dev /dev/sdc1 sector 944117480)

So should I try to run another scrub to repair the errors?

I have also noticed that it says "No balance found on '/mnt/cache'". Is that also an issue? Should I be performing another balance?

August 18, 2022

Hello,

so today I decided to add a drive to my cache pool. It was going normally until the drive temperature randomly stopped showing. But the balance seemed to keep on going. The problem is that after it finished the mover wasn't working. So I decided to shut down the system and try another SATA cable. The temperature and smart details started showing again after starting up again but I decided to run a btrfs scrub to check for errors. It was correcting what seemed like every single sector until it started showing I/O errors (for the new disk) in the logs and subsequently uncorrectable sectors, so I stopped the scrub. It seems that I can still access the files on the pool since the first drive seems to be working normally. The new drive has since stopped showing smart info and temperature again. Is there a way to fix these issues? Is it a problem with the drive, since it doesn't seem to be the SATA cable? In that case can I remove the new drive and only keep the original while I wait for a replacement?

Otherwise I have made an appdata backup before doing this and I also backed up some of the more important files before doing this. But I would still like to keep everything if possible (even though they are mostly torrents which I can re-download)

I have attached my diagnostics file below. (Sadly I forgot to save one before shutting the system down to switch the SATA cable, so this is just the new one)

serverina-diagnostics-20220818-2150.zip

August 12, 2022

3 minutes ago, JorgeB said:

The LSI HBA doesn't support trim, use the Intel SATA ports for the SSD instead, but change the mode to AHCI in the BIOS, it's set to IDE.

I recently changed it to the LSI HBA SATA port, so I'm guessing that's why I wasn't seeing any errors regarding this before. Thanks for the info, I'll change it back asap.

5 minutes ago, JonathanM said:

Not optional, mandatory. All changes to the array disks, formatting, file system checks, etc, MUST be done while the array is started so parity can remain in sync with the changes made.

Thanks, I'll keep that in mind next time I'm adding a drive.

August 12, 2022

2 minutes ago, JorgeB said:

Why? Just needed to format the disk.

Sadly I forgot that that was an option while the disk was in the array, even though that's the way I did it last time.

4 minutes ago, JorgeB said:

No, a few sync errors after an unclean shutdown are normal, even expected, last check detected 0 errors so all is good now.

Thanks for the confirmation! I thought that was the case, but I just wanted to make sure.

Do you by chance also have any advice regarding the BTRFS trim warning/error at the end?

Aug 11 23:05:28 Serverina kernel: blk_update_request: critical target error, dev sdb, sector 976773120 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
Aug 11 23:05:28 Serverina kernel: BTRFS warning (device sdb1): failed to trim 1 device(s), last error -121

I noticed it for the first time then, but it seems to be showing up daily (at the time my Dynamix SSD TRIM plugin is set to run). Since the issue seems to be regarding the plugin, I guess that I should ask in the Dynamix Plugins thread.

August 12, 2022

Hello,
so a couple of days ago I tried adding a new drive to my array. So just like the previous times I did so, I ran 2 cycles of preclear and everything seemed fie with the drive.

####################################################################################################
#                              Unraid Server Preclear of disk XXXXXXXX                             #
#                             Cycle 2 of 2, partition start on sector 64.                          #
#                                                                                                  #
#   Step 1 of 5 - Pre-read verification:                   [6:38:01 @ 167 MB/s] SUCCESS            #
#   Step 2 of 5 - Zeroing the disk:                        [6:38:53 @ 167 MB/s] SUCCESS            #
#   Step 3 of 5 - Writing Unraid's Preclear signature:                          SUCCESS            #
#   Step 4 of 5 - Verifying Unraid's Preclear signature:                        SUCCESS            #
#   Step 5 of 5 - Post-Read verification:                  [6:38:04 @ 167 MB/s] SUCCESS            #
#                                                                                                  #
#                                                                                                  #
#                                                                                                  #
####################################################################################################
#       Cycle elapsed time: 19:55:01 | Total elapsed time: 39:50:13                                #
####################################################################################################

####################################################################################################
#   S.M.A.R.T. Status (device type: default)                                                       #
#                                                                                                  #
#   ATTRIBUTE                   INITIAL CYCLE 1 CYCLE 2 STATUS                                     #
#   Reallocated_Sector_Ct       0       0       0       -                                          #
#   Power_On_Hours              0       20      40      Up 40                                      #
#   End-to-End_Error            0       0       0       -                                          #
#   Reported_Uncorrect          0       0       0       -                                          #
#   Airflow_Temperature_Cel     27      32      31      Up 4                                       #
#   Current_Pending_Sector      0       0       0       -                                          #
#   Offline_Uncorrectable       0       0       0       -                                          #
#   UDMA_CRC_Error_Count        0       0       0       -                                          #
#                                                                                                  #
####################################################################################################
#                                                                                                  #
####################################################################################################
--> ATTENTION: Please take a look into the SMART report above for drive health issues.

--> RESULT: Preclear Finished Successfully!.

After that, I spun down my array and added the new disk. After restarting the array I got the "Unmountable: Unsupported partition layout" error shown next to the drive, so I wanted to stop the array but after waiting for 20 minutes it still hadn't stopped, so I tried using the shutdown button in the GUI and the server shut down. After starting back up I don't remember if I tried formatting the drive again and then starting the array or just starting it. But when it started, I noticed that it had detected an unclean shutdown so it started a parity check with the "Unmountable: Unsupported partition layout" drive in the array and it started showing (and correcting sync errors).

Aug 10 14:35:37 Serverina kernel: md: recovery thread: check P Q ...
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=0
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=8
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=16
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=24
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=32
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=40
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=48
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=56
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=64
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=128
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=136
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=144
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=152
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=160
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=168
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=176
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=184

So I panicked and quickly stopped it. After that I used the format unmountable drives in array option which formatted the drive properly this time and it was successfully added into the array.

After seeing the previous errors I started another parity check which resulted in 28 errors being corrected:

Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509272
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509280
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509288
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509296
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509304
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509312
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509320
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509328
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509336

Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018544
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018552
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018560
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018568
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018576
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018584
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018592
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018600
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018608
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018616

Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527816
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527824
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527832
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527840
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527848
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527856
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527864
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527872
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527880

So after reading some other posts regarding parity I ran another parity check the next day, just to make sure there was no issues with the hardware and it found no errors.

image.png.c559aa82e2be11d8fdf6e376e4254be1.png

My main question is, should I be worried about any data loss? Or have any of my files potentially been damaged?

As far as I can see, everything seems to be running fine now, but I would just like to make sure.

I have attached my two diagnostics files (the older one contains the parity checks that found errors and the second one is the one where I ran the second parity check the next day (there was a reboot in between))

Also slightly unrelated:

When looking at the parity errors I found the following regarding my cache drive in the logs

Aug 10 23:05:21 Serverina kernel: blk_update_request: critical target error, dev sdb, sector 976773120 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
Aug 10 23:05:21 Serverina kernel: BTRFS warning (device sdb1): failed to trim 1 device(s), last error -121

Should I be worried about this? I am currently running only a single cache drive, but I was planing on adding a second one of the same capacity which should be arriving next week.

serverina-diagnostics-20220811-0118.zip serverina-diagnostics-20220812-1409.zip

obuwunkunubi

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by obuwunkunubi

Issues with new drive in cache pool

Issues with new drive in cache pool

Issues with new drive in cache pool

Issues with new drive in cache pool

Issues adding new drive to array

Issues adding new drive to array

Issues adding new drive to array