Hello,
so a couple of days ago I tried adding a new drive to my array. So just like the previous times I did so, I ran 2 cycles of preclear and everything seemed fie with the drive.
####################################################################################################
# Unraid Server Preclear of disk XXXXXXXX #
# Cycle 2 of 2, partition start on sector 64. #
# #
# Step 1 of 5 - Pre-read verification: [6:38:01 @ 167 MB/s] SUCCESS #
# Step 2 of 5 - Zeroing the disk: [6:38:53 @ 167 MB/s] SUCCESS #
# Step 3 of 5 - Writing Unraid's Preclear signature: SUCCESS #
# Step 4 of 5 - Verifying Unraid's Preclear signature: SUCCESS #
# Step 5 of 5 - Post-Read verification: [6:38:04 @ 167 MB/s] SUCCESS #
# #
# #
# #
####################################################################################################
# Cycle elapsed time: 19:55:01 | Total elapsed time: 39:50:13 #
####################################################################################################
####################################################################################################
# S.M.A.R.T. Status (device type: default) #
# #
# ATTRIBUTE INITIAL CYCLE 1 CYCLE 2 STATUS #
# Reallocated_Sector_Ct 0 0 0 - #
# Power_On_Hours 0 20 40 Up 40 #
# End-to-End_Error 0 0 0 - #
# Reported_Uncorrect 0 0 0 - #
# Airflow_Temperature_Cel 27 32 31 Up 4 #
# Current_Pending_Sector 0 0 0 - #
# Offline_Uncorrectable 0 0 0 - #
# UDMA_CRC_Error_Count 0 0 0 - #
# #
####################################################################################################
# #
####################################################################################################
--> ATTENTION: Please take a look into the SMART report above for drive health issues.
--> RESULT: Preclear Finished Successfully!.
After that, I spun down my array and added the new disk. After restarting the array I got the "Unmountable: Unsupported partition layout" error shown next to the drive, so I wanted to stop the array but after waiting for 20 minutes it still hadn't stopped, so I tried using the shutdown button in the GUI and the server shut down. After starting back up I don't remember if I tried formatting the drive again and then starting the array or just starting it. But when it started, I noticed that it had detected an unclean shutdown so it started a parity check with the "Unmountable: Unsupported partition layout" drive in the array and it started showing (and correcting sync errors).
Aug 10 14:35:37 Serverina kernel: md: recovery thread: check P Q ...
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=0
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=8
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=16
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=24
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=32
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=40
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=48
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=56
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=64
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=128
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=136
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=144
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=152
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=160
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=168
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=176
Aug 10 14:35:37 Serverina kernel: md: recovery thread: PQ corrected, sector=184
So I panicked and quickly stopped it. After that I used the format unmountable drives in array option which formatted the drive properly this time and it was successfully added into the array.
After seeing the previous errors I started another parity check which resulted in 28 errors being corrected:
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509272
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509280
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509288
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509296
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509304
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509312
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509320
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509328
Aug 10 17:10:05 Serverina kernel: md: recovery thread: PQ corrected, sector=1953509336
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018544
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018552
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018560
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018568
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018576
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018584
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018592
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018600
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018608
Aug 10 19:16:51 Serverina kernel: md: recovery thread: PQ corrected, sector=3907018616
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527816
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527824
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527832
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527840
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527848
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527856
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527864
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527872
Aug 10 21:46:07 Serverina kernel: md: recovery thread: PQ corrected, sector=5860527880
So after reading some other posts regarding parity I ran another parity check the next day, just to make sure there was no issues with the hardware and it found no errors.
My main question is, should I be worried about any data loss? Or have any of my files potentially been damaged?
As far as I can see, everything seems to be running fine now, but I would just like to make sure.
I have attached my two diagnostics files (the older one contains the parity checks that found errors and the second one is the one where I ran the second parity check the next day (there was a reboot in between))
Also slightly unrelated:
When looking at the parity errors I found the following regarding my cache drive in the logs
Aug 10 23:05:21 Serverina kernel: blk_update_request: critical target error, dev sdb, sector 976773120 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
Aug 10 23:05:21 Serverina kernel: BTRFS warning (device sdb1): failed to trim 1 device(s), last error -121
Should I be worried about this? I am currently running only a single cache drive, but I was planing on adding a second one of the same capacity which should be arriving next week.
serverina-diagnostics-20220811-0118.zip
serverina-diagnostics-20220812-1409.zip