Preclear vs Formatting - and Weird CPU spikes

Hawkins12 · April 13, 2021

Hello and thanks in advance for your time. I was hoping someone would be able to clarify something for me as I am new to unraid but try to be educated.

I just started my unraid server and began by pre-clearing 4 drives (2 - 8tb and 2- 4tb) for two cycles. All are brand new so I like the fact of stress testing. My first 4 tb finished both cycles after about 24 hrs or so and so I stopped array (i had one other 500 gb drive that I threw in for kicks) and added the drive. As expected, it was labeled "Unmountable". I selected to "Format" and clicked the "Yes I want to do this" button. The "Formatting" process seems to be taking an unreasonably long time in addition to pulling a TON of CPU capacity.

In my example, my CPU reached it's critical temp with fans going full speed ahead. I am running an i9-10850K in a Supermicro X12SCZ-F board. What am I a missing here? What can be pulling so much CPU power when I have 32 GB of RAM installed. Happy to provide additional specs as necessary.

*UPDATE* So after some time, the drive was finally added to the array. My observation is as follows:

My current processes at time I added drive to the array was only the preclearing of 2-3 other drives (via the Preclear app). When I "formatted" the single 4 TB drive, CPU went to 100%. It then seemed to settle down for the rest of the formatting process. After some time, CPU usage spiked again to 100% reaching critical temp levels. After trying to manage fan speeds better and watching array, this seemed to be when the formatting was wrapping up and just before 4TB was successfully added to array. After formatting finished, temps came down but are still running somewhat high for the little processes that are occuring at the server level (presently 75 degrees C with fans running at 13000 RPM in a 1U chassis) with odd spikes in the threads with little server activity.

What could be causing CPU spikes. I never thought I'd see unraid take 100% of a 10850K when doing "simple" disk tasks -- i don't even have a parity in there currently

Edited April 13, 2021 by Hawkins12
updates

itimpi · April 13, 2021

Something strange going on - I would normally only expect formatting to take a few minutes and not put much load on the system.

whether running the pre-clears in parallel was causing some sort of contention I have no idea

John_M · April 13, 2021

Your diagnostics might show something, if you haven't already rebooted. Tools -> Diagnostics. Post the zip file.

Hawkins12 · April 13, 2021

On 4/13/2021 at 12:22 PM, John_M said:

Your diagnostics might show something, if you haven't already rebooted. Tools -> Diagnostics. Post the zip file.

I have not rebooted. Attached are the diagnostics. I was shocked and frantic when I saw the CPU going up from 40 degrees to 70---80--90. Fans going max. Note "Disk 6" is the one I added that created the "issue"

Edited April 15, 2021 by Hawkins12
update

Hawkins12 · April 13, 2021

2 hours ago, itimpi said:

Something strange going on - I would normally only expect formatting to take a few minutes and not put much load on the system.

whether running the pre-clears in parallel was causing some sort of contention I have no idea

So the 3 other preclears going on --

1) 4 TB via LSI2308 SAS Controller -- I have a storage array (Dell MD1200) attached and it's included in the storage array. No RAID

2)_ 8 TB Easystore - via USB 3.1 or 3.2 - havent shucked in case it's bad

3) 8 TB EasyStore - same as above.

The process I would guess took 30 min or so to add the file. As of now, the 2 - 8tb are in their final "Post Read" preclear cycle and the other 4TB is precleared and ready to go. I am extremely hesitant to stop the array and add the 4TB to the array since the CPU/Temp spikes before.

Note i posted my diagnostics below. Those diagnostics are about 12 hrs after the incident. FWIW, The 2 - 8tb drives are going to serve as my parity drives.

Thanks for any help you can provide!

Hawkins12 · April 13, 2021

One other point to note.

The disk I mounted/formatted was the one ending in "ZTT0W09W" I did preclear this under 2 cycles (no pre-read but did post-read). Once finished, I did not click the Red X that says "Stop Preclear". as it was already finished in my opinion. Not sure if this contributed to the problem or not.

John_M · April 13, 2021

This is your problem:

Apr 11 16:37:21 HawkinsUnraid preclear_disk_18RUS19KS[32513]: Zeroing: dd if=/dev/zero of=/dev/sdg bs=2097152 seek=2097152 count=500105764864 conv=notrunc iflag=count_bytes,nocache,fullblock oflag=seek_bytes
Apr 11 16:37:21 HawkinsUnraid preclear_disk_18RUS19KS[32513]: Zeroing: dd pid [25302]
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: attempting task abort!scmd(0x000000001e09371a), outstanding for 30027 ms & timeout 30000 ms
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: [sdg] tag#6445 CDB: opcode=0x28 28 00 3a 38 5f 80 00 00 08 00
Apr 11 16:37:51 HawkinsUnraid kernel: scsi target8:0:3: handle(0x000d), sas_address(0x500c04f25cb9a71a), phy(26)
Apr 11 16:37:51 HawkinsUnraid kernel: scsi target8:0:3: enclosure logical id(0x500c04f25cb9a700), slot(4) 
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: task abort: SUCCESS scmd(0x000000001e09371a)
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: [sdg] tag#6445 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00 cmd_age=30s
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: [sdg] tag#6445 CDB: opcode=0x28 28 00 3a 38 5f 80 00 00 08 00
Apr 11 16:37:51 HawkinsUnraid kernel: blk_update_request: I/O error, dev sdg, sector 976772992 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Apr 11 16:37:52 HawkinsUnraid kernel: sd 8:0:3:0: Power-on or device reset occurred

There should be enough information there to identify the problem disk.

Hawkins12 · April 13, 2021

15 minutes ago, John_M said:

This is your problem:



Apr 11 16:37:21 HawkinsUnraid preclear_disk_18RUS19KS[32513]: Zeroing: dd if=/dev/zero of=/dev/sdg bs=2097152 seek=2097152 count=500105764864 conv=notrunc iflag=count_bytes,nocache,fullblock oflag=seek_bytes
Apr 11 16:37:21 HawkinsUnraid preclear_disk_18RUS19KS[32513]: Zeroing: dd pid [25302]
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: attempting task abort!scmd(0x000000001e09371a), outstanding for 30027 ms & timeout 30000 ms
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: [sdg] tag#6445 CDB: opcode=0x28 28 00 3a 38 5f 80 00 00 08 00
Apr 11 16:37:51 HawkinsUnraid kernel: scsi target8:0:3: handle(0x000d), sas_address(0x500c04f25cb9a71a), phy(26)
Apr 11 16:37:51 HawkinsUnraid kernel: scsi target8:0:3: enclosure logical id(0x500c04f25cb9a700), slot(4) 
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: task abort: SUCCESS scmd(0x000000001e09371a)
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: [sdg] tag#6445 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00 cmd_age=30s
Apr 11 16:37:51 HawkinsUnraid kernel: sd 8:0:3:0: [sdg] tag#6445 CDB: opcode=0x28 28 00 3a 38 5f 80 00 00 08 00
Apr 11 16:37:51 HawkinsUnraid kernel: blk_update_request: I/O error, dev sdg, sector 976772992 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Apr 11 16:37:52 HawkinsUnraid kernel: sd 8:0:3:0: Power-on or device reset occurred

There should be enough information there to identify the problem disk.

John - I think this was a separate issue. This disk "18RUS19KS" is a 500GB disk that I threw int he array for the hell of it. What happened with this error - I was preclearing the disk and as I was preclearing it, I added it to the array like a noob. It caused issues, I unmounted it, and ran 2 preclear cycles on it (I believe it was 2). No errors came back in those preclears if I am not mistaken. I can easily take it off the array to troubleshoot -- i only threw it in because it was a dormant disk that I had from 3 years ago and never used. Also, this was 2 days ago and was not involved when I mounted/formatted the 4TB to "Disk 6" last night. I may be wrong but wouldn't expect this disk to cause weird issues with a separate disk, especially if I have no parity?

Edited April 13, 2021 by Hawkins12
updates

John_M · April 14, 2021

Ah, OK. Concentrating on the evening in question, the string "mkfs" (as in "format a disk") appears only once on 12 April and concerns Disk 6. The process seems to start at 22:51:58 and finish at 23:01:32. That's getting on for ten minutes, compared with the 30 seconds or so it took to format Disk 7 on the previous day. So what's the difference? Well, the formatting of Disk 6 (a.k.a. /dev/sdf) was actually pretty fast but it was preceded by a call to sgdisk to partition it, here:

Apr 12 22:51:58 HawkinsUnraid root: ***************************************************************
Apr 12 22:51:58 HawkinsUnraid root: Found invalid GPT and valid MBR; converting MBR to GPT format
Apr 12 22:51:58 HawkinsUnraid root: in memory. 
Apr 12 22:51:58 HawkinsUnraid root: ***************************************************************
Apr 12 22:51:58 HawkinsUnraid root: 
Apr 12 22:51:58 HawkinsUnraid root: GPT data structures destroyed! You may now partition the disk using fdisk or
Apr 12 22:51:58 HawkinsUnraid root: other utilities.
Apr 12 22:51:58 HawkinsUnraid emhttpd: shcmd (17431): sgdisk -o -a 8 -n 1:32K:0 /dev/sdf
Apr 12 22:51:58 HawkinsUnraid kernel: sdf: sdf1
Apr 12 22:51:58 HawkinsUnraid rc.diskinfo[7494]: SIGHUP received, forcing refresh of disks info.
Apr 12 22:53:58 HawkinsUnraid nginx: 2021/04/13 01:53:58 [error] 27337#27337: *380481 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.9, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.1.119", referrer: "http://192.168.1.119/Main"
Apr 12 22:56:37 HawkinsUnraid nginx: 2021/04/13 01:56:37 [error] 27337#27337: *380767 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.9, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.1.119", referrer: "http://192.168.1.119/Main"
Apr 12 22:58:53 HawkinsUnraid preclear_disk_56474835384B3047[18453]: Zeroing: progress - 40% zeroed @ 179 MB/s
Apr 12 23:01:19 HawkinsUnraid root: Creating new GPT entries in memory.
Apr 12 23:01:19 HawkinsUnraid root: The operation has completed successfully.
Apr 12 23:01:19 HawkinsUnraid emhttpd: shcmd (17432): udevadm settle
Apr 12 23:01:19 HawkinsUnraid kernel: sdf: sdf1
Apr 12 23:01:19 HawkinsUnraid rc.diskinfo[7494]: SIGHUP received, forcing refresh of disks info.
Apr 12 23:01:19 HawkinsUnraid emhttpd: shcmd (17433): /sbin/wipefs -a /dev/md6
Apr 12 23:01:19 HawkinsUnraid emhttpd: shcmd (17434): mkfs.xfs -m crc=1,finobt=1 -f /dev/md6

So the question is, does it take that long to set up the partition table or are the two nginx "upstream timed out" errors, where each time nothing seems to happen for two and-a-half minutes, to blame?

Hawkins12 · April 14, 2021

1 hour ago, John_M said:

Ah, OK. Concentrating on the evening in question, the string "mkfs" (as in "format a disk") appears only once on 12 April and concerns Disk 6. The process seems to start at 22:51:58 and finish at 23:01:32. That's getting on for ten minutes, compared with the 30 seconds or so it took to format Disk 7 on the previous day. So what's the difference? Well, the formatting of Disk 6 (a.k.a. /dev/sdf) was actually pretty fast but it was preceded by a call to sgdisk to partition it, here:

So the question is, does it take that long to set up the partition table or are the two nginx "upstream timed out" errors, where each time nothing seems to happen for two and-a-half minutes, to blame?

I'll add reading that is a foreign language to me My two 8 tb preclear's are almost done (about 2-3 hrs). Once finished, I'll have the additional 4TB drive and 2 - 8tb drives to add to the array. I think i'll add the 4 TB and see how things perform. Hopefully, this was just a "glitch" that's overcome somehow. The diagnostics seems somewhat muddied up by the preclear sequences as well as the other crap I've been doing since I just got Unraid and started messing around with it a couple days ago. Once the preclears are finished, I'll reboot the system and add the array and monitor my CPU performance (both temp and usage). I'll report back with the updated diagnostics once that all happens. We'll take this one drive at a time. This 4 TB will occupy DISK 5 and I think the diagnostics will be easier to follow (for me at least). Will report back...

And if I had to guess, I wonder if the time out errors came to play when my CPU throttled down due to overheating. Like I said, never would have expected adding a drive would max out 10 cores of a 10850K...

Hawkins12 · April 14, 2021

So I think I am going to chalk this up to a "fluke". Something I did originally didn't work but it's sense resolved itself. After all pre-clears were complete, I rebooted the server and mounted/assigned my 4 TB drive to DISK 5. Then proceeded to "Format". The process took seconds and my CPU didn't even hiccup. I'll test again when i do my parity drives in the next day or two but I think this is resolved. I'll wait to confirm until the parities are mounted with no issues. Also a Cache drive is en-route as well.

Preclear vs Formatting - and Weird CPU spikes

Recommended Posts

Hawkins12

Link to comment

itimpi

Link to comment

John_M

Link to comment

Hawkins12

Link to comment

Hawkins12

Link to comment

Hawkins12

Link to comment

John_M

Link to comment

Hawkins12

Link to comment

John_M

Link to comment

Hawkins12

Link to comment

Hawkins12

Link to comment

Join the conversation