Adding (Rebuilding) Multiple Drives in a Heat Wave - Managing Risk Path?


Recommended Posts

I have a feeling I already know the answer to my question from years of experience and having searched recent posts, but:

 

I’m running a large server (20 disks, including dual parity) on a MB/CPU combo that rebuilds at “normal” speeds when I add a new drive (replacing an older, smaller one), but takes twice as long for parity checks (due to dual parity calculations, as I’ve been told in the past, with no simple upgrade path - it’s a “dumb” box just used for local A/V storage, content replaceable from another backed-up server, though in a time-consuming way for many reasons, so this hasn’t been something I’m concerned about 95% of the time).

 

I’ve historically always run a parity check after a rebuild to confirm the new disk was “rebuilt”without errors.

 

I’m planning on adding between 3 and 5 drives (as “rebuilds” of existing, smaller drives) to the system before the end of summer, and with temperatures heading in the direction they are, I would like to do this as soon as possible.
 

Most of the year, parity check temperatures are a good 10 degrees Celsius below the safe operating temperatures, but by late summer, almost all disks start flirting with 60C during parity check unless I’m constantly running the air conditioner in the room with the unraid box (which is already getting expensive).

 

I’m staring down the barrel of limited time before the usual local power outages from people running air conditioners during heat waves, and I’m curious if anyone (empirically) sees one path as materially more risky than the other:

 

 

—-continuing to parity check (for an additional few days each time) as I add each of these new 3 to 5 drives, with all disks running hotter than usual (in each intense parity check) for additional days or weeks, or

 

 

—-adding all the drives one after the other and then parity checking the lot of them once at the end before outdoor temperatures rise further and seasonal power outages could even start risking me getting caught unawares on UPS issues (just bought a new one a few months ago, but I’m always paranoid about “surprise” power issues and battery failures these days).

 

 

I know the first instinct for a lot of people will be to guide me towards finding a way to lower the drive temperatures even further using internal methods, but I’ve already researched that path, and replacing hotswap cage fans is going to take more time and effort than I will be able to expend this summer thanks to work obligations, so I’m just looking to take one of the two paths listed above (knowing neither is ideal) in hopes of “beating the heat clock” as safely as possible.

 

Greatly appreciate any guidance and advice from anyone who’s been in a similar situation or knows a good reason (which I haven’t found online yet) why adding multiple drives consecutively before a parity check at the end isn’t that much riskier for the “rebuild” expansion at all!

Edited by wheel
Clarification in paragraph 4
Link to comment
  • 4 weeks later...

I do, but temps were hitting the threshold on one or two drives quickly enough to make the auto-pause feature effectively useless for most of the day. Luckily, a buddy with an air-conditioned home came through in the clutch, and I was able to power through all my replacements/parity checks there. Next step longterm is definitely to work on a hardware-based cooling improvement solution!

Link to comment

I have similar issue, my server was a 3U 16 bay hot swap rack, and two group in 8 disks each (1-8) (9-16)

 

For temperature issue, due to some reason I can't lower the temperature too much by air conditioner, so I arrange disk in below order ( my rack was stand-up by turn 90 degree )

 

1  9 5 13

2 10 6 14

3 11 7 15

4 12 8 16

 

When I work on disk 1-8, then I will plug out 9-16, as result much better air flow for 1-8, if I work on 9-16, then swap out 1-8 on same way.

 

For UPS part, the new replace battery just serve me two months then one battery got failure (48v in 12v x4), the battery haven't indicate the problem until electricity really interrupt happen.

 

Anyway, I still working on modifying the UPS to use 24v and redesign whole UPS system,  I prefer buying two bigger battery instead four smaller battery, this use almost same room size in battery cage, and should increase battery reliability.

 

 

 

Edited by Vr2Io
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.