Start with the high-low decomposition. For any low-pass operator B (the box average, here), define the high-pass residual H = I − B(I). Then trivially I = B(I) + H — the image equals its low-pass plus high-pass parts. The sharp filter outputs:
O = I + αH = B(I) + (1 + α) · H = B(I) + (1 + α)(I − B(I))
Distributing: O = (1 + α)I − α · B(I). This is a single convolutional filter with kernel K = (1 + α)δ − αB, where δ is the unit impulse (the identity kernel, 1 at center and 0 elsewhere) and B is the box-average kernel. The kernel always sums to (1 + α) − α · 1 = 1, so the filter is brightness-preserving for any α.
Overshoot math: consider a step edge where the input transitions from value a on one side to b on the other, with b > a. The box average smears the step into a ramp; at the bright edge of the ramp, B(I) is less than I, so H is positive. The output I + αH is higher than I — that's the bright ridge. Just outside the ramp, on the dark side, B(I) exceeds I (because the box reaches across the step), H is negative, and the output dips below I — the dark moat. Overshoot is the mathematical consequence of high-pass boost combined with edge sharpness.