Spatial filtering is one of the most fundamental operations in image processing. Every filter discussed here operates by moving a small window (the kernel) across the image and computing a new pixel value from the neighborhood — a process called spatial convolution for linear filters, or neighborhood processing more generally (Gonzalez & Woods, §3.4).
Convolution & the Kernel Model
In spatial convolution, a kernel w of size m×n is centered on each pixel. The output is the weighted sum: g(x,y) = Σ Σ w(s,t) · f(x+s, y+t), where f is the input image. The kernel encodes what the filter does — a uniform kernel averages (box filter), a bell-curve kernel smooths (Gaussian), a derivative kernel detects edges (Sobel, Laplacian). At image boundaries, pixels outside the frame must be handled: common strategies include zero-padding (treat as black), reflection (mirror the edge), and replication (repeat the boundary pixel).
Linear vs. Non-Linear Filters
Linear filters satisfy superposition: filtering the sum of two images equals the sum of filtering each separately. The Gaussian filter uses bell-curve weights controlled by σ (standard deviation) — larger σ produces broader, stronger smoothing. The average (box) filter uses uniform weights — simple but creates frequency-domain ringing artifacts absent in the Gaussian. The median filter (covered in its own term) ranks neighborhood pixels and selects the middle value — a non-linear operation that is superior for removing salt-and-pepper noise while preserving edges, described by Solomon & Breckon as one of the "workhorse operators" of image processing. Anisotropic diffusion (Perona-Malik) is iterative and non-linear: the diffusion coefficient decreases near strong gradients, allowing smoothing within regions while preserving edges — conceptually, diffusion "stops at boundaries."
Separability & Computational Cost
A 2D kernel is separable if it can be expressed as the outer product of two 1D vectors. The Gaussian kernel is separable: a k×k 2D convolution decomposes into a k×1 vertical pass followed by a 1×k horizontal pass. This reduces the cost from O(k2N) to O(2kN) operations, where N is the number of pixels — a significant speedup for large kernels. The box filter is also separable, and constant-time algorithms exist that compute box-filter output independent of kernel size using integral images (summed-area tables).
Frequency Domain Interpretation
Every spatial filter has a frequency-domain equivalent. Smoothing filters are low-pass — they attenuate high-frequency detail (noise, edges) while preserving low-frequency structure. Sharpening filters (Sharp, Unsharp) are high-pass or high-boost — they amplify edges and fine detail. The Laplacian is a second-derivative operator that responds to all rapid intensity changes. The convolution theorem states that spatial convolution is equivalent to multiplication in the frequency domain: filtering in the spatial domain produces the same result as multiplying the Fourier transforms of the image and kernel, then transforming back (Gonzalez & Woods, §4.7). This equivalence is the theoretical foundation for understanding filter behavior.
Every filter works by sliding a small window (kernel) across the image and computing a new value from the neighborhood. Linear filters compute weighted averages — Gaussian uses bell-curve weights for smooth blurring, Average uses uniform weights for simple blurring, Laplace uses derivative weights for edge detection. Non-linear filters like Anisotropic diffusion adapt their behavior based on image content, smoothing flat regions while preserving edges. The Gaussian kernel's separability is key to efficiency: a single 2D pass can be split into two faster 1D passes.