To minimize the amount of computation, traditional approaches to calculating the distance transform (DT) on a discrete volume propagate distance values in a local neighborhood. This results in recursive dependencies across the volume, requiring the DT to be calculated for all points in the domain en mass and stored as static values in memory. On the other hand, the ability to calculate the distance transform point-wise not only offers the prospect of efficient memory usage and scalability, but also a high degree of flexibility in accommodating the unique requirements of new application domains. However, among the current DT algorithms, the computationally intensive brute-force algorithm is the only one that allows point-wise computation. We demonstrate that the by decomposing it into a map and a reduction pattern on the massively parallel architecture of a modern Graphics Processing Unit (GPU), the brute-force distance transform algorithm achieves the threefold goals of memory efficiency, flexibility, and performance. We discuss a memory constrained implementation in the CUDA parallel programming model. The flexibility of point-wise computation at runtime is demonstrated by presenting an approximate and an anisotropic variant of the standard distance transform algorithm, and using these variants for the rendering of a CT scan image. Our approach allows the distance transform to be calculated for 1024 query points and up to 16 million feature points in 141.25 milliseconds while allowing direct control over the memory working-set size. These results demonstrate the potential of pointwise computation of the DT at runtime and the need for future algorithms to incorporate this capability.