In this paper we describe a simple parallelization of the ZSWEEP algorithm for rendering unstructured volumetric grids on distributed-shared memory machines, and study its performance on three generations of SGI multiprocessors, including the new Origin 3000 series. The main idea of the ZSWEEP algorithm is very simple; it is based on sweeping the data with a plane parallel to the viewing plane, in order of increasing z, projecting the faces of cells that are incident to vertices as they are encountered by the sweep plane. Our parallel extension of the basic algorithm makes use of an image-based task partitioning scheme. Essentially, the screen is divided in more tiles than the number of processors, then each processor performs the sweep independently on the next available tile, until no more tiles are available to render. Here, we detail the modifications necessary to efficiently extend the sequential algorithm to work on shared-memory machines. We report on the performance of our implementation, and show that the tile-based ZSWEEP is naturally cache friendly, achieves fast rendering times, and substantial speedups on all the machines we used for testing. On one processor of our Origin 3000, we measure the L2 data cache hit rate of the tile-based ZSWEEP to be over 99%; a parallel efficiency of 83% on 16 processors; and rendering rates of about 300 thousand tetrahedra per second for a 1024 image.