A general strategy for automatically decomposing and dynamically distributing a functional program for parallel execution on multiprocessor architectures is discussed. The strategy borrows ideas from dataflow and reduction machine research on one hand, and conventional compiler technology for sequential machines on the other. One of the troublesome issues in such a system is choosing the right granularity for the parallel tasks. A program-transformation technique is described that is based on serial combinators. It offers in some sense just the right granularity for this style of computing and can be fine-tuned for particular multiprocessor architectures. Simulation shows the validity of the approach.