An n-thread parallel program P is large-grained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that within a parallel step the computations on different threads may considerably vary in size. Let M be an n-processor asynchronous parallel system, with no restriction on the degree of asynchrony and without any specialized synchronization mechanisms. It is a challenging theoretical as well as practically important problem to ensure correct execution of P on such a parallel machine. Let P be a large-grained program requiring total work W for its execution on a synchronous n-processor parallel system. We present a transformation (compilation) of P into a program C(P) which correctly and efficiently effects the computation of P on the asynchronous machine M. Under moderate assumptions on the granularity of threads and the size of the program variables, execution of C(P) requires just O(W log* n) expected total work, and the memory space overhead is a small multiplicative constant. This result is the first of its kind. The solution involves a number of new concepts and methods. These include methods for storing program and control variables, employing a combination of error correction codes with phase-dependent hashing into memory. We feel that these methods for storing data will have significant practical applications to storage of data on Disk Arrays (RAIDS), as well as additional theoretical implications. The significance of the present work to parallel data-processing programs and large scale parallel numerical computations is obvious.