In this paper, we describe an approach to detecting and tracking certain feature points in the mouth region in a talking head sequence. These feature points are interconnected in a polygonal mesh so that the detection and tracking of these points is based on the information not only at these points but also in the surrounding elements. The detection of the nodes in an initial frame is accomplished by a feature detection algorithm. The tracking of these nodes in successive frames is obtained by deforming the mesh so that, when one mesh is warped to the other, the image patterns over corresponding elements in two meshes match with each other. This is accomplished by a modified Newton algorithm which iteratively minimized the error between the two images after mesh-based- warping. The numerical calculation involved in the optimization approach is simplified by using the concept of master elements and shape functions in the finite element method. This algorithm has been applied to a SIF resolution sequence, which contains fairly rapid mouth movement. Our simulation results show that this algorithm can locate and track the feature points in the mouth region quite accurately.