Accurately modeling as-built environments and tracking moving objects' poses are critical for many architecture, engineering, construction, and facility management (AECFM) automation applications. Equally important are the reliability, operating range and cost efficiency of such solutions for their broad deployment in unstructured, dynamic, and sometimes featureless AECFM sites. In this paper, a flexible vision-based technique is developed for accurate, robust, low-cost, and scalable pose estimation and as-built modeling in AECFM applications. This technique combines marker-based pose estimation and structure-from-motion (SfM). In the preparation phase, a sparse set of visual markers are installed in the target environment. During the operation phase, a set of unordered images are taken with a calibrated RGB camera. These images are immediately processed by a SfM system to estimate those markers' poses and generate a sparse point cloud, which can be used by robots or other mobile clients for either moving objects' pose estimation, or dimensional analysis of that environment. Furthermore, for as-built modeling, the RGB camera is replaced by a RGBD camera to create both a dense 3D point cloud and a concise planar model of the environment. Experiments have demonstrated sufficient accuracy (average absolute error within 5 mm over a 9 m scale) of the proposed technique.