Typical recommender systems use the root mean squared error (RMSE) between the predicted and actual ratings as the evaluation metric. We argue that RMSE is not an optimal choice for this task, especially when we will only recommend a few (top) items to any user. Instead, we propose using a ranking metric, namely normalized discounted cumulative gain (NDCG), as a better evaluation metric for this task. Borrowing ideas from the learning to rank community for web search, we propose novel models which approximately optimize NDCG for the recommendation task. Our models are essentially variations on matrix factorization models where we also additionally learn the features associated with the users and the items for the ranking task. Experimental results on a number of standard collaborative filtering data sets validate our claims. The results also show the accuracy and efficiency of our models and the benefits of learning features for ranking.