Abstract
Understanding and creating mathematics using natural mathematical language – the mixture of symbolic and natural language used by humans – is a challenging and important problem for driving progress in machine learning. As a step in this direction, we develop NATURALPROOFS, a multi-domain corpus of mathematical statements and their proofs, written in natural mathematical language. NATURALPROOFS unifies broad coverage, deep coverage, and low-resource mathematical sources, allowing for evaluating both in-distribution and zero-shot generalization. Using NATURALPROOFS, we benchmark strong neural methods on mathematical reference retrieval and generation tasks which test a system’s ability to determine key results that appear in a proof. Large-scale sequence models show promise compared to classical information retrieval methods, yet their performance and out-of-domain generalization leave substantial room for improvement. NATURALPROOFS opens many avenues for research on challenging mathematical tasks.
Original language | English (US) |
---|---|
Journal | Advances in Neural Information Processing Systems |
State | Published - 2021 |
Event | 35th Conference on Neural Information Processing Systems - Track on Datasets and Benchmarks, NeurIPS Datasets and Benchmarks 2021 - Virtual, Online Duration: Dec 6 2021 → Dec 14 2021 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing