Millimeter-wave (mmWave) bands have the potential to enable significantly high data rates in wireless systems. In order to overcome intense path loss and severe shadowing in these bands, it is essential to employ directional beams for data transmission. Furthermore, it is known that the mmWave channel incorporates a few number of spatial clusters necessitating additional time to align the corresponding beams with the channel prior to data transmission. This procedure is known as beam training (BT). While a longer BT leads to more directional beams (equivalently higher beamforming gains), there is less time for data communication. In this paper, this trade-off is investigated for a time slotted system under practical constraints such as finite beamwidth resolution and discrete modulation and coding schemes. At each BT time slot, the access point (AP) scans a region of uncertainty by transmitting a probing packet and refines angle of arrival (AoA) estimate based on user equipment (UE) feedback. Given a total number time slots, the objective is to find the optimum allocation between BT and data transmission and a feasible beamwidth for the estimation of AoA at each BT time slot such that the expected throughput is maximized. It is shown that the problem satisfies the optimal substructure property enabling the use of a backward dynamic programming approach to find the optimal solution with polynomial computational complexity. Simulation results reveal that in practical scenarios, the proposed approach outperforms existing techniques such as exhaustive and bisection search.