Various cosmological observations consist of prolonged integrations over small patches of sky. These include searches for B-modes in the CMB, the power spectrum of 21-cm fluctuations during the epoch of reionization and deep-field imaging by telescopes such as HST/JWST, among others. However, since these measurements are hindered by spatially-varying foreground noise, the observational sensitivity can be improved considerably by finding the region of sky cleanest of foregrounds. The best strategy thus involves a tradeoff between exploration (to find lower-foreground patches) and exploitation (through prolonged integration). But how to balance this tradeoff efficiently? This problem is akin to the multi-armed bandit (MAB) problem in probability theory, wherein a gambler faces a series of slot machines with unknown winning odds and must develop a strategy to maximize his/her winnings with some finite number of pulls. While the optimal MAB strategy remains to be determined, a number of machine-learning algorithms have been developed in an effort to maximize the winnings. By constructing adaptive survey strategies based on heuristic methods to solve the MAB problem, we will demonstrate that ground-based B-mode experiments can substantially improve the upper bound on the tensor-to-scalar ratio. Implementations of MAB strategies elsewhere are focus of current work and some pertaining issues will be discussed.