Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Publikationen: Beitrag in FachzeitschriftArtikelForschung(peer-reviewed)

Externe Organisationseinheiten

  • Université Paris-Saclay

Abstract

This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.

Details

OriginalspracheEnglisch
Seitenumfang19
FachzeitschriftTransactions on machine learning research
Jahrgang2023
AusgabenummerAugust
StatusVeröffentlicht - 7 Juni 2023