Regret Bounds for Satisficing in Multi-Armed Bandit Problems
Publikationen: Beitrag in Fachzeitschrift › Artikel › Forschung › (peer-reviewed)
Autoren
Organisationseinheiten
Externe Organisationseinheiten
- Université Paris-Saclay
Abstract
This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.
Details
Originalsprache | Englisch |
---|---|
Seitenumfang | 19 |
Fachzeitschrift | Transactions on machine learning research |
Jahrgang | 2023 |
Ausgabenummer | August |
Status | Veröffentlicht - 7 Juni 2023 |