Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Thomas Michel; Hossein Hajiabolhassan; Ronald Ortner

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Publikationen: Beitrag in Fachzeitschrift › Artikel › Forschung › (peer-reviewed)

Autoren

Organisationseinheiten

Lehrstuhl für Informationstechnologie (150)

Externe Organisationseinheiten

Université Paris-Saclay

Abstract

This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.

Details

Originalsprache	Englisch
Seitenumfang	19
Fachzeitschrift	Transactions on machine learning research
Jahrgang	2023
Ausgabenummer	August
Status	Veröffentlicht - 7 Juni 2023

Forschungsportal

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Autoren

Organisationseinheiten

Externe Organisationseinheiten

Abstract

Details

Links

360 Link

Forschungsportal

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Autoren

Organisationseinheiten

Externe Organisationseinheiten

Abstract

Details

Links

360 Link

Vom selben Autor

Online Regret Bounds for Satisficing in MDPs

Online Regret Bounds for Satisficing in MDPs

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Adaptive Algorithms for Meta-Induction

FunQG: Molecular Representation Learning Via Quotient Graphs