Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

Abstract:

We study best-of-both-worlds algorithms for K-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate (dK)2polyln(dKT ) / ∆min where ∆min is the minimum suboptimality gap over the d-dimensional context space. In the adversarial regime, we obtain either the first-order O(dK√L∗) bound, or the second- order O(dK√Λ∗) bound, where L∗ is the cumulative loss of the best action and Λ∗ is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regeret in the stochastic regime while obtaining O(dK √T) regret bounds in the adversarial regime.

Type of Publication:Conference paper

Title of Journal: Aritifical Intelligence and Statistics Conference (AISTATS), October 2024

Authors: Kuroki, Yuko; Rumi, Alberto; Tsuchiya, Taira; Vitale, Fabio; Cesa-Bianchi, Nicolò

On the Minimax Regret for Online Learning with Feedback Graphs

Abstract: In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is O√αT ln K, where K is the number of actions, α is the independence number of the graph, and T is the time horizon. The √ln K factor is known to be necessary when α = 1 (the experts case). On the other hand, when α = K (the bandits case), the minimax rate is known to be Θ√K T , and a lower bound Ω√αT is known to hold for any α. Our improved upper bound OpαT (1 + ln(K/α)) holds for any α and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with q-Tsallis entropy for a carefully chosen value of q ∈ [1/2, 1) that varies with α. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time- varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved ΩpαT (ln K )/(ln α) lower bound for all α > 1, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as α < K.

 Type of Publication:Conference paper

Title of Conference: Neural Information Processing Systems (NEURIPS), 2023

Authors: Eldowa, Khaled; Esposito, Emmanuel; Cesari, Tommaso; Cesa-Bianchi, Nicolò

Sum-max submodular bandits

Abstract: Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-K-bandits, combinatorial bandits, and the bandit versions on M-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove (1-1/e)-regret bounds for bandit feedback in the nonstochastic setting of the order of sqrt(MKT) (ignoring log factors), where T is the time horizon and M is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the O(T^2/3) regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.

Type of Publication: Conference Proceeding

Title of Conference: Neural Information Processing Systems (NEURIPS), December 2024

Authors: Pasteris, Stephen; Rumi, Alberto; Vitale, Fabio; Cesa-Bianchi, Nicolò

Sparsity-agnostic linear bandits with adaptive adversaries.

Abstract: We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number S of non-zero coefficients in the linear reward function. Previous works focused on the case where S is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when S is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When S is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.

 

Type of Publication: Conference paper

Title of Conference:Neural Information Processing Systems (NEURIPS), 2024

Authors: Tianyuan, Jin; Jang, Kyoungseok; Cesa-Bianchi, Nicolò

The Role of Transparency in Repeated First-Price Auctions with Unknown Valuations

Abstract: We study the problem of regret minimization for a single bidderin a sequence of rst-price auctions where the bidder discoversthe item’s value only if the auction is won. Our main contributionis a complete characterization, up to logarithmic factors, of theminimax regret in terms of the auction’stransparency, which con-trols the amount of information on competing bids disclosed bythe auctioneer at the end of each auction. Our results hold underdierent assumptions (stochastic, adversarial, and their smoothedvariants) on the environment generating the bidder’s valuationsand competing bids. These minimax rates reveal how the interplaybetween transparency and the nature of the environment aectshow fast one can learn to bid optimally in rst-price auctions​.

Type of Publication: Conference Proceeding

Title of Conference: STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing, 225 – 236, 2024.

Authors: Cesa-Bianchi, Nicolò; Cesari, Tommaso; Colomboni, Roberto; Fusco, Federico; Leonardi, Stefano

Linear Bandits with Memory

Abstract: Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms. However, the richer action space provided by linear bandits is often preferred in practice. In this work, we introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner’s past actions in a fixed-size window. Our model, which recovers stationary linear bandits as a special case, leverages two parameters: the window size m ≥ 0, and an exponent γ that captures the rotting (γ < 0) or rising (γ > 0) nature of the phenomenon. When both m and γ are known, we propose and analyze a variant of OFUL which minimizes regret against cyclic policies. By choosing the cycle length so as to trade-off approximation and estimation errors, we then prove a bound of order √d (m + 1) 12 +max{γ,0} T 3/4 (ignoring log factors) on the regret against the optimal sequence of actions, where T is the horizon and d is the dimension of the linear action space. Through a bandit model selection approach, our results are then extended to the case where both m and γ are unknown. Finally, we complement our theoretical results with experiments comparing our approach to natural baselines.

Type of Publication: Conference Paper

Title of Conference: Transactions on Machine Learning Research, ISSN: 2835-8856, 2024.

Authors: Clerici, Giulia; Laforgue, Pierre; Cesa-Bianchi, Nicolò Antonio

FactCheckBureau: Build Your Own Fact-Check Analysis Pipeline

Abstract: Fact-checkers are overwhelmed by the volume of claims they need to pay attention to fight misinformation. Even once debunked, a claim may still be spread by people unaware that it is false, or it may be recycled as a source of inspiration by malicious users. Hence, the importance of fact-check (FC) retrieval as a research problem: given a claim and a database of previous checks, find the checks relevant to the claim. Existing solutions addressing this problem rely on the strategy of retrieve and re-rank relevant documents. We have built FactCheckBureau, an end-to-end solution that enables researchers to easily and interactively design and evaluate FC retrieval pipelines. We also present a corpus 1 we have built, which can be used in further research to test fact-check retrieval tools.

Type of Publication: Conference Paper

Title of Conference: 33rd ACM International Conference on Information and Knowledge Management (CIKM) , Boise, Idaho, US, 21-25 October

Authors: Balalau, Oana; Bertaud-Velten, Pablo; El Fraihi, Younes; Gaur, Garima; Goga, Oana; Guimaraes, Samuel; Manolescu, Ioana; Saadi, Brahim

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Abstract: The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that pretraining with multi-modal pretext tasks notably improves the linear probing performance compared to pretraining on optical satellite images only. This also leads to better label efficiency and parameter efficiency which are crucial aspects in global scale applications.

Type of Publication: Conference Paper

Title of Conference:European Conference on Computer Vision (ECCV) , Milan, Italy, September 29–October 4, 2024

Authors: Ziheng Chen; Yue Song; Yunmei Liu; Nicu Sebe