Scientific Publications | Page 1

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

Abstract: DUltra-fine-grained visual categorization (Ultra-FGVC) aims at distinguishing highly similar sub-categories within fine-grained objects, such as different soybean cultivars. Compared to traditional fine-grained visual categorization, Ultra-FGVC encounters more hurdles due to the small inter-class and large intra-class variation. Given these challenges, relying on human annotation for Ultra-FGVC is impractical. To this end, our work introduces a novel task termed Ultra-Fine-Grained Novel Class Discovery (UFG-NCD), which leverages partially annotated data to identify new categories of unlabeled images for Ultra-FGVC. To tackle this problem, we devise a Region-Aligned Proxy Learning (RAPL) framework, which comprises a Channel-wise Region Alignment (CRA) module and a Semi-Supervised Proxy Learning (SemiPL) strategy. The CRA module is designed to extract and utilize discriminative features from local regions, facilitating knowledge
transfer from labeled to unlabeled classes. Furthermore, SemiPL strengthens representation learning and knowledge transfer with proxy-guided supervised learning and proxyguided contrastive learning. Such techniques leverage class
distribution information in the embedding space, improving the mining of subtle differences between labeled and unlabeled ultra-fine-grained classes. Extensive experiments demonstrate that RAPL significantly outperforms baselines
across various datasets, indicating its effectiveness in handling the challenges of UFG-NCD.

Code is available at https://github.com/SSDUT-Caiyq/UFG-NCD.

Type of Publication: conference paper

Title of Journal: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024

Authors: Liu, Yu; Cai, Yaqi; Jia, Qi; Qiu, Binglin; Wang, Weimin; Pu, Nan

Riemannian Multinomial Logistics Regression for SPD Neural Networks

Abstract: Deep neural networks for learning Symmetric Positive Definite (SPD) matrices are gaining increasing attention in machine learning. Despite the significant progress, most existing SPD networks use traditional Euclidean classifiers on an approximated space rather than intrinsic classifiers that accurately capture the geometry of SPD manifolds. Inspired by Hyperbolic Neural Networks (HNNs), we propose Riemannian Multinomial Logistics Regression (RMLR) for the classification layers in SPD networks. We introduce a unified framework for building Riemannian classifiers under the metrics pulled back from the Euclidean space, and showcase our framework under the parameterized Log-Euclidean Metric (LEM) and Log-Cholesky Metric (LCM).
Besides, our framework offers a novel intrinsic explanation for the most popular LogEig classifier in existing SPD networks. The effectiveness of our method is demonstrated in three applications: radar recognition, human action recognition, and electroencephalography (EEG) classification.

The code is available at https://github.com/GitZH-Chen/SPDMLR.git.

Type of Publication: conference paper

Title of Journal: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024

Authors: Chen, Ziheng; Song, Yue; Liu, Gaowen; Rao Kompella, Ramana; Wu, Xiao-Jun; Sebe, Nicu

OpenBias: Open-set Bias Detection in Generative Models

Abstract: Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set.
OpenBias has three stages. In the first phase, we leverage a Large Language Model (LLM) to propose biases given a set of captions. Secondly, the target generative model produces images using the same set of captions. Lastly, a Vision Question Answering model recognizes the presence and extent of the previously proposed biases. We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before. Via quantitative experiments, we demonstrate that OpenBias agrees with current closed-set bias detection methods and human judgement.

Type of Publication: conference paper

Title of Journal: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024

Authors: D’Inca, Moreno; Peruzzo, Elia; Mancini, Massimiliano; Xu, Dejia; Goel, Vidit; Xu, Xingqian; Wang, Zhangyang; Shi, Humphrey; Sebe, Nicu

SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective

Abstract: Owing to the power of vision-language foundation models, e.g., CLIP, the area of image synthesis has seen recent important advances. Particularly, for style transfer, CLIP enables transferring more general and abstract styles without collecting the style images in advance, as the style can be efficiently described with natural language, and the result is optimized by minimizing the CLIP similarity between the text description and the stylized image. However, directly using CLIP to guide style transfer leads to undesirable artifacts (mainly written words and unrelated visual entities) spread over the image. In this paper, we propose SpectralCLIP, which is based on a spectral representation of the CLIP embedding sequence, where most of the common artifacts occupy specific frequencies. By masking the band including these frequencies, we can condition the generation process to adhere to the target style properties (e.g., color, texture, paint stroke, etc.) while excluding the generation of larger-scale structures corresponding to the artifacts. Experimental results show that SpectralCLIP prevents the generation of artifacts effectively in quantitative and qualitative terms, without impairing the stylisation quality. We also apply SpectralCLIP to textconditioned image generation and show that it prevents written words in the generated images.

Our code is available at https://github.com/zipengxuc/SpectralCLIP.

Type of Publication: publication

Title of Journal: IEEE Winter Conference on Application of Computer Vision (WACV), 2024

Authors: Xu, Zipeng; Xing, Songlong; Sangineto, Enver; Sebe, Nicu

Multifidelity Gaussian Process Emulation for Atmospheric Radiative Transfer Models

Abstract: Atmospheric radiative transfer models (RTMs) are widely used in satellite data processing to correct for the scattering and absorption effects caused by aerosols and gas molecules in the Earth’s atmosphere. As the complexity of RTMs grows and the requirements for future Earth Observation missions become more demanding, the conventional lookup-table (LUT) interpolation approach faces important challenges. Emulators have been suggested as an alternative to LUT interpolation, but they are still too slow for operational satellite data processing. Our research introduces a solution that harnesses the power of multifidelity methods to improve the accuracy and runtime of Gaussian process (GP) emulators. We investigate the impact of the number of fidelity layers, dimensionality reduction, and training dataset size on the performance of multifidelity GP emulators. We find that an optimal multifidelity emulator can achieve relative errors in surface reflectance below 0.5% and performs atmospheric correction of hyperspectral PRISMA satellite data (one million pixels) in a few minutes. Additionally, we provide a suite of functions and tools for automating the creation and generation of atmospheric RTM emulators.

 

Type of Publication: publication

Title of Journal: IEEE Transactions on Geoscience and Remote Sensing, 61, 1-10, 2023.

Authors: Vicent Servera, Jorge; Martino, Luca; Verrelst, Jochem; Camps-Valls, Gustau

Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge

Abstract: This paper discusses four facets of the Knowledge Distillation (KD) process for Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures, particularly when executed on edge devices with constrained processing capabilities. First, we conduct a comparative analysis of the KD process between CNNs and ViT architectures, aiming to elucidate the feasibility and efficacy of employing different architectural configurations for the teacher and student, while assessing their performance and efficiency. Second, we explore the impact of varying the size of the student model on accuracy and inference speed, while maintaining a constant KD duration. Third, we examine the effects of employing higher resolution images on the accuracy, memory footprint and computational workload. Last, we examine the performance improvements obtained by fine-tuning the student model after KD to specific downstream tasks. Through empirical evaluations and analyses, this research provides AI practitioners with insights into optimal strategies for maximizing the effectiveness of the KD process on edge devices.

 

Type of Publication: Conference Proceeding

Title of Conference: Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge , Lyon, France, 27 August 2024 (Session Signal and Data Analytics for Machine Learning, Part 1)

Authors: John Violos; Symeon Papadopoulos; Ioannis Kompatsiaris

Federated Generalized Category Discovery

Abstract: Generalized category discovery (GCD) aims at grouping unlabeled samples from known and unknown classes, given labeled data of known classes. To meet the recent decentralization trend in the community, we introduce a practical yet challenging task, Federated GCD (Fed-GCD), where the training data are distributed among local clients and cannot be shared among clients. Fed-GCD aims to train a generic GCD model by client collaboration under the privacy-protected constraint. The Fed-GCD leads to two challenges: 1) representation degradation caused by training each client model with fewer data than centralized GCD learning, and 2) highly heterogeneous label spaces across different clients. To this end, we propose a novel Associated Gaussian Contrastive Learning (AGCL) framework based on learnable GMMs, which consists of a Client Semantics Association (CSA) and a global-local GMM Contrastive Learning (GCL). On the server, CSA aggregates the heterogeneous categories of local-client GMMs to generate a global GMM containing more comprehensive category knowledge. On each client, GCL builds class-level contrastive learning with both local and global GMMs. The local GCL learns robust representation with limited local data. The global GCL encourages the model to produce more discriminative representation with the comprehensive category relationships that may not exist in local data. We build a benchmark based on six visual datasets to facilitate the study of Fed-GCD. Extensive experiments show that our AGCL outperforms multiple baselines on all datasets. Code is available at https://github.com/TPCD/FedGCD.

Type of Publication: Conference Proceeding

Title of Conference: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Authors: Nan Pu; Wenjing Li; Xingyuan Ji; Yalan Qin; Nicu Sebe; Zhun Zhong

A Characterization Theorem for Equivariant Networks with Point-wise Activations

Abstract: Equivariant neural networks have shown improved performance, expressiveness and sample complexity on symmetrical domains. But for some specific symmetries, representations, and choice of coordinates, the most common point-wise activations, such as ReLU, are not equivariant, hence they cannot be employed in the design of equivariant neural networks. The theorem we present in this paper describes all possibile combinations of representations, choice of coordinates and point-wise activations to obtain an equivariant layer, generalizing and strengthening existing characterizations. Notable cases of practical relevance are discussed as corollaries. Indeed, we prove that rotation-equivariant networks can only be invariant, as it happens for any network which is equivariant with respect to connected compact groups. Then, we discuss implications of our findings when applied to important instances of equivariant networks. First, we completely characterize permutation equivariant networks such as Invariant Graph Networks with point-wise nonlinearities and their geometric counterparts, highlighting a plethora of models whose expressive power and performance are still unknown. Second, we show that feature spaces of disentangled steerable convolutional neural networks are trivial representations.

Type of Publication: Conference Proceeding

Title of Conference: International Conference on Learning Representations 2024 (ICLR 2024)

Authors: Marco Pacini; Bruno Lepri; Xiaowen Dong; Gabriele Santin

Scientific Publications | Page 2

Structural Group Unfairness: Measurement and Mitigation by means of the Effective Resistance

Abstract: Social networks contribute to the distribution of social capital, defined as the relationships, norms of trust and reciprocity within a community or society that facilitate cooperation and collective action. Social capital exists in the relations among individuals, such that better positioned members in a social network benefit from faster access to diverse information and higher influence on information dissemination. A variety of methods have been proposed in the literature to measure social capital at an individual level. However, there is a lack of methods to quantify social capital at a group level, which is particularly important when the groups are defined on the grounds of protected attributes. Furthermore, state-of-the-art approaches fail to model the role of long-range interactions between nodes in the network and their contributions to social capital. To fill this gap, we propose to measure the social capital of a group of nodes by means of their information flow and emphasize the importance of considering the whole network topology. Grounded in spectral graph theory, we introduce three effective resistance-based measures of group social capital, namely group isolation, group diameter and group control. We denote the social capital disparity among different groups in a network as structural group unfairness, and propose to mitigate it by means of a budgeted edge augmentation heuristic that systematically increases the social capital of the most disadvantaged group. In experiments on real networks, we uncover significant levels of structural group unfairness when using gender as the protected attribute, with females being the most disadvantaged group in comparison to males. We also illustrate how our proposed edge augmentation approach is able to not only effectively mitigate the structural group unfairness but also increase the social capital of all groups in the network.

Type of Publication: Conference Proceeding

Title of Conference: WWW 2024 workshop on Trustworthy Learning on Graphs (TrustLOG)

Authors: Adrian Arnaiz Rodriguez; Georgina Curto; Nuria Oliver

FairShap: A Data Re-weighting Approach for Algorithmic Fairness based on Shapley Values

Abstract: Algorithmic fairness is of utmost societal importance, yet the current trend in large-scale machine learning models requires training with massive datasets that are frequently biased. In this context, pre-processing methods that focus on modeling and correcting bias in the data emerge as valuable approaches. In this paper, we propose FairShap, a novel instance-level data re-weighting method for fair algorithmic decision-making through data valuation by means of Shapley Values. FairShap is model-agnostic and easily interpretable, as it measures the contribution of each training data point to a predefined fairness metric. We empirically validate FairShap on several state-of-the-art datasets of different nature, with a variety of training scenarios and models and show how it yields fairer models with similar levels of accuracy than the baselines. We illustrate FairShap’s interpretability by means of histograms and latent space visualizations. Moreover, we perform a utility-fairness study, and ablation and runtime experiments to illustrate the impact of the size of the reference dataset and FairShap’s computational cost depending on the size of the dataset and the number of features. We believe that FairShap represents a promising direction in interpretable and model-agnostic approaches to algorithmic fairness that yield competitive accuracy even when only biased datasets are available.

Type of Publication: Conference Proceeding

Title of Conference: International Conference on Learning Representations (ICLR 2024) workshop on Data-centric Machine Learning Research (DMLR)

Authors: Adrian Arnaiz Rodriguez; Nuria Oliver

A Lie Group Approach to Riemannian Batch Normalization

Abstract: Manifold-valued measurements exist in numerous applications within computer vision and machine learning. Recent studies have extended Deep Neural Networks (DNNs) to manifolds, and concomitantly, normalization techniques have also been adapted to several manifolds, referred to as Riemannian normalization. Nonetheless, most of the existing Riemannian normalization methods have been derived in an ad hoc manner and only apply to specific manifolds. This paper establishes a unified framework for Riemannian Batch Normalization (RBN) techniques on Lie groups. Our framework offers the theoretical guarantee of controlling both the Riemannian mean and variance. Empirically, we focus on Symmetric Positive Definite (SPD) manifolds, which possess three distinct types of Lie group structures. Using the deformation concept, we generalize the existing Lie groups on SPD manifolds into three families of parameterized Lie groups. Specific normalization layers induced by these Lie groups are then proposed for SPD neural networks. We demonstrate the effectiveness of our approach through three sets of experiments: radar recognition, human action recognition, and electroencephalography (EEG) classification. The code is available at this https URL.

Type of Publication: Conference Proceeding

Title of Conference: International Conference on Learning Representations (ICLR 2024)

Authors: Ziheng Chen; Yue Song; Yunmei Liu; Nicu Sebe

Putting Context in Context: the Impact of Discussion Structure on Text Classification

Abstract: Current text classification approaches usually focus on the content to be classified. Contextual aspects (both linguistic and extra-linguistic) are usually neglected, even in tasks based on online discussions. Still in many cases the multi-party and multi-turn nature of the context from which these elements are selected can be fruitfully exploited. In this work, we propose a series of experiments on a large dataset for stance detection in English, in which we evaluate the contribution of different types of contextual information, i.e. linguistic, structural and temporal, by feeding them as natural language input into a transformer-based model. We also experiment with different amounts of training data and analyse the topology of local discussion networks in a privacy-compliant way. Results show that structural information can be highly beneficial to text classification but only under certain circumstances (e.g. depending on the amount of training data and on discussion chain complexity). Indeed, we show that contextual information on smaller datasets from other classification tasks does not yield significant improvements. Our framework, based on local discussion networks, allows the integration of structural information, while minimising user profiling, thus preserving their privacy.

Type of Publication: Conference paper

Title of Conference: European Chapter of the Association for Computational Linguistics. EACL 2024.

Authors: Nicolò Penzo; Antonio Longa; Bruno Lepri; Sara Tonelli; Marco Guerini

Personalized Algorithmic Recourse with Preference Elicitation

Abstract: Algorithmic Recourse (AR) is the problem of computing a sequence of actions that – once performed by a user – overturns an undesirable machine decision. It is paramount that the sequence of actions does not require too much effort for users to implement. Yet, most approaches to AR assume that actions cost the same for all users, and thus may recommend unfairly expensive recourse plans to certain users. Prompted by this observation, we introduce PEAR, the first human-in-the-loop approach capable of providing personalized algorithmic recourse tailored to the needs of any end-user. PEAR builds on insights from Bayesian Preference Elicitation to iteratively refine an estimate of the costs of actions by asking choice set queries to the target user. The queries themselves are computed by maximizing the Expected Utility of Selection, a principled measure of information gain accounting for uncertainty on both the cost estimate and the user’s responses. PEAR integrates elicitation into a Reinforcement Learning agent coupled with Monte Carlo Tree Search to quickly identify promising recourse plans. Our empirical evaluation on real-world datasets highlights how PEAR produces high-quality personalized recourse in only a handful of iterations.

Type of Publication: Journal article

Title of Journal: Transactions on Machine Learning Research, ISSN: 2835-8856, 2024.

Authors: Giovanni De Toni; Paolo Viappiani; Stefano Teso; Bruno Lepri; Andrea Passerini

Sharp Spectral Rates for Koopman Operator Learning

Abstract: Nonlinear dynamical systems can be handily described by the associated Koopman operator, whose action evolves every observable of the system forward in time. Learning the Koopman operator and its spectral decomposition from data is enabled by a number of algorithms. In this work we present for the first time non-asymptotic learning bounds for the Koopman eigenvalues and eigenfunctions. We focus on time-reversal-invariant stochastic dynamical systems, including the important example of Langevin dynamics. We analyze two popular estimators: Extended Dynamic Mode Decomposition (EDMD) and Reduced Rank Regression (RRR). Our results critically hinge on novel minimax estimation bounds for the operator norm error, that may be of independent interest. Our spectral learning bounds are driven by the simultaneous control of the operator norm error and a novel metric distortion functional of the estimated eigenfunctions. The bounds indicates that both EDMD and RRR have similar variance, but EDMD suffers from a larger bias which might be detrimental to its learning rate. Our results shed new light on the emergence of spurious eigenvalues, an issue which is well known empirically. Numerical experiments illustrate the implications of the bounds in practice..

Type of Publication: Conference paper

Title of Journal: Neural Information Processing Systems (NeurIPS)

Authors: Vladimir Kostic; Karim Lounici; Pietro Novelli; Massimiliano Pontil

Robust covariance estimation with missing values and cell-wise contamination

Abstract: Large datasets are often affected by cell-wise outliers in the form of missing or erroneous data. However, discarding any samples containing outliers may result in a dataset that is too small to accurately estimate the covariance matrix. Moreover, the robust procedures designed to address this problem require the invertibility of the covariance operator and thus are not effective on high-dimensional data. In this paper, we propose an unbiased estimator for the covariance in the presence of missing values that does not require any imputation step and still achieves near minimax statistical accuracy with the operator norm. We also advocate for its use in combination with cell-wise outlier detection methods to tackle cell-wise contamination in a high-dimensional and low-rank setting, where state-of-the-art methods may suffer from numerical instability and long computation times. To complement our theoretical findings, we conducted an experimental study which demonstrates the superiority of our approach over the state of the art both in low and high dimension settings.

Type of Publication: Conference paper

Title of Journal: Neural Information Processing Systems (NeurIPS)

Authors: Karim Lounici; Gregoire Pacreau

Improving Fairness using Vision-Language Driven Image Augmentation

Abstract: Fairness is crucial when training a deep-learning discriminative model, especially in the facial domain. Models tend to correlate specific characteristics (such as age and skin color) with unrelated attributes (downstream tasks), resulting in biases which do not correspond to reality. It is common knowledge that these correlations are present in the data and are then transferred to the models during training. This paper proposes a method to mitigate these correlations to improve fairness. To do so, we learn interpretable and meaningful paths lying in the semantic space of a pre-trained diffusion model (DiffAE) — such paths being supervised by contrastive text dipoles. That is, we learn to edit protected characteristics (age and skin color). These paths are then applied to augment images to improve the fairness of a given dataset. We test the proposed method on CelebA-HQ and UTKFace on several downstream tasks with age and skin color as protected characteristics. As a proxy for fairness, we compute the difference in accuracy with respect to the protected characteristics. Quantitative results show how the augmented images help the model improve the overall accuracy, the aforementioned metric, and the disparity of equal opportunity. Code is available at: this URL.

Type of Publication: Conference paper

Title of Journal: IEEE Winter Conference on Application of Computer Vision

Authors: Moreno D’Incà; Christos Tzelepis; Ioannis Patras; Nicu Sebe

Revisiting Supervision for Continual Representation Learning

Abstract: In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, there is a growing interest in unsupervised continual learning, which makes use of the vast amounts of unlabeled data. Recent studies have highlighted the strengths of unsupervised methods, particularly self-supervised learning, in providing robust representations.
The improved transferability of those representations built with selfsupervised methods is often associated with the role played by the multilayer perceptron projector. In this work, we depart from this observation
and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning. This highlights the importance of the multi-layer perceptron projector in shaping feature transferability across a sequence of tasks in continual learning.

 

Type of Publication: conference paper

Type of Conference: The 18th European Conference on Computer Vision (ECCV), Milano, 2024

Authors: Marczak, Daniel; Cygert, Sebastian; Trzciński, Tomasz; Twardowski, Bartłomiej

Scientific Publication | Page 3

Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery

Abstract: Generalized Continual Category Discovery (GCCD) tackles learning from sequentially arriving, partially labeled datasets while uncovering new categories. Traditional methods depend on feature distillation to prevent forgetting the old knowledge. However, this strategy restricts the model’s ability to adapt and effectively distinguish new categories. To address this, we introduce a novel technique integrating a learnable
projector with feature distillation, thus enhancing model adaptability without sacrificing past knowledge. The resulting distribution shift of the previously learned categories is mitigated with the auxiliary category
adaptation network. We demonstrate that while each component offers modest benefits individually, their combination – dubbed CAMP (Category Adaptation Meets Projected distillation) – significantly improves the
balance between learning new information and retaining old. CAMP exhibits superior performance across several GCCD and Class Incremental Learning scenarios. The code is available on Github.

 

Type of Publication: conference paper

Type of Conference: The 18th European Conference on Computer Vision (ECCV) , Milano, 2024

Authors: Rypeść, Grzegorz; Marczak, Daniel; Cygert, Sebastian; Trzciński, Tomasz; Twardowski

AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale

Abstract: Active Visual Exploration (AVE) is a task that involves dynamically selecting observations (glimpses), which is critical to facilitate comprehension and navigation within an environment. While modern AVE methods have demonstrated impressive performance, they are constrained to fixed-scale glimpses from rigid grids. In contrast, existing mobile platforms equipped with optical zoom capabilities can capture glimpses of arbitrary positions and scales. To address this gap between software and hardware capabilities, we introduce AdaGlimpse. It uses Soft Actor-Critic, a reinforcement learning algorithm tailored for exploration tasks, to select glimpses of arbitrary position and scale. This approach enables our model to rapidly establish a general awareness of the environment before zooming in for detailed analysis. Experimental results demonstrate that AdaGlimpse surpasses previous methods across various visual tasks while maintaining greater applicability in realistic AVE scenarios.

 

Type of Publication: publication

Type of Conference: The 18th European Conference on Computer Vision (ECCV) , Milano, 2024

Authors: Annapureddy, Ravinithesh; Fornaroli, Alessandro; Gatica-Perez, Daniel

Generative AI Literacy: Twelve Defining Competencies

Abstract:

This paper introduces a competency-based model for generative artificial intelligence (AI) literacy covering essential skills and knowledge areas necessary to interact with generative AI. The competencies range from foundational AI literacy to prompt engineering and programming skills, including ethical and legal considerations. These twelve competencies offer a framework for individuals, policymakers, government officials, and educators looking to navigate and take advantage of the potential of generative AI responsibly. Embedding these competencies into educational programs and professional training initiatives can equip individuals to become responsible and informed users and creators of generative AI. The competencies follow a logical progression and serve as a roadmap for individuals seeking to get familiar with generative AI and for researchers and policymakers to develop assessments, educational programs, guidelines, and regulations.

 

Type of Publication: publication

Authors: Annapureddy, Ravinithesh; Fornaroli, Alessandro; Gatica-Perez, Daniel

Trading Volume Maximization with Online Learning

Abstract: We explore brokerage between traders in an online learning framework. At any round t, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price.

Previous work provided guidance to a broker aiming at enhancing traders’ total earnings by maximizing the gain from trade, defined as the sum of the traders’ net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades.

We model the traders’ valuations as an i.i.d. process with an unknown distribution. If the traders’ valuations are revealed after each interaction (full-feedback), and the traders’ valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors.

If only their willingness to sell or buy at the proposed price is revealed after each interaction (2-bitfeedback), we provide an algorithm achieving polylogarithmic regret when the traders’ valuations cdf is Lipschitz and show that this rate is near-optimal.

We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders’ valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to Θ(√T) in the full-feedback case, where T is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the 2-bit feedback case.

 

Type of Publication: publication

Authors: Tommaso Cesari; Roberto Colomboni

A Contextual Online Learning Theory of Brokerage

Abstract: We study the role of contextual information in the online learning problem of brokerage between traders. At each round, two traders arrive with secret valuations about an asset they wish to trade. The broker suggests a trading price based on contextual data about the asset. Then, the traders decide to buy or sell depending on whether their valuations are higher or lower than the brokerage price. We assume the market value of traded assets is an unknown line ar function of a d-dimensional vector representing the contextual information available to the broker. Additionally, we model traders’ valuations as independent bounded zero-mean perturbations of the asset’s market value, allowing for potentially different unknown distributions across traders and time steps. Consistently with the existing online learning literature, we evaluate the performance of a learning algorithm with the regret with respect to the gain from trade. If the noise distributions admit densities bounded by someconstant L, then, for any time horizon T:

  • If the agents’ valuations are revealed after each interact ion, we provide an algorithm achieving O(LdlnT) regret, and show a corresponding matching lower bound of Ω(LdlnT).
  • If only their willingness to sell or buy at the proposed price is revealed after each interaction, we provide an algorithm achieving O(√LdTlnT) regret, and show that this rate is optimal (up to logarithmic factors), via a lower bound of Ω(√LdT).
To complete the picture, we show that if the bounded density a ssumption is lifted, then the problem becomes unlearnable, even with full feedback.
 

Type of Publication: publication

Authors: François Bachoc; Tommaso Cesari; Roberto Colomboni

Fair Online Bilateral Trade

Abstract: In online bilateral trade, a platform posts prices to incoming pairs of buyers and sellers that have private valuations for a certain good. If the price is lower than the buyers’ valuation and higher than the sellers’ valuation, then a trade takes place. Previous work focused on the platform perspective, with the goal of setting prices maximizing the gain from trade (the sum of sellers’ and buyers’ utilities). Gain from trade is, however, potentially unfair to traders, as they may receive highly uneven shares of the total utility. In this work we enforce fairness by rewarding the platform with the fair gain from trade, defined as the minimum between sellers’ and buyers’ utilities. After showing that any no-regret learning algorithm designed to maximize the sum of the utilities may fail badly with fair gain from trade, we present our main contribution: a complete characterization of the regret regimes for fair gain from trade when, after each interaction, the platform only learns whether each trader accepted the current price. Specifically, we prove the following regret bounds: Θ(ln T ) in the deterministic setting, Ω(T)in the stochastic setting, and ̃Θ(T2~3)in the stochastic setting when sellers’ and buyers’ valuations are independent of each other. We conclude by providing tight regret bounds when, after each interaction, the platform is allowed to observe the true traders’ valuations.

 

Type of Publication: publication

Authors: François Bachoc; Nicolò Cesa-Bianchi; Tommaso Cesari; Roberto Colomboni

A deep cut into Split Federated Self-supervised Learning

Abstract: Collaborative self-supervised learning has recently become feasible in highly distributed environments by dividing the network layers between client devices and a central server. However, state-of-the-art methods, such as MocoSFL, are optimized for network division at the initial layers, which decreases the protection of the client data and increases communication overhead. In this paper, we demonstrate that splitting depth is crucial for maintaining privacy and communication efficiency in distributed training. We also show that MocoSFL suffers from a catastrophic quality deterioration for the minimal communication overhead. As a remedy, we introduce Momentum-Aligned contrastive Split Federated Learning (MonAcoSFL), which aligns online and momentum client models during training procedure. Consequently, we achieve state-of-the-art accuracy while significantly reducing the communication overhead, making MonAcoSFL more practical in real-world scenarios.

 

Type of Publication: publication

Title of Journal: International Conference on Learning Representations (ICLR), Vienna Austria, 7-11.05.2024

Authors: Marcin Przewięźlikowski; Marcin Osial; Bartosz Zieliński; Marek Śmieja

Divide and not forget: Ensemble of selectively trained experts in Continual Learning

Abstract: Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning.

Type of Publication: publication

Title of Journal: International Conference on Learning Representations (ICLR), Vienna Austria, 7-11.05.2024

Authors: Grzegorz Rypesc; Sebastian Cygert; Valeriya Khan; Tomasz Trzcínski; Bartosz Zielínski; Bartłomiej Twardowski

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Abstract:Continual learning methods are known to suffer from catastrophic forgetting, a phenomenon that is particularly hard to counter for methods that do not store exemplars of previous tasks. Therefore, to reduce potential drift in the feature extractor, existing exemplar-free methods are typically evaluated in settings where the first task is significantly larger than subsequent tasks. Their performance drops drastically in more challenging settings starting with a smaller first task. To address this problem of feature drift estimation for exemplar-free methods, we propose to adversarially perturb the current samples such that their embeddings are close to the old class prototypes in the old model embedding space. We then estimate the drift in the embedding space from the old to the new model using the perturbed images and compensate the prototypes accordingly. We exploit the fact that adversarial samples are transferable from the old to the new feature space in a continual learning setting. The generation of these images is simple and computationally cheap. We demonstrate in our experiments that the proposed approach better tracks the movement of prototypes in embedding space and outperforms existing methods on several standard continual learning benchmarks as well as on fine-grained datasets.

Type of Publication: publication

Title of Journal:The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Seattle, USA, 17-21.06.2024

Authors: Otavian Pascu; Adriana Stan; Dan Oneata; Elisabeta Oneata; Horia Cucu

Towards generalisable and calibrated audio deepfake detection with self-supervised representations

Abstract:Generalisation—the ability of a model to perform well on unseen data—is crucial for building reliable deepfake detectors. However, recent studies have shown that the current audio deepfake models fall short of this desideratum. In this work we investigate the potential of pretrained self-supervised representations in building general and calibrated audio deepfake detection models. We show that large frozen representations coupled with a simple logistic regression classifier are extremely effective in achieving strong generalisation capabilities: compared to the RawNet2 model, this approach reduces the equal error rate from 30.9% to 8.8% on a benchmark of eight deepfake datasets, while learning less than 2k parameters. Moreover, the proposed method produces considerably more reliable predictions compared to previous approaches making it more suitable for realistic use.

Type of Publication: Conference paper

Title of Journal: Interspeech 2024

Authors: Octavian Pascu; Adriana Stan; Dan Oneata; Elisabeta Oneata; Horia Cucu