Seminar « Surfing the Ocean »

« Surfing the Ocean » is a seminar organized by Antonio Ocello and Andrea Bertazzi to fuel collaboration and share fresh ideas about ongoing projects within the consortium. The sessions takes place at PariSante Campus, room 07, and more information can be found on the Ocean Slack for participants. This page gathers abstracts from past and future presentations.

Future sessions

23/04/24

Data Acquisition via Experimental Design for Decentralized Data Markets

Baihe Huang

Acquiring high-quality training data is essential for current machine learning models. Data markets provide a way to increase the supply of data, particularly in data-scarce domains such as healthcare, by incentivizing potential data sellers to join the market. A major challenge for a data buyer in such a market is selecting the most valuable data points from a data seller. Unlike prior work in data valuation, which assumes centralized data access, we propose a federated approach to the data selection problem that is inspired by linear experimental design. Our proposed data selection method achieves lower prediction error without requiring labeled validation data and can be optimized in a fast and federated procedure. The key insight of our work is that a method that directly estimates the benefit of acquiring data for test set prediction is particularly compatible with a decentralized market setting.

Demonstration-Regularized RL

Daniil Tiapkin

Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL’s sample complexity. In particular, we study the demonstration-regularized reinforcement learning framework that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning. Our findings reveal that using N^E expert demonstrations enables the identification of an optimal policy at a sample complexity of order O(Poly(S,A,H)/(\eps^2 N^E) in finite and O(Poly(d,H)/(\eps^2 N^E) in linear Markov decision processes, where \eps is the target precision, H the horizon, A the number of action, S the number of states in the finite case and d the dimension of the feature space in the linear case. As a by-product, we provide tight convergence guarantees for the behavior cloning procedure under general assumptions on the policy classes. Additionally, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Interestingly, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, thus setting our approach apart from the prior works.

14/05/24

Insufficient Gibbs Sampling

Antoine Luciano

In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric framework and propose a method to sample from the posterior distribution of parameters conditioned on various robust and inefficient statistics: specifically, the pairs (median, MAD) or (median, IQR), or a collection of quantiles. Our approach leverages a Gibbs sampler and simulates latent augmented data, which facilitates simulation from the posterior distribution of parameters belonging to specific families of distributions. A by-product of these samples from the joint posterior distribution of parameters and data given the observed statistics is that we can estimate Bayes factors based on observed statistics via bridge sampling. We validate and outline the limitations of the proposed methods through toy examples and an application to real-world income data.

Operationalizing Counterfactual Metrics: Incentives, Ranking, and Information Asymmetry

Serena Wang

From the social sciences to machine learning, it has been well documented that metrics to be optimized are not always aligned with social welfare. In healthcare, Dranove et al. (2003) showed that publishing surgery mortality metrics actually harmed the welfare of sicker patients by increasing provider selection behavior. We analyze the incentive misalignments that arise from such average treated outcome metrics, and show that the incentives driving treatment decisions would align with maximizing total patient welfare if the metrics (i) accounted for counterfactual untreated outcomes and (ii) considered total welfare instead of averaging over treated patients. Operationalizing this, we show how counterfactual metrics can be modified to behave reasonably in patient-facing ranking systems. Extending to realistic settings when providers observe more about patients than the regulatory agencies do, we bound the decay in performance by the degree of information asymmetry between principal and agent. In doing so, our model connects principal-agent information asymmetry with unobserved heterogeneity in causal inference.

Past presentations

19/03/24

Delegating data collection through linear contracting

Nivasini Ananthakrishnan

We introduce a model of contract design for a principal delegating data collection for machine learning to an agent. We demonstrate the efficacy of linear contracts in dealing with challenges from two forms of uncertainty and information asymmetry between the principal and the agent. The first is the hidden action (moral hazard) challenge due to the principal’s uncertainty about the quality of the data collected. The second is the hidden state (hidden action) challenge due to the principal’s uncertainty about the accuracy level to target. We also demonstrate the efficacy of repeated linear contracting in a multi-round setting where the agent has uncertainty about the learning task and uses payments from each round as feedback to learn more about the task.

19/03/24

Unravelling in Collaborative Learning

Aymeric Capitaine

Collaborative learning environments offer a promising avenue for collective intelligence and knowledge sharing. However, the presence of data heterogeneity poses significant challenges to the stability of the learner coalition. In this study, we demonstrate that when data quality is private information, the coalition may undergo a phenomenon known as unraveling, wherein it shrinks up to the point that it becomes empty or made of the worst agent. The adverse effect of unravelling is sizeable and lead to a significant drop in welfare as compared to the full-information case. To address this issue, we propose a novel method inspired by accuracy shaping and probabilistic verification to enforce individual rationality and incentive compatibility. This enables the emergence of the first-best coalition with high probability in spite of information asymmetry, effectively breaking unravelling.

27/02/24

Incentivized Learning In Principal-agent Bandit games

Antoine Scheid

This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent. The principal and the agent have misaligned objectives and the choice of action is only left to the agent. However, the principal can influence the agent’s decisions by offering incentives which add up to his rewards. The principal aims to iteratively learn an incentive policy to maximize her own total utility. This framework extends usual bandit problems and is motivated by several practical applications, such as healthcare or ecological taxation, where traditionally used mechanism design theories often overlook the learning aspect of the problem. We present nearly optimal (with respect to a horizon T ) learning algorithms for the principal’s regret in both multi-armed and linear contextual settings. Finally, we support our theoretical guarantees through numerical experiments.

27/02/24

Private Hierarchical Bayesian Model for Federated Learning

Stanislas Du Che

In privacy concerned federated learning, agents share information about common parameters while protecting the privacy of their data. Within a Bayesian perspective, this setting translates into returning privacy-protected conditional posterior distributions to a central server and it also qualifies as a hierarchical Bayes model, which is a category of models for which the original Gibbs sampler was devised. Instead of the common randomization attached to differential privacy, we argue in this work for a substitution of observations at random by locally generated substitutes. Convergence and bias bounds are to be derived to ensure method coherence.