Daniel R. Jiang

Publications and Papers Under Review

Improving Generative Ad Text on Facebook using Reinforcement Learning

Daniel R. Jiang*, Alex Nikulkov*, Yu-Chia Chen, Yang Bai, Zheqing Zhu

Working paper, 2025.

Brief Description: We introduce a new post-training framework for LLMs called reinforcement learning with performance feedback (RLPF). RLPF uses performance metrics (e.g., click-through rates of online ads) as a reward metric for fine-tuning LLMs so that they aligned with concrete, real-world outcomes. We apply RLPF to Meta's Text Generation product, which creates AI-generated ad variations for advertisers and report results of a large-scale A/B test.

Faster Reinforcement Learning by Freezing Slow States

Yijia Wang and Daniel R. Jiang

Minor revision at Management Science, 2025.

Brief Description: We consider fast-slow MDPs, where certain states move "fast" while other parts of the state space transition more "slowly." This is common when decisions need to be made at high frequencies over long horizons, yet information that varies at a slower timescale also influences the optimal policy. We propose a new hierarchical value iteration algorithm based periodically "freezing" and then "releasing" slow states, leading to computational benefits.

PDF arXiv Slides

Carbon Aware Transformers through Joint Model-Hardware Optimization

Irene Wang, Newsha Ardalani, Mostafa Elhoushi, Daniel R. Jiang, Samuel Hsia, Ekin Sumbul,
Divya Mahajan, Carole-Jean Wu, Bilge Acun

Working paper, 2025.

Brief Description: We introduce CATransformers, a carbon-aware architecture search framework that that enables sustainability-driven co-optimization of Transformer-based models and hardware architecture that considers both operational and embodied carbon. The framework is based on multi-objective Bayesian optimization. Applied to CLIP, it yields CarbonCLIP variants that sustain accuracy and latency while lowering total footprint by up to 17%.

PDF arXiv

Aligned Multi-Objective Optimization

Yonathan Efroni, Ben Kretzu, Daniel R. Jiang, Jalaj Bhandari, Zheqing Zhu, Karen Ullrich

ICML '25. International Conference on Machine Learning, 2025.

Brief Description: Most multi-objective optimization research focuses on conflicting objectives and Pareto tradeoffs. However, recent work in multi-task learning, reinforcement learning, and LLMs suggests that related tasks can enhance performance across all objectives simultaneously. We introduce a scalable gradient-based method (AMOOO) with proven advantages in handling numerous compatible objectives.

PDF arXiv

Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank

Wenhao Zhan, Scott Fujimoto, Zheqing Zhu, Jason D. Lee, Daniel R. Jiang, Yonathan Efroni

ICLR '25. International Conference on Learning Representations, 2025.

Brief Description: We study the problem of learning an approximate equilibrium in offline multi-agent reinforcement learning (MARL). We introduce a structural assumption, the interaction rank, and show that utilizing function classes with low interaction rank leads to decentralized, computationally and statistically efficient learning in offline MARL. Our experiments show the potential of critic architectures with low interaction rank when used in TD3-BC.

PDF arXiv

On the Linear Speedup of Personalized Federated RL with Shared Representations

Guojun Xiong, Shufan Wang, Daniel R. Jiang, Jian Li

ICLR '25. International Conference on Learning Representations, 2025.

Brief Description: Federated reinforcement learning (FedRL) enables multiple agents to collaboratively learn a policy without sharing their own local trajectories collected during agent-environment interactions. We develop a class of personalized FedRL algorithms that learns (1) a shared feature representation collaboratively among all agents and (2) an agent-specific weight vector personalized to its local environment.

PDF arXiv

Optimization-Driven Adaptive Experimentation

Ethan Che, Daniel R. Jiang, Hongseok Namkoong, Jimmy Wang

Submitted, 2024. Oral talks at ESIF Economics and AI+ML Meeting, CODE @ MIT.

Brief Description: We observe that real experiments are deliberately implemented with a few, large batches. By invoking a central limit approximation at each batch, we obtain a tractable Bayesian MDP that can flexibly incorporate a wide range of problem specifications, including batched and delayed feedback, personalization, non-stationarity, multiple objectives, and constraints. We call this the "mathematical programming" view of adaptive experimentation.

PDF arXiv Code

AExGym: Benchmarks and Environments for Adaptive Experimentation

Jimmy Wang, Ethan Che, Daniel R. Jiang, Hongseok Namkoong

Submitted, 2024.

Brief Description: We present a benchmark for adaptive experimentation based on real-world datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. We release an opensource library, AExGym, which is designed with modularity and extensibility in mind.

PDF arXiv Code

Pearl: A Production-Ready Reinforcement Learning Agent

Z. Zhu, R. de Salvo Braz, J. Bhandari, D. Jiang, Y. Wan, Y. Efroni,
R. Xu, L. Wang, H. Guo, A. Nikulkov, D. Korenkevych, U. Dogan, F. Cheng, Z. Wu, W. Xu

JMLR. Journal of Machine Learning Research, 2024.

Brief Description: We introduce Pearl, a new open source library for reinforcement learning that aims to enable users to easily build versatile RL agent for real-world applications. Pearl is designed with modularity in mind, allowing researchers and practitioners to mix & match components for policy learning, exploration, safety, and history summarization for building practical RL agents.

PDF arXiv Code

Weakly Coupled Deep Q-Networks

Ibrahim El-Shar and Daniel R. Jiang

NeurIPS '23. Advances in Neural Information Processing Systems, 2023.

Brief Description: We introduce weakly coupled deep Q-networks and weakly coupled Q-learning, reinforcement learning methods designed for weakly coupled MDPs. We employ multiple, simultaneous DQN or Q-learning agents such that each runs on a separate easier subproblem and when combined, they form an upper bound on the action value of the original problem. These dynamic bounds then guide the primary agent toward the optimal policy.

PDF arXiv

Dynamic Subgoal-Based Exploration via Bayesian Optimization

Yijia Wang, Matthias Poloczek, Daniel R. Jiang

TMLR. Transactions on Machine Learning Research, 2023.

Brief Description: We suppose that an agent faces an unknown RL task (drawn from a distribution of MDPs) in the future and is given prior opportunities to "practice" on related tasks where environment interactions are costly. We propose a one-step Bayes-optimal algorithm for selecting subgoal designs, along with the number of episodes and the episode length during training, to efficiently maximize the expected performance of the agent at test time.

PDF arXiv

On Noisy Evaluation in Federated Hyperparameter Tuning

K. Kuo, P. Thaker, M. Khodak, J. Nguyen, D. Jiang, A. Talwalkar, V. Smith

MLSys '23. Conference on Machine Learning and Systems, 2023.

Brief Description: We perform the first systematic study on the effect of noisy evaluation in federated hyperparameter tuning. We identify and rigorously explore key sources of noise, including client subsampling, data and systems heterogeneity, and data privacy. Surprisingly, our results indicate that even small amounts of noise can significantly impact tuning methods—reducing the performance of state-of-the-art approaches to that of naive baselines.

PDF arXiv

Dynamic Inventory Repositioning in On-Demand Rental Networks

Saif Benjafaar, Daniel R. Jiang, Xiang Li, and Xiaobo Li

Management Science 68(11), pp. 7793-8514, 2022.

Brief Description: We consider a product rental network with a fixed number of rental units distributed across multiple locations. We show convexity of the value function and that the optimal policy can be described in terms of a well-specified region over the state space. We leverage these results in an infinite-horizon, cutting-plane-based ADP algorithm and prove its asymptotic optimality, improving upon previous convergence results in the literature.

PDF Supplement SSRN Slides Code

Interpretable Personalized Experimentation

H. Wu, S. Tan, W. Li, M. Garrard, A. Obeng, D. Dimmery, S. Singh, H. Wang, D. Jiang, E. Bakshy

KDD '22. ACM International Conference on Knowledge Discovery and Data Mining, 2022.

Brief Description: We present a scalable, interpretable personalized experimentation system, implemented and deployed in production at Meta. The system works in a multiple treatment, multiple outcome setting typical at Meta to (1) learn explanations for black-box HTE models; (2) generate interpretable personalized policies.

PDF arXiv

Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs

Raul Astudillo, Daniel R. Jiang, Max Balandat, Eytan Bakshy, Peter Frazier

NeurIPS '21. Advances in Neural Information Processing Systems, 2021.

Brief Description: Most Bayesian optimization algorithms ignore how evaluation costs, which are often unknown, may change over the optimization domain. An unknown cost function with a budget constraint introduces a new dimension to the exploration-exploitation trade-off, where learning about the cost incurs the cost itself. We propose a new dynamic programming-based acquisition function for this problem setting.

PDF arXiv Slides Code

Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees

Shali Jiang*, Daniel R. Jiang*, Max Balandat*, Brian Karrer, Jacob R. Gardner, Roman Garnett

NeurIPS '20. Advances in Neural Information Processing Systems, 2020.

Brief Description: Bayesian optimization is a sequential decision making framework for optimizing expensive-to-evaluate black-box functions. Computing a full lookahead policy amounts to solving a stochastic dynamic program, which is highly intractable. Instead, we propose a multi-step scenario tree formulation and a one-shot optimization approach that operates by differentiating through the entire decision tree. (* equal contribution).

PDF Supplement arXiv Code

Lookahead-Bounded Q-Learning

Ibrahim El-Shar and Daniel R. Jiang

ICML '20. International Conference on Machine Learning, 2020.

Brief Description: We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to make better use of collected experience through the use of noisy "lookahead" upper and lower bounds that constrain the Q-iterates. The algorithm operates via a "feedback loop" by using approximate Q-values to estimate bounds and subsquently using those bounds to improve the Q-values (and repeat).

PDF Supplement arXiv Slides Code

BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization

M. Balandat, B. Karrer, D. R. Jiang, S. Daulton, B. Letham, A. G. Wilson, E. Bakshy

NeurIPS '20. Advances in Neural Information Processing Systems, 2020.

Brief Description: Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, molecular chemistry, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization, along with a new "one-shot" approach to optimizing the Knowledge Gradient acquisition function.

PDF Supplement arXiv Code

Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Daniel R. Jiang, Lina Al-Kanj, and Warren B. Powell

Operations Research, 68(6), pp. 1678-1697, 2020.

Brief Description: MCTS is a well-known strategy for solving sequential decision problems, particularly in the area of game-play AI. We propose a new technique called Primal-Dual MCTS that utilizes sampled information relaxation (Brown et. al., 2010) bounds on potential actions in order to make tree expansion decisions. The approach shows promise when used to optimize the behavior of a driver navigating a graph while operating on a ride-sharing platform.

PDF arXiv Slides

Feedback-Based Tree Search for Reinforcement Learning

Daniel R. Jiang, Emmanuel Ekwedike, and Han Liu

ICML '18. International Conference on Machine Learning, 2018.

Brief Description: We describe a technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon MDP. We show that a deep neural network implementation of the technique can create a competitive AI agent for a popular multi-player online battle arena (MOBA) game.

PDF Supplement arXiv Slides Selected for Long Talk (~8.6%)

Shape Constraints in Economics and Operations Research

Andrew L. Johnson and Daniel R. Jiang

Statistical Science, 33(4), pp. 527-546, 2018.

Brief Description: This paper reviews an illustrative set of research on shape constrained estimation in the economics and operations research literature. We highlight the methodological innovations and applications, with a particular emphasis on utility functions, production economics, and sequential decision making applications.

PDF Special Issue Editorial

Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

Daniel R. Jiang and Warren B. Powell

Mathematics of Operations Research, 43(2), pp. 554-579, 2018.

Brief Description: We propose a new Q-learning algorithm and a companion sampling procedure to solve risk-averse Markov decision processes under a class of dynamic quantile-based risk measures. Convergence results are proven and an application to energy storage is shown.

PDF Supplement arXiv Slides Editor's Pick, INFORMS ICYMI

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Daniel R. Jiang and Warren B. Powell

Operations Research, 63(6), pp. 1489-1511, 2015.

Brief Description: We describe a provably convergent algorithm to exploit the structural property of monotonicity that arises in many applications in operations research, finance, and economics. We show via simulations that near optimal solutions can be obtained using the proposed method when the exact approach is computationally intractable.

PDF Supplement arXiv Slides

Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming

Daniel R. Jiang and Warren B. Powell

INFORMS Journal on Computing, 27(3), pp. 525-543, 2015.

Brief Description: We formulate a mathematical model for bidding in the real-time market with the goal of performing energy arbitrage (i.e., exploiting variations in spot prices to profit) in the presence of storage. We train and test an approximate dynamic programming policy on real spot price data from the NYISO and show its value over heuristic policies used in industry.

PDF Supplement arXiv Slides

Daniel R. Jiang

Chinese: 姜若凌, Korean: 강 다니엘

danielrjiang@gmail.com CV (PDF)

Publications and Papers Under Review

Teaching and Lecture Notes

@ University of Pittsburgh

IE 3186

IE 1086/2086

IE 1086/2086

IE 2186

Personal Projects