Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Will (Wei) Ma

Will (Wei) Ma

· Roderick H. Cushman Associate Professor of Business

Columbia University · Decision Sciences and Operations

Active 2009–2026

h-index15
Citations950
Papers12178 last 5y
Funding
See your match with Will (Wei) Ma — sign in to PhdFit.Sign in

Research topics

  • Computer Science
  • Mathematical optimization
  • Mathematics
  • Artificial Intelligence
  • Algorithm
  • Statistics
  • Econometrics
  • Economics
  • Discrete mathematics
  • Microeconomics
  • Combinatorics
  • Operations research

Selected publications

  • From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms

    Management Science · 2026-05-14 · 4 citations

    preprintOpen access

    In this work, we study how the relevance/quality and quantity of past data influence performance by analyzing a contextual Newsvendor problem, in which a decision maker trades off between underage and overage costs under uncertain demand. We consider a setting in which past demands observed under “close-by” contexts come from close-by distributions and analyze the performance of data-driven algorithms through a notion of context-dependent worst-case expected regret. We analyze the broad class of Weighted Empirical Risk Minimization (WERM) policies which weigh past data according to their similarity in the contextual space. This class includes classical policies such as ERM, k-Nearest Neighbors, and kernel-based policies. Our main methodological contribution is to characterize exactly the worst-case regret of any WERM policy on any given configuration of contexts. To the best of our knowledge, this provides the first understanding of tight performance guarantees in any contextual decision-making problem, with past literature focusing on upper bounds via concentration inequalities. We instead take an optimization approach, and isolate a structure in the Newsvendor loss function that allows us to reduce the infinite-dimensional optimization problem over worst-case distributions to a simple line search. This in turn allows us to unveil fundamental insights that were obfuscated by previous general-purpose bounds. We characterize actual guaranteed performance as a function of the contexts, as well as granular insights on the learning curve of algorithms. This paper was accepted by Victor Martínez de Albéniz, operations management. Funding: This work was supported by the Deming Center for Operations Innovation and Excellence at Columbia Business School [Doctoral Fellowship (O. Mouchtaki)]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.02068 .

  • Leveraging AI to Evaluate Minimal Residual Disease Endpoint Surrogacy in Multiple Myeloma

    Cancer Research Communications · 2026-05-01

    articleOpen access

    Minimal residual disease (MRD) has been endorsed by FDA Oncology Drugs Advisory Committee as an endpoint for accelerated approval in Multiple Myeloma (MM) based on individual patient data collected from randomized trials. However, emerging data from recent trials were not included. A novel AI-assisted framework is proposed, which automates information identification and extraction, providing up-to-date analyses that confirm moderate trial-level and strong individual-patient-level association between MRD-CR and survival endpoints in MM. Specifically, this study utilized an AI-assisted framework that identifies relevant studies and filters critical information to analyze published data via two independent objectives. Firstly, we examined the trial-level association by the coefficients of determination (R^2) and its 95% confidence intervals (CIs) based on published statistics of treatment effects on MRD and various endpoints. Next, we generated synthetic IPD with covariates through AI-curated tools to estimate the individual-level association. The AI tool searched for eligible randomized clinical trials. A total of 20 two-arm comparisons from 19 RCTs were analyzed. Trial-level analysis showed an R² of 0.71 (95% CI 0.52 - 0.89) pooling disease subpopulations. Furthermore, AI techniques were applied to create synthetic individual data, combining information extracted from Kaplan-Meier curves and subgroup analyses from published literatures. Using generated synthetic data, we estimated the individual-level correlation between MRD-CR rates and PFS outcomes with a bivariate copula model and calculated a Global OR of 7.28 (95% CI 5.60-8.95).

  • Summarizing clinical evidence utilizing large language models for cancer treatments: a blinded comparative analysis

    Frontiers in Digital Health · 2025-04-29 · 2 citations

    articleOpen access

    Background: Concise synopses of clinical evidence support treatment decision-making but are time-consuming to curate. Large language models (LLMs) offer potential but they may provide inaccurate information. We objectively assessed the abilities of four commercially available LLMs to generate synopses for six treatment regimens in multiple myeloma and amyloid light chain (AL) amyloidosis. Methods: We compared the performance of four LLMs: Claude 3.5, ChatGPT 4.0; Gemini 1.0 and Llama-3.1. Each LLM was prompted to write synopses for six regimens. Two hematologists independently assessed accuracy, completeness, relevance, clarity, coherence, and hallucinations using Likert scales. Mean scores with 95% confidence intervals (CI) were calculated across all domains and inter-rater reliability was evaluated using Cohen's quadratic weighted kappa. Results: Claude demonstrated the highest performance in all domains, outperforming the other LLMs in accuracy: mean Likert score 3.92 (95% CI 3.54-4.29); ChatGPT 3.25 (2.76-3.74); Gemini 3.17 (2.54-3.80); Llama 1.92 (1.41-2.43);completeness: mean Likert score 4.00 (3.66-4.34); GPT 2.58 (2.02-3.15); Gemini 2.58 (2.02-3.15); Llama 1.67 (1.39-1.95); and extentofhallucinations: mean Likert score 4.00 (4.00-4.00); ChatGPT 2.75 (2.06-3.44); Gemini 3.25 (2.65-3.85); Llama 1.92 (1.26-2.57). Llama performed considerably poorer across all the studied domains. ChatGPT and Gemini had intermediate performance. Notably, none of the LLMs registered perfect accuracy, completeness, or relevance. Conclusion: Claude performed at a consistently higher level than other LLMs, all tested LLMs required careful editing from a domain expert to become usable. More time will be needed to determine the suitability of LLMsto independently generate clinical synopses.

  • Forward-backward Contention Resolution Schemes for Fair Rationing

    2025-07-02

    articleOpen access1st authorCorresponding

    We use contention resolution schemes (CRS) to derive algorithms for the fair rationing of a single resource when agents have stochastic demands. We aim to provide ex-ante guarantees on the level of service provided to each agent, who may measure service in different ways (Type-I, II, or III), calling for CRS under different feasibility constraints (rank-1 matroid or knapsack). We are particularly interested in two-order CRS where the agents are equally likely to arrive in a known forward order or its reverse, which is motivated by online rationing at food banks. Indeed, for a mobile pantry driving along cities to ration food, it is equally efficient to drive that route in reverse on half of the days, and we show that doing so significantly improves the service guarantees that are possible, being more "fair" to the cities at the back of the route.

  • Beyond IID: Data-Driven Decision Making in Heterogeneous Environments

    Management Science · 2025-05-07 · 1 citations

    article

    How should one leverage historical data when past observations are not perfectly indicative of the future, for example, because of the presence of unobserved confounders which one cannot “correct” for? Motivated by this question, we study a data-driven decision-making framework in which historical samples are generated from unknown and different distributions assumed to lie in a heterogeneity ball with known radius and centered around the (also) unknown future (out-of-sample) distribution on which the performance of a decision will be evaluated. This work aims to analyze the performance of central data-driven policies and also near-optimal ones in these heterogeneous environments, and it aims to understand key drivers of performance. We establish a first result that allows us to upper bound the asymptotic worst-case regret of a broad class of policies. Leveraging this result, for any integral probability metric, we provide a general analysis of the performance achieved by sample average approximation (SAA) as a function of the radius of the heterogeneity ball. This analysis is centered around the approximation parameter, a notion of complexity we introduce to capture how the interplay between the heterogeneity and the problem structure impacts the performance of SAA. In turn, we illustrate, through several widely studied problems—for example, newsvendor, pricing—how this methodology can be applied and find that the performance of SAA varies considerably depending on the combinations of problem classes and heterogeneity. The failure of SAA for certain instances motivates the design of alternative policies to achieve rate optimality. We derive problem-dependent policies achieving strong guarantees for the illustrative problems described above and provide initial results toward a principled approach for the design and analysis of general rate-optimal algorithms. This paper was accepted by Vivek Farias, data science. Supplemental Material: The online appendix is available at https://doi.org/10.1287/mnsc.2022.03448 .

  • AI for evidence-based treatment recommendation in oncology: a blinded evaluation of large language models and agentic workflows

    Frontiers in Artificial Intelligence · 2025-12-09 · 1 citations

    articleOpen accessSenior authorCorresponding

    Background: Evidence-based medicine is crucial for clinical decision-making, yet studies suggest that a significant proportion of treatment decisions do not fully incorporate the latest evidence. Large Language Models (LLMs) show promise in bridging this gap, but their reliability for medical recommendations remains uncertain. Methods: We conducted an evaluation study comparing five LLMs' recommendations across 50 clinical scenarios related to multiple myeloma diagnosis, staging, treatment, and management, using a unified evidence cutoff of June 2024. The evaluation included three general-purpose LLMs (OpenAI o1-preview, Claude 3.5 Sonnet, Gemini 1.5 Pro), one retrieval-augmented generation (RAG) system (Myelo), and one agentic workflow-based system (HopeAI). General-purpose LLMs generated responses based solely on their internal knowledge, while the RAG system enhanced these capabilities by incorporating external knowledge retrieval. The agentic workflow system extended the RAG approach by implementing multi-step reasoning and coordinating with multiple tools and external systems for complex task execution. Three independent hematologist-oncologists evaluated the LLM-generated responses using standardized scoring criteria developed specifically for this study. Performance assessment encompassed five dimensions: accuracy, relevance, comprehensiveness, hallucination rate, and clinical use readiness. Results: HopeAI demonstrated superior performance across accuracy (82.0%), relevance (85.3%), and comprehensiveness (74.0%), compared to OpenAI o1-preview (64.7, 57.3, 36.0%), Claude 3.5 Sonnet (50.0, 51.3, 29.3%), Gemini 1.5 Pro (48.0, 46.0, 30.0%), and Myelo (58.7, 56, 32.7%). Hallucination rates were consistently low across all systems: HopeAI (5.3%), OpenAI o1-preview (3.3%), Claude 3.5 Sonnet (10.0%), Gemini 1.5 Pro (8.0%), and Myelo (5.3%). Clinical use readiness scores were relatively low for all systems: HopeAI (25.3%), OpenAI o1-preview (6.0%), Claude 3.5 Sonnet (2.7%), Gemini 1.5 Pro (4.0%), and Myelo (4.0%). Conclusion: This study demonstrates that while current LLMs show promise in medical decision support, their recommendations require careful clinical supervision to ensure patient safety and optimal care. Further research is needed to improve their clinical use readiness before integration into oncology workflows. These findings provide valuable insights into the capabilities and limitations of LLMs in oncology, guiding future research and development efforts toward integrating AI into clinical workflows.

  • Tightness Without Counterexamples: A New Approach and New Results for Prophet Inequalities

    Mathematics of Operations Research · 2025-04-29

    article

    Prophet inequalities consist of many beautiful statements that establish tight performance ratios between online and offline allocation algorithms. Typically, tightness is established by constructing an algorithmic guarantee and a worst-case instance separately, whose bounds match as a result of some “ingenuity.” In this paper, we instead formulate the construction of the worst-case instance as an optimization problem, which directly finds the tight ratio without needing to construct two bounds separately. Our analysis of this optimization problem involves identifying structure in a new “Type Coverage” dual problem. It can be seen as akin to the celebrated Magician and OCRS (Online Contention Resolution Scheme) problems, except more general, in that it can also provide tight ratios relative to the optimal offline allocation, whereas the earlier problems only establish tight ratios relative to the ex ante relaxation of the offline problem. Through this analysis, our paper provides a unified framework that derives new prophet inequalities and recovers existing ones, with our principal results being twofold. First, we show that the “oblivious” method of setting a static threshold due to Chawla et al., surprisingly, is best-possible among all static threshold algorithms, under any number k of selection slots. We emphasize that this result is derived without needing to explicitly find any counterexample instances. This implies the tightness of the asymptotic convergence rate of [Formula: see text] for static threshold algorithms from Hajiaghayi et al. Turning to the independent and identically distributed setting, our second principal result is to use our framework to characterize the tight guarantee (of adaptive algorithms) under any number k of selection slots and any fixed number of agents n.

  • Dynamic Pricing for Reusable Resources: The Power of Two Prices

    Operations Research · 2025-08-21 · 1 citations

    article

    Two Prices Unlock Big Gains for Reusable Resources How much sophistication is needed to price reusable resources, like hotel rooms and cloud computing, when usage durations are not memoryless? Surprisingly little. In “Dynamic Pricing for Reusable Resources: The Power of Two Prices,” Balseiro, Ma, and Zhang propose a class of dynamic stock-dependent policies that achieve significant improvements over static pricing by only looking at how many units are busy and ignoring how long they have been busy. Using an “insensitivity” property of loss networks, they show that optimizing within this policy class can be formulated as a tractable convex optimization problem. Better yet, the performance loss of the optimal stock-dependent policy can be achieved by a simple two-price policy: charge a high price when inventory falls below a threshold and a low price otherwise. Extensions to multiple resources and customer classes, together with extensive simulations, confirm that “just a little” dynamicity can go a long way.

  • The Benefits of Delay to Online Decision Making

    Management Science · 2025-08-06 · 2 citations

    article

    Real-time decisions are usually irrevocable in many contexts of online decision making. One common practice is delaying real-time decisions so that the decision maker can gather more information to make better decisions. For example, in online retailing, there is typically a time delay between when an online order is received and when it gets picked and assembled for shipping. However, decisions cannot be delayed forever. In this paper, we study this fundamental trade-off and aim to theoretically characterize the benefits of delaying real-time decisions. We provide a theoretical foundation for a broad family of online decision-making problems by proving that the gap between our proposed online algorithm (called “delayed Bayesian prophet”) and the offline optimal hindsight policy decays exponentially fast in the length of delay. We also conduct extensive numerical experiments on the benefits of delay, using both synthetic data and publicly available real data. Both our theoretical and empirical results demonstrate an important managerial insight: a little delay is all we need. Finally, we extend our analysis and results to the setting where the arrival distribution is independent but nonidentical, the setting where the arrival distribution is unknown, and the setting where decisions are made in batches. This paper was accepted by Jeannette Song, operations management. Funding: The second author was partially funded by a grant from Amazon.com Inc., awarded through collaboration with the Columbia Center of AI Technology. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.00549 .

  • Degeneracy Is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions

    Operations Research · 2025-02-25 · 3 citations

    article

    The network revenue management problem is a fundamental problem in online decision making with resource constraints. Tremendous efforts have been devoted to deriving near-optimal policies for the network revenue management problem with a theoretical guarantee better than the classic $O(\sqrt{T})$ regret. Logarithmic or better regret has been derived, however, with an additional nondegeneracy assumption that requires the underlying fluid approximation to enjoy a unique optimal basis. This work relaxes the nondegeneracy assumption and achieves logarithmic regret for network revenue management problems for general indiscrete reward distribution. To achieve these advances, this work develops several new techniques, including a new method of bounding myopic regret, a semifluid relaxation of the off-line allocation, and an improved bound on the dual convergence, which has the potential to inspire other works. All in all, this work takes a fundamental step toward relaxing the nondegeneracy assumption, which traditionally limits the scope of online algorithms.

Frequent coauthors

  • David Simchi‐Levi

    64 shared
  • Chung‐Piaw Teo

    26 shared
  • Jiashuo Jiang

    10 shared
  • Calum MacRury

    Columbia University

    10 shared
  • Nathaniel Grammel

    University of Maryland, College Park

    10 shared
  • Jinglong Zhao

    Boston University

    9 shared
  • Brian Brubach

    Wellesley College

    7 shared
  • Aravind Srinivasan

    6 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Will (Wei) Ma

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup