Shiqian Ma

· Professor of Computational Applied Mathematics and Operations ResearchVerified

Rice University · Computing and Mathematical Sciences

Active 2006–2026

h-index38

Citations5.4k

Papers20283 last 5y

Funding$1.4M2 active

Faculty page Lab page

See your match with Shiqian Ma — sign in to PhdFit.Sign in

About

Shiqian Ma is a Professor of Computational Applied Mathematics and Operations Research at Rice University. His research areas include optimization and machine learning. He holds a Ph.D. from the Department of Industrial Engineering and Operations Research at Columbia University, obtained in 2011. Ma also earned a Master's degree from the Institute of Computational Mathematics and Scientific/Engineering Computing at the Chinese Academy of Sciences in 2006, and a Bachelor's degree from the School of Mathematical Sciences at Peking University in 2003. He is a member of the Ken Kennedy Institute and is involved in teaching operations research, optimization, and machine learning.

Research topics

Artificial Intelligence
Computer Science
Mathematics
Mathematical optimization
Algorithm
Applied mathematics
Pure mathematics
Mathematical analysis
Combinatorics

Selected publications

Demystifying Manifold Constraints in LLM Pre-training
ArXiv.org · 2026-05-06
articleOpen accessSenior author
The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may improve numerical stability and performance, the mechanism and motivation for adding constraints still remain elusive. This paper systematically demystifies the role of explicit manifold constraints in LLM pre-training. By introducing the Msign-Aligned Constrained Riemannian Optimizer (MACRO)-a provably convergent, single-loop optimization framework-our study disentangles weight regularization heuristics from interacting mechanisms like RMS normalization and decoupled weight decay. Theoretical analyses and comprehensive empirical evaluations reveal that manifold constraints independently bound forward activation scales and enforce stable rotational equilibrium, thereby subsuming the roles of these heuristic mechanisms. Evaluations on large-scale LLM architectures demonstrate that MACRO achieves highly competitive performance while rigorously preserving the theoretical guarantees of exact Riemannian optimization.
Publisher OA PDF
Fast Sparse Nonnegative Matrix Factorization with Manifold Acceleration
2026-04-21
articleSenior author
In this paper, we propose a fast sparse Nonnegative Matrix Factorization algorithm incorporating manifold identification techniques. Within an alternating update framework, it adaptively leverages the algorithm’s inherent manifold identification information to accelerate subproblem solutions, thereby enhancing computational efficiency. Numerical experiments demonstrate that our algorithm shows superior performance compared to existing methods, achieving better solutions with faster convergence rates, particularly under high sparsity requirements. We provide a global convergence guarantee for the algorithm. Regarding the locally linear convergence observed experimentally, under a set of assumptions, we develop a proof strategy for general cases. Furthermore, we furnish a complete proof for the vector case.
Publisher DOI
Demystifying Manifold Constraints in LLM Pre-training
arXiv (Cornell University) · 2026-05-06
preprintOpen accessSenior author
The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may improve numerical stability and performance, the mechanism and motivation for adding constraints still remain elusive. This paper systematically demystifies the role of explicit manifold constraints in LLM pre-training. By introducing the Msign-Aligned Constrained Riemannian Optimizer (MACRO)-a provably convergent, single-loop optimization framework-our study disentangles weight regularization heuristics from interacting mechanisms like RMS normalization and decoupled weight decay. Theoretical analyses and comprehensive empirical evaluations reveal that manifold constraints independently bound forward activation scales and enforce stable rotational equilibrium, thereby subsuming the roles of these heuristic mechanisms. Evaluations on large-scale LLM architectures demonstrate that MACRO achieves highly competitive performance while rigorously preserving the theoretical guarantees of exact Riemannian optimization.
Publisher DOI
AutoBalance: An Automatic Balancing Framework for Training Physics-Informed Neural Networks
ArXiv.org · 2025-10-08
preprintOpen accessSenior author
Physics-Informed Neural Networks (PINNs) provide a powerful and general framework for solving Partial Differential Equations (PDEs) by embedding physical laws into loss functions. However, training PINNs is notoriously difficult due to the need to balance multiple loss terms, such as PDE residuals and boundary conditions, which often have conflicting objectives and vastly different curvatures. Existing methods address this issue by manipulating gradients before optimization (a "pre-combine" strategy). We argue that this approach is fundamentally limited, as forcing a single optimizer to process gradients from spectrally heterogeneous loss landscapes disrupts its internal preconditioning. In this work, we introduce AutoBalance, a novel "post-combine" training paradigm. AutoBalance assigns an independent adaptive optimizer to each loss component and aggregates the resulting preconditioned updates afterwards. Extensive experiments on challenging PDE benchmarks show that AutoBalance consistently outperforms existing frameworks, achieving significant reductions in solution error, as measured by both the MSE and $L^{\infty}$ norms. Moreover, AutoBalance is orthogonal to and complementary with other popular PINN methodologies, amplifying their effectiveness on demanding benchmarks.
Publisher OA PDF DOI
First-Order Federated Bilevel Learning
Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11
articleOpen access
Federated bilevel optimization (FBO) has garnered significant attention lately, driven by its promising applications in meta-learning and hyperparameter optimization. Existing algorithms generally aim to approximate the gradient of the upper-level objective function (hypergradient) in the federated setting. However, because of the nonlinearity of the hypergradient and client drift, they often involve complicated computations. These computations, like multiple optimization sub-loops and second-order derivative evaluations, end up with significant memory consumption and high computational costs. In this paper, we propose a computationally and memory-efficient FBO algorithm named MemFBO. MemFBO features a fully single-loop structure with all involved variables updated simultaneously, and uses only first-order gradient information for all local updates. We show that MemFBO exhibits a linear convergence speedup with milder assumptions in both partial and full client participation scenarios. We further implement MemFBO in a novel FBO application for federated data cleaning. Our experiments, conducted on this application and federated hyper-representation, demonstrate the effectiveness of the proposed algorithm.
Publisher OA PDF DOI
Efficient OPF calculations for power system reliability assessment based on state similarity
Applied Energy · 2025-11-24
articleOpen access
Publisher OA PDF DOI
AdaBB: Adaptive Barzilai-Borwein Method for Convex Optimization
Mathematics of Operations Research · 2025-03-31 · 2 citations
article
In this paper, we propose AdaBB, an adaptive gradient method based on the Barzilai-Borwein stepsize. The algorithm is line-search-free and parameter-free, and it essentially provides a convergent variant of the Barzilai-Borwein method for general convex optimization problems. We analyze the ergodic convergence of the objective function value and the convergence of the iterates for solving general convex optimization problems. Compared with existing works along this line of research, our algorithm gives the best lower bounds on the stepsize and the average of the stepsizes. Furthermore, we present extensions of the proposed algorithm for solving locally strongly convex and composite convex optimization problems where the objective function is the sum of a smooth function and a nonsmooth function. In the case of local strong convexity, we achieve linear convergence. Our numerical results also demonstrate very promising potential of the proposed algorithms on some representative examples. Funding: S. Ma is supported by the National Science Foundation [Grants DMS-2243650, CCF-2308597, CCF-2311275, and ECCS-2326591] and a startup fund from Rice University. J. Yang is supported by the National Natural Science Foundation of China [Grants 12431011 and 12371301] and the Natural Science Foundation for Distinguished Young Scholars of Gansu Province [Grant 22JR5RA223].
Publisher DOI
On the Convergence of Constrained Gradient Method
ArXiv.org · 2025-11-21
preprintOpen access
The constrained gradient method (CGM) has recently been proposed to solve convex optimization and monotone variational inequality (VI) problems with general functional constraints. While existing literature has established convergence results for CGM, the assumptions employed therein are quite restrictive; in some cases, certain assumptions are mutually inconsistent, leading to gaps in the underlying analysis. This paper aims to derive rigorous and improved convergence guarantees for CGM under weaker and more reasonable assumptions, specifically in the context of strongly convex optimization and strongly monotone VI problems. Preliminary numerical experiments are provided to verify the validity of CGM and demonstrate its efficacy in addressing such problems.
Publisher OA PDF DOI
Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains
ArXiv.org · 2025-10-10
preprintOpen accessSenior author
We study generative modeling on convex domains using flow matching and mirror maps, and identify two fundamental challenges. First, standard log-barrier mirror maps induce heavy-tailed dual distributions, leading to ill-posed dynamics. Second, coupling with Gaussian priors performs poorly when matching heavy-tailed targets. To address these issues, we propose Mirror Flow Matching based on a \emph{regularized mirror map} that controls dual tail behavior and guarantees finite moments, together with coupling to a Student-$t$ prior that aligns with heavy-tailed targets and stabilizes training. We provide theoretical guarantees, including spatial Lipschitzness and temporal regularity of the velocity field, Wasserstein convergence rates for flow matching with Student-$t$ priors and primal-space guarantees for constrained generation, under $\varepsilon$-accurate learned velocity fields. Empirically, our method outperforms baselines in synthetic convex-domain simulations and achieves competitive sample quality on real-world constrained generative tasks.
Publisher OA PDF DOI
Relaxed Proximal Point Algorithm: Tight Complexity Bounds and Acceleration Without Momentum
INFORMS Journal on Optimization · 2025-12-09
articleOpen access
In this paper, we focus on the relaxed proximal point algorithm (RPPA) for solving convex (possibly nonsmooth) optimization problems. We conduct a comprehensive study on three types of relaxation schedules: (i) constant schedule with relaxation parameter [Formula: see text], (ii) a dynamic schedule put forward by Teboulle and Vaisbourd, and (iii) the silver step-size schedule proposed by Altschuler and Parrilo. The latter two schedules were initially investigated for the gradient descent (GD) method and are extended to the RPPA in this paper. For type (i), we establish tight nonergodic [Formula: see text] convergence rate results measured by function value residual and subgradient norm, where N denotes the iteration counter. For type (ii), we establish a convergence rate that is tight and approximately [Formula: see text] times better than the constant schedule of type (i). For type (iii), aside from the original silver step-size schedule proposed previously, we propose two new modified silver step-size schedules, and for all the three silver step-size schedules, [Formula: see text] accelerated convergence rate results with respect to three different performance metrics are established. Furthermore, our research affirms a previous conjecture by Luner and Grimmer on the GD method with the original silver step-size schedule. Funding: B. Wang, J. Yang, and D. Zhou were supported by the National Natural Science Foundation of China [Grants 12431011 and 12371301] and the Key Laboratory of Numerical Simulation of Large Scale Complex Systems of the Ministry of Education of China. S. Ma was supported in part by the National Science Foundation [Grants CCF-2311275 and ECCS-2326591].
Publisher DOI

Recent grants

Collaborative Research: New Methods, Theory and Applications for Nonsmooth Manifold-Based Learning
NSF · $99k · 2022–2024
Collaborative Research: Distributed Bilevel Optimization in Multi-Agent Systems
NSF · $250k · 2023–2027
Collaborative Research: CIF: Small: New Theory and Applications of Non-smooth and Non-Lipschitz Riemannian Optimization
NSF · $252k · 2022–2024
Collaborative Research: CIF: Small: New Theory, Algorithms and Applications for Large-Scale Bilevel Optimization
NSF · $300k · 2023–2027
Collaborative Research: CIF: Small: New Theory and Applications of Non-smooth and Non-Lipschitz Riemannian Optimization
NSF · $317k · 2020–2023

Frequent coauthors

Shuzhong Zhang
34 shared
Donald Goldfarb
22 shared
Shixiang Chen
Chang'an University
15 shared
Lingzhou Xue
15 shared
Tianyi Lin
Columbia University
14 shared
Jiaxiang Li
University of South China
11 shared
Bo Jiang
10 shared
Krishnakumar Balasubramanian
University of California, Davis
9 shared

Labs

Shiqian Ma's LabPI
Not provided

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Shiqian Ma

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you