Jia Li
VerifiedPennsylvania State University · Social Data Analytics
Active 1988–2024
Research topics
- Artificial Intelligence
- Computer Science
- Operating system
- Geology
- Physical geography
- Civil engineering
- Programming language
- Environmental science
- Meteorology
- Mathematics
- Software engineering
- Climatology
- Engineering
- Geography
- World Wide Web
- Cartography
Selected publications
StarCoder: may the source be with you!
arXiv (Cornell University) · 2023 · 192 citations
- Computer Science
- Computer Science
- Artificial Intelligence
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.
Journal of Hydrology Regional Studies · 2020 · 46 citations
- Environmental science
- Climatology
- Geography
Annual and monthly ET values from seven global remote sensing products (ALEXI, CMRSET, ETMonitor, GLEAM V3.3b, MOD16A2, SEBS V3 and SSEBop) were validated for 172 sub-basins in Thailand. This study describes a generalised validation procedure that uses rainfall (P), streamflow (Q) and storage change data (from the Gravity Recovery and Climate Experiment - TWSCGRACE) and land use information. For each sub-basin, bulk ET was computed using the water balance framework and compared to estimates by ET products. Inverse water balance computations were applied to infer the storage change estimates from each product (ΔS = P – Q - ETRS), which were compared to TWSCGRACE to assess their monthly scale performances. All products performed very well on the annual basis (mean NSE > 0.96) and satisfactorily on the monthly scale (mean NSE > 0.65). Land use classifications from the Land Development Department were used to examine the ability of four candidates (CMRSET, MOD16A2, GLEAM V3.3b and ETMonitor) to provide ET estimates with correspondence to physical land use conditions. By also considering product resolutions and data accessibility, MOD16A2 was consensually shown to be the most promising product to be used for water resources management in Thailand. In addition to local applications, the outcomes emanate the potential for utilisation on the global scale which should be further investigated.
Recent grants
Statistical Learning for Image Annotation
NSF · $325k · 2015–2019
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
NSF · $250k · 2015–2018
NSF · $110k · 2004–2009
Parametric and nonparametric regressions on spot volatility
NSF · $256k · 2013–2017
Estimation and Inference Methods for Continuous-Time Models
NSF · $50k · 2012–2013
Frequent coauthors
- 741 shared
Massimo Menenti
- 725 shared
Meng Li
Anhui Normal University
- 360 shared
Xuan Wang
Nanjing Health and Health Commission
- 351 shared
Lin Lin
Ningbo University
- 301 shared
Chaolei Zheng
Aerospace Information Research Institute
- 221 shared
Jing Lu
Aerospace Information Research Institute
- 200 shared
Jie Zhou
- 198 shared
Guangcheng Hu
Aerospace Information Research Institute
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jia Li
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup