Jia Li

Verified

Pennsylvania State University · Social Data Analytics

Active 1988–2024

h-index70

Citations25.1k

Papers1.3k632 last 5y

Funding$1.2M

Faculty page

See your match with Jia Li — sign in to PhdFit.Sign in

Research topics

Artificial Intelligence
Computer Science
Operating system
Geology
Physical geography
Civil engineering
Programming language
Environmental science
Meteorology
Mathematics
Software engineering
Climatology
Engineering
Geography
World Wide Web
Cartography

Selected publications

StarCoder: may the source be with you!
arXiv (Cornell University) · 2023 · 192 citations
- Computer Science
- Computer Science
- Artificial Intelligence
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.
Publisher OA PDF DOI
Validation of seven global remotely sensed ET products across Thailand using water balance measurements and land use classifications
Journal of Hydrology Regional Studies · 2020 · 46 citations
- Environmental science
- Climatology
- Geography
Annual and monthly ET values from seven global remote sensing products (ALEXI, CMRSET, ETMonitor, GLEAM V3.3b, MOD16A2, SEBS V3 and SSEBop) were validated for 172 sub-basins in Thailand. This study describes a generalised validation procedure that uses rainfall (P), streamflow (Q) and storage change data (from the Gravity Recovery and Climate Experiment - TWSCGRACE) and land use information. For each sub-basin, bulk ET was computed using the water balance framework and compared to estimates by ET products. Inverse water balance computations were applied to infer the storage change estimates from each product (ΔS = P – Q - ETRS), which were compared to TWSCGRACE to assess their monthly scale performances. All products performed very well on the annual basis (mean NSE > 0.96) and satisfactorily on the monthly scale (mean NSE > 0.65). Land use classifications from the Land Development Department were used to examine the ability of four candidates (CMRSET, MOD16A2, GLEAM V3.3b and ETMonitor) to provide ET estimates with correspondence to physical land use conditions. By also considering product resolutions and data accessibility, MOD16A2 was consensually shown to be the most promising product to be used for water resources management in Thailand. In addition to local applications, the outcomes emanate the potential for utilisation on the global scale which should be further investigated.
DOI

Recent grants

Statistical Learning for Image Annotation
NSF · $325k · 2015–2019
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
NSF · $250k · 2015–2018
Modeling the Impact of Releasing Genetically Altered Mosquitoes in Preventing the Transmission of Mosquito-Borne Diseases
NSF · $110k · 2004–2009
Parametric and nonparametric regressions on spot volatility
NSF · $256k · 2013–2017
Estimation and Inference Methods for Continuous-Time Models
NSF · $50k · 2012–2013

Frequent coauthors

Massimo Menenti
741 shared
Meng Li
Anhui Normal University
725 shared
Xuan Wang
Nanjing Health and Health Commission
360 shared
Lin Lin
Ningbo University
351 shared
Chaolei Zheng
Aerospace Information Research Institute
301 shared
Jing Lu
Aerospace Information Research Institute
221 shared
Jie Zhou
200 shared
Guangcheng Hu
Aerospace Information Research Institute
198 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Jia Li

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup