
Paul Liu
· Director of International Affairs, COS; ProfessorVerifiedNorth Carolina State University · Earth Sciences
Active 1971–2026
About
Paul Liu is a Professor in the Marine, Earth and Atmospheric Sciences department at NC State University. He serves as the Director of the AI Hub for Science and the Director of International Affairs for the College of Sciences. His research focuses on AI and Large Language Model (LLM) applications in the geosciences, including fine-tuning LLMs, agent and RAG building, and database administration. Liu has expertise in large dataset processing and modeling, numerical modeling for sediment transport, and paleo-climatic and paleo-environmental implications of late Quaternary sea-level changes. His work also involves studying riverine sediment dispersal, transport, and accumulation in continental margin environments, particularly those associated with large Asian rivers such as the Yellow, Yangtze, Pearl, Red, Mekong, Irrawaddy, and Salween. Additionally, he specializes in land-ocean interactions, postglacial sea-level rise, stratigraphic sequence formation, and nearshore and offshore seafloor digital mapping using high-resolution sonar systems. Liu holds a Ph.D. in Geological Oceanography from the Virginia Institute of Marine Science, College of William & Mary, and has a background in marine quaternary geology, hydrology, engineering geology, and marine geology from institutions in China and the United States. He is also a faculty fellow/member at several research centers and academies, including the Center for Geospatial Analytics and the Data Science and AI Academy.
Research topics
- Geology
- Geomorphology
- Oceanography
- Paleontology
- Physical geography
- Geography
- Geochemistry
Selected publications
Discover Geoscience · 2026-01-14
articleOpen accessSenior authorThis study explores the application of machine learning for facies classification in complex sandstone formations with overlapping petrophysical features. Three boosting ensemble models, Random Forest, XGBoost and CatBoost, were compared with one non-ensemble supervised model, Support Vector Machine, and one unsupervised model, K Means. Synthetic data were generated using Latin Hypercube Sampling to expand the feature space, and feature selection was performed using Random Forest importance, Recursive Feature Elimination and Lasso regression. Principal Component Analysis was used to examine feature relationships and reduce dimensional complexity. Model performance was evaluated using classification accuracy, precision recall analysis, ROC curves and confusion matrices to assess both overall and facies-level prediction reliability. CatBoost achieved the highest cross validation accuracy at 95.4%, followed by XGBoost at 93.7%, Random Forest at 89.5% and Support Vector Machine at 85.6%. K-Means showed the lowest performance with 49.7% accuracy. The results show that ensemble and supervised models provided higher consistent and accurate classifications, especially for distinguishing subtle differences between similar facies. The results also highlight the role of synthetic data in improving model generalization and the value of combining multiple feature selection methods. These findings support the integration of machine learning and data augmentation as a practical and scalable workflow for improving facies prediction and reservoir characterization in geologically variable and data-limited settings.
Natural Resources Research · 2026-03-19
articleOpen accessSenior authorAbstract Predicting microporosity and permeability in clastic reservoirs is a challenge in reservoir quality assessment, especially in formations where direct measurements are difficult or expensive. These reservoir properties are fundamental in determining a reservoir’s capacity for fluid storage and transmission, yet conventional methods for evaluating them, such as mercury injection capillary pressure and scanning electron microscopy, are resource-intensive. The aim of this study was to develop a cost-effective machine learning model to predict complex reservoir properties using readily available field data and basic laboratory analyses. A random forest classifier was employed, utilizing key geological parameters such as porosity, grain size distribution, and spectral gamma-ray measurements. An uncertainty analysis was applied to account for natural variability, expanding the dataset, and enhancing the model’s robustness. The model achieved a high level of accuracy in predicting microporosity (93%) and permeability levels (88%) using cross-validation and independent holdout evaluation. By using easily obtainable data, this model reduces the reliance on expensive laboratory methods, making it a valuable tool for early-stage exploration, especially in remote or offshore environments. By incorporating uncertainty, the model reflects geological variability more realistically and improves confidence in predictions, particularly when decisions must be made with limited or noisy input data. The integration of machine learning with uncertainty analysis provides a reliable and cost-effective approach for evaluating key reservoir properties in siliciclastic formations. This model offers a practical solution to improve reservoir quality assessments, enhancing decision-making in exploration planning and risk reduction, especially where direct measurements are limited enabling more informed decision-making and optimizing exploration efforts.
Generative AI for Science - A Hands-On Guide for Students and Researchers
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-02
bookOpen access1st authorCorrespondingAbout This Book Generative AI for Science is a comprehensive, hands-on guide for researchers, students, and practitioners who want to apply cutting-edge AI techniques to scientific discovery. This book bridges the gap between AI/ML expertise and domain science, providing practical implementations across chemistry, biology, physics, geoscience, and beyond. "Generative AI does not replace the scientific method—it enhances it. It expands the space of hypotheses we can explore, sharpens experimental design, and reveals patterns hidden in complexity." ✨ What Makes This Book Different Feature Description 🔬 Theory Meets Practice Every concept is paired with ready-to-run code 💻 Interactive Learning All examples provided as Google Colab notebooks—no installation required 🧪 Real Scientific Problems Examples from authentic research across multiple domains 📊 Accessible Yet Rigorous Suitable for domain scientists exploring AI and ML experts entering scientific applications 🎓 Who Is This For? You Are... You'll Get... 🔬 Domain Scientist AI skills to accelerate your research 💻 ML Engineer Scientific applications for your expertise 🎓 Graduate Student Complete curriculum with hands-on projects 👔 Industry Practitioner Production-ready code and best practices ✅ What You Will Learn By the end of this book, you will: ✅ Understand key AI architectures: Transformers, Diffusion Models, VAEs, and GNNs ✅ Represent scientific data types effectively for AI models ✅ Apply generative models to problems in climate science, drug discovery, genomics, materials science, and more ✅ Follow best practices around ethics, reproducibility, and deployment ✅ Stay current with emerging methods and future directions ✅ Develop the intuition to know when and how to apply AI to scientific research
Generative AI for Science - A Hands-On Guide for Students and Researchers
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-02
bookOpen access1st authorCorrespondingAbout This Book Generative AI for Science is a comprehensive, hands-on guide for researchers, students, and practitioners who want to apply cutting-edge AI techniques to scientific discovery. This book bridges the gap between AI/ML expertise and domain science, providing practical implementations across chemistry, biology, physics, geoscience, and beyond. "Generative AI does not replace the scientific method—it enhances it. It expands the space of hypotheses we can explore, sharpens experimental design, and reveals patterns hidden in complexity." ✨ What Makes This Book Different Feature Description 🔬 Theory Meets Practice Every concept is paired with ready-to-run code 💻 Interactive Learning All examples provided as Google Colab notebooks—no installation required 🧪 Real Scientific Problems Examples from authentic research across multiple domains 📊 Accessible Yet Rigorous Suitable for domain scientists exploring AI and ML experts entering scientific applications 🎓 Who Is This For? You Are... You'll Get... 🔬 Domain Scientist AI skills to accelerate your research 💻 ML Engineer Scientific applications for your expertise 🎓 Graduate Student Complete curriculum with hands-on projects 👔 Industry Practitioner Production-ready code and best practices ✅ What You Will Learn By the end of this book, you will: ✅ Understand key AI architectures: Transformers, Diffusion Models, VAEs, and GNNs ✅ Represent scientific data types effectively for AI models ✅ Apply generative models to problems in climate science, drug discovery, genomics, materials science, and more ✅ Follow best practices around ethics, reproducibility, and deployment ✅ Stay current with emerging methods and future directions ✅ Develop the intuition to know when and how to apply AI to scientific research
Forecasting Nile Delta Shoreline Change Until 2050 Using a Shallow Neural Network
2025-08-11
preprintOpen accessSenior authorThe Nile Delta is among the world’s most vulnerable coasts, facing intensified shoreline retreat from sea level rise, subsidence, sediment decline, and human interventions. While past studies emphasized historical trends, it is the first time we applied a data-driven framework using a shallow artificial neural network to predict the next 30-year shoreline evolution. The model was trained on satellite-derived shoreline data from 1992–2022, incorporating wave energy, sea level, land cover, and subsidence with feature selection and uncertainty analysis to improve model performance. The simulation was validated using 2022 shoreline data. Forecast changes show high agreement with observed patterns. Projections indicate increasing erosion, with cumulative land losses of 5.3, 10.7, and 18.4 km², in 2030, 2040, and 2050 respectively, particularly near Burullus Lake where a major geohazard shift could happen. This AI framework offers a practical tool to support long-term coastal management and targeted adaptation strategies for world deltas.
ArXiv.org · 2025-03-21
preprintOpen accessSenior authorPredicting microporosity and permeability in clastic reservoirs is a challenge in reservoir quality assessment, especially in formations where direct measurements are difficult or expensive. These reservoir properties are fundamental in determining a reservoir's capacity for fluid storage and transmission, yet conventional methods for evaluating them, such as Mercury Injection Capillary Pressure (MICP) and Scanning Electron Microscopy (SEM), are resource-intensive. The aim of this study is to develop a cost-effective machine learning model to predict complex reservoir properties using readily available field data and basic laboratory analyses. A Random Forest classifier was employed, utilizing key geological parameters such as porosity, grain size distribution, and spectral gamma-ray (SGR) measurements. An uncertainty analysis was applied to account for natural variability, expanding the dataset, and enhancing the model's robustness. The model achieved a high level of accuracy in predicting microporosity (93%) and permeability levels (88%). By using easily obtainable data, this model reduces the reliance on expensive laboratory methods, making it a valuable tool for early-stage exploration, especially in remote or offshore environments. The integration of machine learning with uncertainty analysis provides a reliable and cost-effective approach for evaluating key reservoir properties in siliciclastic formations. This model offers a practical solution to improve reservoir quality assessments, enabling more informed decision-making and optimizing exploration efforts.
2025-06-10
preprintOpen accessSenior authorFacies classification plays a critical role in characterizing subsurface heterogeneity and supporting effective reservoir development. Traditional methods, which often rely on core interpretation and manual log analysis, are limited by subjective interpretation and sparse data coverage. This study aims to improve facies prediction by comparing the performance of five machine learning models: Random Forest, XGBoost, Support Vector Machine, CatBoost, and K-Means clustering. The dataset is derived from sandstone formations in Labuan Island, Malaysia, and is enhanced using synthetic data generated through Latin Hypercube Sampling to address data scarcity. Feature selection is performed using three independent techniques to identify the most informative variables, and Principal Component Analysis is used to investigate feature relationships. Model evaluation is based on classification accuracy, precision-recall metrics, receiver operating characteristic curves, and confusion matrices. Among the models tested, CatBoost achieved the highest cross-validation accuracy at 95.4%, followed by XGBoost at 93.7%. Random Forest achieved a test accuracy of 89.5%, while Support Vector Machine performed less reliably with a test accuracy of 85.6%. The K-Means clustering approach yielded an overall accuracy of 49.7% in aligning predicted clusters with true facies labels. The results demonstrate the effectiveness of ensemble methods in facies classification and support the use of augmented data in enhancing model performance. This approach provides a practical framework for applying machine learning in geological settings, with potential benefits for reservoir modeling and development planning.
Research Square · 2025-07-11
preprintOpen accessSenior author2025-03-18 · 3 citations
reviewOpen accessSenior authorThe dynamic nature of coastal zones is characterized by continuous change in shoreline position due to natural and anthropogenic processes. These changes present challenges for coastal management and conservation efforts. Traditionally, shoreline change analysis relied mainly on empirical observations and numerical models which was limited in dealing with complex, multi-dimensional interactions along our coasts. Recent decades have witnessed an integration of machine learning (ML) techniques into coastal studies to predict shoreline changes. This review aims to provide a general overview of the development of shoreline modeling and the evolution of ML applications in the field. The review synthesizes findings from 18 research papers, tracing the development of shoreline prediction methodologies from early empirical models to modern ML-based frameworks. The analysis highlights a shift from deterministic approaches to data-driven models that leverage multiple ML techniques for improved predictions. By comparing different modeling approaches over time, this study evaluates the effectiveness of ML in capturing shoreline dynamics and enhancing predictive capabilities. The review shows that new methods can significantly enhance shoreline modeling, offering improved predictive power and new insights into coastal dynamics. The findings suggest future research directions in the context of climate change and increasing human interventions.
Journal of Geophysical Research Biogeosciences · 2025-09-28
articleAbstract Deep‐sea fans represent the largest sediment and organic carbon (OC) accumulation zones on Earth. However, variations of sedimentary OC sequestration in deep‐sea fans during the last sea level rise have not been well evaluated. Here, a gravity core (4.24 m) retrieved from the inner flank of the active channel in the Lower Bengal Fan was analyzed for mineralogy, inorganic elements, total OC (TOC) and carbon isotopes (δ 13 C, Δ 14 C), and lignin phenols to reconstruct sources and accumulation rates of sediment and OC over the past 15 ka. The results showed significantly higher TOC accumulation rate (TOC AR , 443 ± 221 mg/cm 2 /ka), terrestrial OC proportion (53 ± 5%), and burial efficiency (37 ± 8%) during sea‐level lowstand (15–10 ka) than the following sea‐level highstand (10–2 ka, 7 ± 2 mg/cm 2 /ka, 39 ± 6%, 22 ± 4%) due to considerable decline of terrestrial sediment and OC supply when the sea level was high. This was further evidenced by decreasing lignin content (0.46 ± 0.30 vs. 0.02 ± 0.02 mg/100 mg OC) and pre‐depositional age (4,607 ± 300 vs. 2,650 ± 933 years). At 2–0 ka, slight increase in these parameters was most likely due to enhanced anthropogenic interference. The re‐evaluated TOC AR and burial efficiency for global deep‐sea fans during the Holocene and the last deglaciation are higher than for deep‐sea plains (>1,000 m) and upwelling regions, suggesting deep‐sea fans are hotspots of OC sequestration. This study highlights the role of active channels of deep‐sea fans in modulating OC biogeochemistry under global climate change.
Recent grants
NSF · $239k · 2017–2022
U.S.- Vietnam Planning Visit: Study of the Mekong River-derived sediment in the South China Sea
NSF · $11k · 2005–2007
Sediment Flux and Fate of the Yangtze River Sediments Delivered to the East China Sea
NSF · $63k · 2004–2007
Frequent coauthors
- 55 shared
Yonggang Jia
Ocean University of China
- 52 shared
Xuefa Shi
Ministry of Natural Resources
- 52 shared
Xiaolei Liu
Ocean University of China
- 50 shared
Jiwei Tian
Ocean University of China
- 50 shared
Zhuangcai Tian
China University of Mining and Technology
- 50 shared
Jiangxin Chen
- 49 shared
Chunsheng Ji
China Geological Survey
- 49 shared
Shaotong Zhang
Ocean University of China
Labs
Awards & honors
- NC State Provost Faculty Fellow–Global Leadership (2019)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Paul Liu
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup