Yao-Yi Chiang

· Associate Professor, Director of Graduate Studies for Data ScienceVerified

University of Minnesota · Computer Science and Engineering

Active 2004–2026

h-index26

Citations2.5k

Papers18790 last 5y

Funding—

Faculty page Lab page Website

See your match with Yao-Yi Chiang — sign in to PhdFit.Sign in

About

I work on Spatial AI topics. I am interested in developing data-driven methods that can take advantage of domain knowledge to solve complex problems. For example, we built machine learning algorithms that incorporate spatial science techniques for air quality prediction and imagery recognition. I also enjoy building working systems with my students and doing consulting work related to my research.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Meteorology
Risk analysis (engineering)
Mathematical optimization
Data science
Environmental science
Engineering
Geography
Algorithm
Mathematics
Management science

Selected publications

NARA: Anchor-Conditioned Relation-Aware Contextualization of Heterogeneous Geoentities
arXiv (Cornell University) · 2026-05-12
preprintOpen accessSenior author
Geospatial foundation models have primarily focused on raster data such as satellite imagery, where self-supervised learning has been widely studied. Vector geospatial data instead represent the world as discrete geoentities with explicit geometry, semantics, and structured spatial relations, including metric proximity and topological relationships. These relations jointly determine how entities interact within space, yet existing representation learning methods remain fragmented, often restricted to specific geometry types or partial spatial relations, limiting their ability to capture unified spatial context across heterogeneous geoentities. We propose NARA (Neural Anchor-conditioned Relation-Aware representation learning), a self-supervised framework for vector geoentities. NARA learns context-dependent representations by jointly modeling semantics, geometry, and spatial relations within a unified framework and captures relational spatial structure beyond proximity alone, enabling rich contextualized representations across heterogeneous geoentities of points, polylines, and polygons. Evaluation on building function classification, traffic speed prediction, and next point-of-interest recommendation shows consistent improvements over prior methods, highlighting the benefit of unified relational modeling for vector geospatial data.
Publisher DOI
OMNI-Dent: Towards an Accessible and Explainable AI Framework for Automated Dental Diagnosis
Open MIND · 2026-02-03
preprint
Accurate dental diagnosis is essential for oral healthcare, yet many individuals lack access to timely professional evaluation. Existing AI-based methods primarily treat diagnosis as a visual pattern recognition task and do not reflect the structured clinical reasoning used by dental professionals. These approaches also require large amounts of expert-annotated data and often struggle to generalize across diverse real-world imaging conditions. To address these limitations, we present OMNI-Dent, a data-efficient and explainable diagnostic framework that incorporates clinical reasoning principles into a Vision-Language Model (VLM)-based pipeline. The framework operates on multi-view smartphone photographs,embeds diagnostic heuristics from dental experts, and guides a general-purpose VLM to perform tooth-level evaluation without dental-specific fine-tuning of the VLM. By utilizing the VLM's existing visual-linguistic capabilities, OMNI-Dent aims to support diagnostic assessment in settings where curated clinical imaging is unavailable. Designed as an early-stage assistive tool, OMNI-Dent helps users identify potential abnormalities and determine when professional evaluation may be needed, offering a practical option for individuals with limited access to in-person care.
DOI
TiCLS : Tightly Coupled Language Text Spotter
Open MIND · 2026-02-03
preprint
Scene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TiCLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TiCLS introduces a linguistic decoder that fuses visual and linguistic features, yet can be initialized by a pretrained language model, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015 and Total-Text demonstrate that TiCLS achieves state-of-the-art performance, validating the effectiveness of PLM-guided linguistic integration for scene text spotting.
DOI
NARA: Anchor-Conditioned Relation-Aware Contextualization of Heterogeneous Geoentities
ArXiv.org · 2026-05-12
articleOpen accessSenior author
Geospatial foundation models have primarily focused on raster data such as satellite imagery, where self-supervised learning has been widely studied. Vector geospatial data instead represent the world as discrete geoentities with explicit geometry, semantics, and structured spatial relations, including metric proximity and topological relationships. These relations jointly determine how entities interact within space, yet existing representation learning methods remain fragmented, often restricted to specific geometry types or partial spatial relations, limiting their ability to capture unified spatial context across heterogeneous geoentities. We propose NARA (Neural Anchor-conditioned Relation-Aware representation learning), a self-supervised framework for vector geoentities. NARA learns context-dependent representations by jointly modeling semantics, geometry, and spatial relations within a unified framework and captures relational spatial structure beyond proximity alone, enabling rich contextualized representations across heterogeneous geoentities of points, polylines, and polygons. Evaluation on building function classification, traffic speed prediction, and next point-of-interest recommendation shows consistent improvements over prior methods, highlighting the benefit of unified relational modeling for vector geospatial data.
Publisher OA PDF
TiCLS : Tightly Coupled Language Text Spotter
ArXiv.org · 2026-02-03
articleOpen access
Scene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TiCLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TiCLS introduces a linguistic decoder that fuses visual and linguistic features, yet can be initialized by a pretrained language model, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015 and Total-Text demonstrate that TiCLS achieves state-of-the-art performance, validating the effectiveness of PLM-guided linguistic integration for scene text spotting.
Publisher OA PDF
OMNI-Dent: Towards an Accessible and Explainable AI Framework for Automated Dental Diagnosis
ArXiv.org · 2026-02-03
articleOpen access
Accurate dental diagnosis is essential for oral healthcare, yet many individuals lack access to timely professional evaluation. Existing AI-based methods primarily treat diagnosis as a visual pattern recognition task and do not reflect the structured clinical reasoning used by dental professionals. These approaches also require large amounts of expert-annotated data and often struggle to generalize across diverse real-world imaging conditions. To address these limitations, we present OMNI-Dent, a data-efficient and explainable diagnostic framework that incorporates clinical reasoning principles into a Vision-Language Model (VLM)-based pipeline. The framework operates on multi-view smartphone photographs,embeds diagnostic heuristics from dental experts, and guides a general-purpose VLM to perform tooth-level evaluation without dental-specific fine-tuning of the VLM. By utilizing the VLM's existing visual-linguistic capabilities, OMNI-Dent aims to support diagnostic assessment in settings where curated clinical imaging is unavailable. Designed as an early-stage assistive tool, OMNI-Dent helps users identify potential abnormalities and determine when professional evaluation may be needed, offering a practical option for individuals with limited access to in-person care.
Publisher OA PDF
Line-of-Sight Probability in Macrocells: Framework, Statistical Models, and Parametrization From Massive Real-World Datasets in the USA
IEEE Transactions on Wireless Communications · 2026-01-01
article
Accurate modeling of line-of-sight (LOS) probability is crucial for wireless channel description and coverage planning. The presence of a LOS impacts other channel characteristics such as pathloss, fading depth, delay- and angular spread, etc. Existing models, although useful, are based on very limited datasets. In this paper, we establish a framework to produce high accuracy LOS models from geospatial data in different environments, and apply it to create a LOS model for macrocells, using datasets of the United States (US) on a national scale, using more than <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$13,000$ </tex-math></inline-formula> locations of real-world macrocells. Based on this we create a new, fully parameterized model that better describes macrocell deployments in the US than the 3GPP model. We furthermore demonstrate that for improved accuracy the LOS probability should be modeled on a per cell basis, and the model parameters treated as random variables; we provide a full description and parameterization of this novel approach and by simulations show that it better predicts the inter-cell interference at the cell-edge than an average-based model.
Publisher DOI
TICLS: Tightly Coupled Language Text Spotter
2026-03-06
article
Scene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TICLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TICLS contains a pretrained linguistic decoder that fuses visual and linguistic features, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015, Total-Text, and CTW1500 demonstrate that TICLS achieves state-of-the-art performance, validating the effectiveness of PLM-guided linguistic integration for scene text spotting. The code is available at https://github.com/knowledge-computing/TiCLS.
Publisher DOI
Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning
ArXiv.org · 2025-10-09
preprintOpen access
Historical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and link legend items and their descriptions via bounding box predictions. Our experiments show that GPT-4 with structured JSON prompts outperforms the baseline, achieving 88% F-1 and 85% IoU, and reveal how prompt design, example counts, and layout alignment affect performance. This approach supports scalable, layout-aware legend parsing and improves the indexing and searchability of historical maps across various visual styles.
Publisher OA PDF DOI
Key research priorities in methodological approaches for measuring the exposome and studying its role in the development of dementia
Alzheimer s & Dementia · 2025-11-01 · 1 citations
articleOpen access
There is growing recognition regarding the importance of the exposome, or the totality of exposures one experiences across the life course, in research on Alzheimer's disease and related dementias. However, the measurement of numerous exposures at once and over time, as well as modeling their effects on dementia risk, presents significant methodological challenges. Through community engagement and consensus-building processes integrating input from multidisciplinary panels of experts, we identified critical priority topics for methods used in studying links between the exposome and dementia risk, along with advances needed to address those priorities. We identified nine priority topics: high-dimensional and multimodal data, measurement error, harmonization across studies, mixtures of exposures, effect heterogeneity, exposure timing, cumulative exposures, reverse causation, and sample composition. This paper describes these priority topics and highlights areas where future research or the dissemination of existing methods could advance the state of existing science. HIGHLIGHTS: Inherent complexities central to the measurement and modeling of the exposome and its relationship to dementia pose methodological challenges. We identified nine priority topics, such as measurement error, mixtures of exposures, and cumulative exposures. Modeling approaches should consider complexity but provide useful simplifications when possible. Investments in the development and dissemination of innovative approaches and methodological guidance are needed.
Publisher OA PDF DOI

Frequent coauthors

Craig A. Knoblock
88 shared
Johannes Uhl
39 shared
Stefan Leyk
University of Colorado Boulder
38 shared
Weiwei Duan
37 shared
Cyrus Shahabi
35 shared
Zekun Li
34 shared
Yijun Lin
34 shared
Muhao Chen
University of California, Davis
20 shared

Education

Ph.D., Spatial Sciences
University of Southern California

Awards & honors

$3.2M in Funding to Leverage AI in Predicting Mineral Deposi…
NSF Convergence Accelerator Phase 1 Project

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Yao-Yi Chiang

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you