
Yao-Yi Chiang
· Associate Professor, Director of Graduate Studies for Data ScienceVerifiedUniversity of Minnesota · Computer Science and Engineering
Active 2004–2026
About
I work on Spatial AI topics. I am interested in developing data-driven methods that can take advantage of domain knowledge to solve complex problems. For example, we built machine learning algorithms that incorporate spatial science techniques for air quality prediction and imagery recognition. I also enjoy building working systems with my students and doing consulting work related to my research.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Meteorology
- Risk analysis (engineering)
- Mathematical optimization
- Data science
- Environmental science
- Engineering
- Geography
- Algorithm
- Mathematics
- Management science
Selected publications
NARA: Anchor-Conditioned Relation-Aware Contextualization of Heterogeneous Geoentities
arXiv (Cornell University) · 2026-05-12
preprintOpen accessSenior authorGeospatial foundation models have primarily focused on raster data such as satellite imagery, where self-supervised learning has been widely studied. Vector geospatial data instead represent the world as discrete geoentities with explicit geometry, semantics, and structured spatial relations, including metric proximity and topological relationships. These relations jointly determine how entities interact within space, yet existing representation learning methods remain fragmented, often restricted to specific geometry types or partial spatial relations, limiting their ability to capture unified spatial context across heterogeneous geoentities. We propose NARA (Neural Anchor-conditioned Relation-Aware representation learning), a self-supervised framework for vector geoentities. NARA learns context-dependent representations by jointly modeling semantics, geometry, and spatial relations within a unified framework and captures relational spatial structure beyond proximity alone, enabling rich contextualized representations across heterogeneous geoentities of points, polylines, and polygons. Evaluation on building function classification, traffic speed prediction, and next point-of-interest recommendation shows consistent improvements over prior methods, highlighting the benefit of unified relational modeling for vector geospatial data.
OMNI-Dent: Towards an Accessible and Explainable AI Framework for Automated Dental Diagnosis
Open MIND · 2026-02-03
preprintAccurate dental diagnosis is essential for oral healthcare, yet many individuals lack access to timely professional evaluation. Existing AI-based methods primarily treat diagnosis as a visual pattern recognition task and do not reflect the structured clinical reasoning used by dental professionals. These approaches also require large amounts of expert-annotated data and often struggle to generalize across diverse real-world imaging conditions. To address these limitations, we present OMNI-Dent, a data-efficient and explainable diagnostic framework that incorporates clinical reasoning principles into a Vision-Language Model (VLM)-based pipeline. The framework operates on multi-view smartphone photographs,embeds diagnostic heuristics from dental experts, and guides a general-purpose VLM to perform tooth-level evaluation without dental-specific fine-tuning of the VLM. By utilizing the VLM's existing visual-linguistic capabilities, OMNI-Dent aims to support diagnostic assessment in settings where curated clinical imaging is unavailable. Designed as an early-stage assistive tool, OMNI-Dent helps users identify potential abnormalities and determine when professional evaluation may be needed, offering a practical option for individuals with limited access to in-person care.
TiCLS : Tightly Coupled Language Text Spotter
Open MIND · 2026-02-03
preprintScene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TiCLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TiCLS introduces a linguistic decoder that fuses visual and linguistic features, yet can be initialized by a pretrained language model, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015 and Total-Text demonstrate that TiCLS achieves state-of-the-art performance, validating the effectiveness of PLM-guided linguistic integration for scene text spotting.
NARA: Anchor-Conditioned Relation-Aware Contextualization of Heterogeneous Geoentities
ArXiv.org · 2026-05-12
articleOpen accessSenior authorGeospatial foundation models have primarily focused on raster data such as satellite imagery, where self-supervised learning has been widely studied. Vector geospatial data instead represent the world as discrete geoentities with explicit geometry, semantics, and structured spatial relations, including metric proximity and topological relationships. These relations jointly determine how entities interact within space, yet existing representation learning methods remain fragmented, often restricted to specific geometry types or partial spatial relations, limiting their ability to capture unified spatial context across heterogeneous geoentities. We propose NARA (Neural Anchor-conditioned Relation-Aware representation learning), a self-supervised framework for vector geoentities. NARA learns context-dependent representations by jointly modeling semantics, geometry, and spatial relations within a unified framework and captures relational spatial structure beyond proximity alone, enabling rich contextualized representations across heterogeneous geoentities of points, polylines, and polygons. Evaluation on building function classification, traffic speed prediction, and next point-of-interest recommendation shows consistent improvements over prior methods, highlighting the benefit of unified relational modeling for vector geospatial data.
TiCLS : Tightly Coupled Language Text Spotter
ArXiv.org · 2026-02-03
articleOpen accessScene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TiCLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TiCLS introduces a linguistic decoder that fuses visual and linguistic features, yet can be initialized by a pretrained language model, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015 and Total-Text demonstrate that TiCLS achieves state-of-the-art performance, validating the effectiveness of PLM-guided linguistic integration for scene text spotting.
OMNI-Dent: Towards an Accessible and Explainable AI Framework for Automated Dental Diagnosis
ArXiv.org · 2026-02-03
articleOpen accessAccurate dental diagnosis is essential for oral healthcare, yet many individuals lack access to timely professional evaluation. Existing AI-based methods primarily treat diagnosis as a visual pattern recognition task and do not reflect the structured clinical reasoning used by dental professionals. These approaches also require large amounts of expert-annotated data and often struggle to generalize across diverse real-world imaging conditions. To address these limitations, we present OMNI-Dent, a data-efficient and explainable diagnostic framework that incorporates clinical reasoning principles into a Vision-Language Model (VLM)-based pipeline. The framework operates on multi-view smartphone photographs,embeds diagnostic heuristics from dental experts, and guides a general-purpose VLM to perform tooth-level evaluation without dental-specific fine-tuning of the VLM. By utilizing the VLM's existing visual-linguistic capabilities, OMNI-Dent aims to support diagnostic assessment in settings where curated clinical imaging is unavailable. Designed as an early-stage assistive tool, OMNI-Dent helps users identify potential abnormalities and determine when professional evaluation may be needed, offering a practical option for individuals with limited access to in-person care.
IEEE Transactions on Wireless Communications · 2026-01-01
articleAccurate modeling of line-of-sight (LOS) probability is crucial for wireless channel description and coverage planning. The presence of a LOS impacts other channel characteristics such as pathloss, fading depth, delay- and angular spread, etc. Existing models, although useful, are based on very limited datasets. In this paper, we establish a framework to produce high accuracy LOS models from geospatial data in different environments, and apply it to create a LOS model for macrocells, using datasets of the United States (US) on a national scale, using more than <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$13,000$ </tex-math></inline-formula> locations of real-world macrocells. Based on this we create a new, fully parameterized model that better describes macrocell deployments in the US than the 3GPP model. We furthermore demonstrate that for improved accuracy the LOS probability should be modeled on a per cell basis, and the model parameters treated as random variables; we provide a full description and parameterization of this novel approach and by simulations show that it better predicts the inter-cell interference at the cell-edge than an average-based model.
TICLS: Tightly Coupled Language Text Spotter
2026-03-06
articleScene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TICLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TICLS contains a pretrained linguistic decoder that fuses visual and linguistic features, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015, Total-Text, and CTW1500 demonstrate that TICLS achieves state-of-the-art performance, validating the effectiveness of PLM-guided linguistic integration for scene text spotting. The code is available at https://github.com/knowledge-computing/TiCLS.
Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning
ArXiv.org · 2025-10-09
preprintOpen accessHistorical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and link legend items and their descriptions via bounding box predictions. Our experiments show that GPT-4 with structured JSON prompts outperforms the baseline, achieving 88% F-1 and 85% IoU, and reveal how prompt design, example counts, and layout alignment affect performance. This approach supports scalable, layout-aware legend parsing and improves the indexing and searchability of historical maps across various visual styles.
Alzheimer s & Dementia · 2025-11-01 · 1 citations
articleOpen accessThere is growing recognition regarding the importance of the exposome, or the totality of exposures one experiences across the life course, in research on Alzheimer's disease and related dementias. However, the measurement of numerous exposures at once and over time, as well as modeling their effects on dementia risk, presents significant methodological challenges. Through community engagement and consensus-building processes integrating input from multidisciplinary panels of experts, we identified critical priority topics for methods used in studying links between the exposome and dementia risk, along with advances needed to address those priorities. We identified nine priority topics: high-dimensional and multimodal data, measurement error, harmonization across studies, mixtures of exposures, effect heterogeneity, exposure timing, cumulative exposures, reverse causation, and sample composition. This paper describes these priority topics and highlights areas where future research or the dissemination of existing methods could advance the state of existing science. HIGHLIGHTS: Inherent complexities central to the measurement and modeling of the exposome and its relationship to dementia pose methodological challenges. We identified nine priority topics, such as measurement error, mixtures of exposures, and cumulative exposures. Modeling approaches should consider complexity but provide useful simplifications when possible. Investments in the development and dissemination of innovative approaches and methodological guidance are needed.
Frequent coauthors
- 88 shared
Craig A. Knoblock
- 39 shared
Johannes Uhl
- 38 shared
Stefan Leyk
University of Colorado Boulder
- 37 shared
Weiwei Duan
- 35 shared
Cyrus Shahabi
- 34 shared
Zekun Li
- 34 shared
Yijun Lin
- 20 shared
Muhao Chen
University of California, Davis
Education
Ph.D., Spatial Sciences
University of Southern California
Awards & honors
- $3.2M in Funding to Leverage AI in Predicting Mineral Deposi…
- NSF Convergence Accelerator Phase 1 Project
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yao-Yi Chiang
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup