mungall.dev

Knowledge-Based AI + Biosystems

Chris Mungall

Senior computational scientist at Berkeley Lab building knowledge-based AI for complex biological systems.

Focus Areas

Research

Primary research themes

See representative outputs

These areas are tightly coupled in practice: standards and ontologies enable robust resources, which in turn support mechanism-aware and phenotype-driven AI workflows.

Research

Papers

Selected recent publications and pre-prints.

View all publications
Chemical classification paper figure 4

Mungall CJ, Malik A, Korn DR, ... Hastings J. J Cheminform (2025).

Chemical classification program synthesis using generative artificial intelligence.

This paper introduces a program-synthesis approach for chemical classification, where LLMs iteratively generate and refine executable classifiers against ontology-grounded examples. It emphasizes transparency and error analysis by exposing classifier logic and failure modes directly in code.

It benchmarks C3PO against SMARTS and deep-learning baselines, showing the tradeoff between maximal predictive performance and interpretable, curator-friendly program outputs that can be audited and improved.

Highlighted: chemical classification and explainable classifier programs

DRAGON-AI paper figure

Toro S, Anagnostopoulos AV, Bello SM, ... Mungall CJ. J Biomed Semantics (2024).

Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI).

DRAGON-AI applies retrieval-augmented generation to ontology authoring workflows, combining ontology context and issue-tracker signals to propose definitions and logical axioms. The study evaluates automated term completion across multiple OBO ontologies with expert scoring and reproducible analysis artifacts.

The focus is practical curation acceleration: ontology editors can use generated candidates as draft artifacts, then accept, revise, or reject with provenance-aware review.

Highlighted: ontology completion with retrieval-augmented AI

SPIRES paper figure 3

Caufield JH, Hegde H, Emonet V, ... Mungall CJ. Bioinformatics (2024).

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

SPIRES presents a schema-first extraction pipeline that uses recursive prompting plus ontology grounding to populate structured knowledge bases from unstructured text. It is implemented in OntoGPT and evaluated on nested schema extraction and biomedical relation tasks.

By enforcing schema constraints and grounding to external resources, the method targets higher precision and reproducibility than free-form extraction workflows.

Highlighted: schema-constrained semantic extraction for KB population

LinkML framework components figure

Moxon SAT, Solbrig H, Harris NL, ... Mungall CJ. Gigascience (2025).

LinkML: An Open Data Modeling Framework.

This paper describes LinkML as a schema language and tooling framework for creating machine-readable, semantically aligned data models that can generate downstream artifacts such as JSON Schema, OWL, SHACL, and Python classes.

It emphasizes reusable model design and FAIR-aligned interoperability across heterogeneous scientific domains, including biomedicine and environmental data integration.

Highlighted: reusable semantic data modeling infrastructure

Gene set summarization paper figure 1

Joachimiak MP, Caufield JH, Harris NL, ... Mungall CJ. ArXiv [preprint] (2024).

Gene Set Summarization Using Large Language Models.

This preprint presents TALISMAN, an LLM-based approach for summarizing gene sets by generating both narrative interpretations and ontology-grounded term lists.

The work evaluates model and prompt variants against standard enrichment baselines, showing where LLM summarization is informative and where precision-recall tradeoffs require careful review.

Highlighted: TALISMAN gene-set interpretation with LLMs

  • Biomedical ontologies
  • Phenotype informatics
  • Knowledge graphs
  • AI for translational biology

Beyond these highlights, recurring themes include GO and Monarch resource updates, ontology quality control, and applied agentic AI for biocuration and knowledge extraction.

Open Source + Data

Works

Selected research products and tools.

All repositories
OntoGPT repository preview

Knowledge Extraction

OntoGPT

Schema-constrained extraction tooling for turning biomedical literature into structured knowledge, including SPIRES-style recursive extraction pipelines.

Role: Maintainer and contributor

Monarch Initiative - ontology-grounded LLM extraction

CurateGPT repository preview

AI-Assisted Curation

CurateGPT

LLM-driven curation assistant workflows for ontology editors and knowledge engineers, focused on draft generation plus human review loops.

Role: Maintainer and contributor

Monarch Initiative - curation and review acceleration

Ontology Access Kit logo

Ontology Toolkit

OAK (Ontology Access Kit)

Unified Python and CLI toolkit for ontology search, graph operations, mapping generation, quality control, and adapter-based access across ontology backends.

Role: Core contributor

INCATools - ontology operations abstraction layer

BOOMER-py logo

Probabilistic Reasoning

BOOMER-py

Bayesian ontology merging toolkit for reasoning over uncertain mappings and taxonomic assertions, with consistency-aware search over candidate alignments.

Role: Contributor

Monarch Initiative - probabilistic ontology integration

Whelk project image

Reasoning Engine

Whelk

OWL EL reasoner designed for incremental and concurrent reasoning, with support for OWL RL and selected SWRL features in ontology-intensive applications.

Role: Contributor

INCATools - fast reusable reasoning states for ontology workflows

Additional works include ontology QC pipelines, data harmonization tools, and reusable datasets that support curation and analysis across OBO and Translator-adjacent ecosystems.

Semantic Resources

Ontologies

Selected ontologies I lead or contribute to.

Browse ontology ecosystem
Uberon logo

uberon

Uberon

Multi-species anatomy ontology integrating anatomical concepts across metazoa, supporting spatial omics tissue harmonization and comparative phenotype analysis.

obophenotype/uberon

LinkML ValueSets repository preview

valuesets

LinkML Common ValueSets

Standardized enumerations and value sets for science and biomedicine, with ontology-linked semantics and LinkML-native generation pathways.

linkml/valuesets

My ontology work emphasizes pattern-driven modeling, review automation, and interoperability governance, including term request triage and coordinated releases across collaborating ontology teams.

Knowledge Infrastructure

KBs

Highlighted knowledge bases and data resources.

Explore Monarch
Monarch Initiative logo

Cross-Species KG

Monarch Initiative

The Monarch platform integrates phenotype, disease, and genotype knowledge into queryable graph infrastructure for diagnosis support and discovery.

Monarch Initiative - monarch-app repo

DisMech repository preview

Disease Mechanisms KB

DisMech

Curated knowledge base of disease pathophysiology with structured literature-backed claims, phenotype links, and mechanism-centric evidence views.

monarch-initiative/dismech

Semantic SQL repository preview

Queryable Ontology Layer

Semantic SQL

Standard SQL and SQLite representations of OWL/RDF ontologies that make large ontology knowledge resources directly queryable in relational workflows.

INCATools/semantic-sql - SQL views and downloadable OBO SQLite builds

These knowledge resources are designed as interoperable infrastructure, linked by common schemas and standards so data can move across portals, APIs, and computational workflows.

Interoperability

Standards

Core standards for representing change, schemas, and mappings across ontology and knowledge graph workflows.

Standards context
KGCL standard preview

Change Representation

KGCL

Knowledge Graph Change Language (KGCL) is a standard data model and controlled natural language for representing ontology and graph edits as structured change objects.

INCATools/kgcl

LinkML standard preview

Schema Standard

LinkML

LinkML provides a schema language and toolchain for building machine-readable, semantically grounded data models that compile to multiple downstream artifacts.

linkml/linkml

CHEMROF standard preview

Chemical Data Standard

CHEMROF

Chemical Entity Materials and Reactions Ontological Framework (CHEMROF) defines a LinkML-first schema for chemistry entities, mixtures, and reactions aligned to ontology-driven workflows.

chemkg/chemrof

MIxS standards preview

Standards for Genomics and Metagenomes

MIxS

MIxS (Minimum Information about any (x) Sequence) provides interoperable metadata checklists for genome and metagenome data, improving reusability and cross-study comparison.

Genomic Standards Consortium MIxS specification

SSSOM standard preview

Mapping Standard

SSSOM

Simple Standard for Sharing Ontology Mappings defines a common tabular and semantic model for exchanging and auditing ontology mappings.

mapping-commons/sssom

KGX standard preview

Exchange Standard

KGX

Knowledge Graph Exchange (KGX) provides a canonical model and toolkit for converting and validating biological knowledge graphs across common graph formats.

biolink/kgx

KG Registry standard preview

Registry Standard

kg-registry

A structured registry for knowledge graphs and related products, with standardized metadata for discovery, interoperability, and lifecycle tracking across resources.

Knowledge-Graph-Hub/kg-registry

NCATS Translator graphic

Knowledge Graph Standard

Biolink Model

Biolink Model defines shared biomedical classes, predicates, and association patterns to support interoperable knowledge graphs, including NCATS Translator components.

biolink/biolink-model

I also contribute to Phenopackets, including work on the schema repository, the core paper, and a LinkML schema view.

Other standards work includes GFF3, AgBioData recommendations, MIRACL (paper), and Mapping Commons. Honorable mention: Bioregistry (paper).

Funding

Funded Projects

Selected grants from ORCID and agency records, centered on ontology infrastructure, phenotype interpretation, and AI-ready data ecosystems.

View ORCID record
Exomiser repository preview

Rare Disease Resource

Exomiser

Phenotype-driven rare disease variant prioritization and case reinterpretation platform integrating HPO-based profiles with variant evidence and cross-species knowledge.

Role: Collaborator and ontology integration contributor

Open rare disease interpretation infrastructure in Monarch-aligned workflows

Many of these awards support long-lived community resources and shared infrastructure, with deliverables spanning ontologies, knowledge graphs, standards, and AI-ready data services.

Speaking

Talks

Selected recent talks and keynotes from Zenodo covering agentic AI, ontology workflows, and interoperable bioscience knowledge systems.

Browse talks
Agentic AI and GO part 1 slide preview

Workshop Talk - 2025

Agentic AI and GO, Oct 2025, Part 1

Foundational session on using coding agents in ontology workflows, with hands-on patterns for high-quality curation and schema-aware editing tasks.

Zenodo record - published October 14, 2025

The talk archive spans keynotes, workshops, and technical deep-dives on ontology engineering, interoperable knowledge resources, and reliable AI-assisted curation workflows.

Learning

Tutorials and Training

Hands-on tutorials and workshop materials, including the ICBO 2025 agentic AI tutorial and GO AI workshop series.

Zenodo presentations
Agentic AI and GO tutorial slide preview

GO AI Workshop

Agentic AI and GO, Oct 2025, Part 1

Foundational workshop on integrating coding agents into ontology development and GitHub-centric review workflows for large semantic projects.

Zenodo workshop deck - published October 14, 2025

AI4Curation docs preview

Companion Resources

AI4Curation Documentation

Operational setup guides and practical checklists that complement the tutorial decks with reproducible workflows and tooling instructions.

Training docs for day-to-day agentic curation practice

Training materials continue to expand across conference tutorials, focused GO sessions, and practical docs designed for day-to-day ontology and curation workflows.

Automation

Agentic

Agent skills and tooling for reliable AI-assisted ontology, schema, and knowledge-base curation workflows.

Browse agentic tooling
linkml-reference-validator repository preview

Reference Validation

linkml-reference-validator

Validates whether supporting text in structured records is actually present in cited references, helping enforce evidence-backed curation.

linkml/linkml-reference-validator

linkml-term-validator repository preview

Term Validation

linkml-term-validator

Checks LinkML schemas and datasets that depend on external ontologies and controlled terms, improving consistency for agent-generated outputs.

linkml/linkml-term-validator

ai-blame repository preview

Agent Provenance

ai-blame

Extracts provenance and audit trails from agent execution traces, enabling line-level attribution and post-hoc review for AI-assisted edits.

ai4curation/ai-blame

curation-skills repository preview

Agent Skills

curation-skills

Reusable skill packs for ontology and biocuration tasks, designed to make agent behavior more consistent, transparent, and domain-aware.

ai4curation/curation-skills

noctua-mcp repository preview

Agent API Bridge

noctua-mcp

MCP server wrapping GO-CAM editing capabilities, enabling agentic interaction with Noctua/Barista workflows through a standardized interface.

geneontology/noctua-mcp (active MCP integration)

This section tracks practical agentic infrastructure: validators, skills, provenance tools, and MCP wrappers that make AI-assisted curation reproducible and reviewable.

Commentary

Perspectives

Short perspectives on emerging AI for biology, ontology practice, and research infrastructure across social and long-form channels.

View Bluesky perspectives
AlphaGenome perspective post

Bluesky Perspective

AlphaGenome perspective

Commentary on AlphaGenome and what large foundation models imply for practical biological knowledge workflows, curation, and downstream interpretability.

Bluesky thread and commentary

rBio perspective post

Bluesky Perspective

rBio perspective

Notes on rBio and reasoning-oriented biological AI, emphasizing implications for ontology-aware reinforcement and transparent scientific interpretation.

Bluesky commentary referencing CZI rBio post

Knowledge Graph Insights interview with Chris Mungall

Interview

Knowledge Graph Insights: Chris Mungall

Interview and profile discussion covering ontology engineering, biological knowledge graphs, and how AI methods can be grounded in structured semantics.

Knowledge Graph Insights profile and Q&A

AI training perspective post

Bluesky Perspective

AI training perspective

Perspective thread on practical training pathways for biocurators, ontology developers, and resource leads adopting agentic AI tools.

Bluesky training thread with practical resources

Monkeying around with OWL blog

Blog

Monkeying around with OWL

Long-form posts on ontology engineering, curation practice, standards decisions, and the practical realities of semantic infrastructure work.

WordPress blog by Chris Mungall

These perspectives connect day-to-day practice with broader trends in AI, ontology engineering, and scientific infrastructure, from short threads to paper-length commentary.