Biomedical Data Management Systems / Workshop at VLDB 2026

Community of biomedical informatics and data management researchers and practitioners who engage in collaborative efforts to identify emerging problem areas, develop novel solutions and help accelerate the pace of innovation in healthcare.


Program

Where: VLDB 2026 in Boston, USA.
When: September 4th (afternoon).

The workshop program will feature a selection of invited and proposed talks, and networking opportunities. Details TBD.

Contributing

As an inaugural workshop, our goal is to hear as many voices as possible, providing the attendees with a thorough overview of the current state of the field. To complement our selection of esteemed speakers, we encourage the community to submit proposals for talks that will (1) highlight key problems and/or potential solutions, and, optionally, (2) outline visions of collaborative projects. Hence, we will accept two types of submissions:

Important dates

Review process

All submissions will receive single-blind peer reviews from a program committee consisting of both biomedical researchers and data management researchers. Each submission will be assigned to at least two reviewers with relevant expertise. The review criteria will focus on the clarity of the problem statement, the relevance to biomedical data management, the potential for cross-disciplinary collaboration, and the feasibility of the proposed project (for Project Talk Proposals). We will also consider the novelty and potential impact of the proposed ideas. The final meta-review and acceptance decision will be made by the workshop chairs.

Accepted submissions will be invited to submit a camera-ready version as well as an associated blog. The camera-ready version will be included in the workshop proceedings and hosted on OpenReview. The blog post will be hosted on this community website and will serve as an easy-to-digest artifact.

Submission guidelines

Talk proposals should be submitted to OpenReview as a PDF formatted according to the official PVLDB volume 19 formatting guidelines. Submissions that breach the formatting guidelines will not be considered. Below, we provide a few general guidelines, followed by some guidelines specific to different submission types:

Insight Talk Proposal

Project Talk Proposal

Camera Ready

After a submission is accepted, the authors will be given a chance to submit a camera-ready version that integrates reviewer feedback, includes any final touches, and any supplementary details that the authors want to include (up to 2 additional pages).

Blog Post

Invited speakers and authors of accepted submissions are invited to contribute a blog post to the workshop website highlighting key points of their talk. This will allow workshop attendees and other interested parties to easily get acquainted with the author’s work, even if they miss their talk. Some specific guidelines:

Collaborating

One of our key goals is to organize a friendly scientific forum that encourages cross-disciplinary collaborations. Apart from the in-person interactions that will take place during the workshop day, we are planning several online initiatives aimed at sparking productive joint efforts.

Discord Server

We started a Discord server that will serve as the main forum for online interactions. We encourage interested participants to join the server and:

Talk Mergers

Community members who have had their insight talk proposals accepted will be given a chance to join forces with authors of other accepted insight talks and request a merger into a single project talk. After the camera-ready deadline, we will release all submitted proposals, associated blog posts, and author information on the workshop website. The authors will be able to review each other’s submissions and identify complementary groups with whom they can get in touch and come up with an idea for a joint project.

Submitting a merger request: The two author groups who have agreed to merge their talks should send an email to the workshop chairs with a formal request, a statement of motivation, a title and abstract of the new talk, and a brief outline of the structure of the new talk. The proposed structure can still be based on the individual insight talks, but should connect them into a cohesive synergistic project vision. The chairs will review the request and respond with their final decision that will be based on the main criteria for project talk proposals (i.e., relevance, collaborative potential, and feasibility).

Accepted merger requests: Authors of accepted merger requests will be given a regular project talk slot. They will also be invited to submit a blog post about their proposed project vision, which will be published on the workshop website.

Vision Paper

Assuming the workshop proves successful at achieving its main goals, the co-chairs will start a working group for producing a vision paper to be submitted to PVLDB that will outline the conclusions of the workshop and lay out a vision for a future comprehensive biomedical data management system. All speakers and authors of accepted submissions will be invited to join this working group and contribute to this paper as co-authors, leaving a lasting artifact that will hopefully inspire more future work in this space.

About

Vision

This interdisciplinary workshop will focus on data management techniques, tools, and systems with direct applications to the unique challenges in biomedical research and healthcare. Our goal is to build a lasting community centered around these topics and spark fruitful collaborations.

Biomedical research and healthcare are increasingly data-driven, yet practitioners regularly struggle with data management challenges.

We are witnessing a proliferation of data collection technologies (e.g., electronic health records, high-throughput sequencing, medical imaging), the growing adoption of computational methods for data analysis, and the recognition that data is crucial for unlocking new scientific insights and improving patient care. Furthermore, the scale of data that is being collected is growing exponentially, driven by the decreasing costs of data acquisition technologies and the increasing digitization of healthcare systems. However, the people who produce, analyze, and interpret biomedical data (e.g., clinicians, digital health specialists, bioinformaticians, etc.) often lack the expertise and tools to effectively manage and analyze these multi-modal datasets at scale. This means that scientific progress is often limited by overwhelming data management challenges.

Data management researchers are well-equipped to tackle many of these challenges, and are actively seeking new research directions.

Data management researchers have spent decades developing techniques, tools, and systems for managing large-scale data. They possess valuable expertise in areas such as data integration, data quality, data governance, and scalable analytics. However, as evidenced by some points raised at panels at SIGMOD and VLDB 2025, there is a growing push for data management researchers to actively pursue new, high-impact application domains whose requirements can inspire fundamentally new system designs, benchmarks, and end-to-end deployments. Biomedical data management is one such domain, presenting unique research opportunities that can directly impact and improve human lives.

Bringing these two communities together can unlock the potential for novel solutions and rapidly accelerate scientific progress.

However, this is by no means a trivial task. One noteworthy challenge is that data management researchers often lack exposure to real-world biomedical datasets (due to privacy restrictions), as well as the domain knowledge necessary to understand the specific challenges and requirements of biomedical data management. Conversely, biomedical researchers often lack awareness of the latest advances in data management research and how these techniques can be applied to their specific problems. As a result, there is a gap between the capabilities of existing data management systems and the needs of biomedical researchers and clinicians. Our goal is to bridge this gap by bringing together experts from both communities to identify pressing biomedical data management challenges and explore opportunities for collaboration that can lead to the development of novel data management solutions tailored to the biomedical domain.

Target audience

This workshop brings together two key groups, each one bringing a deep understanding of their own domain of expertise and an interest in collaborating with the other side on projects that have the potential to advance both fields:

Motivating scenario

To provide an illustrative example of the kinds of challenges that this workshop aims to address, consider the scenario of a molecular tumor board (MTB). It is a multidisciplinary meeting in which experts from multiple disciplines (e.g., oncologists, molecular biologists, pathologists, surgeons, genetic counselors) discuss a complex patient case and converge on a treatment plan. Decisions must be made quickly, transparently, and with a clear trail of supporting evidence. Such evidence is usually found by integrating highly multimodal evidence: clinical data (diagnoses, medications, adverse events, patient history, and family), high-throughput omics data (genome, transcriptome, proteome, genetic variations), histopathological lab results, and tissue imaging, all of which are often originating from different institutions.

Given the number of patients, their uniqueness, and the time constraints of all the practitioners involved, the board operates under tight time constraints and high stakes. Time spent discussing a typical patient’s case is measured in minutes, while the preparation phase can take several hours of manual data preparation and analysis.

In practice, tumor board preparation often turns into an ad-hoc integration exercise across siloed systems and heterogeneous formats. This setting exposes a set of recurring data management challenges: data assembly across scales, statistical data quality assessments, integration across multiple external data sources, intuitive data access, and scalability. A well-designed biomedical data management system could substantially reduce these frictions, enabling a secure, patient-centric, multimodal view that is assembled reliably and quickly, transforming tumor board preparation from a tedious, error-prone integration task into a reproducible workflow.

Topics of interest

This workshop will feature presentations and discussions related to the following topics:

Specific goals

Organizers

Bojan Karlaš
Postdoc / Harvard University (affiliated with HMS, MGB, DFCI, and the Broad)
Works on developing interpretable deep learning pipelines for extracting clinically meaningful insights from pathology images. He obtained his PhD at ETH Zurich, working on data management systems for ML with a particular focus on data debugging.
Gerardo Vitagliano
Postdoc / Data Systems Group / MIT CSAIL
Builds interactive and user-friendly data systems, allowing domain experts to analyze large-scale multimodal datasets. His research involves active collaborations with clinicians and biomedical researchers to impact real-world healthcare.
Benjamin M. Gyori
Associate professor / Northeastern University
Works on large-scale data integration and knowledge assembly in biomedicine. His research combines computational systems modeling, ML, NLP, and human–machine interaction to improve our understanding of complex human biology.
Ulf Leser
Full professor / Humboldt-Universität zu Berlin
Developed new tools for management, integration, and analysis of biomedical data. Interested in biomedical data management, text mining, infrastructures for large-scale scientific data analysis, and statistical bioinformatics, with a focus on cancer research.