Building an AI System for Rapid Pathogen Screening in Sputum Slides

Written by Gwihwan Moon | Feb 18, 2026 10:28:06 AM

Lessons from an Early Collaboration with Seoul National University Hospital and GC Biopharma

Executive Summary

At the beginning of this year, we initiated a joint research project with Seoul National University Hospital and GC Biopharma to explore whether artificial intelligence could assist in the rapid and scalable interpretation of sputum microscopy slides.

The goal was simple:
detect white blood cells (WBCs) and identify potential pathogenic organisms across an entire slide automatically.

However, once we began operating on real clinical material rather than curated academic datasets, the true difficulty became apparent.
Tiny bacteria, gigapixel images, and poor-quality annotations quickly turned this into a systems engineering challenge rather than a simple modeling task.

This post explains what we attempted, what failed, how we adapted, and why an end-to-end platform such as Deep Block became essential.

Why Sputum Microscopy Matters

In many pneumonia workflows, sputum examination remains a frontline diagnostic method.

Clinicians typically evaluate:

the presence and density of white blood cells
contamination indicators such as epithelial cells
bacterial morphology (cocci, rods, Gram reaction)

These observations help determine:

whether infection is likely
whether the sample quality is acceptable
what empirical antibiotics might be appropriate

The challenge is time and labor.
Manual microscopy requires trained personnel and becomes a bottleneck as case volume increases.

Automation is therefore highly attractive —but far more complex than it first appears.

Imaging Reality vs Research Datasets

For this project, we used the Roche Ventana slide scanner to digitize entire slides.
Each slide image file ranged from 9 GB to 34 GB.

Unlike single field-of-view microscope images commonly used in academic AI papers, whole slide imaging introduces several difficulties:

gigapixel scale
multi-gigabyte file sizes
very small targets occupying only a few pixels
debris and artifacts

In other words, classical object detection recipes do not directly transfer.

The problem becomes one of hardware orchestration, data annotation, and graphical interface design.

Dataset Construction

Because this was an early-stage feasibility effort, the dataset was necessarily limited.

We prepared:

4 slides
6 target classes
- yeast
- epithelial cells
- white blood cells
- Gram-positive cocci
- Gram-positive rods
- Gram-negative rods

For each class, approximately 400 instances were annotated.

The effective magnification corresponds to roughly ~800×, which is lower than what a dedicated 1000× digital microscope can provide.
As a result, some microorganisms were only weakly visible.

All annotation, training, and inference workflows were executed inside Deep Block.

Where Things Became Difficult

The most severe challenge appeared with Gram-positive rods.

They were:

extremely small
hard even for experts to delineate precisely

During labeling, boundaries were often drawn coarsely — sometimes closer to a loose region than a true biological contour.

The annotation boundary was not rendered correctly, as shown in the lower-left corner.

This detail has enormous consequences for learning.

Because modern segmentation networks optimize pixel-level agreement, systematic boundary inflation becomes a form of structured noise.

If the model predicts the true boundary, it can still be penalized.
If it predicts the inaccurate training boundary, it is rewarded.

Therefore, better models may actually learn worse shapes.

Why Straightforward Segmentation Was Not Reliable

In controlled datasets, segmentation supervision is powerful.

In our case:

masks were frequently oversized
bacteria were near the resolution limit
annotation quality was poor
hardware resources, especially storage, were limited

Under these conditions, pixel supervision degraded into uncertainty propagation.

We observed unstable convergence and inconsistent validation behavior, even when architectures were improved.

The bottleneck was data quality, not network capacity.

The Deep Block Workflow

To handle whole slide AI development, were lied on an integrated pipeline rather than isolated scripts.

1. Slide Ingestion

WSI files were registered and indexed inside the platform.

2. Intelligent Tiling

Gigapixel images were divided into training tiles while preserving coordinate consistency.

3. Annotation

Experts labeled directly on tiles with class management and dataset correction.

4. Training

Experiments were reproducible and traceable across configurations.

5. Validation & Visualization

Predictions could be reviewed instantly at both the data level (JSON files) and the whole-slide image.

6. Export

Results were structured into machine-readable formats for further clinical analysis.

Without this orchestration, iteration speed would have been drastically slower.

Strategic Pivot: From Segmentation to Detection

Given the annotation characteristics, we changed strategy.

Instead of forcing pixel accuracy, we:

converted coarse masks into bounding boxes
trained object detection networks
used predictions to guide re-labeling and refinement
reintroduced segmentation later where confidence improved

This approach compresses noisy supervision into a more tolerant representation.

Detection tolerates spatial looseness.
Segmentation does not.

Human-in-the-Loop Improvement

The detector helped surface:

missed objects
inconsistent regions
potential false positives

Experts could then correct them more efficiently than annotating from scratch.

Over time, dataset reliability increased.

Inference result of DEEP BLOCK after training

Throughput Considerations

Beyond accuracy, we also measured:

processing time per slide
number of candidate regions automatically generated
reduction in manual review effort

Even at this early stage, automation significantly reduced the search burden for specialists.

Key Lessons

1. Resolution limits dominate algorithm choice

When bacteria approach pixel scale, labeling error becomes unavoidable.

2. Data quality outweighs model complexity

More sophisticated architectures did not compensate for coarse annotation.

3. Platforms matter

Managing WSI, annotation, training, and review as a single system is critical.

4. Detection can be a bridge

Bounding boxes allow progress while improving the dataset.

What Comes Next

Future phases will include:

larger multi-center datasets
improved data annotation
structured quality control protocols
semi-automated label validation
potential regulatory alignment pathways

The present work establishes feasibility and reveals the roadmap.

Closing Thoughts

AI in clinical microscopy is not merely a modeling problem.
It is an integration problem across imaging, human labeling, data engineering,and deployment constraints.

In this study, we were able to significantly reduce the number of false positives.
Although some false negatives remained, when we performed high-speed inference across the entire slide, we could still reliably determine which pathogens were present, as the abundance of specific organisms was clearly evident.

This project reinforced why Deep Block was designed as a full-stack platform:
to allow teams to move forward even when data is imperfect.

We are continuing to collaborate with medical and industry partners to push this boundary further.

View full post