Pretrained change detection models are useful as a baseline. They are not guaranteed to perform well on every dataset, especially when the data distribution, spatial resolution, and operational definition of “change” differ from the assumptions built into the pretrained model.
This case study shows how Deep Block was used to fine-tune a Change Detection AI model and improve inference performance on a specific orthophoto dataset from Yeongdeok-gun, South Korea. The objective was not to claim perfect accuracy from a small amount of labeling. The objective was narrower and more practical: to verify whether a GUI-based fine-tuning workflow inside Deep Block can materially improve performance on a target dataset that differs from the conditions the pretrained model was originally optimized for.
In practice, that is the real operational question. Many teams can run inference with a pretrained model. Fewer teams can adapt that model efficiently when the initial results are not good enough for their own data.
Uploading a pair of temporal images into a Deep Block Change Detection project before running baseline inference.
Deep Block includes a pretrained AI model, so basic land change detection can be performed immediately after data upload. In the video workflow, two images from different points in time are loaded into the project, and the user can simply click the PREDICT button to generate an inference result.
That baseline workflow is useful, but it also exposes the limit of a generic pretrained model.
The pretrained model used here was mainly optimized to detect changes related to building appearance and disappearance. Under those assumptions, it can produce acceptable results for certain categories of structural change. However, in this dataset, the target was broader. The task was not limited to building change. It also included farmland land-use change, forest appearance and loss, road construction, and other non-building changes that matter in rural land monitoring.
As a result, the pretrained model did not detect many of those changes well. Areas involving farmland transformation or forest-related change were not reliably identified, even when some of those changes were visually obvious to a human observer.
This is not unusual. It is also not unique to one model family. The same limitation often appears in recently popular foundation models. A model can be broad and still underperform when the target domain, label definition, and image conditions differ from the assumptions baked into pretraining.
Baseline inference with the pretrained model: some building-related changes are detected, but non-building land-use changes are weak or missing.
This case is more difficult than a simple before-and-after building comparison.
The project used orthophotos from Yeongdeok-gun, South Korea. Earlier imagery in the project included pre-2020 orthophotos at approximately 51 cm GSD, while later imagery included 2025 orthophotos at approximately 25 cm GSD. In other words, the input pair did not only differ by time. It also differed by spatial resolution.
That matters. A change detection model is asked to decide whether a region has semantically changed, but the raw visual input has also changed because one image may be sharper, more detailed, and structurally different at the pixel level. A model that is not adapted to that difference can confuse resolution shift with real land change, or fail to generalize across the mismatch.
On top of that, this was rural data. Rural orthophoto change detection is often harder than urban building change detection for several reasons:
First, the boundary of “change” is less obvious. A newly built structure is usually easier to define than a gradual land-use transition or altered vegetation pattern.
Second, cultivated land changes appearance by season. Color, texture, and surface pattern can shift substantially even when the actual land-use category has not meaningfully changed.
Third, forest-related changes are often spatially irregular. Their edges are not always clean, and the visual signal can overlap with seasonal or lighting variation.
Fourth, the operational definition of change is domain-dependent. If a customer wants to monitor illegal land conversion, road construction, solar panel installation, forestry damage, or agricultural land-use change, the label policy must reflect that goal. A generic pretrained model does not know that policy unless it is fine-tuned for it.
This is the central reason fine-tuning still matters. The problem is not only model capacity. The problem is domain alignment.
The dataset includes temporal orthophotos with different spatial resolutions, which increases domain shift and makes change detection more difficult.
A useful system for real deployment needs to do more than run pretrained inference. It needs to let users adapt the model to their own data without turning the entire process into a code-heavy ML engineering project.
This is where Deep Block is structurally different from a pure inference tool. The platform provides a GUI-based training interface so users can prepare additional labeled data, define their own change criteria, train on that data, and run prediction again inside the same workflow.
In this case, the next step was to open the Train tab in the Change Detection project, upload additional training data, and start labeling the changed areas.
The key point is that the model is not treated as fixed. If the pretrained result is insufficient, the user can adapt it to the target domain rather than accepting baseline performance as a permanent limit.
Deep Block provides a GUI training interface so domain-specific fine-tuning can be performed without leaving the project workflow.
At this point, the process becomes less about software operation and more about label policy.
For this experiment, the change definition was intentionally expanded beyond building creation and removal. The labeled “changed area” included categories such as:
That decision matters because the model will learn whatever definition is expressed in the labels. If the label policy is vague, the model will inherit that vagueness.
This is especially important in agricultural areas. Cultivated land can look very different depending on season, crop stage, or harvesting condition. Some regions may appear changed at the visual level even when they should not be treated as semantic change for training purposes. If those areas are labeled incorrectly, the model will be pushed toward false positives.
So the operational instruction is simple: do not label every visual difference as change. Define the change standard first. Then label according to that standard consistently.
In the Deep Block workflow, changed areas can be drawn directly using draw mode. Once a polygon is created, the corresponding label mask is displayed in pink. This gives the user immediate feedback and makes the training dataset visually auditable inside the interface.
Changed areas are labeled directly in the GUI; the generated label mask is shown in pink for immediate visual verification.
Not every visual difference should be treated as a change. Seasonal and management-driven variation in cultivated fields or vegetation can easily trigger false positives unless the labeling criteria are clearly and explicitly defined.
This was not a large-scale retraining project. It was a practical fine-tuning test designed to answer a limited question: if we prepare a modest amount of additional domain-specific training data, will inference improve on the target dataset?
The answer depends on setup quality, not only on model architecture.
In this case, only around 200 changed-area labels were prepared. That is a small dataset by full supervised learning standards, so the expected outcome should be framed correctly. With that volume of data, one should expect partial domain adaptation, not full coverage of every edge case.
The image resolution used in the workflow was 11712 × 9648 pixels.
The Division factor was set to 24 × 20, which means each patch was roughly 500 × 500 pixels. This matters because smaller patch sizes make it easier for the model to ingest localized changes as training examples. If patches are too large, small but important change regions can become diluted within broader context. If patches are too small, context may be lost. For this experiment, the chosen setting was intended to preserve small, local changes while keeping the patch size operationally manageable.
After configuration, the user simply clicked the Train button to begin training.
The training process took about 30 minutes.
Again, this is an important practical point. The system is not only trainable in theory. It is trainable within a time window that is operationally useful for iterative testing.
For this run, the image was divided into roughly 500 × 500 patches using a 24 × 20 division factor.
The fine-tuning run completed in about 30 minutes, making the workflow practical for iterative domain adaptation.
After training completed, the same image pair was used again for prediction. The same division factor was applied, and inference was run once more through the PREDICT button.
This is where the workflow becomes measurable. The question is no longer theoretical. It becomes a direct before-and-after comparison under the same project structure.
The post-training inference result was better.
One visible improvement was in areas where farmland had been converted into solar panel infrastructure. Compared to the baseline pretrained result, the fine-tuned model detected this category of land-use change much more effectively.
Importantly, the pretrained model’s building-change capability did not collapse in the process. In other words, the fine-tuning improved sensitivity to some target-domain changes without severely degrading the model’s pre-existing strength on building-related change.
That is the expected kind of gain from successful domain adaptation on a small but relevant custom dataset. It is not magic. It is not universal. But it is useful.
After fine-tuning, the model detects certain land-use changes, such as farmland conversion to solar panel infrastructure, more reliably than the pretrained baseline.
Side-by-side comparison of baseline and fine-tuned inference on the same region.
The left image shows the inference result from the pretrained model before fine-tuning, and the right image shows the result after fine-tuning with Deep Block.
The correct interpretation is not that the model became perfect after a short fine-tuning run with roughly 200 labels. That would be an overclaim.
The correct interpretation is narrower.
The experiment shows that even a relatively small amount of domain-specific additional data can improve the model’s alignment to the target task. In this case, the model became better at detecting categories of change that were underrepresented or poorly represented in the pretrained baseline, especially non-building land-use changes in rural orthophoto imagery.
At the same time, the result remained incomplete.
There were still false positives in some regions. There were also still false negatives, meaning some real changes were missed. Some areas still did not infer properly.
This is normal given the setup. The label set was limited. The target concept of change was broader than the pretrained assumption. The data involved GSD mismatch. The domain included seasonal agricultural appearance variation. Under those conditions, full convergence should not be expected from a small fine-tuning pass.
That does not reduce the value of the experiment. It defines its value more precisely.
This was a successful confirmation that fine-tuning inside Deep Block can move the model in the right direction on a real user dataset. It was not a claim that a small label set is enough to solve all downstream change detection problems.
In many operational environments, the bottleneck is not whether a pretrained model exists. The bottleneck is whether the organization can adapt that model to its own data, label policy, and security constraints.
That is especially relevant for aerial imagery, drone orthophotos, and satellite imagery. These domains are heterogeneous. Resolution varies. geography varies. change definitions vary. customer requirements vary. The same pretrained model will not perform equally across all of them.
A system that only supports prediction leaves the user exposed to that mismatch.
A system that supports both prediction and domain-specific fine-tuning inside a single interface is more useful in production, because it allows iterative improvement rather than forcing the user to accept baseline failure.
This is the actual product implication of this case study.
Deep Block supports aerial photos, drone orthophotos, and satellite imagery. When the pretrained model is sufficient, users can start with prediction immediately. When it is not sufficient, users can prepare additional labels and fine-tune for their own domain through the GUI workflow.
That is a materially different operating model from a static, black-box inference tool.
For some organizations, model performance is only one side of the decision. Data handling is the other.
If the workflow requires training data preparation, labeling, model training, and inference to happen without data export, then deployment architecture matters. This is common in public-sector, security-sensitive, and closed-network environments.
For those cases, the Deep Trek on-premise hardware system can be used to run the same general workflow without sending data outside the local environment. That means the full sequence of data preparation, training, and inference can be performed on-premise.
This matters because the practical value of fine-tuning is much lower if the organization cannot legally or operationally move the data required to perform it.
This case study shows a simple point with practical significance.
A pretrained change detection model can provide a useful starting point, but it will not necessarily align with the actual semantic changes a user wants to detect. That gap becomes larger when the dataset involves rural land-use change, seasonal agricultural variation, forest-related change, or mismatched GSD across temporal imagery.
In this Yeongdeok-gun orthophoto case, the baseline pretrained model was not sufficient for the broader change categories the project required. By preparing additional changed-area labels inside Deep Block and running a relatively small fine-tuning job, the model’s inference quality improved on the target dataset. In particular, certain non-building changes that were weak in the baseline result became more detectable after fine-tuning.
The result was not perfect. False positives and false negatives remained. Some regions were still not handled well. That is consistent with the scale of the fine-tuning dataset and the difficulty of the task.
So the conclusion is not that a small label set solves everything. The conclusion is that Deep Block gives users a direct path from baseline inference failure to measurable domain adaptation inside a GUI workflow.
That is the operational value.
If a user needs to detect more than generic building change, or needs to adapt a model to orthophotos, drone imagery, or satellite imagery under project-specific criteria, fine-tuning is often necessary. Deep Block is designed to make that process available inside the product rather than outside it.
For organizations that need the same workflow under strict data control, the Deep Trek on-premise hardware system can support training data preparation, model training, and inference without data export.
More details are available in the video demonstration and on DEEPBLOCK.net.