The development of automated detection technologies has opened new possibilities for analysing urban spaces with speed, accuracy and cost-effectiveness. Advanced technologies combined with transformer models such as Real-Time DEtection TRansformers are taking the field of Geo AI forward, enabling instant automated building recognition and accurate delineation in satellite or aerial imagery.

There is a growing trend in AI-driven geospatial analytics, where machine learning models optimise spatial workflows by automating data extraction and improving accuracy in high-stakes projects. As technology advances, recognition models promise to improve operational outcomes across various domains, demonstrating AI’s transformative potential in geospatial applications.

In this article, we introduce how transformers are revolutionising geospatial analysis with real-time detection, improved accuracy, and significant cost savings, as well as its applications in various industries and its important role in the automated building recognition project.

What are Transformers and how do they work?

Transformers are a neural network architecture designed to process sequences, such as text, by understanding the context and relationships within the data. They can handle long-range dependencies, making them highly effective for tasks such as natural language processing (NLP). Transformers work through two key innovations. The first is self-attention, a mechanism that evaluates how different parts of a sequence relate to each other, allowing the model to capture dependencies across the input. The second is parallel processing, which allows the analysis of entire sequences simultaneously rather than step-by-step.

By combining contextual understanding with scalability, transformers have become central to advances in modern AI, extending their impact beyond NLP to areas such as vision and geospatial analysis. For example, Vision Transformers (ViTs) leverage the transformer architecture to process image data by breaking it into patches, analysing relationships between patches, and excelling at tasks like image classification, object detection, and segmentation.

See our geospatial offerings!

Find out more

What about the RT-DETR?

Real-Time Detection Transformer is an advanced machine learning model designed for fast and accurate object detection. It uses transformer-based neural networks to process visual data in parallel, making it ideal for real-time applications such as detecting and outlining buildings in urban areas. By exploiting attention mechanisms, RT-DETR focuses on relevant image details and efficiently identifies objects and their contours, even in dense or cluttered scenes. This family of models eliminates the need for costly Non-Maximum Suppression usage, which negatively affects popular alternatives, such as YOLO models. RT-DETR often outperforms YOLO models of similar size. This technology is perfect for applications that require accurate, real-time object recognition.

How are transformers different from other neural network architectures

Think of a traffic control tower equipped with radar systems that can instantly monitor all airplanes in the sky, regardless of their distance. The control tower doesn’t need to watch planes in the order they take off or land. Instead, it has a complete bird’s-eye view, identifying patterns and connections across the entire airspace at once. This is how transformers work. They process all input data simultaneously and use attention mechanisms to focus on the most relevant parts of the data. For example, they can identify that “plane A,” which took off an hour ago, is relevant to “plane B,” landing now, without needing to process every flight in between.

Now, picture an air traffic controller who has to watch planes take off and land one by one in a strict sequence. To understand what’s happening, they must recall what they saw earlier – building context as they go. For instance, if they saw “plane C” land earlier, they use that memory to decide whether to clear “plane D” for takeoff. This is the way the recurrent neural networks (RNNs) function. They process data sequentially, step by step and are great for tasks like predicting the next word in a sentence or analysing time series, where past context is essential.

To understand how convolutional neural networks (CNNs) work, imagine scanners placed along airport runways that analyse each section of a plane as it passes. These scanners only look at small sections at a time, but together they form a complete picture of the plane’s condition. They’re great for checking localised details, like whether the plane’s landing gear is down, or its engines are working properly, and for visual tasks, such as identifying objects in an image.

How are transformers different from other neural network architectures

RT-DETR applications in geospatial analysis

This technology has a major impact on many industries where geospatial analysis is essential, streamlining processes through its ability to deliver accurate, real-time data. Here are some of the industries where RT-DETR is beneficial, and how its capabilities meet the unique needs of each.

Urban planning & infrastructure development

It aids urban planners and developers by quickly and accurately identifying buildings, roads and other urban features. RT-DETR supports monitoring urban growth, updating city maps, and optimising resource allocation. By identifying underutilised or high-demand areas, it enhances zoning decisions and infrastructure planning. The technology also assists in recognising non-structural elements like green spaces, car parks, and waterways for comprehensive urban development.

Real estate & property valuation

Real estate companies and government agencies benefit from RT-DETR’s ability to automate the detection and measurement of built-up areas, roads, and landscape features like parks or water bodies. This ensures consistent, accurate data even in densely populated or complex environments, aiding in property assessment and valuation.

Disaster management & emergency response

In emergencies, this advanced technology provides real-time recognition of buildings, roads, and natural barriers, which is critical for assessing vulnerability, planning evacuation routes, and deploying resources. RT-DETR can identify changes in the landscape caused by natural disasters, like flooded areas or collapsed structures, enabling quicker and more efficient response coordination.

Defence & security

It supports strategic planning by detecting changes in infrastructure, road networks, and other key features across regions. It is invaluable for border security operations, surveillance, and defence planning, helping to monitor urban growth, identify potential security threats, and optimise patrol routes.

Environmental monitoring & protection

RT-DETR plays an essential role in tracking changes in natural landscapes, such as deforestation, urban encroachment on protected areas, and habitat loss. Its ability to recognise features like rivers, vegetation cover, and artificial structures supports enforcement of environmental policies and aids in sustainable development. It also helps monitor the impact of human activity on ecosystems, ensuring informed conservation efforts.

Real-Time DEtection TRansformer in automated building recognition

Accurate building recognition is key to many urban planning and real estate applications, but accurately identifying buildings in large datasets can be challenging. One of our clients required a system that could efficiently recognise buildings and, ideally, generate an outline of each building – a feature rarely found in conventional recognition tools. Overcoming this challenge needed an advanced approach that could provide both fast recognition and accurate structural mapping.

To meet the client’s needs, we developed a robust building recognition system using Real-Time DRtection TRansformer technologies as the main recognition framework. We adapted the RT-DETR model to the building recognition task to increase accuracy and stability. To add value to the building outline generation, we integrated Segment Anything Model (SAM) to provide accurate, scalable building segmentation.

To improve performance on larger images and detect small objects in them, we leveraged computer vision and Slicing-Assisted Hyper-Inference (SAHI), which significantly increased the system’s capabilities by segmenting visual data and enabling the detection of small objects without extensive model retraining.

Real-Time DEtection TRansformer in automated building recognition

To sum up, the solution consists of 4 key elements:

1. Image segmentation – The process begins with SAHI, which divides the input image into smaller overlapping tiles using a sliding window technique. This provides better handling of large images and improved detection of small objects such as rooftops.

2. Mask generation – Each tile is then processed by FastSAM, which generates precise segmentation masks for potential buildings within each slice.

3. Classification with fine-tuned RT-DETR – The segmentation masks generated by FastSAM are then classified by a fine-tuned Real-Time Detection Transformer (RT-DETR) model, specifically trained on rooftop images, to ensure accurate detection and classification of building structures.

4. Re-assembly and mask aggregation – All processed slices and their corresponding segmentation masks are re-assembled into a coherent image. Overlapping areas are reconciled by aggregating the coloured masks, effectively creating a unified building outline that is both accurate and scalable.

After implementing the solution, the client received excellent results. The system accurately identifies buildings in real-time and can generate clear and detailed outlines. In addition, SAHI ensured that even small structures in large images could be identified without sacrificing processing speed. The solution improved operational efficiency and increased the accuracy of the client’s urban analysis data.

Integrating Real-Time DEtection TRansformer with advanced technologies such as SAHI and SAM has enabled us to address one of the biggest challenges in urban data analysis – scalable and accurate building detection. This solution gives our client a competitive advantage by combining speed with exceptional detail in real-time building outline generation.

Piotr Semberecki, Senior AI Data Scientist at Spyrosoft

Key benefits of Real-Time DEtection TRansformers 

High accuracy object detection

RT-DETR uses transformers to focus on specific parts of the image, capturing detailed object outlines and complex shapes, resulting in improved accuracy, especially in complex clusters or scenes where precise boundaries are essential.

Speed and efficiency

RT-DETR models excel at processing large amounts of data quickly. They can perform fast object detection and image segmentation without significant latency using transformer attention mechanisms and parallel processing.

Reduced need for manual data handling

It automates the process of object identification and segmentation, saving time and resources and allowing skilled professionals to focus on higher-level analysis and strategic planning.

Robustness

It ensures reliable results even in dynamic conditions. For example, a false positive in one image is mitigated in subsequent images, minimising the impact on overall detection accuracy.

Versatility across sectors

The flexibility of RT-DETR makes it highly versatile and useful across various industries. Each sector can apply technology to meet its specific needs and improve the quality of decision-making with reliable and up-to-date spatial data.

Transform your geospatial analysis with Spyrosoft

With our experience in advanced geospatial solutions, we can tailor the latest technologies to the unique needs of your industry, helping you make faster, smarter and more informed decisions.

Contact our geospatial experts today to learn how transformers can change your projects and give you a competitive edge with actionable, timely insights!

About the author

An interview with Spyrosoft’s ML expert: the winner of Kelp Wanted Competition

Michal Wierzbinski

Lead AI Data Scientist