Cities are facing rising temperatures, with buildings playing a key role in urban heat dynamics. Yet, mapping facade materials at scale remains a challenge. In this study, we leverage zero-shot learning to automate facade material detection, enabling new insights into urban thermal exchanges.
Harnessing Computer Vision for Urban Climate Research
The urban heat island (UHI) effect poses one of the most pressing climate challenges for cities worldwide. To future-proof urban areas against rising temperature extremes, urban scientists must conduct large-scale simulations to assess urban conditions. These simulations, typically physics-based, start with simplified representations of street canyons and progressively increase complexity by incorporating building surface materials, heights, and refined geometries. Collecting street-specific building data is labor-intensive and extracting it from street view imagery has been gaining traction within the urban science community. While the extraction of urban morphology, vegetation, and building density from street view data have been extensively studied, facade materials remain underexplored due to the difficulty of large-scale material detection. Building surface types vary significantly across cities, and training segmentation networks for each new context is often impractical for urban scientists. Our project explores a workflow that leverages recent advances in foundational models and zero-shot methods to address this challenge. By bypassing the need for extensive labeled datasets, our method enables high-fidelity facade material extraction across diverse urban environments.
Why Do Facade Materials Matter in Urban Climate Studies?
Materials that make up the urban fabric—brick, concrete, glass, plaster, and metal panels—play a critical role in determining local thermal conditions. Their properties, such as albedo (reflectivity), emissivity, and thermal mass, influence how much heat is absorbed, retained, and re-emitted into the urban environment. Cities experiencing increasing heat stress require detailed data on these materials to model the impact of design choices on outdoor thermal comfort, peak cooling demand, and urban heat mitigation strategies. However, the lack of large-scale datasets on facade materials has been a persistent barrier to integrating this knowledge into city-scale studies. Figure 1 illustrates examples of facade material segmentations across different urban contexts.
Figure 1: Example output segmentations of buildings in Amsterdam, Boston, and Dubai
Zero-Shot Learning: Applications in Urban Data Acquisition
To overcome these challenges, we developed a workflow that integrates multiple state-of-the-art zero-shot learning models, including OpenAI CLIP, CLIPSeg, Segment Anything Model (SAM), and Grounding DINO. These models, originally designed for general-purpose image recognition, were adapted to segment and classify urban facade materials without requiring pre-labeled training data. This approach allows us to detect predominant materials in 68% of cases in our labeled dataset and identify the top three materials with 85% accuracy across three diverse cities: Boston, Amsterdam, and Dubai.
Our methodology unfolds in several steps:
Image Sourcing and Preprocessing: We extract high-resolution facade images from street-view panoramas, applying segmentation techniques (SAM) to isolate buildings from their surroundings and patch different segments.
Zero-Shot Material Classification: Using CLIP and CLIPSeg, we assign material labels to facade elements based on text-image matching principles, allowing the model to infer material types from their visual characteristics.
Instance and Panoptic Segmentation: By integrating SAM and Grounding DINO, we refine material detection at a granular level, distinguishing between different facade elements; such as windows, doors, and balconies and material classes; brick, glass, concrete etc.
Validation and Benchmarking: We compare our method’s performance against supervised segmentation models, demonstrating its robustness in diverse urban settings. The full workflow is shown in Figure 2.
Figure 2: Workflow overview for Image sourcing, transformation and joint detection-segmentation of facade materials and objects
From Segmentation to Urban Thermal Comfort Applications
Our research applies the formulated material detection pipeline to explore its implications for urban comfort modeling. We conducted a case study simulating outdoor thermal comfort using the Universal Thermal Climate Index (UTCI) metric in a sample urban canyon. By altering facade material compositions, we quantified shifts in heat and cold stress hours across the three cities. The results highlighted the notable influence of facade materials:
In Dubai, glazing-covered facades increased heat stress hours by 220 annually, underscoring the thermal impact of high-glass skyscrapers in hot climates.
In Amsterdam, material variations had a more moderate impact, reflecting the city’s temperate conditions.
In Boston, while glazing increased heat stress, materials like brick reduced winter cold stress due to their thermal mass properties.
The example outputs and mapped material distributions are highlighted in Figure 3 for Boston.
Figure 3: Full workflow outputs in Boston; from facade segmentation to thermal comfort assessment
Scaling Up: Towards city-wide Material Mapping
Our findings highlight the impactful potential of zero-shot learning in urban climate research. By automating facade material mapping, we highlight relevant applications in urban heat island studies, city wide construction classification, and climate-responsive urban design. Moreover, our methodology can support regulatory efforts by providing policymakers with data-driven insights to optimize building codes, retrofit strategies, and material selection for multiple use cases. Despite its promise, zero-shot detection still struggles with rare materials, variations in facade texture, and occlusions from urban obstructions and noisy images. Additionally, its performance varies across different geographic contexts, potentially due to training image biases embedded in large-scale foundational models. Through the results shown in this study, we hope our workflow serves as a foundation for the computer science and urban climatology communities to refine and expand upon, leveraging ever-evolving computer vision tools to address this complex challenge. As cities face escalating climate extremes, harnessing data-driven methodologies to extract and analyze material data at scale will be essential in shaping targeted urban interventions.
Tarkhan, N., Klimenka, M., Fang, K.et al. Mapping facade materials utilizing zero-shot segmentation for applications in urban microclimate research.Sci Rep 15, 5492 (2025). https://doi.org/10.1038/s41598-025-86307-1