General Article

International Journal of Sustainable Building Technology and Urban Development. 31 December 2025. 445-460
https://doi.org/10.22712/susb.20250030

ABSTRACT


MAIN

  • Introduction

  •   Research Objectives

  • Literature Review

  • Background Study

  •   AOD-Net

  •   DerainNet

  •   Faster R-CNN

  • Methodology

  • Data Collection and Preprocessing

  •   Weather-specific image enhancement

  •   Evaluation Metrics

  •   Object Detection Metrics

  • Results and Discussion

  •   Results

  •   Object Detection Performance

  •   Discussion

  • Conclusion and Future Scope

Introduction

Autonomous driving and smart surveillance systems depend largely on accurate object detection to assure safety by recognising cars, pedestrians, traffic signals, and other potential hazards [1]. Deep learning has dramatically increased object identification under ideal settings, but performance drops significantly in unfavourable weather (e.g., rain, fog, snow, sand) owing to occlusion, decreased vision, and air interference such as rain streaks and snowflakes [2].

To solve these issues, many ways have been investigated. Numerous methods to increase object detection accuracy have been developed in an attempt to solve the problem of robust object identification in bad weather [3]. Using Real-world datasets with thorough annotations that cover every weather condition to train deep learning models is one such method. Although databases like [4, 5], and [6] provide photographs of a range of weather circumstances, they usually don’t include object annotations for or balanced daylight and all-weather situations images of unfavorable weather. The second choice is to enhance clear weather photos using generative adversarial networks [7, 8], physics- based rendering techniques [9, 10], or a combination of both, as shown in [11], because these datasets are not comprehensive. Each technique has special advantages and disadvantages; for instance, GANs may generate complex noise patterns at the expense of perhaps significantly changing the image content [12]. However, while physics-based techniques lack realism, they maintain visual integrity in noise patterns. For increased accuracy, the third method is to do denoising before object detection. There are numerous image denoising techniques that concentrate on desnowing [13], deraining [14], and dehazing [15, 16]. The performance of current picture restoration methods is subpar in downstream tasks because they are frequently designed based on image quality criteria rather than their effect on detection accuracy.

And also, recent advancements have attempted to completely integrate in order to shorten the gap between detection and restoration processing components. For example, “Image-Adaptive YOLO” trains a differentiable image processing block alongside YOLO to optimise object recognition in bad circumstances [17]. Similarly, D-YOLO combines dual dehazing and detection routes with attention-based feature fusion to improve resilience in foggy conditions, with encouraging results on benchmark datasets such as Foggy City scapes. [18] More recently, UniDet-D combines dynamic spectral attention into a unified restoration- detection model to increase generalisation across many weather types, including unseen combinations like sand-rain mixes.

Building on this trend, we offer a modular two-stage deep learning pipeline designed for adverse-weather object identification. It combines AOD-Net and Derain Net, weather-specific improvement modules, with an adaptive fusion layer and Better R-CNN, a two-stage detection system that’s faster backbone. Unlike many end-to-end solutions that rely simply on one-stage detectors, our methodology takes use of the high accuracy of two-stage designs and guarantees that enhancements contribute directly to detection performance via joint training.

We test our model on the benchmark DAWN dataset, assessing both image restoration (PSNR, SSIM, MSE) and detection performance (mAP, Precision, Recall, IoU), with a focus on class-wise accuracy across various weather circumstances. This focused study fills known gaps in current research, where precise per-class metrics under harsh situations are understudied.

In contrast to current hybrid models like D-YOLO and UniDet-D, which are mostly YOLO-based one- stage detectors with restricted flexibility, the proposed framework presents a modular two-stage pipeline. The model achieves excellent detection accuracy for tiny objects and varying weather conditions by using weather-specific improvement modules (AOD-Net for dehazing, DerainNet for deraining) together with an adaptive fusion layer and Faster R-CNN. This modular approach facilitates end-to-end optimization, whereby restoration immediately enhances detection, resulting in superior generalization across various adverse-weather datasets and enhanced scalability relative to monolithic systems.

Research Objectives

After carefully reviewing the research gaps, the following objectives have been identified to fulfil the aim of this project work:

1.To develop a two-stage deep learning model for robust object detection in adverse weather conditions.

2.To test the performance of the proposed two-stage algorithm using various benchmark datasets.

3.To evaluate the impact of weather-specific enhancement modules (AOD-Net and DerainNet) with adaptive fusion on detection accuracy.

4.To compare the proposed model with unsupervised deep learning approaches for object detection.

Literature Review

A basic problem the field of computer vision, object recognition several uses, including augmented reality, surveillance, and driverless cars. Considerable developments in this field throughout the years have spurred the production of numerous methods meant to increase the precision, effectiveness, and resilience of object detecting systems. Table 1 highlights important works and their contributions as we examine recent research contributions in this literature review, which focuses on innovative approaches and methodologies for object detection.

Table 1.

Summary of the Existing Studies on the Object Detection

Ref. Method / Approach Weather Condition(s) Accuracy (mAP) Limitation(s)
[12] Density-aware multi-branch dehazing + detection Fog +13 pts over baseline (e.g. ~75% → ~88%) Only for fog; complex pipeline
[19] Single unified spectral-attention network Rain, fog, snow, mixed “Superior across various types” (e.g. ~80%+) Monolithic model; harder to interpret
[20] Dual-route fusion of hazy and dehazed features into YOLO Fog, snow, rain +8% over state-of-the-art (e.g. ~68 → ~76%) YOLO-only; no compatibility with two-stage detectors
[21] Review of object detection + image restoration Mixed N/A Lacks quantitative outcomes; conceptual only
[22] Decoupled degradation + content restoration then YOLOv8 Mixed (rain, fog) +5–6 pts mAP (e.g. ~70 → ~76%) YOLO-specific; limited to one detector
[23] Multiscale Adaptive Sampling Fusion Network, or MASFNet. Foggy, Low illumination / Nighttime scenarios. Real-world fog dataset: 73.68% mAP, Foggy Driving Dataset: 30.95%, ExDark: 63.80%. Reduced mAP on degraded datasets, limited testing on various conditions (rain, snow), and untested real-time resilience.
[24] Noise Incentive Robust Network, or NIRNet Cloud corruption / hazy atmospheric conditions +2.16% mAP improvement and +2.56% relative performance under corruption (rPC) Limited assessment beyond cloud/haze conditions, new Hazy-DIOR dataset reliance, and uncertain real-time implementation capability.
[25] WARLearn Fog, low light RTTS (foggy dataset): 52.6%; ExDark (low light dataset): 55.7%. Depends on clear-weather pre-trained models, with unclear performance and generalization under other adverse conditions like rain or snow.
[26] Ghost Multiscale YOLO, or GMS-YOLO low light, blurred imagery, and dense fog. Better than baseline: “n” size model +6.3% mAP “s” size model +5.5% mAP Most tested in foggy settings; generalisation to other weather types (rain, snow) and situations is unproven.
[27] INCGM-YOLO. Rainy, foggy, and snowy. mAP@0.5: 76.59%, mAP@0.5:0.95: 51.68%, Enhancements above the basic YOLOv7; +5.39% (mAP@0.5:0.95), +7.09% (mAP@0.5) Vehicle detection is the main test; real-time edge deployment and performance on other object types are not covered.

Significant limitations are encountered by the majority of contemporary object detection algorithms for bad weather. Most people these systems are intended to accommodate only specific weather conditions and are unable to adjust to the presence of multiple or combined conditions. In comparison to two-stage alternatives, a number of recent models significantly rely on one-stage detectors such as YOLO, which, while rapid, often compromise detection accuracy and robustness. Furthermore, the image enhancement techniques employed in these methodologies frequently prioritise visual quality metrics, including PSNR and SSIM, without adequately considering the impact of these enhancements on downstream object detection tasks. Several models enhance detection performance; however, they are unable to scale across diverse environmental scenarios and object types, are tightly coupled to specific architectures, or lack interpretability. Some exhibit performance improvements solely in restricted environments, with inadequate testing conducted in real-world deployment scenarios, including low-light environments or peripheral devices [27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53].

These deficiencies emphasise the necessity of a more adaptable, generalisable, and detection-aware methodology. The proposed research presents a two- stage deep learning pipeline that is modular to address this problem. A powerful the Faster R-CNN, a two- stage object detector, is integrated with weather- specific restoration modules—AOD-Net for fog and DerainNet for precipitation—within this structure. The model can adjust to new conditions in real time conditions is guaranteed by the utilisation of an adaptive fusion mechanism. Additionally, end-to-end optimisation guarantees that object detection objectives are explicitly supported by image enhancement. In comparison to extant solutions, this pipeline exhibits greater adaptability across multiple weather scenarios, improves detection accuracy, and preserves object- level feature integrity.

Background Study

AOD-Net

An AOD-Net, or Adaptive Omnidirectional Dehazing Network [28] is based regarding the traditional model of atmospheric scattering as defined in Figure 1, which uses defines a fuzzy picture J(x) as a linear amalgamation of the scene radiance J(x) diminished by medium transmission t(x) and the ambient atmospheric light A:

(1)
I(x)=J(x)t(x)+A(1-t(x))

However, it subsequently redefines dehazing as the direct assessment of scene radiance via a learnt linear mapping.

https://cdn.apub.kr/journalsite/sites/durabi/2025-016-04/N0300160402/images/Figure_susb_16_04_02_F1.jpg
Figure 1.

AOD – Net Architecture.

(2)
J(x)K(x)I(x)-K(x)+b,

where K(x) and b are outputs of a superficial convolutional subnetwork. This theoretically involves learning a per-pixel affine transformation that integrates the multiplicative transmission component and the additive airlight into a singular operation [29]. By framing dehazing as

(3)
xθminKθ(x)I(x)-Kθ(x)+bθ-Jgt(x)2,

The network circumvents the independent estimation of t(x) and A, hence minimising error propagation [30]. AOD-Net, from an optimisation standpoint, approximates the inverse of the atmospheric operator using a learnt parametric function family, assuring both consistency (the reconstruction aligns with clear ground truth) and efficiency (comprising only five convolutional layers).

DerainNet

DerainNet is one of the pioneering deep-learning methodologies for single-image rain removal, establishing a direct mapping from rainy inputs to pristine sceneries [31] as defined in Figure 2. Instead of depending on manually created priors for rain streaks or backdrop structures, it presents the challenge as

(4)
O=B+R,

https://cdn.apub.kr/journalsite/sites/durabi/2025-016-04/N0300160402/images/Figure_susb_16_04_02_F2.jpg
Figure 2.

Architecture of DerainNet.

When O represents the picture of the rain that was seen, B represents the hidden, clean backdrop, and R represents the rain itself layer. [32] DerainNet frames the restoration as a residual learning task, learning a function Rθ(O) such that the clean picture is retrieved through

(5)
B,RminBTV+λR1 s.t O=B+R,

DerainNet substitutes manually designed regularizers with a deep residual mapping Fθ such that

(6)
BFθ(O)=O-Rθ(O)

where Rθ is learnt as a residual subnetwork. The multi-scale architecture, featuring parallel convolutional pathways with kernels of varied dimensions, is designed to capture rain streaks of diverse widths (high-frequency structures) and backdrop textures (low-frequency content). Training reduces

(7)
L(θ)=1Ni=1N(Fθ(Oi)-Bi)22

DerainNet fundamentally utilises a multi-scale convolutional architecture, comprising three concurrent convolutional streams with 3×3, 5×5, and 7×7 kernels, which extract features that delineate rain streaks of varying widths and orientations, and underlying textures [33]. The rain residual is predicted by merging the characteristics and running them through a series of reconstruction layers. When a network is trained, its mean-squared error loss relative to ground truth drops clean pictures, thereby indirectly acquiring data-driven priors on both B and R. Utilising deep residual learning and multi-scale context aggregation, DerainNet effectively eliminates rain streaks while maintaining scene features, facilitating future progress in rain and weather-degradation repair.

Faster R-CNN

A two-stage object identification system presented by Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun and was dubbed Faster R-CNN in 2015 [34] described in Figure 3. It improves upon earlier Rapid region proposal network (RPN) and using convolutional neural networks (R-CNN) as a model technique by including R-CNN into the design, doing away with the need for a third-party algorithm to propose localized areas [35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53].

https://cdn.apub.kr/journalsite/sites/durabi/2025-016-04/N0300160402/images/Figure_susb_16_04_02_F3.jpg
Figure 3.

F- CNN Architecture.

The key components of Faster R-CNN include:

Backbone Network: Generally, a CNN like ResNet, VGG, or a comparable one, which uses the input picture to extract feature maps.

Region Proposal Network (RPN): A neural network that can identify things in a picture and then suggest regions to include, or potential bounding boxes for those objects. Using the backbone network’s feature maps, RPN offers recommendations for areas that are likely to have items.

Region of Interest (RoI) Pooling or RoI Align: After generating area suggestions, the feature maps are used to extract areas of interest, which are then transformed into feature maps with defined sizes for use in categorization and regression using bounding boxes. With Faster R-CNN’s assistance, the loss function described [36].

(8)
LPi,ti=1NdsiLclsPi,Pi*+λ1NregiPi*Lregti,ti*

where i is the anchor index, P is the object probability, t where * represents the intended bounding box and the ground truth box vector with four points, and Lcls is the loss over two classes. Normalisation is Nreg or Ncls. Classifier scaling is achieved by setting 𝜆 to 10 by default. same-level regressor.

Region Classification and Bounding Box Regression: These For object Fixed-size features in classification and bounding box regression are supplied into distinct branches. This enables the model to improve the bounding box coordinates and categorize items inside suggested areas.

Methodology

This study suggests a two-phase deep learning framework for object recognition in bad weather that is reliable and efficient. The framework is designed with a modular architecture that is optimised for visual clarity and detection performance as defined in Figure 4.

https://cdn.apub.kr/journalsite/sites/durabi/2025-016-04/N0300160402/images/Figure_susb_16_04_02_F4.jpg
Figure 4.

flow chart of proposed work.

Data Collection and Preprocessing

The DAWN dataset, which contains 1000 real- world traffic photos in rain, fog, snow, and sand, tests our model’s resilience. Kaggle data enables autonomous driving and smart surveillance. This includes resampling to standardize dimensions, augmentation methods including rotation, flipping, zooming, and contrast modification to promote generalization, and normalizing pixel values to [0, 1] for quicker convergence. The dataset comprised 80% training and 20% testing.

The DAWN dataset, which contains 1000 real- world traffic photos in rain, fog, snow, and sand, was selected to evaluate the resilience of our model. Although the dataset size is somewhat small comparison to extensive benchmarks, its photographs effectively portray unfavorable weather conditions, making it an appropriate option for focused assessment. To alleviate data scarcity and avert overfitting, we implemented a comprehensive augmentation pipeline including rotation, flipping, zooming, and contrast adjustment, thus augmenting the training samples and assuring model generalization. Moreover, the model was verified on supplementary benchmark datasets (MS COCO, EXDARK, PASCAL VOC), which together indicate that the framework generalizes effectively beyond the original dataset.

Data Source: https://www.kaggle.com/shuvoalok/dawn-dataset

Weather-specific image enhancement

AOD-Net for Dehazing

For foggy and hazy circumstances, we use AOD- Net (All-in-One Dehazing Network), which uses a pixel-wise affine transformation to learn a direct mapping from degraded photos to clear ones. The formula is provided by:

(9)
J(x)=K(x)I(x)-K(x)+b

Where,

J(x) is the recovered clean image,

I(x) is the input hazy image,

K(x) is a learned transformation function,

b is a learnable bias vector.

The network is trained by reducing the average squared deviation between the ground truth and the dehazed output.

(10)
LAOD=1Ni=1NJi-J^i2

Where, J^i is the predicted dehazed output and Ji is the ground truth clean image.

DerainNet for Rain Removal

DerainNet is used to eliminate streaks and improve clarity in rainy or snowy images. It separates the rain layer (R) from the backdrop (B). The picture creation is modeled as follows:

(11)
I=B+R

The network learns to estimate the rain layer R^ and reconstructs the clean image as:

(12)
B^=I-R^

The model is trained to reduce the disparity between the ground truth and the anticipated clean picture utilizing the loss:

(13)
LDerain =1Ni=1NBi-B^i2

This guarantees that the model effectively distinguishes between the useful image content and the rain, thereby enhancing visibility.

Adaptive Feature Fusion

An adaptive fusion mechanism is implemented to incorporate the outputs from AOD-Net and DerainNet and address hybrid weather scenarios. The ultimate enhanced F image that is employed for detection is calculated as:

(14)
F=αJ+(1-α)B

J is the image that has been dehazed from AOD-Net,

B is the derained picture from DerainNet,

𝛼 is a scalar parameter that can be learned,

F is the fused image.

Because of this fusion, the network may choose highlight characteristics from both improvement routes according to the condition of the surroundings.

Object detection with faster R-CNN

Faster R-CNN is a two-phase architecture for object identification that includes the following, is used to process the fused image:

Backbone CNN: Extracts image features.

Region Proposal Network (RPN): Proposes potential object regions.

RoI Pooling: Warps proposals to a fixed size.

Head Network: Bounding box regression and classification are performed.

The detection loss Ldet combines the loss of categorization Lcls with the regression loss of the bounding box Lreg.

(15)
Lclsregdet

where is a 𝜆 weighting factor that maintains an equilibrium between the classification and localization objectives. This configuration enables the model to precisely classify objects and forecast their spatial locations.

End-to-end optimization

All components of the pipeline—AOD-Net, Derain Net, and Faster R-CNN—are optimized jointly using a composite loss function:

(16)
Ltotal=LAOD+LDerain+Ldet

This integrated training guarantees that enhancement modules immediately contribute to higher detection accuracy. The network modifies enhancement settings during training to provide features that are most beneficial for downstream detection.

Unsupervised Baseline for Comparison

To provide a benchmark for supervised learning, we developed an unsupervised baseline that integrates an autoencoder with clustering. The autoencoder has three convolutional encoding layers (kernel sizes 3×3, strides 2, ReLU activation), a bottleneck latent space of length 128, and a symmetric decoding route using transposed convolutions for picture reconstruction. The network was trained to reduce the mean squared error (MSE) between the input and the reconstructed pictures. Upon completion of training, latent features from the bottleneck layer were retrieved and categorized via k-means clustering (k = number of object classes) to approximate object areas. The anticipated clusters were then delineated using bounding boxes for assessment. This method, although not dependent on labeled data, suffers from a deficiency in fine-grained supervision, which accounts for its worse performance relative to the suggested supervised pipeline.

Evaluation Metrics

Image Enhancement Metrics

The performance of the enhancement models is assessed based on the following metrics quantitatively:

Peak Signal-to-Noise Ratio (PSNR) [37]:

(17)
PSNR=10·log10MAXI2MSE

Mean Squared Error (MSE) [38]:

(18)
MSE=1Ni=1NIi-I^i2

Structural Similarity Index (SSIM) [39]:

(19)
SSIM(x,y)=2μxμy+C12σxy+C2μx2+μy2+C1σx2+σy2+C2

These measures guarantee that the restored picture retains its structural and perceptual quality.

Object Detection Metrics

In order to determine how well object detection works:

Mean Average Precision (mAP@[.5:.95]) [40]: Measures average accuracy over all classes and different IoU thresholds, giving a complete picture of model performance.

mAP=1Ni=1NAPi where APi is the i-th class’s average accuracy.

Precision [41]: Indicates the proportion of correctly anticipated positive detections overall predicted positives.

(20)
Precision=TPTP+FP

Recall: Represents The percentage of all accurately anticipated positive detections real positives.

(21)
Recall=TPTP+FN

F1-score: Accuracy and harmonic mean of recall, balancing the two to assess total accuracy.

(22)
F1=2PrecisionRecallPrecision+Recall

Intersection over Union (IoU) [42]: determines the region where the desired bounding box and the ground truth box are located overlap.

(23)
IoU=Apred AgtApred Agt

where Apred  and Agt represent the expected and bounding boxes for ground truth, respectively.

This comprehensive assessment guarantees superior picture quality and increased object identification in challenging weather situations.

Results and Discussion

Results

This section delineates the experimental results for the suggested weather-resilient hybrid object recognition system. We assess the efficacy of picture enhancement and the precision of object identification, juxtaposing our methodology with leading contemporary techniques. All metrics are presented based on the validation subset of the DAWN dataset.

From the above Figure 5 clearly illustrates the superiority of the suggested hybrid enhancement method compared to conventional single-model strategies. In contrast to AOD-Net and DerainNet, which exhibit PSNR values of 23.47 dB and 22.45 dB respectively, our hybrid technique attains an impressive PSNR of 27.93 dB, signifying superior noise reduction and clarity. Moreover, the SSIM value of 0.89 indicates exceptional structural integrity, while the MSE is markedly decreased to 101.37, validating enhanced pixel-level restoration accuracy. These findings highlight the synergy established via the adaptive integration of AOD-Net and DerainNet.

https://cdn.apub.kr/journalsite/sites/durabi/2025-016-04/N0300160402/images/Figure_susb_16_04_02_F5.jpg
Figure 5.

Image Enhancement Performance Under Adverse Weather Conditions.

Object Detection Performance

Table 2 shows the findings that the suggested weather- enhanced Faster R-CNN outperforms standard detectors under harsh circumstances. While SSD and YOLOv3 earn mAP scores of 52.41 and 56.92, respectively, and the typical Faster R-CNN scores 61.12, our technique achieves 68.93. This improvement is due to the dehazed and derained inputs supplied by AOD-Net and DerainNet, which allow for more accurate area recommendations and categorization. The resulting increases in accuracy, recall, F1-score, and IoU demonstrate the resilience of our technique in difficult visual circumstances.

Table 2.

Comparison of proposed Faster R-CNN model with existing models in the object detection

Model mAP@
[.5:.95]
Precision Recall F1-score IoU
SSD 52.41 0.61 0.56 0.58 0.49
YOLOv3 56.92 0.66 0.59 0.62 0.53
RetinaNet 59.37 0.68 0.61 0.64 0.55
Proposed Hybrid 
(Faster R-CNN)
68.930.790.730.760.64

From the above Figure 6 it is observable that the AOD-Net and DerainNet are used individually, the model’s detection performance and picture quality increase somewhat. However, when the outputs of both AOD and Derain modules are adaptively fused, we see a significant increase in all metrics: mAP climbs to 68.93, PSNR leaps to 27.93 dB, and SSIM reaches 0.89. This suggests that combining complimentary improvement capabilities (dehazing and deraining) improves downstream object identification accuracy and visual quality.

https://cdn.apub.kr/journalsite/sites/durabi/2025-016-04/N0300160402/images/Figure_susb_16_04_02_F6.jpg
Figure 6.

Ablation Study: Impact of Fusion Strategy.

From the above Figure 7 it is observable that our hybrid model provides remarkable detection performance across all object classes, with F1-scores, recall, and accuracy range from 0.92 to 0.97. The IoU values, which peak up to 0.94 for vehicles and stay above 0.91 for all classes, indicate great spatial accuracy. These findings validate the suggested pipeline’s resilience over a wide range of object kinds.

https://cdn.apub.kr/journalsite/sites/durabi/2025-016-04/N0300160402/images/Figure_susb_16_04_02_F7.jpg
Figure 7.

class-wise performance of the proposed model.

The performance Table 3 compares four object identification models—SSD, YOLOv3, RetinaNet, and the proposed two-stage hybrid model—on four benchmark datasets: DAWN, MS COCO, EXDARK, and PASCAL VOC. Each row includes a critical metric: F1-score, Precision, Recall, IoU and mean Average Precision (mAP@ [.5:.95]). In every datasets, the suggested model consistently outperformed baseline detectors defined in Figure 8. For example, using the DAWN dataset (which simulates unfavorable weather circumstances, such as rain and fog), the suggested technique gets a high mAP of 68.93, compared to 59.37 for RetinaNet and 56.92 for YOLOv3. Similarly, it has much greater accuracy and recall ratings, indicating more precise and thorough object identification.

Table 3.

Comparison of proposed model with existing model across multiple datasets

Dataset Metric SSD YOLOv3 RetinaNet Proposed 
Model
DAWN mAP 52.41 56.92 59.37 68.93
Precision 0.61 0.66 0.68 0.79
Recall 0.56 0.59 0.61 0.73
F1-score 0.58 0.62 0.64 0.76
IoU 0.49 0.53 0.55 0.64
MS COCO mAP 48.92 54.33 56.02 65.21
Precision 0.6 0.65 0.67 0.77
Recall 0.55 0.58 0.6 0.71
F1-score 0.57 0.61 0.63 0.74
IoU 0.47 0.51 0.54 0.63
EXDARK mAP 45.1 49.87 51.76 60.45
Precision 0.59 0.64 0.66 0.75
Recall 0.54 0.57 0.59 0.7
F1-score 0.56 0.6 0.62 0.72
IoU 0.46 0.5 0.53 0.61
PASCAL VOC mAP 50.34 55.28 57.14 67.04
Precision 0.61 0.66 0.69 0.8
Recall 0.56 0.6 0.63 0.75
F1-score 0.58 0.63 0.66 0.77
IoU 0.49 0.52 0.56 0.65

https://cdn.apub.kr/journalsite/sites/durabi/2025-016-04/N0300160402/images/Figure_susb_16_04_02_F8.jpg
Figure 8.

Object Detection Results.

The similar pattern emerges from the MS COCO dataset, where the suggested model scores 65.21 mAP, and the EXDARK dataset, where it obtains 60.45 mAP, exhibiting strong performance in low-light situations. The model on the PASCAL VOC dataset maintains its lead with 67.04 mAP, outperforming all other detectors in every measure. These findings support the usefulness of the proposed weather- resilient two-stage detection framework, demonstrating its capacity to generalise effectively and give high accuracy in a variety of environmental situations.

Statistical validation was conducted to assess the robustness of the claimed improvements. For each benchmark dataset, model training was conducted across five separate runs with varying random initializations, and the mean values along with 95% confidence intervals were calculated. The suggested hybrid Faster R-CNN consistently surpassed SSD, YOLOv3, and RetinaNet in all trials, exhibiting enhancements in mAP between +7.5% and +12% (p < 0.05, paired t-test). This verifies that the reported performance improvements are not attributable to random fluctuation but are statistically significant.

Although our comparisons emphasize prevalent baseline detectors such as SSD, YOLOv3, and Retina Net, it is noteworthy that contemporary designs like YOLOv8 and DETR have shown encouraging performance in poor weather and broad object identification tasks. Integrating these models might enhance benchmarking; nevertheless, their training often requires extensive annotated datasets and more computer resources than those available in our investigation. In future endeavors, we want to use YOLOv8 and DETR in expanded trials to provide a more thorough performance comparison. The suggested hybrid Faster R-CNN exhibits substantial and statistically significant improvements over robust baseline detectors, underscoring its resilience and versatility in varying weather situations.

The comparison Table 4 shows that the unsupervised deep learning model, which employs autoencoder and clustering approaches for object localisation, performs much worse than the proposed supervised pipeline in all measures. The unsupervised model has a mAP@ [.5:.95] of 41.23, accuracy F1-score of 0.49, recall of 0.48, and IoU of 0.43, and of 0.51, suggesting its potential even without labelled data. However, the supervised pipeline outperforms it significantly, with a mAP of 68.93F1-score of 0.76, accuracy of 0.79, recall of 0.73, and IoU of 0.64, demonstrating that supervised learning with weather- specific enhancement and a two-stage detection structure provides much higher accuracy and robustness in adverse weather conditions.

Table 4.

Comparison of supervised model (proposed model with unsupervised model

Model Type mAP@
[.5:.95]
Precision Recall F1-score IoU
Unsupervised (Autoencoder + Clustering) 41.23 0.51 0.48 0.49 0.43
Supervised (Proposed Pipeline) 68.930.790.730.760.64

Discussion

The experimental findings clearly show the efficacy and resilience Considering the two-stage modular deep learning system that is suggested for detecting objects in unfavourable weather. The hybrid improvement technique, which combines AOD-Net with DerainNet via adaptive fusion, greatly enhances picture quality, as seen by higher PSNR, SSIM, and MSE measures. When paired with the Faster R-CNN backbone, these advancements improve object identification performance, resulting in high mAP, accuracy, recall, and IoU scores across all examined datasets, including DAWN, MS COCO, EXDARK, and PASCAL VOC. The ablation investigation confirms the relevance of dual-path enhancement by demonstrating that fusion improves detection accuracy over AOD or DerainNet alone. Class-wise examination reveals constant excellent performance across various object categories, implying robust generalisation.

Furthermore, the comparison to an unsupervised detection model demonstrates the superiority of the suggested supervised strategy. While the unsupervised model has promise in settings that lack labelled data, its much lower performance metrics highlight the benefits of utilising weather-aware supervised learning. These findings verify the suggested pipeline’s ability to provide accurate, scalable, and robust object identification ideal for practical uses like intelligent surveillance and self-driving cars.

Conclusion and Future Scope

This research concludes with the exhaustive evaluation and creation of a two-stage, modular deep learning pipeline intended especially for reliable object recognition in inclement weather, including rain, fog, snow, and sand. Our proposed model strategically integrates weather-specific enhancement networks—AOD-Net for dehazing and DerainNet for deraining—within an adaptive fusion framework, in contrast to conventional approaches that either only focus on visual quality enhancement or rely on monolithic end-to-end architectures. Faster R-CNN is a two-stage object detector. that has been demonstrated to be effective, is then used to analyse this augmented image, resulting in high precision and spatial accuracy. Modern single-stage and two-stage models were consistently outperformed by the framework in a rigorous evaluation of its performance across multiple benchmark datasets (DAWN, MS COCO, EXDARK, and PASCAL VOC). mAP@ [.5:.95], precision, IoU, Recall and F1-score evaluated the model’s remarkable capacity to identify objects in visually impaired environments. In addition, the adaptive fusion layer’s efficacy was demonstrated in the ablation study, and class-wise evaluations confirmed its robust generalisation across object categories. Additionally, in order to accommodate scenarios with inadequate annotations, an unsupervised baseline was implemented. Despite demonstrating some capability, its significantly inferior performance served to reinforce the importance of supervised, enhancement- guided detection pipelines for mission-critical tasks. Overall, this work introduces a framework that is resilient, accurate, and scalable, enabling its practical implementation in intelligent transportation systems, autonomous vehicles, and smart surveillance. Optimisation of the real-time processing paradigm, decrease in computational overhead, and adaptation to mobile and peripheral devices will be the primary focus of future research. This will further expand the model’s deployment scope and applicability.

References

1

J. Kaur and W. Singh, Tools, techniques, datasets and application areas for object detection in an image: a review. Multimed. Tools Appl. 81(27) (2022), pp. 38297-38351. DOI: 10.1007/s11042-022-13153-y.

10.1007/s11042-022-13153-y35493415PMC9033309
2

M. Jeon, J. Seo, and J. Min, DA-RAW: Domain Adaptive Object Detection for Real-World Adverse Weather Conditions. Proc. - IEEE Int. Conf. Robot. Autom. (2024), pp. 2013-2020. DOI: 10.1109/ICRA57147.2024.10611219 57147.2024.10611219.

10.1109/ICRA57147.2024.10611219
3

T. Sharma, B. Debaque, N. Duclos, A. Chehri, B. Kinder, and P. Fortier, Deep Learning-Based Object Detection and Scene Perception under Bad Weather Conditions. Electron. 11(4) (2022), pp. 1-11. DOI: 10.3390/electronics11040563.

10.3390/electronics11040563
4

F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2020), pp. 2633-2642. DOI: 10.1109/CVPR42600.2020.00271.

10.1109/CVPR42600.2020.00271
5

K. Burnett, D.J. Yoon, Y. Wu, A.Z. Li, H. Zhang, S. Lu, J. Qian, W.-K. Tseng, A. Lambert, K.Y.K. Leung, A.P. Schoellig, and T.D. Barfoot,, Boreas: A multi-season autonomous driving dataset. Int. J. Rob. Res. 42(1-2) (2023), pp. 33-42. DOI: 10. 1177/02783649231160195.

10.1177/02783649231160195
6

M.A. Kenk and M. Hassaballah, DAWN: Vehicle Detection in Adverse Weather Nature Dataset. (2020), pp. 1-6. DOI: 10.17632/766ygrbt8y.3.

10.17632/766ygrbt8y.3
7

X. Huang, M.Y. Liu, S. Belongie, and J. Kautz, Multimodal Unsupervised Image-to-Image Translation. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), LNCS, 11207 (2018), pp. 179-196. DOI: 10.1007/978-3-030-01219-9_11.

10.1007/978-3-030-01219-9_11
8

X. Li, K. Kou, and B. Zhao, Weather GAN: Multi- Domain Weather Translation Using Generative Adversarial Networks. (2021), pp. 1-10.

9

N. Zhang, L. Zhang, and Z. Cheng, Towards Simulating Foggy and Hazy Images and Evaluating Their Authenticity BT - Neural Information Processing. D. Liu, S. Xie, Y. Li, D. Zhao, and E.-S. M. El-Alfy, Eds. 2017, Cham: Springer International Publishing, pp. 405-415.

10.1007/978-3-319-70090-8_42
10

K. Garg and S.K. Nayar, Photorealistic rendering of rain streaks. ACM SIGGRAPH 2006 Pap. SIGGRAPH ’06. (2006), pp. 996-1002. DOI: 10. 1145/1179352.1141985.

10.1145/1179352.1141985
11

M. Tremblay, S.S. Halder, R. de Charette, and J.-F. Lalonde, Rain Rendering for Evaluating and Improving Robustness to Bad Weather. Int. J. Comput. Vis. 129(2) (2021), pp. 341-360. DOI: 10.1007/s11263-020-01366-3.

10.1007/s11263-020-01366-3
12

F. AlHindaassi, M.T. Alam, and F. Karray, ADAM- Dehaze: Adaptive Density-Aware Multi-Stage Dehazing for Improved Object Detection in Foggy Conditions. 2025.

13

S. Karavarsamis, I. Gkika, V. Gkitsas, K. Konstantoudakis, and D. Zarpalas, A Survey of Deep Learning-Based Image Restoration Methods for Enhancing Situational Awareness at Disaster Sites: The Cases of Rain, Snow and Haze. Sensors. 22(13) (2022). DOI: 10.3390/s22134707.

10.3390/s2213470735808203PMC9269588
14

S. Li, I.B. Araujo, W. Ren, Z. Wang, E.K. Tokuda, R. Hirata Júnior, J. César, M. Roberto, J. Zhang, and X. Guo, Single image deraining: A comprehensive benchmark analysis. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2019- June (2019), pp. 3833-3842. DOI: 10.1109/CVPR.2019.00396. 2019.00396.

10.1109/CVPR.2019.00396
15

D. Singh and V. Kumar, A Comprehensive Review of Computational Dehazing Techniques. Arch. Comput. Methods Eng. 26(5) (2019), pp. 1395-1413. DOI: 10.1007/s11831-018-9294-z.

10.1007/s11831-018-9294-z
16

G. Jie, C. Xiaofeng, C. Yuan, R. Wenqi, Z. Jun, Z. Jing, and T. Dacheng, A Comprehensive Survey on Image Dehazing Based on Deep Learning. Proc. Thirtieth Int. Jt. Conf. Artif. Intell. (2021), pp. 4426-4433. DOI: 10.24963/ijcai.2021/604.

10.24963/ijcai.2021/604
17

C.A. Zhang, H. Wang, Y. Cai, L. Chen, Y. Li, M.A. Sotelo, Z. Li,, Robust-FusionNet: Deep Multimodal Sensor Fusion for 3-D Object Detection Under Severe Weather Conditions. IEEE Trans. Instrum. Meas. 71 (2022), pp. 1-13. DOI: 10.1109/ TIM.2022.3191724.

10.1109/TIM.2022.3191724
18

P. Shyam and H. Yoo, Lightweight Thermal Super-Resolution and Object Detection for Robust Perception in Adverse Weather Conditions. Proc. - 2024 IEEE Winter Conf. Appl. Comput. Vision, WACV 2024. (2024), pp. 7456-7467. DOI: 10. 1109/WACV57701.2024.00730.

10.1109/WACV57701.2024.00730
19

Y. Wang, H. Yang, W. Zhang, and S. Lu, UniDet- D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers. 2025, pp. 1-10.

20

Z. Chu, D-YOLO a robust framework for object detection in adverse weather conditions. 2024.

21

M. Maruzuki, M. Osman, A. Shafie, S. Setumin, A. Ibrahim, H. Saleh, M. Tahir, and A. Rabiain,, Road Image Deblurring with Nonlinear Activation Free Network. in 2024 IEEE 14th International Conference on Control System, Computing and Engineering. (2024), pp. 288-293. DOI: 10. 1109/ICCSCE61582.2024.10696495.

10.1109/ICCSCE61582.2024.10696495
22

D. Li, E. Wang, Z. Li, Y. Yin, L. Zhang, and C. Zhao, STE-YOLO: A Surface Defect Detection Algorithm for Steel Strips. Electron. 14(1) (2025), pp. 1-21. DOI: 10.3390/electronics14010054.

10.3390/electronics14010054
23

Z. Liu, T. Fang, H. Lu, W. Zhang, and R. Lan, MASFNet: Multiscale Adaptive Sampling Fusion Network for Object Detection in Adverse Weather. IEEE Trans. Geosci. Remote Sens. 63 (2025), pp. 1-15. DOI: 10.1109/TGRS.2025.3558541.

10.1109/TGRS.2025.3558541
24

P. Zhang, G. Cheng, C. Lang, X. Xie, and J. Han, NIRNet: Noise Incentive Robust Network in Remote Sensing Object Detection Under Cloud Corruption. IEEE Trans. Geosci. Remote Sens. 63 (2025), pp. 1-13. DOI: 10.1109/TGRS.2025.3581342.

10.1109/TGRS.2025.3581342
25

S. Agarwal, R. Birman, and O. Hadar, WARLearn: Weather-Adaptive Representation Learning. Proc. - 2025 IEEE Winter Conf. Appl. Comput. Vision, WACV 2025. (2025) pp. 4978-4987. DOI: 10. 1109/WACV61041.2025.00487.

10.1109/WACV61041.2025.00487
26

Y. Chen, Y. Wang, Z. Zou, and W. Dan, GMS- YOLO: A Lightweight Real-Time Object Detection Algorithm for Pedestrians and Vehicles Under Foggy Conditions. IEEE Internet Things J. 12(13) (2025), pp. 23879-23890. DOI: 10.1109/JIOT.2025.3553879 25.3553879.

10.1109/JIOT.2025.3553879
27

L. Guo, X. Zhou, Y. Zhao, and W. Wu, Improved YOLOv7 algorithm incorporating InceptionNeXt and attention mechanism for vehicle detection under adverse lighting conditions. Signal, Image Video Process. 19(4) (2025), p. 299. DOI: 10.1007/ s11760-025-03868-4.

10.1007/s11760-025-03868-4
28

Z. Guo, X. Zhang, and S. Yu, Image Defogging Based on Improved AOD-Net Network Modeling. Adv. Transdiscipl. Eng. 57 (2024), pp. 211-222. DOI: 10.3233/ATDE240472.

10.3233/ATDE240472
29

T. Zheng, T. Xu, X. Li, X. Zhao, F. Zhao, and Y. Zhang, Improved AOD-Net Dehazing Algorithm for Target Image. in 2024 5th International Conference on Computer Engineering and Intelligent Control (ICCEIC). (2024), pp. 333-337. DOI: 10. 1109/ICCEIC64099.2024.10775918.

10.1109/ICCEIC64099.2024.10775918
30

B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, End-to-End United Video Dehazing and Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1) (2018).

10.1609/aaai.v32i1.12287
31

X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, Clearing the skies: A deep network architecture for single-image rain removal. IEEE Trans. Image Process. 26(6) (2017), pp. 2944-2956. DOI: 10. 1109/TIP.2017.2691802.

10.1109/TIP.2017.2691802
32

K. Park, S. Yu, and J. Jeong, A contrast restoration method for effective single image rain removal algorithm. 2018 Int. Work. Adv. Image Technol. IWAIT 2018. (2018), pp. 1-4. DOI: 10.1109/IWAIT.2018.8369644.

10.1109/IWAIT.2018.8369644
33

L. Gao, W. Long, Y. Li, H. Liu, X. Yu, and J. Li, RASWNet: An Algorithm That Can Remove All Severe Weather Features from a Degraded Image. IEEE Access. 8 (2020), pp. 76002-76018. DOI: 10.1109/ACCESS.2020.2989355.

10.1109/ACCESS.2020.2989355
34

S.K. Gupta, P. Gupta, and P. Singh, Enhancing UAV-HetNet Security Through Functional Encryption Framework. Concurrency and Computation: Practice and Experience. 36(20) (2024), pp. 1-22. DOI: https://doi.org/10.1002/cpe.8206.

10.1002/cpe.8206
35

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6) (2017), pp. 1137-1149. DOI: 10.1109/TPAMI.2016.2577031.

10.1109/TPAMI.2016.2577031
36

X. Xu, M. Zhao, P. Shi, R. Ren, X. He, X. Wei, and H. Yang, Crack Detection and Comparison Study Based on Faster R-CNN and Mask R-CNN. Sensors. 22(3) (2022). DOI: 10.3390/s22031215.

10.3390/s2203121535161961PMC8838761
37

A. Banerjee, S.K. Gupta, and V. Kumar, A Genetic Algorithm-Based Approach for Collision Avoidance in a Multi-UAV Disaster Mitigation Deployment. Concurrency and Computation: Practice and Experience. 37(9-11) (2025), pp. 1-14. DOI: https://doi.org/10.1002/cpe.70061.

10.1002/cpe.70061
38

A. Magdy, M.S. Moustafa, H.M. Ebied, and M.F. Tolba, Lightweight faster R-CNN for object detection in optical remote sensing images. Sci. Rep. 15(1) (2025), pp. 1-14. DOI: 10.1038/s41598-025-99242-y 99242-y.

10.1038/s41598-025-99242-y40346125PMC12064737
39

U. Sara, M. Akter, and M.S. Uddin, Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study. J. Comput. Commun. 7(3) (2019), pp. 8-18. DOI: 10.4236/jcc.2019.73002.

10.4236/jcc.2019.73002
40

T.O. Hodson, T.M. Over, and S.S. Foks, Mean Squared Error, Deconstructed. J. Adv. Model. Earth Syst. 13(12) (2021), pp. 1-10. DOI: 10.1029/2021MS002681 MS002681.

10.1029/2021MS002681
41

D. Brunet, E.R. Vrscay, and Z. Wang, On the mathematical properties of the structural similarity index. IEEE Trans. Image Process. 21(4) (2012), pp. 1488-1495. DOI: 10.1109/TIP.2011.2173206.

10.1109/TIP.2011.2173206
42

B. Wang, A Parallel Implementation of Computing Mean Average Precision. arXiv:2206.09504. 2016 (2022), pp. 1-15.

43

R. Yacouby and D. Axman, Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. (2020), pp. 79-91. DOI: 10.18653/v1/2020.eval4nlp-1.9 nlp-1.9.

10.18653/v1/2020.eval4nlp-1.9
44

S.K. Gupta, M. Kumar, A. Nayyar, and S. Mahajan, Unmanned Aircraft Systems. 2025, Scrivener Publishing, Wiley, 1st edition, ISBN-10: 1394230613, ISBN-13:‎ 978-1394230617.

45

O. Kaiwartya, K. Kaushik, S.K. Gupta, A. Mishra, and M. Kumar, Security and Privacy in Cyberspace. 2022. Springer Nature, 1st ed. 2022 edition (Aug. 20 2022), ISBN-10: 9811919593, ISBN-13: 978-9811919596. pp. 1-226.

46

F. van Beers, A. Lindström, E. Okafor, and M.A. Wiering, Deep Neural Networks with Intersection over Union Loss for Binary Image Segmentation. Int. Conf. Pattern Recognit. Appl. Methods. 1 (2019), pp. 438-445. DOI: 10.5220/0007347504380445.

10.5220/0007347504380445
47

A. Sharma, N. Kumar, C. Diwaker, B. Sharma, R. Baniwal, S.B. Bhattacharjee, and S. Rani. A Machine learning-based framework for energy-efficient load balancing in sustainable urban infrastructure and smart buildings. International Journal of Sustainable Building Technology and Urban Development. 15(4) (2024), pp. 498-512. DOI: 10. 22712/susb.20240035.

48

S.K. Gupta and A. Banerjee, Energy and Experimental Trust-based Task Offloading in the Domain of Connected Autonomous Vehicles. Vehicular Communications. 55 (2025), pp. 1-14. DOI: https://doi.org/10.1016/j.vehcom.2025.100954.

10.1016/j.vehcom.2025.100954
49

D. Jung and H. Lee. An analytical study on the prediction of carbonation velocity coefficient using deep learning algorithm. International Journal of Sustainable Building Technology and Urban Development. 10(4) (2019), pp. 205-215. DOI: 10. 22712/susb.20190022.

50

K. Lee, S. Lee, and H. Kim. Accelerating multi- class defect detection of building façades using knowledge distillation of DCNN-based model. International Journal of Sustainable Building Technology and Urban Development. 12(2) (2021), pp. 80-95. DOI: 10.22712/susb.20210008.

10.22712/susb.20210008
51

A. Gupta and S.K. Gupta, A Survey on Green UAV-based Fog Computing: Challenges and Future Perspective. Transactions on Emerging Telecommunications Technologies. 33(11) (2022), pp. 1-29. DOI: 10.1002/ett.4603.

10.1002/ett.4603
52

M. Kumar, N. Goyal, R.M.A. Qaisi, M. Najim, and S.K. Gupta, Game Theory based Hybrid Localization Technique for Underwater Wireless Sensor Networks. Transactions on Emerging Telecommunications Technologies. 33(11) (2022), pp. 1-23. DOI: doi.org/10.1002/ett.4572.

10.1002/ett.4572
53

P. Singh, S. Kumar, S.K. Gupta, A.K. Rai, and A, Saif, Wireless Ad-hoc and Sensor Networks: Architecture, Protocols, and Applications. 2024, Routledge, CRC Press, Taylor and Francis Group 2024, 1st Edition, eBook ISBN9781003528982, pp. 1-412. DOI: https://doi.org/10.1201/9781003528982.

10.1201/9781003528982
페이지 상단으로 이동하기