BGPT: Paper Review: Enhancement of YOLOv5 for automatic weed detection through backbone optimization

Fuel Your Discoveries

Quick Explanation Copied

One-line appraisal

Backbone swapping into YOLOv5 (DarkNet, MobileNet, ResNet, VGG) shows expected speed/accuracy tradeoffs for pea crop vs weeds: MobileNetv1 gives highest AP at IoU0.5 but worse localization at stricter IoUs while YOLOv5 (CSPDarkNet53mod) is the most stable across IoU thresholds and fastest-stable choice for embedded weeding (results and numbers taken directly from the paper)

Key numeric evidence: MobileNetv1 AP50 0.893, YOLOv5 AP50 0.881, VGG16 AP50 0.946, inference speeds 4.7 5.9 10 29 32 ms range (table values)

Source: peer article evaluation and result tables

Long Explanation

Detailed critical review and analysis

Paper summary (what authors did)

Replaced YOLOv5 backbone with multiple common CNN backbones (CSPDarkNet53mod original, DarkNet53, MobileNetv1/v2, ResNet50, VGG16/19) and evaluated detection performance (AP at IoU50/75/85/90) and per-image inference speed on a pea crop vs weeds dataset (7360 train, 1840 val, 54 test) using PyTorch on a Tesla T4 GPU
Reported AP50/AP75/AP85/AP90 and inference speed (ms) in Table 3; used BCEWithLogits and GIoU losses and SGD lr0.01 for 50 epochs

Primary source for these statements and all numeric results is the paper itself (values below are taken from its tables and figures)

Key quantitative findings (directly from paper)

Selected AP and inference speed per model (AP at IoU thresholds from paper)

Model	AP50	AP75	AP85	AP90	Speed ms
YOLOv5	0.881	0.874	0.852	0.798	10
MobileNetv1	0.893	0.862	0.752	0.593	10.1
VGG16	0.946	0.935	0.898	0.830	32.5
YOLOv3	0.915	0.912	0.898	0.871	28.4
YOLOv4	0.906	0.890	0.859	0.818	29

These numbers are transcribed from the paper tables and support the authors conclusions about tradeoffs (exact table provenance in the paper)

Strengths of the study

Direct, systematic backbone replacement within the same detector backbone/neck/head pipeline isolates backbone effect on detection—good experimental control
Quantitative reporting across multiple IoU thresholds (not only AP50) helps reveal localization stability, not just coarse detection
Practical focus on inference time and GFLOPs relevant to embedded robotics for precision weeding

All above points are supported by the authors methods and tables

Weaknesses, limitations, and reproducibility concerns

Very small independent test set (only 54 test images) — this is a critical limitation for claims about generalization and for robust AP estimates (paper reports dataset split 7360/1840/54)
Dataset provenance not public and no repository or code link provided — prevents independent reproduction and external validation
Single-crop, single-environment dataset (pea fields) increases overfitting risk and limits generality to other crops/lighting/soil conditions
Possible metric instability: with only 54 test images AP at high IoU thresholds is sensitive to few examples — authors correctly report AP across thresholds but do not provide confidence intervals or per-class counts
Limited description of augmentation pipelines and whether cross-validation or repeated runs were used to quantify variance

Most important reproducibility gaps

No public dataset access link or DOI for the images
No public training scripts or exact model weight files
No statistical uncertainty (CI or SD) around AP or speed numbers

External context: similar work and corroboration

Comparable applied papers replacing YOLO backbones for speed/weight tradeoffs or improving small-object detection confirm the field trend: lightweight backbones like MobileNet variants frequently increase speed and sometimes improve coarse AP (AP50) but can lose localization quality at stricter IoU thresholds; multi-scale attention / neck improvements (as in RICE-YOLO) often yield large mAP gains for small objects in UAV imagery.

Practical takeaways and recommendations for next steps

Do not judge backbone-only changes without more diverse test data; expand test set across seasons, lighting, and soil backgrounds to measure true field robustness
Report uncertainty (95% CI or SD) for AP and runtime (run multiple seeds and report mean +- SD) — with small test sets AP estimates are noisy
Publish dataset and code (weights, scripts, augmentation settings) to allow reproducibility and community benchmarking
Consider combined approaches: lightweight backbone (MobileNet) plus neck/attention or small-object detection heads (see RICE-YOLO and other recent YOLOv5 variants) to recover localization losses at high IoU while preserving speed
If the use-case is embedded robot actuation for mechanical/thermal weeding, evaluate end-to-end system-level metrics: localization error in cm at the operating height and false positive herbicide/sprayer actuation rates

Reanalysis visualization

This combined AP50 vs speed plot visually highlights the speed/accuracy tradeoff reported by the authors: VGG models give highest AP50 but are slow; MobileNet/YOLO-tiny/YOLOv5 occupy lower-latency region with reasonable AP50; MobileNetv1's AP50 advantage over YOLOv5 is small but its localization at higher IoU degrades per tables

Conclusions, confidence, and how to falsify

Conclusion: The paper credibly demonstrates the expected engineering tradeoffs: heavyweight backbones (VGG/ResNet/YOLOv3/4) give higher AP and more stable localization but at large compute cost; MobileNetv1 yields highest AP50 in this pea dataset but loses localization precision at stricter IoU thresholds; YOLOv5 (CSPDarkNet53mod) provides the best balance of speed and stability for embedded weed-removal systems in the authors dataset and conditions

Confidence in that conclusion: moderate (approx 6/10) because the numeric evidence is consistent and plausible but the very small independent test set (n=54) and lack of public dataset/code reduce confidence in generalizability to other fields, crops, and conditions.

What would disprove this conclusion: rigorous external evaluation on an expanded, independently collected multi-environment dataset showing MobileNetv1 or YOLOv5 failing to replicate reported AP50/AP75 patterns or showing different backbone/neck combos (e.g. lightweight backbone plus attention neck) outperforming the reported best combos on real embedded hardware; or demonstration that AP changes are within noise due to the tiny test set (confidence intervals overlapping).

Actionable next experiments (concise)

Publish dataset and training scripts; rerun experiments with 5-fold cross-validation and report mean +- SD APs and per-model inference distributions on target embedded hardware
Combine MobileNetv1 backbone with improved neck/attention (e.g. EMA / BiFPN / small-object heads as in RICE-YOLO) and test whether localization at high IoU recovers while preserving speed
Measure downstream system metrics: localization error in centimeters at planned robot standoff, false-actuation rate, and end-to-end herbicide/actuator reliability

Tools and resources to reproduce and extend (practical)

Codebase to start from: official YOLOv5 PyTorch repo (use same BCEWithLogits and GIoU loss config) and implement backbone swaps preserving neck/head
Hardware: test on representative embedded device and on Tesla T4 (for parity) and measure ms on target hardware
Datasets: collect multi-field pea images across lighting and canopy densities and increase independent test set to several hundred images

Run automatic follow-up analysis

Click to start an iterative BGPT agent to re-train/benchmark the reported backbones on supplied images, compute CI for APs, and propose an optimized backbone+neck for embedded deployment.

Feedback:

Updated: December 01, 2025