CS-1792: Data-Efficient Transformers for Vision: Enhancing Accuracy under Resource Limitations | [4th International Conference of Sciences “Revamped Scientific Outlook of 21st Century, 2025” , November 12,2025 • 2025]

Author(s):

1. Aiman Sumaya: Dept. of CS, COMSATS University Islamabad, Wah Campus,,Pakistan

Abstract:

In recent years, transformer-based architectures have revolutionized computer vision, achieving state-of-the-art results across diverse tasks. Nevertheless, conventional Vision Transformers (ViTs) demand massive datasets and substantial computational resources, which restricts their applicability in resource-constrained or data-scarce scenarios. To overcome this limitation, DataEfficient Image Transformers (DEiTs) have been introduced, leveraging three key strategies: (i) knowledge distillation, where a compact student transformer benefits from the guidance of a powerful CNN teacher; (ii) optimized training protocols, including regularization and augmentation tailored for small data regimes; and (iii) efficient architectural modifications that reduce redundancy while preserving representational power. By integrating these mechanisms, DEiTs not only narrow the gap between transformers and CNNs under limited data conditions but also establish a scalable framework for future vision applications requiring both efficiency and accuracy.

Page(s): 102-102

DOI: DOI not available

Published: Journal: 4th International Conference of Sciences “Revamped Scientific Outlook of 21st Century, 2025” , November 12,2025, Volume: 1, Issue: 1, Year: 2025

Keywords:

Data Augmentation , Computer vision , Vit , knowledge distillation

References:

References are not available for this document.

Citations

Citations are not available for this document.

Citations

Downloads

Views