Multi-Vehicle Detection & Tracking: Attention Faster RCNN vs YOLOv8
Full CV pipeline for vehicle detection, tracking, and benchmarking. Implemented YOLOv8 with CBAM attention and Faster R-CNN; integrated ByteTrack with trajectory visualization; benchmarked mAP, precision, recall, and inference latency on real-time video.
Generative Image Modeling: GAN, VAE, Diffusion U-Net
Implemented GANs and VAEs from scratch; trained on Kaggle P100 GPU for landscape image synthesis. Fine-tuned GAN architectures for training stability and visual fidelity; explored latent space representations.
Transformer Models and Vision Transformers
Implemented seq2seq Transformer for English–French NMT and Vision Transformer from scratch for Oxford Flower classification. Built attention mechanisms and positional encoding end-to-end.
Facial Deepfake Detection (Frequency-Aware Deep Learning)
Spectral feature learning via Fourier & DCT transforms; CNN + ViT hybrid for generative artifact detection; ablation studies on feature representations; robustness evaluation against GAN and diffusion-based attacks.