Deep Learning for Plankton Image Retrieval
Automating plankton image labeling and retrieval using CNNs and approximate nearest neighbor search to accelerate oceanographic research.
Overview
Developed an automated plankton image retrieval system that accelerates marine ecosystem analysis by 5-10× faster than manual labeling. Using deep learning and approximate nearest neighbor search, the system processes millions of underwater images to help oceanographers identify and catalog plankton species.
Problem Statement
Challenge: Marine biologists spend weeks manually labeling plankton images—one expert taking a month to annotate 30,000 images. This bottleneck limits large-scale ecological studies.
Solution: Automated CNN-based retrieval system that finds visually similar plankton specimens, reducing annotation time from weeks to days.
Key Achievements
Performance Improvements
- Labeling speed: 5-10× faster than manual annotation
- Retrieval accuracy: Improved from 65% → 90% with data augmentation
- Query latency: <4 seconds for 10,000+ image datasets
- Scale: Processing capability for millions of images
Technical Impact
Transformed workflow: 30,000 images labeled in days instead of months
System Architecture
Computer Vision Pipeline
- Feature Extraction: InceptionResNetV2 & ResNet-50 CNNs
- Embedding Generation: High-dimensional visual feature vectors
- Similarity Search: Approximate Nearest Neighbor (ANN) algorithms
- Distance Metrics: Cosine, Euclidean, and Hamming distances
Optimization Techniques
- Binary hashing: Fast similarity computation
- Hyperplane partitioning: Efficient large-scale indexing
- Data augmentation: Rotation, flipping, Gaussian blur for robustness
Full-Stack Implementation
- Backend: Django + PostgreSQL (metadata management)
- Frontend: jQuery web interface (real-time visualization)
- Database: Scalable image metadata storage
Technical Highlights
Deep Learning Approach
- Pre-trained CNNs (ImageNet) fine-tuned on plankton imagery
- Classification layer removed for feature embedding extraction
- Transfer learning accelerated model convergence
Similarity Search
Approximate Nearest Neighbor (ANN) retrieval enables:
- Sub-second search across millions of images
- Multiple distance metrics for flexible comparison
- Binary encoding for memory-efficient storage
Data Augmentation Benefits
Enhanced model robustness through:
- Geometric transformations (rotation, flipping)
- Photometric variations (brightness, contrast)
- Simulated underwater imaging conditions
Research Impact
Oceanographic Applications
- Biodiversity monitoring: Track plankton population changes
- Climate research: Analyze ecosystem responses to environmental shifts
- Real-time assessment: Accelerate environmental impact studies
Workflow Transformation
Automated retrieval enables:
- Large-scale dataset annotation
- Collaborative labeling across research institutions
- Rapid species identification for time-sensitive studies
Collaboration
Developed in partnership with Scripps Institute of Oceanography to support marine ecosystem research and oceanographic data analysis.
Supervision
Conducted under Prof. Nuno Vasconcelos, Department of Electrical and Computer Engineering, UC San Diego (Statistical Visual Computing Lab).