Intelligent Model Recommendation
<1s latency
5x Catalog Expansion
The Challenge
As our experiment catalog grew, standard query methods became a bottleneck. We needed to analyze hundreds of meta-features (dataset characteristics + model architecture stats) to recommend the optimal model for a user’s specific task. Latency was degrading, and we needed a way to isolate client data logically without managing separate physical clusters for every tenant.
The Solution
I engineered a Vector Search System using Milvus as the core engine.
- Fingerprinting Pipeline: Built an automated pipeline to convert arbitrary DataLoaders and Model Graphs into dense vector representations.
- Two-Stage Retrieval:
- Stage 1 (ANN): Rapid approximate nearest neighbor search to funnel down the search space
- Stage 2 (Re-ranking): A fine-grained statistical scoring layer to select the highest-performing candidate based on historical performance metrics.
- Logical Isolation: Implemented a partition strategy that routes queries to client-specific shards, ensuring data privacy while maintaining a shared infrastructure.
The Impact
The system achieved sub-second retrieval latency (< 800ms) while handling a 5x expansion in the experiment catalog. It now serves as the backbone for the automated modeling workflow, allowing users to receive architecture recommendations instantly.