Combining vector similarity with traditional keyword search (BM25) to catch specific technical terms or product IDs.
: Extensive discussions on production themes are available in the arXiv preprint "Practitioners’ Discussions on Building LLM-based Applications for Production".
In traditional software, code either compiles or it doesn't. In LLMs, outputs are probabilistic. is the hardest part of production.
: Grounding models with external data to prevent hallucinations.
Production LLMs can be expensive and slow. Optimization strategies include: