Combining vector similarity with traditional keyword search (BM25) to catch specific technical terms or product IDs.

: Extensive discussions on production themes are available in the arXiv preprint "Practitioners’ Discussions on Building LLM-based Applications for Production".

In traditional software, code either compiles or it doesn't. In LLMs, outputs are probabilistic. is the hardest part of production.

: Grounding models with external data to prevent hallucinations.

Production LLMs can be expensive and slow. Optimization strategies include: