Building Intelligent Search Systems with LLM-Gym Architecture¶
Published: November 2024 • 6 min read
Introduction¶
Modern knowledge management requires more than traditional search - it needs intelligent systems that can understand context, semantics, and user intent. This article explores the architecture of LLM-Gym, a personal project demonstrating how to build sophisticated search and chat systems over curated content.
System Overview¶
LLM-Gym implements a three-layer orchestration pattern for intelligent content processing:
graph TD
A[GitHub Links] --> B[Data Processing Layer]
B --> C[Indexing Layer]
C --> D[App Engine]
D --> E[Search Interface]
D --> F[Chat Interface]
subgraph "Data Processing Layer"
B1[Webhook Handler] --> B2[Content Scraper]
B2 --> B3[Data Storage]
end
subgraph "Indexing Layer"
C1[Text Processing] --> C2[Vector Embeddings]
C2 --> C3[Full-text Indexing]
end
subgraph "App Engine"
D1[Query Router] --> D2[Hybrid Search]
D2 --> D3[Response Generator]
end
Architecture Patterns¶
Hybrid Search Strategy¶
The system combines two complementary search approaches:
graph LR
A[User Query] --> B[Query Processing]
B --> C[Meilisearch BM25]
B --> D[Qdrant Vector Search]
C --> E[Result Fusion]
D --> E
E --> F[Ranked Results]
Benefits of Hybrid Approach: - BM25 (Meilisearch): Excellent for exact keyword matching - Vector Search (Qdrant): Captures semantic similarity - Combined Results: Best of both worlds
Multi-Database Architecture¶
# Database specialization
databases = {
"postgres": "Structured data, relationships",
"qdrant": "Vector embeddings, semantic search",
"meilisearch": "Full-text search, faceted search"
}
Each database serves its optimal use case:
graph TB
A[Application Layer] --> B[Postgres]
A --> C[Qdrant]
A --> D[Meilisearch]
B --> B1[User Data<br/>Metadata<br/>Relationships]
C --> C1[Vector Embeddings<br/>Semantic Search<br/>Similarity Queries]
D --> D1[Full-text Search<br/>Faceted Search<br/>Exact Matching]
Core Components¶
Data Processing Layer¶
Handles incoming data with automated workflows:
class DataProcessor:
def process_github_webhook(self, payload):
# Extract content from GitHub links
content = self.scrape_content(payload.url)
# Store structured data
self.store_metadata(content)
# Queue for indexing
self.queue_for_indexing(content)
Indexing Layer¶
Transforms raw content into searchable formats:
sequenceDiagram
participant C as Content
participant P as Processor
participant V as Vector DB
participant F as Full-text DB
C->>P: Raw Content
P->>P: Text Cleaning
P->>P: Chunk Generation
P->>V: Store Embeddings
P->>F: Store Text Index
V-->>P: Success
F-->>P: Success
App Engine¶
Orchestrates search and chat interactions:
class SearchEngine:
def hybrid_search(self, query, k=10):
# Get results from both engines
vector_results = self.qdrant_search(query, k)
text_results = self.meilisearch_search(query, k)
# Fusion strategy
return self.reciprocal_rank_fusion(
vector_results,
text_results
)
Implementation Highlights¶
Containerized Development¶
# docker-compose.yml structure
services:
app:
build: .
depends_on: [postgres, qdrant, meilisearch]
postgres:
image: postgres:15
qdrant:
image: qdrant/qdrant
meilisearch:
image: getmeili/meilisearch
Modern Python Stack¶
# Key dependencies
dependencies = [
"dspy-ai", # LLM framework
"instructor", # Structured outputs
"prisma", # Database ORM
"qdrant-client", # Vector database
"meilisearch", # Search engine
]
Semantic Search Implementation¶
Embedding Strategy¶
def create_embeddings(content):
# Chunk content appropriately
chunks = self.chunk_content(content)
# Generate embeddings
embeddings = []
for chunk in chunks:
embedding = self.embedding_model.encode(chunk)
embeddings.append({
"vector": embedding,
"metadata": {
"content": chunk,
"source": content.url
}
})
return embeddings
Query Processing¶
graph TD
A[User Query] --> B[Query Analysis]
B --> C{Query Type}
C -->|Factual| D[Direct Search]
C -->|Conversational| E[Context Building]
C -->|Exploratory| F[Semantic Search]
D --> G[Meilisearch]
E --> H[Vector Search + LLM]
F --> I[Hybrid Search]
Chat Interface Integration¶
Context-Aware Responses¶
class ChatHandler:
def generate_response(self, query, search_results):
# Build context from search results
context = self.build_context(search_results)
# Generate response with LLM
response = self.llm.generate(
query=query,
context=context,
max_tokens=500
)
return response
Conversation Flow¶
graph LR
A[User Question] --> B[Search Content]
B --> C[Build Context]
C --> D[Generate Response]
D --> E[Return Answer + Sources]
subgraph "Context Building"
C1[Relevant Documents] --> C2[Summarization]
C2 --> C3[Context Window]
end
Scaling Considerations¶
Performance Optimization¶
- Caching Strategy: Cache frequent queries and embeddings
- Batch Processing: Process multiple documents efficiently
- Async Operations: Non-blocking I/O for web scraping
Resource Management¶
# Configuration for different environments
config = {
"development": {
"embedding_batch_size": 10,
"max_concurrent_requests": 5
},
"production": {
"embedding_batch_size": 100,
"max_concurrent_requests": 50
}
}
¶
# Configuration for different environments
config = {
"development": {
"embedding_batch_size": 10,
"max_concurrent_requests": 5
},
"production": {
"embedding_batch_size": 100,
"max_concurrent_requests": 50
}
}
This article is part of my ongoing series on AI engineering. Check out the writing section for more articles.