Skip to content

Building Intelligent Search Systems with LLM-Gym Architecture

Published: November 2024 • 6 min read

Introduction

Modern knowledge management requires more than traditional search - it needs intelligent systems that can understand context, semantics, and user intent. This article explores the architecture of LLM-Gym, a personal project demonstrating how to build sophisticated search and chat systems over curated content.

System Overview

LLM-Gym implements a three-layer orchestration pattern for intelligent content processing:

graph TD
    A[GitHub Links] --> B[Data Processing Layer]
    B --> C[Indexing Layer]
    C --> D[App Engine]
    D --> E[Search Interface]
    D --> F[Chat Interface]

    subgraph "Data Processing Layer"
        B1[Webhook Handler] --> B2[Content Scraper]
        B2 --> B3[Data Storage]
    end

    subgraph "Indexing Layer"
        C1[Text Processing] --> C2[Vector Embeddings]
        C2 --> C3[Full-text Indexing]
    end

    subgraph "App Engine"
        D1[Query Router] --> D2[Hybrid Search]
        D2 --> D3[Response Generator]
    end

Architecture Patterns

Hybrid Search Strategy

The system combines two complementary search approaches:

graph LR
    A[User Query] --> B[Query Processing]
    B --> C[Meilisearch BM25]
    B --> D[Qdrant Vector Search]
    C --> E[Result Fusion]
    D --> E
    E --> F[Ranked Results]

Benefits of Hybrid Approach: - BM25 (Meilisearch): Excellent for exact keyword matching - Vector Search (Qdrant): Captures semantic similarity - Combined Results: Best of both worlds

Multi-Database Architecture

# Database specialization
databases = {
    "postgres": "Structured data, relationships",
    "qdrant": "Vector embeddings, semantic search",
    "meilisearch": "Full-text search, faceted search"
}

Each database serves its optimal use case:

graph TB
    A[Application Layer] --> B[Postgres]
    A --> C[Qdrant]
    A --> D[Meilisearch]

    B --> B1[User Data<br/>Metadata<br/>Relationships]
    C --> C1[Vector Embeddings<br/>Semantic Search<br/>Similarity Queries]
    D --> D1[Full-text Search<br/>Faceted Search<br/>Exact Matching]

Core Components

Data Processing Layer

Handles incoming data with automated workflows:

class DataProcessor:
    def process_github_webhook(self, payload):
        # Extract content from GitHub links
        content = self.scrape_content(payload.url)

        # Store structured data
        self.store_metadata(content)

        # Queue for indexing
        self.queue_for_indexing(content)

Indexing Layer

Transforms raw content into searchable formats:

sequenceDiagram
    participant C as Content
    participant P as Processor
    participant V as Vector DB
    participant F as Full-text DB

    C->>P: Raw Content
    P->>P: Text Cleaning
    P->>P: Chunk Generation
    P->>V: Store Embeddings
    P->>F: Store Text Index
    V-->>P: Success
    F-->>P: Success

App Engine

Orchestrates search and chat interactions:

class SearchEngine:
    def hybrid_search(self, query, k=10):
        # Get results from both engines
        vector_results = self.qdrant_search(query, k)
        text_results = self.meilisearch_search(query, k)

        # Fusion strategy
        return self.reciprocal_rank_fusion(
            vector_results, 
            text_results
        )

Implementation Highlights

Containerized Development

# docker-compose.yml structure
services:
  app:
    build: .
    depends_on: [postgres, qdrant, meilisearch]

  postgres:
    image: postgres:15

  qdrant:
    image: qdrant/qdrant

  meilisearch:
    image: getmeili/meilisearch

Modern Python Stack

# Key dependencies
dependencies = [
    "dspy-ai",          # LLM framework
    "instructor",       # Structured outputs
    "prisma",          # Database ORM
    "qdrant-client",   # Vector database
    "meilisearch",     # Search engine
]

Semantic Search Implementation

Embedding Strategy

def create_embeddings(content):
    # Chunk content appropriately
    chunks = self.chunk_content(content)

    # Generate embeddings
    embeddings = []
    for chunk in chunks:
        embedding = self.embedding_model.encode(chunk)
        embeddings.append({
            "vector": embedding,
            "metadata": {
                "content": chunk,
                "source": content.url
            }
        })

    return embeddings

Query Processing

graph TD
    A[User Query] --> B[Query Analysis]
    B --> C{Query Type}
    C -->|Factual| D[Direct Search]
    C -->|Conversational| E[Context Building]
    C -->|Exploratory| F[Semantic Search]

    D --> G[Meilisearch]
    E --> H[Vector Search + LLM]
    F --> I[Hybrid Search]

Chat Interface Integration

Context-Aware Responses

class ChatHandler:
    def generate_response(self, query, search_results):
        # Build context from search results
        context = self.build_context(search_results)

        # Generate response with LLM
        response = self.llm.generate(
            query=query,
            context=context,
            max_tokens=500
        )

        return response

Conversation Flow

graph LR
    A[User Question] --> B[Search Content]
    B --> C[Build Context]
    C --> D[Generate Response]
    D --> E[Return Answer + Sources]

    subgraph "Context Building"
        C1[Relevant Documents] --> C2[Summarization]
        C2 --> C3[Context Window]
    end

Scaling Considerations

Performance Optimization

  1. Caching Strategy: Cache frequent queries and embeddings
  2. Batch Processing: Process multiple documents efficiently
  3. Async Operations: Non-blocking I/O for web scraping

Resource Management

# Configuration for different environments
config = {
    "development": {
        "embedding_batch_size": 10,
        "max_concurrent_requests": 5
    },
    "production": {
        "embedding_batch_size": 100,
        "max_concurrent_requests": 50
    }
}

This article is part of my ongoing series on AI engineering. Check out the writing section for more articles.