rag
vectors
oss
reference

Build a RAG Search for Your Blog with Open‑Source Tools

8/22/2025

Your blog doesn’t need a heavyweight stack to get great semantic search. In this guide, we’ll build a small, open‑source RAG pipeline that indexes your markdown posts and powers fast, relevant retrieval. You’ll see how to chunk content, pick embeddings, choose a vector store you control, and evaluate quality before you ship.

Goals

  • Accurate results for “what’s that thing I wrote about tokens?”‑style queries.
  • Simple, cheap infra you can run locally or on your favorite VPS.
  • Clear evaluation so you know when changes help, not just hope.

Architecture at a glance

  • Ingest: read markdown files, parse titles/tags/front‑matter.
  • Chunk: split into passage‑sized chunks with headers preserved.
  • Embed: generate dense vectors for each chunk.
  • Store: write to a vector DB you control.
  • Retrieve: hybrid search (BM25 + vector) or vector‑only with filters.
  • Rank + Answer: return the best passages; optionally synthesize a summary.

Ingestion and chunking

Keep chunks short enough to match query intent but long enough to be meaningful. A good starting point:

  • 300–600 tokens per chunk
  • Overlap 50–80 tokens to preserve continuity
  • Carry section headers into chunk metadata

Practical tips:

  • Strip boilerplate (nav/footers) from pages.
  • Keep slug, title, section, and position in metadata.
  • Store the raw markdown and a cleaned text version.

Embeddings: small and strong

Start with a compact, high‑quality open model:

  • bge‑small‑en or bge‑base‑en (General retrieval; strong bang‑for‑buck)
  • all‑MiniLM‑L6‑v2 (Tiny, runs almost anywhere)
  • jina‑embeddings‑v2‑base‑en (Good out‑of‑the‑box for English)

If you need on‑device, use ONNX or WebAssembly builds (e.g., @xenova/transformers). Otherwise, a small GPU/CPU VM is plenty.

Normalize vectors and pick a cosine metric unless your DB defaults differently; be consistent end‑to‑end.

Vector database options you control

  • SQLite + VSS (sqlite‑vss): simplest possible setup, great for single‑node blogs.
  • Postgres + pgvector: robust, familiar admin, good filtering and joins.
  • Qdrant (or Milvus): feature‑rich standalone vector DB with HNSW indexes.

For most solo blogs, SQLite/pgvector is perfect. Qdrant shines once you want collections, payload filters, and distributed options.

Retrieval strategies

  • Vector‑only: fast and simple; start here.
  • Hybrid (BM25 + vector): combine keyword and semantic for best of both worlds.
  • Filters: use tags or dates to scope retrieval.

A pragmatic hybrid approach:

  1. Run BM25 for top 100.
  2. Re‑rank those with the vector model.
  3. Return top 5–10 with titles, sections, and snippets.

Evaluating retrieval (don’t skip this)

Create a tiny labeled set—10–50 queries with expected passages. Measure:

  • Recall@k (k=5,10): how often a correct passage is in the top‑k.
  • MRR (mean reciprocal rank): how high the first correct passage appears.
  • NDCG: graded relevance if you label multiple acceptable answers.

Iterate on chunking and model choice; keep the index constant while you isolate variables. When recall@5 is stable at your target, ship.

UX pattern that works

  • Single search box; results show title → section → snippet.
  • Keyboard navigation (j/k) and quick open of the underlying post.
  • Optional answer synthesis only when top‑k confidence is high.
  • Always show sources; never answer without them.

Rollout checklist

  • Index job runs locally and in CI (protect against broken parsers).
  • A one‑line command to rebuild the index.
  • Evaluations live next to the indexer; one script prints recall@k.
  • Search UI ships with analytics: queries, clicks, and abandonment.
  • Document how to add a new collection (e.g., notes or docs) later.

Troubleshooting

  • Irrelevant results? Reduce chunk size or try bge‑base‑en; add hybrid.
  • Duplicates? Deduplicate by URL+position and trim overlap.
  • Slow queries? Precompute HNSW/IVF indexes and limit payload.
  • Wrong passages rank higher? Carry section headers into the chunk text.

Start small, measure, and iterate. A simple RAG stack beats a complicated one you don’t understand—and you can always layer in sophistication once you’re confident in the basics.