Your blog doesn’t need a heavyweight stack to get great semantic search. In this guide, we’ll build a small, open‑source RAG pipeline that indexes your markdown posts and powers fast, relevant retrieval. You’ll see how to chunk content, pick embeddings, choose a vector store you control, and evaluate quality before you ship.
Goals
- Accurate results for “what’s that thing I wrote about tokens?”‑style queries.
- Simple, cheap infra you can run locally or on your favorite VPS.
- Clear evaluation so you know when changes help, not just hope.
Architecture at a glance
- Ingest: read markdown files, parse titles/tags/front‑matter.
- Chunk: split into passage‑sized chunks with headers preserved.
- Embed: generate dense vectors for each chunk.
- Store: write to a vector DB you control.
- Retrieve: hybrid search (BM25 + vector) or vector‑only with filters.
- Rank + Answer: return the best passages; optionally synthesize a summary.
Ingestion and chunking
Keep chunks short enough to match query intent but long enough to be meaningful. A good starting point:
- 300–600 tokens per chunk
- Overlap 50–80 tokens to preserve continuity
- Carry section headers into chunk metadata
Practical tips:
- Strip boilerplate (nav/footers) from pages.
- Keep
slug,title,section, andpositionin metadata. - Store the raw markdown and a cleaned text version.
Embeddings: small and strong
Start with a compact, high‑quality open model:
- bge‑small‑en or bge‑base‑en (General retrieval; strong bang‑for‑buck)
- all‑MiniLM‑L6‑v2 (Tiny, runs almost anywhere)
- jina‑embeddings‑v2‑base‑en (Good out‑of‑the‑box for English)
If you need on‑device, use ONNX or WebAssembly builds (e.g., @xenova/transformers). Otherwise, a small GPU/CPU VM is plenty.
Normalize vectors and pick a cosine metric unless your DB defaults differently; be consistent end‑to‑end.
Vector database options you control
- SQLite + VSS (sqlite‑vss): simplest possible setup, great for single‑node blogs.
- Postgres + pgvector: robust, familiar admin, good filtering and joins.
- Qdrant (or Milvus): feature‑rich standalone vector DB with HNSW indexes.
For most solo blogs, SQLite/pgvector is perfect. Qdrant shines once you want collections, payload filters, and distributed options.
Retrieval strategies
- Vector‑only: fast and simple; start here.
- Hybrid (BM25 + vector): combine keyword and semantic for best of both worlds.
- Filters: use tags or dates to scope retrieval.
A pragmatic hybrid approach:
- Run BM25 for top 100.
- Re‑rank those with the vector model.
- Return top 5–10 with titles, sections, and snippets.
Evaluating retrieval (don’t skip this)
Create a tiny labeled set—10–50 queries with expected passages. Measure:
- Recall@k (k=5,10): how often a correct passage is in the top‑k.
- MRR (mean reciprocal rank): how high the first correct passage appears.
- NDCG: graded relevance if you label multiple acceptable answers.
Iterate on chunking and model choice; keep the index constant while you isolate variables. When recall@5 is stable at your target, ship.
UX pattern that works
- Single search box; results show title → section → snippet.
- Keyboard navigation (j/k) and quick open of the underlying post.
- Optional answer synthesis only when top‑k confidence is high.
- Always show sources; never answer without them.
Rollout checklist
- Index job runs locally and in CI (protect against broken parsers).
- A one‑line command to rebuild the index.
- Evaluations live next to the indexer; one script prints recall@k.
- Search UI ships with analytics: queries, clicks, and abandonment.
- Document how to add a new collection (e.g., notes or docs) later.
Troubleshooting
- Irrelevant results? Reduce chunk size or try bge‑base‑en; add hybrid.
- Duplicates? Deduplicate by URL+position and trim overlap.
- Slow queries? Precompute HNSW/IVF indexes and limit payload.
- Wrong passages rank higher? Carry section headers into the chunk text.
Start small, measure, and iterate. A simple RAG stack beats a complicated one you don’t understand—and you can always layer in sophistication once you’re confident in the basics.