純手工極簡向量資料庫：使用 Top-K

．作者：Jollen／
．日期：Mon Nov 17 2025 09:00:00 GMT+0900 (日本標準時間)

有了前幾篇文章的觀念，以及「Node.js & LLM 原理與實務第 4 章」的基礎，我們就可以「純手工」來打造一個「極簡向量資料庫」。

以下是會使用到的演算法：

使用 Top-K 搜尋
使用 cosine 向量距離計算
支援新增文件、文件向量距離

先列出完整範例實作 vectorStoreV2.js：

// src/services/vectorStoreV2.js
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Top-K + Cosine 多欄位的 Vector Store
export class VectorStoreV2 {
  constructor() {
    this.docs = [];
  }

  async addDocument(id, text) {
    const embedding = await this.createEmbedding(text);
    this.docs.push({ id, text, embedding });
  }

  async createEmbedding(text) {
    const res = await client.embeddings.create({
      model: 'text-embedding-3-small',
      input: text
    });
    return res.data[0].embedding;
  }

  cosine(a, b) {
    const dot = a.reduce((s, v, i) => s + v * b[i], 0);
    const norm =
      Math.sqrt(a.reduce((s, v) => s + v * v, 0)) *
      Math.sqrt(b.reduce((s, v) => s + v * v, 0));
    return dot / norm;
  }

  // Top-K：可取得 distance/score 排序資訊
  topK(queryEmbedding, k = 3) {
    const scored = this.docs.map(d => ({
      id: d.id,
      text: d.text,
      score: this.cosine(d.embedding, queryEmbedding)
    }));

    return scored
      .sort((a, b) => b.score - a.score)
      .slice(0, k);
  }
}

這個極簡向量資料庫，有 3 個必備功能：

addDocument：將文件（例如：一段文字）放入資料庫
createEmbedding：使用 ‘text-embedding-3-small’ 建立文件的向量
topK：整合先前的 top-K 演算法

此範例簡化向量資料庫的實作，在 addDocument 裡先建立文件向量，再放入資料庫；當然，上述範例並未使用真正資料庫系統，僅使用陣列（記憶體）來表示。

使用方法

假設公司內部有一套退貨政策，我們先將退貨政策整理為「知識條例」如下：

商品可於十四天內退貨。
退貨時需保持商品完整並附上發票。
運送時間約三到五個工作天。
若商品有瑕疵，可於七天內辦理退貨與換貨。
退款流程需於系統內提出申請。

接著，把這 5 點知識加入「極簡向量資料庫」：

import { VectorStoreV2 } from './vectorStoreV2.js';

// 建立向量資料庫
const store = new VectorStoreV2();
await store.addDocument(1, '商品可於十四天內退貨。');
await store.addDocument(2, '退貨時需保持商品完整並附上發票。');
await store.addDocument(3, '運送時間約三到五個工作天。');
await store.addDocument(4, '若商品有瑕疵，可於七天內辦理退貨與換貨。');
await store.addDocument(5, '退款流程需於系統內提出申請。');

到這裡，就完成一個極簡向量知識庫的建立了。針對一些非常簡單的 AI 應用（任務目標非常明確），這樣的實作其實也可以滿足需求。

有了加入文件的功能後，後續再加入「語境擷取」與「LLM 文字生成」的功能，就能完備極簡資料庫的功能。

Also read

Tags: rag, llm, text-embedding, top-k, vector-store