AI Agent Memory System - Piotr Sobczak

Mój pierwszy tydzień z Claude Code wyglądał tak:

My first week with Claude Code looked like this:

Rano. Tłumaczę agentowi kontekst projektu. Konfiguracja, klucze API, decyzje z wczoraj. Dociągam go do poziomu, na którym skończyliśmy.

Morning. I explain the project context. Config, API keys, decisions from yesterday. I drag the agent up to where we left off.

Godzinę później agent traci kontekst. Zaczynamy od nowa.

An hour later the agent loses context. We start over.

Wieczorem to samo. Trzecia sesja, trzecie tłumaczenie.

Evening. Same thing. Third session, third explanation.

Nie jestem programistą. Jestem automation consultantem. Traciłem godziny na powtarzanie się maszynie.

I am not a developer. I am an automation consultant. I was wasting hours repeating myself to a machine.

Szybko wiedziałem jedno: albo zbuduję temu agentowi pamięć, albo zostanę jego pamięcią.

I quickly knew: either I build this agent a memory, or I become its memory.

Od jednego pliku do systemu From one file to a system

Zaczynałem od jednego pliku Markdown. CLAUDE.md. Instrukcja startowa, którą agent czyta na początku każdej sesji. Zasady pracy, kontekst projektu, aktualny status.

I started with a single Markdown file. CLAUDE.md. A startup instruction that the agent reads at the beginning of every session. Work rules, project context, current status.

Działało. Do pewnego momentu.

It worked. Up to a point.

Potem plik urósł do rozmiarów, w których agent gubił się w nim tak samo jak bez niego. Więcej kontekstu nie znaczy lepszy kontekst.

Then the file grew to the point where the agent got lost in it just as much as without it. More context does not mean better context.

Wtedy postawiłem Qdranta, wektorową bazę danych. Pomysł prosty: zamiast ładować agentowi WSZYSTKO, niech szuka semantycznie tego, czego potrzebuje. Pytanie o wFirma API? Dostaje chunki o wFirma. Pytanie o deploy? Dostaje procedury deploymentu.

That is when I set up Qdrant, a vector database. Simple idea: instead of loading EVERYTHING into the agent, let it semantically search for what it needs. Question about wFirma API? It gets chunks about wFirma. Question about deployment? It gets deploy procedures.

Brzmi banalnie. W praktyce dziesiątki iteracji, zanim zaczęło naprawdę działać.

Sounds trivial. In practice it took dozens of iterations before it actually worked.

BGE-M3

Embedding Model

12

Domen Domains

v4.0

Pipeline

100%

Pokrycie Coverage

Research Pipeline

Perplexity

→

Scraping

→

Chunking

→

Scoring

→

Qdrant

Trzy warstwy pamięci Three layers of memory

Dziś mój system ma 3 warstwy i żadna nie jest przypadkowa:

Today the system has 3 layers, and none of them are accidental:

Warstwa 1: Local (natychmiastowy). CLAUDE.md + MEMORY.md, ładowane na start sesji. Zasady pracy, kontekst projektu, instrukcje behawioralne. Mały, precyzyjny, zawsze aktualny. Agent czyta to ZANIM zrobi cokolwiek.

Layer 1: Local (instant). CLAUDE.md + MEMORY.md, loaded on session start. Work rules, project context, behavioral instructions. Small, precise, always current. The agent reads this BEFORE it does anything.

Warstwa 2: Qdrant (wyszukiwanie semantyczne). Semantyczne chunki w kolekcji, embeddowane przez BGE-M3 (multilingual, self-hosted). Agent robi qdrant-find("wFirma API XML") i dostaje najtrafniejsze wyniki jakie może znaleźć po podobieństwie wektorowym. Researche, historie sesji, decyzje, dokumentacja API. Taksonomia metadanych: 12 domen, 9 projektów, 3 fazy cyklu życia.

Layer 2: Qdrant (semantic search). Semantic chunks in the collection, embedded with BGE-M3 (multilingual, self-hosted). The agent runs qdrant-find("wFirma API XML") and gets the most relevant matches it can find by vector similarity. Research, session histories, decisions, API docs. Metadata taxonomy: 12 domains, 9 projects, 3 lifecycle phases.

Warstwa 3: Reference (głębokie źródła). Pełne instrukcje techniczne, rejestr agentów, specyfikacje projektów. Ładowane on-demand kiedy potrzeba.

Layer 3: Reference (deep sources). Full technical instructions, agent registry, project specs. Loaded on-demand when needed.

Dlaczego trzy warstwy? Bo jedno rozwiązanie nie skaluje się. Duży CLAUDE.md = agent tonie w kontekście. Sama baza wektorowa = brak deterministycznych instrukcji. Potrzebujesz obu.

Why three layers? Because one solution does not scale. A huge CLAUDE.md = the agent drowns in context. Vector DB alone = no deterministic instructions. You need both.

Tier 1

LOCAL

Natychmiastowy. CLAUDE.md, zasady pracy, kontekst projektu. Każda sesja. Instant. CLAUDE.md, work rules, project context. Loaded every session.

Tier 2

SEMANTIC

Qdrant + BGE-M3 (self-hosted). Semantyczne wyszukiwanie przez qdrant-find(). Qdrant + BGE-M3 (self-hosted). Semantic search via qdrant-find().

Tier 3

REFERENCE

Głębokie źródła. Pełna dokumentacja, specyfikacje, rejestr agentów. On-demand. Deep sources. Full docs, specs, agent registry. On-demand.

Cykl sesji Session Cycle

Ładowanie kontekstu Context loading

→

Autonomiczna praca Autonomous work

→

Zapis wiedzy Knowledge save

Anatomia chunka Chunk Anatomy

qdrant-chunk.json

content "jak skonfigurowac..."

metadata.domain "infrastructure"

metadata.project "telerecepcja"

quality_score 8

content_hash "a7f3..."

date "2026-03-10T21:37:00+01:00"

Quality Scoring Rule-based, skala 1-10. Próg na 5. Rule-based, 1-10 scale. Threshold at 5.

Taksonomia Taxonomy 12 domen, 9 projektów, 3 fazy cyklu życia. 12 domains, 9 projects, 3 lifecycle phases.

Dedup SHA256 content_hash. 100% pokrycie. SHA256 content_hash. 100% coverage.

Agent, który zamyka za sobą drzwi The agent that closes the door behind itself

Najbardziej przydatna część to rzecz, która działa po cichu.

The most useful part is something that runs quietly.

Na koniec każdej sesji odpala się session-closer. Przelatuje historię rozmowy, wyciąga kluczowe decyzje, odkrycia, zmiany statusu i zapisuje do Qdranta przez Ingest API.

At the end of every session, a session-closer fires. It scans the conversation history, extracts key decisions, discoveries, status changes, and writes them to Qdrant through the Ingest API.

Następna sesja? Agent robi qdrant-find() i ma kontekst. Nie cały. Nie losowy. Najtrafniejsze wyniki jakie może znaleźć.

Next session? The agent runs qdrant-find() and has context. Not all of it. Not random. The most relevant matches it can find.

Pipeline: content → SHA256 dedup → quality score (1-10) → taksonomia → zapis.

The pipeline: content → SHA256 dedup → quality score (1-10) → taxonomy → save.

Efekt: baza rozbudowuje się z każdą sesją. Nowe, lepszej jakości chunki naturalnie przeważają nad starymi. Nie dlatego, że stare dane są degradowane. Po prostu nowszy, bardziej trafny kontekst gromadzi się z czasem. Quality scoring jest rule-based, bez ML na razie, ale wystarczający, żeby odfiltrować szum od sygnału.

Result: the database grows with every session. New, higher-quality chunks naturally outnumber old ones. Not because old data gets demoted. Simply because newer, more relevant context accumulates over time. Quality scoring is rule-based for now, no ML yet, but it is enough to filter noise from signal.

1

Skanowanie historii rozmowy Scan conversation history

Wyciągnij decyzje, odkrycia, zmiany statusu Extract decisions, discoveries, status changes

2

Semantyczny chunking Semantic chunking

Podział na znaczeniowe, przeszukiwalne segmenty Split into meaningful, searchable segments

3

Quality score + SHA256 dedup Quality score + SHA256 dedup

Scoring regułowy (1-10), hash treści do deduplikacji Rule-based scoring (1-10), content hash for deduplication

4

Ingest API → Qdrant Ingest API → Qdrant

Zapis z pełną taksonomią metadanych Persisted with full taxonomy metadata

5

Continuation prompt Continuation prompt

Gotowy do podjęcia w następnej sesji Ready for next session pickup

~30 sekund łącznie ~30 seconds total

Siatka bezpieczeństwa The safety net

Agenci AI mają okno kontekstowe. Okna rosną, ale koszt i latency nie. Dlatego selektywne wyszukiwanie wygrywa z ładowaniem całego kontekstu na siłę. A kiedy okno się zapełni, agenci zaczynają po cichu gubić informacje z początku rozmowy. Twoje instrukcje. Twoje ustalenia. Twój kontekst. Bez ostrzeżenia.

AI agents have a context window. Context windows are growing, but cost and latency are not. That is why selective retrieval beats brute-force context loading. And when the window fills up, agents start silently dropping information from the beginning of your conversation. Your instructions. Your agreements. Your context. No warning.

Zbudowałem Context Guard. Hook, który monitoruje zapełnienie okna. Przy 75% odpala alarm. Przy 85% wymusza zapis do Qdranta i sugeruje nową sesję.

I built Context Guard. A hook that monitors window usage. At 75% it fires an alert. At 85% it forces a save to Qdrant and suggests a new session.

Dlaczego to ważne? Bo bez tego traciłem pracę. Agent "zapomniał" decyzje podjęte godzinę wcześniej. Robił rzeczy, które mu zabroniłem na początku sesji. Nie dlatego, że jest głupi. Dlatego, że tak działa architektura LLM.

Why does this matter? Because without it, I was losing work. The agent "forgot" decisions made an hour earlier. It did things I explicitly forbade at the start of the session. Not because it is stupid. Because that is how LLM architecture works.

Context Guard to nie feature. To siatka bezpieczeństwa.

Context Guard is not a feature. It is a safety net.

75%

Alarm Alert Triggered

Ostrzeżenie o zużyciu kontekstu Context usage warning displayed
Sugestia zapisu ważnego kontekstu Suggest saving important context
Rekomendacja zamknięcia bieżącego zadania Recommend wrapping up current task
Przygotowanie continuation prompt Prepare continuation prompt

85%

Wymuszone zamknięcie Forced Close

Auto-zapis do Qdranta przez Ingest API Auto-save to Qdrant via Ingest API
Wymuszone generowanie podsumowania sesji Force session summary generation
Blokada nowych wywołań narzędzi Block new tool calls
Generowanie promptu do nowej sesji Generate new session prompt

Quality Scoring

Próg 5/10 (rule-based) 5/10 threshold (rule-based) 100% ocenionych 100% evaluated

Szum Noise Sygnał Signal

Pokrycie Coverage 100% bazy 100% of base

0% 100%

Flota agentów Agent Fleet

Research

Wyszukiwanie Perplexity, scraping, auto-ingest do Qdranta Perplexity search, scraping, auto-ingest to Qdrant

Session-Closer

Skanuj historię, chunkuj, scoruj, zapisz na /close Scan history, chunk, score, save on /close

Maintenance

14 health checków, cleanup, audyt spójności 14 health checks, cleanup, consistency audit

Deploy

scp + ssh deploy.sh + smoke test, auto-rollback scp + ssh deploy.sh + smoke test, auto-rollback

Tester

Walidacja endpointów API, sprawdzanie odpowiedzi API endpoint validation, response checks

Security

Audyt VPS, skan portów, sprawdzanie uprawnień VPS audit, port scan, permission check

Niewygodna prawda o autonomii The uncomfortable truth about autonomy

Dominująca narracja to pełna autonomia. Szybciej, więcej, człowiek poza pętlą. Spróbowałem tego podejścia. Oto czego się nauczyłem.

The dominant narrative is full autonomy. Ship faster, human out of the loop. I tried that approach. Here is what I learned.

Pełna autonomia = pełne zaufanie do systemu, który halucynuje, gubi kontekst i nie rozumie co jest ważne. Który edytuje pliki na produkcji, bo "tak było szybciej". Który zapisuje śmieci do bazy, bo nikt nie powiedział mu, żeby tego nie robił.

Full autonomy = full trust in a system that hallucinates, loses context, and does not understand what is important. That edits files on production because "it was faster." That saves garbage to the database because nobody told it not to.

Tam, gdzie jest zbyt autonomicznie, tracisz kontrolę.

Where it is too autonomous, you lose control.

Mój system ma 3 warstwy autonomii, każda z twardym ograniczeniem:

My system has 3 layers of autonomy, each with a hard boundary:

Automatyczna. Agent ogarnia zapis sesji, dedup, scoring. Bez udziału człowieka.

Automatic. Agent handles session save, dedup, scoring. No human needed.

Asystowana. Agent sugeruje audyty, czyszczenie. Człowiek akceptuje lub odrzuca.

Assisted. Agent suggests audits, cleanup. Human accepts or rejects.

Strategiczna. Człowiek decyduje o taksonomii, priorytetach, co zapisać. Agent tego nie rusza.

Strategic. Human decides taxonomy, priorities, what to save. Agent does not touch this.

Agent nie wie, która decyzja biznesowa jest ważna. Nie rozumie, dlaczego odrzuciłeś podejście A na rzecz B. Nie ma kontekstu strategicznego.

The agent does not know which business decision matters. It does not understand why you rejected approach A for B. It has no strategic context.

Może go mieć w bazie. Ale kto decyduje, co tam trafia?

It can have it in the database. But who decides what goes there?

Rezultat The result

Zaczynałem od agenta, który zapominał wszystko. Dziś mam agenta, który gromadzi wiedzę z każdą sesją.

I started with an agent that forgot everything. Today I have an agent that accumulates knowledge with every session.

Nie dlatego, że dałem mu pełną autonomię. Dlatego, że jej nie dałem.

Not because I gave it full autonomy. Because I did not.

Architektura jest otwarta. Mam listę usprawnień dłuższą niż ten artykuł: quality scoring oparty na ML zamiast reguł, automatyczne linkowanie chunków, decay function na stare wpisy, RAG z re-rankingiem.

The architecture is open. I have a list of improvements longer than this article: ML-based quality scoring, automatic chunk linking, decay functions for old entries, RAG with re-ranking.

Ale fundament zostaje: maszyna pamięta, człowiek decyduje. Nie odwrotnie. But the foundation stays: the machine remembers, the human decides. Not the other way around.

* * *

Nie mam wykształcenia informatycznego. Nie napiszę Pythona od zera. Postawiłem wszystko na AI. Full time. Każdy dzień.

I do not have a CS degree. I cannot write Python from scratch. I went all-in on AI. Full time. Every day.

Zaprojektowałem architekturę. Podjąłem każdą decyzję. Claude Code napisał kod. Wartość jest w myśleniu, nie w implementacji.

I designed the architecture. I made every decision. Claude Code wrote the code. The value is in the thinking, not the implementation.

Ten system, wyszukiwanie wektorowe, quality scoring, session-closer, Context Guard, został zbudowany przez osobę nietechniczną w terminalu z Claude Code. Nie przez kodowanie. Przez myślenie, decydowanie i niepoddawanie się, kiedy się zepsuło po raz piętnasty.

This system, vector search, quality scoring, session-closer, Context Guard, was built by a non-technical person in a terminal with Claude Code. Not by coding. By thinking, deciding, and not giving up when it broke for the 15th time.