AI Agent Memory System - Piotr Sobczak

Mój pierwszy tydzień z Claude Code wyglądał tak:

My first week with Claude Code looked like this:

Rano. Tłumaczę agentowi kontekst projektu. Konfiguracja, klucze API, decyzje z wczoraj. Dociągam go do poziomu, na którym skończyliśmy.

Morning. I explain the project context. Config, API keys, decisions from yesterday. I drag the agent up to where we left off.

Godzinę później agent traci kontekst. Zaczynamy od nowa.

An hour later the agent loses context. We start over.

Wieczorem to samo. Trzecia sesja, trzecie tłumaczenie.

Evening. Same thing. Third session, third explanation.

Nie jestem programistą. Projektuję logikę, architekturę, funkcjonalność i flow. A mimo to traciłem godziny na powtarzanie się maszynie.

I am not a developer. I design logic, architecture, functionality and flow. And yet I was wasting hours repeating myself to a machine.

Szybko wiedziałem jedno: albo zbuduję temu agentowi pamięć, albo zostanę jego pamięcią.

I quickly knew: either I build this agent a memory, or I become its memory.

Od jednego pliku do systemu From one file to a system

Zaczynałem od jednego pliku Markdown. CLAUDE.md. Instrukcja startowa, którą agent czyta na początku każdej sesji. Zasady pracy, kontekst projektu, aktualny status.

I started with a single Markdown file. CLAUDE.md. A startup instruction that the agent reads at the beginning of every session. Work rules, project context, current status.

Działało. Do pewnego momentu.

It worked. Up to a point.

Potem plik urósł do rozmiarów, w których agent gubił się w nim tak samo jak bez niego. Więcej kontekstu nie znaczy lepszy kontekst.

Then the file grew to the point where the agent got lost in it just as much as without it. More context does not mean better context.

Wtedy postawiłem Qdranta, wektorową bazę danych. Pomysł prosty: zamiast ładować agentowi WSZYSTKO, niech szuka semantycznie tego, czego potrzebuje. Pytanie o wFirma API? Dostaje chunki o wFirma. Pytanie o deploy? Dostaje procedury deploymentu.

That is when I set up Qdrant, a vector database. Simple idea: instead of loading EVERYTHING into the agent, let it semantically search for what it needs. Question about wFirma API? It gets chunks about wFirma. Question about deployment? It gets deploy procedures.

Brzmi banalnie. W praktyce dziesiątki iteracji, zanim zaczęło działać.

Sounds trivial. In practice it took dozens of iterations before it worked.

BGE-M3

Embedding Model

12

Domen Domains

v4.0

Pipeline

100%

Pokrycie Coverage

Research Pipeline

Perplexity

→

Scraping

→

Chunking

→

Scoring

→

Qdrant

Trzy warstwy pamięci Three layers of memory

Dziś mój system ma 3 warstwy i żadna nie jest przypadkowa:

Today the system has 3 layers, and none of them are accidental:

Warstwa 1: Local (natychmiastowy). CLAUDE.md + MEMORY.md, ładowane na start sesji. Zasady pracy, kontekst projektu, instrukcje behawioralne. Mały, precyzyjny, zawsze aktualny. Agent czyta to ZANIM zrobi cokolwiek.

Layer 1: Local (instant). CLAUDE.md + MEMORY.md, loaded on session start. Work rules, project context, behavioral instructions. Small, precise, always current. The agent reads this BEFORE it does anything.

Warstwa 2: Qdrant (wyszukiwanie semantyczne). Semantyczne chunki embeddowane przez BGE-M3 (multilingual, self-hosted). Agent robi qdrant-find("wFirma API XML") i dostaje najtrafniejsze wyniki po podobieństwie wektorowym.

Layer 2: Qdrant (semantic search). Semantic chunks embedded with BGE-M3 (multilingual, self-hosted). The agent runs qdrant-find("wFirma API XML") and gets the most relevant matches by vector similarity.

W bazie: wyniki badań, historie sesji, decyzje, dokumentacja API. Taksonomia metadanych: 12 domen, 9 projektów, 3 fazy cyklu życia.

In the database: research results, session histories, decisions, API docs. Metadata taxonomy: 12 domains, 9 projects, 3 lifecycle phases.

Warstwa 3: Reference (głębokie źródła). Pełne instrukcje techniczne, rejestr agentów, specyfikacje projektów. Ładowane on-demand, żeby nie obciążać okna kontekstowego zbędnym materiałem.

Layer 3: Reference (deep sources). Full technical instructions, agent registry, project specs. Loaded on-demand to avoid bloating the context window with unnecessary material.

Dlaczego trzy warstwy? Bo jedno rozwiązanie nie skaluje się. Duży CLAUDE.md = agent tonie w kontekście. Sama baza wektorowa = brak deterministycznych instrukcji. Potrzebowałem obu.

Why three layers? Because one solution does not scale. A huge CLAUDE.md = the agent drowns in context. Vector DB alone = no deterministic instructions. I needed both.

Tier 1

LOCAL

Natychmiastowy. CLAUDE.md, zasady pracy, kontekst projektu. Każda sesja. Instant. CLAUDE.md, work rules, project context. Loaded every session.

Tier 2

SEMANTIC

Qdrant + BGE-M3 (self-hosted). Semantyczne wyszukiwanie przez qdrant-find(). Qdrant + BGE-M3 (self-hosted). Semantic search via qdrant-find().

Tier 3

REFERENCE

Głębokie źródła. Pełna dokumentacja, specyfikacje, rejestr agentów. On-demand. Deep sources. Full docs, specs, agent registry. On-demand.

Cykl sesji Session Cycle

Ładowanie kontekstu Context loading

→

Autonomiczna praca Autonomous work

→

Zapis wiedzy Knowledge save

Anatomia chunka Chunk Anatomy

qdrant-chunk.json

content "jak skonfigurowac..."

metadata.domain "infrastructure"

metadata.project "telerecepcja"

quality_score 8

content_hash "a7f3..."

date "2026-03-10T21:37:00+01:00"

Quality Scoring Scoring LLM, skala 1-10. Próg na 5. LLM scoring, 1-10 scale. Threshold at 5.

Taksonomia Taxonomy 12 domen, 9 projektów, 3 fazy cyklu życia. 12 domains, 9 projects, 3 lifecycle phases.

Dedup SHA256 content_hash. 100% pokrycie. SHA256 content_hash. 100% coverage.

Agent, który zamyka za sobą drzwi The agent that closes the door behind itself

Najbardziej przydatna część systemu to ta, która działa po cichu.

The most useful part of the system is the one that runs quietly.

Na koniec każdej sesji odpala się session-closer. Przelatuje historię rozmowy, wyciąga kluczowe decyzje, odkrycia, zmiany statusu i zapisuje do Qdranta przez Ingest API.

At the end of every session, a session-closer fires. It scans the conversation history, extracts key decisions, discoveries, status changes, and writes them to Qdrant through the Ingest API.

Następna sesja? Agent robi qdrant-find() i ma kontekst. Nie cały. Nie losowy. Najtrafniejsze wyniki jakie może znaleźć.

Next session? The agent runs qdrant-find() and has context. Not all of it. Not random. The most relevant matches it can find.

Pipeline: content → SHA256 dedup → quality score (1-10) → taksonomia → zapis.

The pipeline: content → SHA256 dedup → quality score (1-10) → taxonomy → save.

Efekt: baza rozbudowuje się z każdą sesją. Nowe chunki naturalnie przeważają nad starymi. Nie przez degradację. Przez akumulację. Quality scoring robi LLM. Wystarczający, żeby odfiltrować szum od sygnału.

Result: the database grows with every session. New chunks naturally outnumber old ones. Not by demotion. By accumulation. An LLM handles quality scoring. Enough to filter noise from signal.

1

Skanowanie historii rozmowy Scan conversation history

Wyciągnij decyzje, odkrycia, zmiany statusu Extract decisions, discoveries, status changes

2

Semantyczny chunking Semantic chunking

Podział na znaczeniowe, przeszukiwalne segmenty Split into meaningful, searchable segments

3

Quality score + SHA256 dedup Quality score + SHA256 dedup

Scoring przez LLM (1-10), hash treści do deduplikacji LLM scoring (1-10), content hash for deduplication

4

Ingest API → Qdrant Ingest API → Qdrant

Zapis z pełną taksonomią metadanych Persisted with full taxonomy metadata

5

Continuation prompt Continuation prompt

Gotowy do podjęcia w następnej sesji Ready for next session pickup

~30 sekund łącznie ~30 seconds total

Siatka bezpieczeństwa The safety net

Claude Code dostał milion tokenów kontekstu. Wszyscy się cieszą. Nikt nie pyta ile z tego realnie działa.

Claude Code got a million context tokens. Everyone celebrated. Nobody asked how much of it actually works.

Zbudowałem Context Guard. Hook, który monitoruje zapełnienie okna i automatycznie blokuje sesję, gdy jakość kodowania spada. Przy 200k kontekstu próg wynosił 75%. Przy milionie? 35%. Bo dalej model zaczyna gubić ustalenia z początku sesji.

I built Context Guard. A hook that monitors window usage and automatically blocks the session when coding quality drops. With 200k context the threshold was 75%. With a million? 35%. Because beyond that the model starts losing agreements from the beginning of the session.

Dlaczego tak wcześnie? Bo 35% z miliona to 350 000 tokenów. Więcej niż cały stary kontekst. A mimo to agent gubi instrukcje. Robi rzeczy, które mu zabroniłem na początku sesji. Nie dlatego, że jest głupi. Dlatego, że tak działa architektura LLM.

Why so early? Because 35% of a million is 350,000 tokens. More than the entire old context. And yet the agent loses its instructions. It does things I explicitly forbade at the start of the session. Not because it is stupid. Because that is how LLM architecture works.

Context Guard to nie feature. To siatka bezpieczeństwa.

Context Guard is not a feature. It is a safety net.

75%

Próg przy 200k kontekstu Threshold at 200k context

= 150 000 tokenów efektywnej pracy = 150,000 tokens of effective work
Cały kontekst to jeden długi chat Entire context is one long chat
Agent trzyma spójność do progu Agent stays coherent up to threshold

35%

Próg przy 1M kontekstu Threshold at 1M context

= 350 000 tokenów efektywnej pracy = 350,000 tokens of effective work
Większe okno, wcześniejszy próg Bigger window, earlier cutoff
~350k efektywnych (z moich testów) ~350k effective (from my testing)

Co robi Context Guard po przekroczeniu progu What Context Guard does when threshold is crossed

Ostrzeżenie Warning

→

Auto-zapis do Qdranta Auto-save to Qdrant

→

Blokada narzędzi Tool block

→

Continuation prompt Continuation prompt

Więcej kontekstu nie znaczy lepszy kontekst. Wrzucić cały codebase i liczyć, że model ogarnie zależności między pierwszym a ostatnim plikiem? Nie ogarnie. Milion tokenów to nie feature. To test czy rozumiesz z czym pracujesz. More context does not mean better context. Load the entire codebase and hope the model catches dependencies between the first and last file? It will not. A million tokens is not a feature. It is a test of whether you understand what you are working with.

Quality Scoring

Próg 5/10 (scoring LLM) 5/10 threshold (LLM scoring) 100% ocenionych 100% evaluated

Szum Noise Sygnał Signal

Pokrycie Coverage 100% bazy 100% of base

0% 100%

Flota agentów Agent Fleet

Research

Wyszukiwanie Perplexity, scraping, auto-ingest do Qdranta Perplexity search, scraping, auto-ingest to Qdrant

Session-Closer

Skanuj historię, chunkuj, scoruj, zapisz na /close Scan history, chunk, score, save on /close

Maintenance

14 health checków, cleanup, audyt spójności 14 health checks, cleanup, consistency audit

UX/Design

Audyt ergonomii, hierarchia wizualna, spójność kolorystyki UX audit, visual hierarchy, color consistency

Tester

Walidacja endpointów API, sprawdzanie odpowiedzi API endpoint validation, response checks

Security

Audyt VPS, skan portów, sprawdzanie uprawnień VPS audit, port scan, permission check

Niewygodna prawda o autonomii The uncomfortable truth about autonomy

Dominująca narracja to pełna autonomia. Szybciej, więcej, człowiek poza pętlą. Spróbowałem tego podejścia.

The dominant narrative is full autonomy. Ship faster, human out of the loop. I tried that approach.

Pełna autonomia = pełne zaufanie do systemu, który halucynuje, gubi kontekst i nie rozumie co jest ważne. Który edytuje pliki na produkcji, bo "tak było szybciej". Który zapisuje śmieci do bazy, bo nikt nie powiedział mu, żeby tego nie robił.

Full autonomy = full trust in a system that hallucinates, loses context, and does not understand what is important. That edits files on production because "it was faster." That saves garbage to the database because nobody told it not to.

Tam, gdzie jest zbyt autonomicznie, traciłem kontrolę.

Where it was too autonomous, I lost control.

Mój system ma 3 warstwy autonomii, każda z twardym ograniczeniem:

My system has 3 layers of autonomy, each with a hard boundary:

Automatyczna. Agent ogarnia zapis sesji, dedup, scoring. Bez udziału człowieka.

Automatic. Agent handles session save, dedup, scoring. No human needed.

Asystowana. Agent sugeruje audyty, czyszczenie. Człowiek akceptuje lub odrzuca.

Assisted. Agent suggests audits, cleanup. Human accepts or rejects.

Strategiczna. Człowiek decyduje o taksonomii, priorytetach, co zapisać. Agent tego nie rusza.

Strategic. Human decides taxonomy, priorities, what to save. Agent does not touch this.

Agent nie wie, która decyzja biznesowa jest ważna. Nie rozumie, dlaczego odrzuciłeś podejście A na rzecz B. Nie ma kontekstu strategicznego.

The agent does not know which business decision matters. It does not understand why you rejected approach A for B. It has no strategic context.

Może go mieć w bazie. Ale kto decyduje, co tam trafia?

It can have it in the database. But who decides what goes there?

Rezultat The result

Zaczynałem od agenta, który zapominał wszystko. Dziś mam agenta, który gromadzi wiedzę z każdą sesją.

I started with an agent that forgot everything. Today I have an agent that accumulates knowledge with every session.

Nie dlatego, że dałem mu pełną autonomię. Dlatego, że jej nie dałem.

Not because I gave it full autonomy. Because I did not.

Architektura jest otwarta. Lista usprawnień jest dłuższa niż ten artykuł.

The architecture is open. The list of improvements is longer than this article.

Ale fundament zostaje: maszyna pamięta, człowiek decyduje. Nie odwrotnie. But the foundation stays: the machine remembers, the human decides. Not the other way around.

* * *

Zaprojektowałem architekturę. Podjąłem każdą decyzję. Claude Code napisał kod. Wartość jest w myśleniu, nie w implementacji. I w niepoddawaniu się, kiedy się zepsuło po raz piętnasty.

I designed the architecture. I made every decision. Claude Code wrote the code. The value is in the thinking, not the implementation. And in not giving up when it broke for the 15th time.