
A minimal GPT project with two aligned implementations:
microgpt.py: pure Python, dependency-free reference implementation.microgpt_cuda.cu: CUDA/C++ implementation for Windows (MSVC + CUDA), optimized for speed while keeping the same model/training logic.Core model/training recipe (both paths):
<BOS>.wte reused as LM head).microgpt.py: full Python algorithm (train + val + inference).microgpt_cuda.cu: full CUDA/C++ algorithm (train + val + inference).microgpt_optimized.html: side-by-side Python/CUDA code converter view.CMakeLists.txt: CUDA build entry.input.txt: corpus (auto-downloaded if missing on first run).python microgpt.py
If input.txt is missing, the script downloads the default names dataset automatically.
Prerequisites:
Build:
cmake -S . -B build -G "Visual Studio 17 2022" -A x64 -DCMAKE_CUDA_ARCHITECTURES=86
cmake --build build --config Release
Run:
.\build\Release\microgpt_cuda.exe --help
.\build\Release\microgpt_cuda.exe
Smoke test:
.\build\Release\microgpt_cuda.exe --steps 5 --samples 3
--steps <int>: training steps (default 500)--val-every <int>: validation interval (default 100)--val-docs <int>: max validation docs per eval (default 20)--samples <int>: generated samples after training (default 20)--top-k <int>: top-k for sampling (default 5)--temperature <float>: sampling temperature (default 0.6)--seed <int>: RNG seed (default 42)n_layer = 1 (same as current Python config).kMaxVocab = 256 in microgpt_cuda.cu; if your dataset exceeds this, increase it and rebuild.CMAKE_CUDA_ARCHITECTURES is 86; set it to your GPU architecture when needed.Open microgpt_optimized.html in a browser to switch between:
This is useful for checking one-to-one conceptual mapping between the two codebases.
Original microgpt idea and baseline by @karpathy: