microgpt

microgpt (optimized + CUDA)

microgpt logo

A minimal GPT project with two aligned implementations:

microgpt.py: pure Python, dependency-free reference implementation.
microgpt_cuda.cu: CUDA/C++ implementation for Windows (MSVC + CUDA), optimized for speed while keeping the same model/training logic.

Core model/training recipe (both paths):

python microgpt.py

If input.txt is missing, the script downloads the default names dataset automatically.

Prerequisites:

Build:

cmake -S . -B build -G "Visual Studio 17 2022" -A x64 -DCMAKE_CUDA_ARCHITECTURES=86
cmake --build build --config Release

Run:

.\build\Release\microgpt_cuda.exe --help
.\build\Release\microgpt_cuda.exe

Smoke test:

.\build\Release\microgpt_cuda.exe --steps 5 --samples 3

CUDA path keeps parameters, gradients, and optimizer states on GPU.
Training step is fused into one kernel launch (forward + backward + grad clip + AdamW update).
Current fused implementation is specialized to n_layer = 1 (same as current Python config).
kMaxVocab = 256 in microgpt_cuda.cu; if your dataset exceeds this, increase it and rebuild.
Default CMAKE_CUDA_ARCHITECTURES is 86; set it to your GPU architecture when needed.

Open microgpt_optimized.html in a browser to switch between:

This is useful for checking one-to-one conceptual mapping between the two codebases.

Original microgpt idea and baseline by @karpathy: