Product Evals: Travel Planner, Long Context, and The Weight Of Taste
A product note on evaluating an AI travel planner: itinerary quality, OOD scenes, long-context consistency, recommendation taste, and user loops.
Context Engineering: Retrieval, Memory, and The Shape Of Evidence
A note on RAG and context engineering: retrieval quality, evidence shape, memory boundaries, and why context is a product surface.
Agentic RL: Reward, Behavior, and The Long Shadow Of Feedback
A field note on agentic RL: reward design, behavior shaping, credit assignment, online and offline evaluation, and the feedback loops behind agents.
Evals As Instruments: Measuring What The Demo Hides
A note on evaluation as an instrument: failure cases, metrics, benchmark design, product loops, and the discipline of measuring agents.
Agent Design: Loops, Tools, and the Shape of Memory
A field note on designing agents as observable loops, with tools, memory, failure recovery, and product boundaries.
Building Njx'Log: The Full Stack of Hugo, PaperMod, and GitHub Pages
A comprehensive guide on how I built this technical blog using Hugo, the PaperMod theme, and automated deployment via GitHub Actions.
Hello World
Hello World I recently read an interview with my idol Wenli, in which she mentioned how having her own blog helped her learn about AI. I also hope to have a blog where I can continue my ideas and let others see them. I want to keep moving forward on the path of AI. If you are new here, start with the search page or the tags list to find topics you care about.