Back to Blog

AI Coding Roadmap

7 min readBy Zeyuan Pu
AITutorial

AI-Integrated Software Engineering Roadmap

10 weeks · 15–20 hours/week · Focused on what you'll actually use


Structure

Each week runs two tracks in parallel:

| Track | Hours | What | |-------|-------|------| | Stanford CS146S | 5–6 hrs | Slides, readings, assignment | | Build | 10–14 hrs | Hands-on projects |

CS146S Materials (free):

  • Course site: https://themodernsoftware.dev
  • Assignments: https://github.com/mihail911/modern-software-dev-assignments

Week 1: How LLMs Work + Prompt Engineering

CS146S Week 1

Study:

  • CS146S Week 1 slides and readings
  • Watch Andrej Karpathy's "Intro to Large Language Models" (~1 hr, YouTube)
  • Anthropic prompt engineering guide: https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/overview

Concepts to nail down:

  • Tokens, context windows, temperature — what they mean for your engineering decisions
  • Prompt techniques: few-shot examples, chain-of-thought, role prompting, structured output (JSON/XML)
  • Why the same prompt gives different results and how to control it

Build:

  • Complete CS146S Week 1 assignment
  • Prompt journal: Pick 5 real tasks from your day job, try 3+ prompt approaches each, document what works and why. This becomes your personal prompt library.
  • Structured data extractor: Python script that takes unstructured text → Claude API → structured JSON output. Focus on getting consistent, reliable formats.

Week 2: AI Development Tools

CS146S Weeks 2–3

Study:

  • CS146S Week 2–3 slides (agent architecture, AI IDEs)
  • Claude Code docs: https://docs.claude.com/en/docs/claude-code/overview

Build:

  • Complete CS146S Week 2 assignment
  • CLAUDE.md for your repo: Write a comprehensive one — architecture, conventions, common patterns, DB schema. Test it by having Claude Code do a real task and checking if the output matches your team's style.
  • First custom skill: Create a Claude Code skill for a task you repeat often (e.g., scaffold a new endpoint, write a migration with tests).
  • Spend 2 hours trying Cursor or Copilot on a real task — understand how IDE-based AI differs from terminal-based.

Week 3: MCP (Model Context Protocol)

CS146S Week 4

Study:

  • CS146S Week 4 slides
  • MCP spec: https://modelcontextprotocol.io
  • Community servers: https://github.com/modelcontextprotocol/servers

Build:

  • Complete CS146S Week 4 assignment
  • Connect an MCP server: Set up the PostgreSQL MCP server with Claude Code. Connect to a dev database. Practice having Claude Code query your data while helping you write code.
  • Build a custom MCP server: Wrap one of your internal APIs or tools. Even something simple like "get recent log entries" or "check service health." This teaches you the full protocol.

Week 4: LLM API + Tool Use

CS146S Week 5

This is the most important week. Tool use is the foundation of everything that follows.

Study:

  • CS146S Week 5 slides
  • Claude API docs: https://docs.claude.com/en/api/overview
  • Tool use guide: https://docs.claude.com/en/docs/build-with-claude/tool-use/overview

Concepts:

  • Messages API pattern (system → user → assistant)
  • Defining tools, handling tool calls, returning results
  • Streaming responses
  • Token counting and cost awareness

Build:

  • Complete CS146S Week 5 assignment
  • CLI chatbot with tools (the key project): Build a terminal chatbot with 3 tools — e.g., query a database, read local files, call an external API. The LLM decides when to call each tool. Get comfortable with the tool-use loop.
  • Add streaming: Display responses token-by-token. Then build a minimal web page that streams from your backend.

Week 5: Your First Agent

CS146S Week 6

Study:

  • CS146S Week 6 slides (AI testing, debugging)
  • Read: Anthropic's "Building effective agents" — https://www.anthropic.com/engineering/building-effective-agents
  • Understand the ReAct pattern (Reason + Act)

Build:

  • Complete CS146S Week 6 assignment
  • Research agent: Takes a question → searches for info → reads results → decides if it has enough → searches again or writes final answer. Simple loop: prompt → tool call → observe → repeat.
  • Add guardrails: max iterations, cost tracking per run, output validation. These aren't optional — agents without guardrails are dangerous.

Week 6: Tool-Rich Agents + Security

CS146S Week 7

Study:

  • CS146S Week 7 slides (vulnerability detection, code review)
  • OWASP Top 10 for LLM Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/
  • Prompt injection — understand the attack surface

Build:

  • Complete CS146S Week 7 assignment
  • Data analysis agent: An agent with 5+ tools — query database, calculate, create charts, summarize, export CSV. Give it complex questions and let it figure out the steps.
  • Red-team your agents: Try to break everything you've built. Adversarial inputs, prompt injections, edge cases. Document failures and build defenses.

Week 7: RAG (Retrieval-Augmented Generation)

CS146S Week 8

Study:

  • CS146S Week 8 slides
  • Embeddings, vector similarity, chunking strategies
  • pgvector docs (since you already use PostgreSQL)

Build:

  • Complete CS146S Week 8 assignment
  • RAG system from scratch — build every component yourself, no frameworks:
    1. Chunk documents into passages
    2. Generate embeddings via API
    3. Store in pgvector (your existing PostgreSQL)
    4. Query by similarity
    5. Feed retrieved context to Claude, generate answer
  • Test with real documents from your domain — logistics SOPs, internal docs, whatever you have.

Week 8: Production Patterns

CS146S Week 9

Study:

  • CS146S Week 9 slides (monitoring, incident response)
  • Model routing: when to use Haiku vs. Sonnet vs. Opus
  • Caching strategies: prompt caching, response caching

Build:

  • Complete CS146S Week 9 assignment
  • Cost + observability layer: Build middleware that logs every LLM call — input, output, tokens, cost, latency. Create a simple report/dashboard.
  • Smart model router: Route requests to different models based on complexity. Simple tasks → cheaper/faster model. Complex tasks → bigger model. Add response caching for repeated queries.

Week 9: Eval + Testing for AI Features

Study:

  • CS146S Week 10 slides (future outlook)
  • Eval-driven development patterns
  • How to test non-deterministic systems

Build:

  • Complete CS146S Week 10 assignment
  • Eval framework: Build a lightweight test harness — define test cases (input + expected output), run them through your agents/features, auto-score where possible, flag edge cases for human review.
  • Regression suite for prompts: When you change a prompt, run the eval suite and compare before/after. This prevents you from fixing one thing and breaking three others.

Week 10: Capstone — Ship Something Real

Full 15–20 hours on one project.

Build and polish a complete AI feature that combines everything:

  • Frontend with streaming responses
  • Backend API with auth and rate limiting
  • Tool use or RAG (or both)
  • Cost tracking and observability
  • Error handling and graceful fallback
  • Eval tests

Pick something you'd actually use — an internal assistant for your logistics team, a document processor, a reporting tool, a customer query handler. Make it real.

Spend the last few hours writing a clean README and documenting your design decisions. This is your portfolio piece.


Resources

Essential:

  • CS146S: https://themodernsoftware.dev
  • Claude API: https://docs.claude.com/en/api/overview
  • Claude Code: https://docs.claude.com/en/docs/claude-code/overview
  • MCP: https://modelcontextprotocol.io
  • Prompt engineering: https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/overview
  • Building effective agents: https://www.anthropic.com/engineering/building-effective-agents

Video:

  • Andrej Karpathy "Intro to Large Language Models" (YouTube)
  • 3Blue1Brown "But what is a GPT?" (YouTube, optional — good visual intuition)

Follow:

  • Simon Willison's blog: https://simonwillison.net
  • Anthropic blog: https://www.anthropic.com/news
  • Latent Space podcast

Checklist

  • [ ] Clone CS146S repo and complete Week 1 assignment
  • [ ] Watch Karpathy's LLM intro
  • [ ] Write a CLAUDE.md for your main repo
  • [ ] Build a CLI chatbot with tool use (Week 4 — most important project)
  • [ ] Connect an MCP server to Claude Code
  • [ ] Build a custom MCP server
  • [ ] Build your first agent with a ReAct loop
  • [ ] Red-team your own agents
  • [ ] Build a RAG system from scratch with pgvector
  • [ ] Build an eval framework for your prompts
  • [ ] Ship a complete AI feature end-to-end (capstone)

Thanks for reading! If you found this helpful, feel free to share it.