Day 1 – Prompt Engineering #
1. Why Prompt Engineering Matters #
We start with the need for controlling LLM behavior. Although everyone can write prompts, crafting high-quality prompts is complex. The model, structure, tone, and context all affect the outcome. Prompt engineering is an iterative process requiring optimization and experimentation.
→ how do we guide LLMs effectively without retraining them?
2. How LLMs Predict Text #
LLMs are token prediction machines. They predict the next likely token based on previous tokens and training data. Prompt engineering means designing inputs that lead the model toward the desired outputs using this prediction mechanism.
→ how do prompt structure and context impact token prediction?
3. Controlling Output Length #
Setting the number of output tokens affects cost, latency, and completeness. Shorter outputs don’t make the model more concise—they just truncate the output. Prompts must be adapted accordingly.
→ how do we engineer prompts to work well with shorter output limits?
4. Temperature – Controlling Randomness #
Temperature tunes the creativity vs. determinism of responses. Lower = more deterministic. Higher = more diverse. Temperature = 0 means always selecting the highest-probability token.
→ what randomness level best matches your use case: precision or creativity?
5. Top-K vs. Top-P #
- Top-K selects from the K most likely tokens.
- Top-P includes tokens whose cumulative probability is under P.
Together with temperature, they control diversity. Improper values can lead to repetition loops or incoherence.
→ what is the optimal balance between relevance and novelty?
6. Putting Sampling Together #
The sampling settings interact:
- Temp=0 overrides others (most probable token only)
- Top-K=1 behaves similarly (greedy decoding)
- At extremes, sampling settings may cancel out or be ignored
→ how do we experiment with sampling settings to avoid repetition and improve quality?
7. Zero-shot Prompting #
The simplest form—just a task or question without examples. Effective when LLMs are pre-trained well. Clarity in phrasing is key.
→ how do we design zero-shot prompts that still get structured, accurate answers?
8. One-shot and Few-shot Prompting #
- One-shot: One example is provided before the prompt.
- Few-shot: Multiple examples guide the model’s pattern recognition. Ideal for steering structure and increasing precision.
→ how many examples are enough for complex or high-variance tasks?
9. System Prompting #
Defines the LLM’s role and constraints at a high level—such as format, safety rules, or output requirements. It’s useful to enforce style, format, or structure like JSON outputs.
→ how can we use system prompts to enforce safety and structured outputs?
10. Role Prompting #
Assigns a persona (e.g., travel guide, teacher). This helps shape tone, depth, and relevance of the response. Adding style (humorous, formal) further guides model behavior.
→ how do personas influence LLM outputs in nuanced ways?
11. Contextual Prompting #
Injects situational context into the prompt to make responses more accurate. Effective for dynamic environments (e.g., blogs, customer support).
→ how can contextual prompts adapt to real-time or user-specific tasks?
12. Step-back Prompting #
Starts with a general question to activate background knowledge before solving a specific task. Encourages critical thinking and can reduce bias.
→ how do we leverage LLMs’ latent knowledge more strategically?
13. Chain of Thought (CoT) #
LLMs struggle with math and logic unless they break problems into steps. Chain of Thought (CoT) prompting makes the model reason step by step. This adds interpretability, reduces drift across models, and improves answer accuracy.
→ how can we make reasoning visible to both developers and users?
14. Self-Consistency #
Instead of relying on a single reasoning path, Self-Consistency samples multiple responses (with higher temperature), then picks the majority answer. It increases reliability—especially for ambiguous or hard-to-evaluate tasks.
→ how can we trade off cost for improved reliability in decision-critical tasks?
15. Tree of Thoughts (ToT) #
Generalizes CoT by enabling multiple reasoning paths at once. Like a decision tree, the model explores and evaluates different intermediate steps before selecting the best route. Powerful for complex planning and exploration.
→ what’s the best way to structure exploration and backtracking in LLM reasoning?
16. ReAct (Reason + Act) #
ReAct prompts mix reasoning with external tool calls (e.g., Google Search). It creates a loop: Think → Act → Observe → Rethink. This enables LLMs to handle multi-step problems using real-world data or APIs.
→ how can we design LLMs that interactively use tools and adapt in real time?
17. Automatic Prompt Engineering (APE) #
Prompts are hard to write. APE uses LLMs to generate and refine their own prompts. You ask the model to create N variations of a task prompt, then rank and select the best one based on performance metrics (e.g., BLEU, ROUGE).
Final insight: what happens when LLMs become their own prompt engineers—and how can we guide that process safely?
18. Prompts for Writing Code #
LLMs like Gemini can generate well-documented scripts, e.g., renaming files with Bash. It reduces developer overhead for common tasks. Prompts should include goal, language, and behavior clearly.
→ how do we craft prompts that result in reusable, safe, and tested code?
19. Prompts for Explaining Code #
LLMs can reverse-engineer logic from code. Useful in team settings or code reviews. Helps onboard new developers or document legacy scripts.
→ how do we evaluate explanation correctness—especially for critical systems?
20. Prompts for Translating Code #
Language models can convert code between languages (e.g., Bash → Python). This helps modernize or modularize projects while preserving logic.
→ what risks emerge in translation—syntax, dependencies, or behavior drift?
21. Prompts for Debugging and Reviewing Code #
Prompting LLMs to diagnose bugs or suggest improvements enhances development speed. Common mistakes like undefined functions can be spotted easily.
→ how do we ensure debugging prompts scale with complex codebases?
22. Multimodal Prompting #
Combines inputs like text, images, and audio. Enables more flexible, human-like interaction. Useful in complex workflows or accessibility tasks.
→ how do we design for alignment across different input modalities?
23. Best Practice – Provide Examples #
Incorporating examples (one-shot or few-shot) within prompts is the most reliable way to guide output. Acts as a template and sets style/tone expectations.
Final reflection: prompt engineering is more than writing—it’s designing a user interface for the LLM.
24. Design with Simplicity #
Clear, concise prompts yield better responses. Avoid overloading with context or ambiguous language. Use active verbs and break down complex requests.
→ how do we optimize for both human and machine comprehension in prompt design?
25. Be Specific About Output #
Specify the desired format, length, tone, and structure to reduce ambiguity and improve relevance. Instructions guide better than vague constraints.
→ how can we use instructions to improve precision without overconstraining the model?
26. Use Variables #
Abstract prompts using variables (e.g., {city}) to enhance reusability in apps or RAG systems. Helps modularize and scale prompt templates.
→ how can prompt modularity boost automation and maintainability?
27. Experiment with Formats and Styles #
Vary phrasing—questions, statements, or instructions. Try structured outputs (e.g., JSON) and track results for consistency and quality.
→ how do output formats affect hallucination, readability, and parse-ability?
28. Work With Schemas #
Use structured input/output formats like JSON Schema to guide the model’s understanding. This enables field-level alignment and supports reasoning over attributes.
→ how do schemas improve accuracy in structured reasoning tasks like RAG or product gen?
29. Best Practices for Chain-of-Thought (CoT) #
- Always place the final answer after reasoning
- Use temperature=0 for deterministic outputs
- Separate reasoning from the final output for evals
→ how do we reliably extract and score CoT answers in evaluation pipelines?
30. Document Prompt Experiments #
Track prompt versions, models, sampling settings, and outputs using a table or Google Sheet. Log RAG details and system changes to trace variation.
Final principle: prompting is experimental—track what you try, and improve iteratively.
24. Design with Simplicity #
Use concise language and clear goals. Overly complex or vague prompts confuse both the user and the model.
→ how do we turn messy input into structured guidance for LLMs?
25. Be Specific About Output #
Specificity in instructions (e.g., format, style, length) yields more relevant, focused outputs.
→ how do we align prompts with precise user needs and formats?
26. Prefer Instructions Over Constraints #
Positive instructions are more intuitive than a list of “don’ts.” Use constraints when safety or exact formatting is critical.
→ how do we encourage creativity while staying on target?
27. Use Variables in Prompts #
Dynamic placeholders (e.g., {city}
) improve reusability and maintainability—especially in production pipelines.
→ how do we modularize prompt logic for reuse across systems?
28. Experiment with Input and Output Formats #
Try different styles—question, statement, instruction—and output formats like JSON. JSON helps structure data and reduce hallucinations.
→ how do we balance human readability with system parsing needs?
29. Use Schemas for Input and Output #
JSON schemas define structure and types—great for grounding LLM understanding and making output usable in applications.
→ how can we give LLMs structured “expectations” to reduce drift?
30. Document Prompt Versions #
Track all attempts, model versions, and outcomes in structured logs. Helps with reproducibility, debugging, and future upgrades.
Final insight: prompt engineering is iterative design—every prompt is a versioned artifact.
Summary #
- Prompt types: zero/few-shot, role, system, contextual
- Reasoning: CoT, ToT, ReAct, Self-consistency, Step-back
- Automation: APE
- Code prompting: generate, translate, debug
- Multimodal and schema-guided prompting
- Best practices: Evaluation, formatting, variables, and documentation
The journey from text to structured reasoning begins with a well-crafted prompt.