Prompt Engineering for Production: Beyond Basic Templates

The Prompt Engineering Maturity Model

Most prompt engineering advice stops at "be specific" and "give examples." That's fine for experiments. Production requires more.

Level 1: Structured Prompts

Move beyond freeform text. Structure your prompts:

STRUCTURED_PROMPT = """
# Role
You are a customer service agent for {company_name}.

# Context
Customer: {customer_name}
Account Status: {account_status}
Previous Interactions: {interaction_summary}

# Task
Respond to the following customer inquiry. Be helpful, professional, and accurate.

# Constraints
- Do not discuss competitors
- Do not make promises about future features
- Escalate billing disputes to human agents

# Customer Message
{customer_message}

# Response Format
Provide a response in the following format:
- Greeting
- Address the main concern
- Offer next steps
- Closing
"""

Level 2: Prompt Versioning

Prompts are code. Treat them that way:

class PromptRegistry:
    def __init__(self):
        self.prompts = {}
        self.versions = {}

    def register(self, name: str, prompt: str, version: str):
        self.prompts[name] = prompt
        self.versions[name] = version

    def get(self, name: str) -> str:
        return self.prompts[name]

    def get_version(self, name: str) -> str:
        return self.versions[name]

# Usage
registry = PromptRegistry()
registry.register(
    "customer_service_v2",
    CUSTOMER_SERVICE_PROMPT,
    "2.1.0"
)

Level 3: Dynamic Prompt Assembly

Build prompts from components:

class PromptBuilder:
    def __init__(self):
        self.components = []

    def add_role(self, role: str) -> 'PromptBuilder':
        self.components.append(f"# Role\n{role}")
        return self

    def add_context(self, context: dict) -> 'PromptBuilder':
        ctx_str = "\n".join(f"- {k}: {v}" for k, v in context.items())
        self.components.append(f"# Context\n{ctx_str}")
        return self

    def add_examples(self, examples: list) -> 'PromptBuilder':
        ex_str = "\n\n".join(
            f"Input: {ex['input']}\nOutput: {ex['output']}"
            for ex in examples
        )
        self.components.append(f"# Examples\n{ex_str}")
        return self

    def add_task(self, task: str) -> 'PromptBuilder':
        self.components.append(f"# Task\n{task}")
        return self

    def build(self) -> str:
        return "\n\n".join(self.components)

Level 4: Prompt Testing

Test prompts like code:

class PromptTest:
    def __init__(self, prompt: str, test_cases: list):
        self.prompt = prompt
        self.test_cases = test_cases

    async def run(self, model) -> TestResults:
        results = []
        for case in self.test_cases:
            filled_prompt = self.prompt.format(**case.inputs)
            response = await model.generate(filled_prompt)

            passed = all(
                check(response) for check in case.checks
            )
            results.append(TestResult(case.name, passed, response))

        return TestResults(results)

# Define test cases
test_cases = [
    TestCase(
        name="handles_refund_request",
        inputs={"customer_message": "I want a refund"},
        checks=[
            lambda r: "refund" in r.lower(),
            lambda r: "policy" in r.lower(),
            lambda r: len(r) > 50
        ]
    ),
    TestCase(
        name="escalates_billing_dispute",
        inputs={"customer_message": "Your charges are fraudulent!"},
        checks=[
            lambda r: "human" in r.lower() or "representative" in r.lower()
        ]
    )
]

Level 5: Prompt Optimization

Automatically improve prompts:

class PromptOptimizer:
    async def optimize(
        self,
        base_prompt: str,
        eval_dataset: list,
        iterations: int = 10
    ) -> str:
        current_prompt = base_prompt
        current_score = await self.evaluate(current_prompt, eval_dataset)

        for i in range(iterations):
            # Generate variations
            variations = await self.generate_variations(current_prompt)

            # Evaluate each variation
            for variation in variations:
                score = await self.evaluate(variation, eval_dataset)
                if score > current_score:
                    current_prompt = variation
                    current_score = score

        return current_prompt

Common Patterns

Chain of Thought

Think through this step by step:
1. First, identify the main question
2. List relevant information
3. Consider possible approaches
4. Choose the best approach and explain why
5. Provide your answer

Self-Critique

After generating your response, review it for:
- Factual accuracy
- Completeness
- Clarity
- Potential misunderstandings

If you find issues, revise your response.

Format Enforcement

You MUST respond in the following JSON format:
{
  "answer": "your answer here",
  "confidence": 0.0 to 1.0,
  "sources": ["source1", "source2"]
}

Do not include any text outside the JSON object.

Conclusion

Production prompt engineering is software engineering. Version your prompts, test them rigorously, and continuously improve them based on real-world performance.