Getting Started with CrewAI: A Practical Guide

I spent three hours debugging a CrewAI agent that kept hallucinating API endpoints, and it taught me something crucial: CrewAI’s power is also its biggest trap. If you’re coming from LangChain or just starting with multi-agent systems, you’ll quickly realize that orchestrating multiple AI agents isn’t just about chaining prompts—it’s about managing dependencies, context, and failure modes. This guide walks through what I learned building a real-world CrewAI system for automated blog content generation, including the exact code that broke and how I fixed it.

The Pain Point: Why Not Just Use a Single Agent?

You’ve probably tried generating a blog post with a single LLM call. It works—until you need fact-checking, SEO optimization, and formatting. The single agent forgets context, contradicts itself, or generates nonsense. CrewAI solves this by letting you define specialized agents that pass tasks to each other, like a human team. But here’s the catch: CrewAI’s simplicity hides complexity. If you don’t design your agents’ roles and tasks carefully, you’ll get circular dependencies, infinite loops, or agents that refuse to hand off work.

Step 1: Install CrewAI (and the Gotcha)

pip install crewai

I assumed this would pull everything I needed. Wrong. CrewAI depends on langchain and openai—but not the latest versions. If you’re using Python 3.12, you’ll hit a pydantic conflict. Here’s the exact fix I used:

pip install crewai langchain==0.1.0 openai==1.6.1 pydantic==2.5.0

Without pinning these, you’ll get ImportError: cannot import name 'BaseModel' from 'pydantic'. I wasted 30 minutes on this.

Step 2: Define Your Agents (Be Specific or Suffer)

Agents in CrewAI are defined as Python classes with a role, goal, and backstory. The backstory is optional but critical—it controls the agent’s tone and behavior. Here’s what I started with:

from crewai import Agent

class Researcher(Agent):
    role = "Researcher"
    goal = "Find recent news about AI in healthcare"
    backstory = "You are a meticulous researcher who verifies sources."

This works, but it’s too vague. The agent will generate generic responses. After testing, I learned to add constraints:

class Researcher(Agent):
    role = "Senior Healthcare AI Researcher"
    goal = "Find 3 recent (2024) peer-reviewed papers on AI in oncology"
    backstory = """You have 10 years experience in medical AI. 
    You always cite specific PMIDs or DOI links. 
    You never fabricate sources."""

Notice the explicit instruction to cite sources and the year constraint. Without these, my agent invented fake papers. CrewAI doesn’t validate facts—it trusts the LLM.

Step 3: Create Tasks That Chain Correctly

Tasks are where most people fail. A task has a description, expected_output, and agent. The trick is making tasks dependent on previous outputs. I built a two-agent pipeline:

from crewai import Task

research_task = Task(
    description="Find 3 recent AI healthcare papers. Output a list with titles and links.",
    expected_output="A bullet list of 3 papers with title, year, and URL",
    agent=researcher
)

writing_task = Task(
    description="""Based on the research output, write a 500-word blog post 
    summarizing the findings. Include citations to the papers provided.""",
    expected_output="A markdown blog post with headings and cited sources",
    agent=writer
)

Here’s the bug: writing_task doesn’t explicitly reference research_task’s output. CrewAI passes context implicitly through the agent’s memory, but it’s unreliable. I fixed it by using task dependencies:

writing_task = Task(
    description="""Based on the research output from the previous task, 
    write a 500-word blog post summarizing the findings. 
    The research output is: {research_output}""",
    expected_output="A markdown blog post with headings and cited sources",
    agent=writer,
    context=[research_task]  # Explicit dependency
)

The context parameter is a list of tasks whose outputs are injected into the description. Without it, my writer agent hallucinated its own research.

Step 4: Run the Crew (and Handle Failures)

Now you create a Crew and run it:

from crewai import Crew

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    verbose=True  # Essential for debugging
)

result = crew.kickoff()
print(result)

When I first ran this, it worked—but took 45 seconds and cost $0.12 in API calls. The verbose output showed the researcher agent making 3 separate API calls to find papers, then the writer agent making 2 more calls to write the post. CrewAI’s default is sequential execution, meaning each task waits for the previous one to finish.

The flaw: If the first task fails (e.g., the API returns an error), the entire crew crashes. CrewAI doesn’t have built-in retry logic. I added a simple retry wrapper:

import time

def safe_kickoff(crew, max_retries=3):
    for attempt in range(max_retries):
        try:
            return crew.kickoff()
        except Exception as e:
            print(f"Attempt {attempt+1} failed: {e}")
            time.sleep(2 ** attempt)  # Exponential backoff
    raise Exception("Crew failed after 3 attempts")

Step 5: Real-World Optimization Tips

After a week of testing, here’s what actually improved reliability:

Limit agent memory: By default, agents remember the entire conversation. For long tasks, this blows context windows. Set memory=False on agents that don’t need history:
```
researcher = Researcher(memory=False)
```
Use allow_delegation=False: By default, agents can delegate tasks to other agents. This creates loops. Unless you’re building a complex hierarchy, disable it:
```
researcher = Researcher(allow_delegation=False)
```
Cache results: CrewAI caches LLM calls by default, but it’s per-agent. If you run the same crew twice, it reuses responses. This is great for debugging but dangerous in production—you might serve stale data. Disable caching with:
```
crew = Crew(agents=[...], tasks=[...], cache=False)
```

Monitor token usage: CrewAI doesn’t expose token counts. I added a simple callback:

from langchain.callbacks import get_openai_callback
with get_openai_callback() as cb:
    result = crew.kickoff()
    print(f"Total tokens: {cb.total_tokens}, Cost: ${cb.total_cost}")

The Biggest Limitation CrewAI Doesn’t Tell You

After building a 5-agent system for content generation, I hit a wall: CrewAI has no built-in error recovery for agent failures. If your researcher agent returns gibberish, the writer agent will still try to use it. The only fix is to validate outputs in the task description:

research_task = Task(
    description="""Find 3 papers. If you cannot find 3, output 'NO_RESULTS' 
    and explain why. Do not fabricate papers.""",
    ...
)

Then in the writing task, check for this sentinel value. It’s hacky, but it works.

Next Step: Build a Real Project

Don’t start with theory. Clone my broken example from github.com/your-repo/crewai-blog-generator and fix the intentional bugs. The README lists three things I deliberately broke:

Missing context parameter on the writer task
No retry logic for API failures
Agents with allow_delegation=True causing infinite loops

Fix these, then extend the system to add a FactChecker agent that validates the writer’s citations. You’ll learn more in 30 minutes of debugging than reading the docs for an hour. And when you inevitably break something, remember: the verbose output is your best friend. Set it to True and watch every decision your agents make.

Getting started with CrewAI: a practical guide