Sudheer Ranjan

Exploring AI Agents: An Introduction to CrewAI

Sudheer Ranjan — Fri, 31 May 2024 05:26:42 GMT

Evolution of AI Agents from LLMs

You might have experimented with Large Language Models from OpenAI and Hugging Face. They excel at predicting the next word using the vast data they are trained on. They are great for simple tasks, but what about complex tasks? If you think the answer needs changes, you would likely tweak the prompts (with Prompt Engineering skills) and add context to the LLM to get the desired output. However, this means we have to engage in the interactions ourselves. How do we automate this or make the LLM autonomous? Given that LLMs can put together words that make sense, agents are a way to help LLMs improve on their own.

Agentic Automation

Automation in traditional software development involves using if-else conditions to achieve an output. In AI, however, we simply provide a map. Unlike traditional software development, AI handles fuzzy inputs, performs fuzzy transformations, and produces fuzzy outputs. Additionally, the results can vary each time.

CrewAI is one such Open-source framework and platform for orchestrating role-playing, autonomous AI agents. CrewAI's building blocks include Agents, Tasks, and Crew, which we will discuss in detail.

Image Credits: CrewAI

Agents

While Andrew Ng highlights Reflection, Tool Use, Planning, and Multi-agent collaboration in his blog Batch, CrewAI adds a few more patterns. Think of these patterns as a way to interact with an LLM using a wrapper. These patterns help the agent complete complex tasks. Let's go through each of these.

Role Playing: Role-playing of Agent helps the LLM set context, leading to better results.
Focus: Though we can provide a long context to the LLM with increasing context windows, too much information can cause the LLM to hallucinate. Narrowly focused tasks help the agent perform better.
Tools: You might have seen LLMs failing miserably in mathematical calculations. LLMs have overcome this by converting the problem to Python and executing it. What about other scenarios where an LLM can't handle the task? Here, the agent should be able to access tools like specific websites, the internet, or files. However, too many tools can make it difficult for the agent (especially with a small model) to choose the right one. Always provide the key tools. With CrewAI, you can decide whether to give the tools at task level or agent level.
Cooperation: Agents role-play and have conversations in their roles, which helps in getting better results. Taking feedback and delegating tasks also improve their output. Delegation doesn't need to happen every time; agents should decide to delegate based on the problem's complexity. Collaboration among multiple agents can occur sequentially, hierarchically (like a manager delegating tasks), or asynchronously.
Guardrails: Unlike traditional software, AI systems have to handle unclear inputs, perform unclear transformations, and produce unclear outputs. Some tasks might end up in a loop. Guardrails help prevent your agent from getting off track and guide them to stay focused.
Memory: Agents should remember the past, learn from it, and apply that knowledge to future tasks. Agents with memory can learn from their mistakes. CrewAI specifically offers three types of memory out of the box: short memory, long memory, and entity memory. With memory, you can achieve more reliable results and reduce randomness.
1. Short Memory: This memory/context is shared across all agents during crew's execution. Share intermediate information even before providing task completion output with other agents.
2. Long Memory: This stays even after crew's execution, stored in Database, agents learn from its previous execution for future tasks. This leads to self improving agents.
3. Entity: This memory stays during crew's execution, it memorizes subjects being discussed.

Multi Agent Systems

Multi Agent Systems are a group of agents in which each agent is customized to do one single thing and do it well.
Each agent can run from different LLMs.
They have ability to delegate or give feedback.

Basic components/classes of CrewAI

Agent: This class requires the role and goal of the agent along with some backstory of the task.
Task: This class requires the description of the task, expected output and the agent to work on.
Crew: Crew is something that puts together both agents and tasks. By default crew operates sequentially, but we assign weights to make it work asynchronously.

What makes a great tool for an AI Agent?

While an LLM on its own doesn't know everything we ask, we can give it tools to complete tasks. When creating a chatbot for specific domain data, we either fine-tune the LLM to that domain or use Retrieval Augmented Generation (RAG) to query the domain data. Think of this as a tool. More tools like this need to be created for various use cases. Below are some qualities you should look for when creating or using tools.

Versatile: A tool connects unclear input to unclear output. It should handle any input and any output that LLMs generate.
Fault-tolerant: Exceptions should not stop the execution. Instead, they should prompt the agent to find alternative ways to achieve the goal.
Caching Strategy: Tools are used to call internet services, internal/external services, and API services. The same tool with the same set of arguments is cached. This prevents unnecessary requests, saving requests per second and execution time. CrewAI offers cross-agent caching, allowing different agents to share the cache layer instead of making the same API call again.

With CrewAI we can build our own custom tool.

Examples of tools

Search the internet
Scrape a website: RAG over a website
Connect to a Database
Call an API
Send Notifications

How to decide the Tasks

This offers a mental framework of how we should think of solving a complex problem. Deciding how we frame the Tasks and Agents is key in solving the problem.

What is the goal? What is the process?
What kind of individuals do I need to hire to complete this task? These individuals become the agents.
What processes and tasks do I expect the individuals on my team to perform?

These become tasks for those agents.
A few other parameters, like clear and concise expectations and setting the context, make up a detailed task.

CrewAI represents a significant advancement in AI, moving beyond simple language models to sophisticated AI agents capable of handling complex tasks autonomously. Key features include agentic automation, role-playing for context, focused tasks to reduce hallucination, and access to external tools. Cooperation among agents and the use of guardrails ensure efficiency and accuracy. CrewAI's memory system—short-term, long-term, and entity memory—enables agents to learn from past experiences. Multi-agent systems specialize in specific tasks and collaborate seamlessly. Essential components include agents, tasks, and crews. Effective tools should be versatile, fault-tolerant, and utilize caching strategies. CrewAI's approach helps solve complex problems by defining clear goals, identifying necessary agents, and outlining tasks and expectations.

References:

Andrew Ng's Article on Agentic Design Patterns

Multi AI Agent Systems with CrewAI course on Deeplearning.ai

Steps to Evaluate Fairness in Machine Learning Models

Sudheer Ranjan — Wed, 15 May 2024 18:30:00 GMT

Businesses prioritize metrics that impact their bottom line, while data scientists often focus on accuracy. However, bias in a model can lead to allocative harms (unequal distribution of benefits) and representation harms (downplaying certain groups). These coupled with opaque black box models ultimately can erode trust in AI.

Here Data scientists play a critical role in developing responsible AI/ML solutions. Cloud platforms offer various fairness testing tools to help identify and mitigate bias. And it is possible to incorporate fairness testing in python notebooks.

One powerful open source tool I use for Fairness assessment in my project is Fairlearn, which quantifies bias in models with respect to sensitive features like age, gender, and race.

Sharing the code template that will help you to see your model fairness. Also it shows an example of unequal distribution of count and selection rate among different groups.

!pip install --no-cache-dir fairlearn
from fairlearn.metrics import MetricFrame
from fairlearn.metrics import (accuracy_score, precision_score,
                              false_positive_rate, false_negative_rate,
                              selection_rate,count)

metrics = {
    "accuracy": accuracy_score,
    "precision": precision_score,
    "false positive rate": false_positive_rate,
    "false negative rate": false_negative_rate,
    "selection rate": selection_rate,
    "count": count,
}
metric_frame = MetricFrame(
    metrics=metrics, y_true=df['y_true'], y_pred=df['y_pred'], sensitive_features=df['race']
)
metric_frame.by_group.plot.bar(
    subplots=True,
    layout=[3, 3],
    legend=False,
    figsize=[12, 8],
    title="Fairness testing",
)

Running the above code on one of the biased dataset resulted in the below charts.

Notice how some metrics, like selection rate and false positive rate, favor certain groups within the population. You will also see that there are more groups belonging to minorities in the population. Even with good accuracy, bias against certain minority groups can undermine their rights. We can measure this only when we shift our focus to how our metrics impact people's lives.

Quick Guide to Speeding Up Docker Builds for ML Applications

Sudheer Ranjan — Wed, 01 May 2024 18:30:00 GMT

Docker builds are essential for deploying ML applications in a consistent and portable way. However, slow build times can significantly hinder development and deployment workflows, as well as streamlined CI/CD pipelines. Here are some strategies I follow to significantly improve Docker build speed.

Docker Layer Caching:

Docker builds are composed of layers, where each instruction creates a new layer. Docker cleverly caches layers that haven't changed, significantly reducing build time on subsequent runs.

Order Matters: In your Dockerfile, place frequently modified instructions (like copying your code) towards the end. This ensures frequently changing layers don't invalidate the cache for earlier layers (like installing libraries using pip and copying the requirements.txt). Install the libraries in earlier layers to leverage the cache effectively. Below is an example of how to implement this idea of docker caching.

Notice how this order has helped us to skip re-building of the first 5 layers of the docker file using cache.
Minimize Package Installation using --no-install-recommends:

By default, package managers like apt-get install both essential and recommended packages. Use the --no-install-recommends flag to install only the strictly necessary packages for your application. This reduces download size and build time.
```
 RUN apt-get update && apt-get install --no-install-recommends libgomp1
 #libgmop1 provides support for OpenMP which is useful in parallel programming
```
Ignore unnecessary files for docker build:

The .dockerignore file functions similarly to .gitignore but for Docker builds. Use it to exclude unnecessary files and folders from your build context. This avoids unnecessary copying during the build process, speeding things up. One common file that is not necessary during the training or inference pipeline is ipynb_checkpoints. Here is an example of how you can exclude them.
```
 **/*.ipynb_checkpoints/
```
Choose a slim base image:

Many base images like python:3.10 come with various pre-installed packages. Consider using slimmer alternatives like python:3.10-slim or python:3.10-alpine to minimize the image size and build time. These slim versions often only contain Python itself and essential libraries.
```
 FROM python:3.10-slim
 #instead of FROM python:latest or FROM python:3.10
```
I hope these strategies—optimizing layer caching, minimizing unnecessary package installations, excluding irrelevant files, and choosing a slim image—will help you save time and enable faster, more frequent ML deployments.

How to choose the right Python base image for containerizing your ML application

Sudheer Ranjan — Wed, 24 Apr 2024 18:30:00 GMT

One often-overlooked aspect of deploying Machine Learning solutions with Docker is selecting the right base image. Discovering vulnerabilities in base images post-deployment in security-conscious organizations may result in building the ML models again. This can introduce additional time investment due to potential changes in library dependencies.

So, here are the two key factors I would consider in choosing the python image (also attached an image that illustrates the same):

Security:

Minimize attack surface: Start with the most lightweight base image that fulfills your requirements. Images like "python: alpine" and "python: slim" offer a solid foundation without unnecessary bloatware. Remember, more libraries increase the potential attack surface. Popular "slim" variants include "bullseye" and "bookworm." You can always install missing libraries using apt or apk within the container. I often favor 'slim' variants over Alpine Linux. While Alpine offers a smaller footprint, 'slim' typically includes some pre-installed, commonly used libraries, reducing the need for additional installation steps.

Version Control: Use explicit version tags instead of the generic "latest" tag. The "latest" tag can point to different images over time, causing inconsistencies and potentially introducing vulnerabilities. Choose a base image with a known version that has been scanned for security issues. Consider official base images from trusted sources like Docker Hub.

And you might ask how to find vulnerabilities in your already deployed solution. Usually, enterprises use plugins like PRISMA to help identify vulnerabilities. If your enterprise doesn't offer such tools, here is a Docker command you can run to find the vulnerabilities.
```
 docker scout 
```
Size Optimization:

Minimize footprint: Smaller base images lead to smaller container sizes, which translates to faster deployment and lower resource consumption. This is particularly important for large-scale deployments.

Let's see how to keep these factors in mind when choosing a base image from Docker Hub. The image below shows the vulnerabilities and sizes of the popular base images available on Docker Hub. Using the 'latest' tag can be risky in production because you don't know what might go wrong as the vulnerabilities are yet to be scanned. 'Slim' variants usually have a reasonable size and fewer vulnerabilities. For example, the 'Slim' variant has 0 critical and 2 high vulnerabilities. In some cases, you can remove some of these vulnerabilities.

By focusing on minimizing the attack surface and optimizing the size of your base image, you can build a more secure and performant solution. Always use explicit version tags and rely on trusted sources for your base images. Regularly scan for vulnerabilities and stay informed about updates to maintain a robust deployment. With these considerations in mind, you can confidently deploy your ML applications with Docker, knowing they are secure and optimized.