Paper | Model Instruction | Framework | Installation | Train | Benchmarks | Acknowledgement
- [04-01.2025]: ActionStudio is now open-source! Explore our paper and code for more information.
- [11.2024]: Add the latest examples and tokenizer info on interacting with xLAM models.
- [09.2024]: Join our Discord Community if you have any feedbacks!
- [09.2024]: Check our xLAM Technical Report Paper.
- [08.2024]: We are excited to announce the release of full xLAM family, our suite of Large Action Models! From the "tiny giant" to industrial powerhouses. These models have achieved impressive rankings, placing #1 and #6 on the Berkeley Function-Calling Leaderboard. Check our Hugging Face collection.
- [07.2024]: We are excited to announce the release of our two function-calling models: xLAM-1b-fc-r and xLAM-7b-fc-r. These models have achieved impressive rankings, placing #3 and #25 on the Berkeley Function-Calling Leaderboard, outperforming many significantly larger models. Stay tuned for more powerful models coming soon.
- [06.2024] Check our latest work APIGen, the best open-sourced models for function calling. Our dataset xlam-function-calling-60k is currently among the Top-3 trending datasets on HuggingFace, standing out in a field of 173,670 datasets as of July 4, 2024. See also the Twitter by Salesforce CEO, VentureBeat and 新智元.
- [03.2024] xLAM model is released! Try it together with AgentLite benchmark or other benchmarks, which is comparable to GPT-4!
- [02.2024] Initial Release of AgentOhana and xLAM paper!
This repo is for research purposes only.
Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories.
This repo introduces xLAM that aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. It standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training. Leveraging the data unification, our training pipeline maintains equilibrium across different data sources and preserves independent randomness across devices during dataset partitioning and model training.
Model | # Total Params | Context Length | Release Date | Category | Download Model | Download GGUF files |
---|---|---|---|---|---|---|
xLAM-7b-r | 7.24B | 32k | Sep. 5, 2024 | General, Function-calling | 🤗 Link | -- |
xLAM-8x7b-r | 46.7B | 32k | Sep. 5, 2024 | General, Function-calling | 🤗 Link | -- |
xLAM-8x22b-r | 141B | 64k | Sep. 5, 2024 | General, Function-calling | 🤗 Link | -- |
xLAM-1b-fc-r | 1.35B | 16k | July 17, 2024 | Function-calling | 🤗 Link | 🤗 Link |
xLAM-7b-fc-r | 6.91B | 4k | July 17, 2024 | Function-calling | 🤗 Link | 🤗 Link |
xLAM-v0.1-r | 46.7B | 32k | Mar. 18, 2024 | General, Function-calling | 🤗 Link | -- |
xLAM series are significant better at many things including general tasks and function calling. For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.
For example, xLAM-v0.1-r represents the version 0.1 of the Large Action Model series, with the "-r" indicating it's tagged for research. This model is compatible with VLLM and FastChat platforms.
Below is one example on using the older xLAM-v0.1-r model:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Salesforce/xLAM-v0.1-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/xLAM-v0.1-r", device_map="auto")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Note: You may need to tune the Temperature setting for different applications. Typically, a lower Temperature is helpful for tasks that require deterministic outcomes. Additionally, for tasks demanding adherence to specific formats or function calls, explicitly including formatting instructions is advisable and important.
There are two main options for serving the xLAM model as an OpenAI-compatible chat completion API (here we use Salesforce/xLAM-8x7b-r
and 4xA100 (40GB) setup as an example):
vLLM offers efficient serving with lower latency. To serve the model with vLLM:
vllm serve Salesforce/xLAM-8x7b-r --host 0.0.0.0 --port 8000 --tensor-parallel-size 4
FastChat provides a more feature-rich serving setup. To serve with FastChat:
- Start the controller:
python3 -m fastchat.serve.controller --host 0.0.0.0
- Start the OpenAI-compatible API server:
python3 -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000
- Launch the model worker:
python3 -m fastchat.serve.vllm_worker \
--model-names "Salesforce/xLAM-8x7b-r" \
--model-path Salesforce/xLAM-8x7b-r \
--host 0.0.0.0 \
--port 31005 \
--worker-address http://localhost:31001 \
--num-gpus 4 \
--limit-worker-concurrency 64
Once the model is served, you can use the following xLAM client to interact with it for function calling or other applications:
from xLAM.client import xLAMChatCompletion, xLAMConfig
# Configure the client
config = xLAMConfig(base_url="http://localhost:8000/v1/", model="Salesforce/xLAM-8x7b-r")
llm = xLAMChatCompletion.from_config(config)
# Example conversation
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather like in New York?"},
{"role": "assistant", "content": "To get the weather information for New York, I'll need to use the get_weather function.", "tool_calls": {"name": "get_weather", "arguments": '{"location": "New York", "unit": "fahrenheit"}'}},
{"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
{"role": "user", "content": "Now, search for the weather in San Francisco."}
]
# Example function definition (optional)
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state, e.g. San Francisco, New York"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
},
"required": ["location"]
}
},
{
"name": "search",
"description": "Search for information on the internet",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query, e.g. 'latest news on AI'"}
},
"required": ["query"]
}
},
{
"name": "respond",
"description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
"parameters": {
"type": "object",
"properties": {
"message": {"type": "string", "description": "The content of the message to respond to."}
},
"required": ["message"]
}
}
]
response = llm.completion(messages, tools=tools)
print(response)
ActionStudio Framework
❤️❤️❤️ Please refer ActionStudio.md for more details.
Install dependencies with:
conda create --name actionstudio python=3.10
bash requirements.sh
Development Version (Latest):
To use the latest code under active development, install ActionStudio in editable mode from the parent actionstudio directory:
pip install -e .
actionstudio/
├── datasets/ # Open-source unified trajectory datasets
├── examples/ # Usage examples and configurations
│ ├── data_configs/ # YAML configs for data mixtures
│ ├── deepspeed_configs/ # DeepSpeed training configuration files
│ └── trainings/ # Bash scripts for various training methods (**`README.md`**)
├── src/ # Source code
│ ├── data_conversion/ # Converting trajectories into training data (**`README.md`**)
│ └── criticLAM/ # Critic Large Action Model implementation (**`README.md`**)
└── foundation_modeling/ # Core modeling components
├── data_handlers/
├── train/
├── trainers/
└── utils/
Most top-level folders include a README.md with detailed instructions and explanations.
LLM Name | ZS | ZST | ReaAct | PlanAct | PlanReAct | BOLAA |
---|---|---|---|---|---|---|
Llama-2-70B-chat | 0.0089 | 0.0102 | 0.4273 | 0.2809 | 0.3966 | 0.4986 |
Vicuna-33B | 0.1527 | 0.2122 | 0.1971 | 0.3766 | 0.4032 | 0.5618 |
Mixtral-8x7B-Instruct-v0.1 | 0.4634 | 0.4592 | 0.5638 | 0.4738 | 0.3339 | 0.5342 |
GPT-3.5-Turbo | 0.4851 | 0.5058 | 0.5047 | 0.4930 | 0.5436 | 0.6354 |
GPT-3.5-Turbo-Instruct | 0.3785 | 0.4195 | 0.4377 | 0.3604 | 0.4851 | 0.5811 |
GPT-4-0613 | 0.5002 | 0.4783 | 0.4616 | 0.7950 | 0.4635 | 0.6129 |
xLAM-v0.1-r | 0.5201 | 0.5268 | 0.6486 | 0.6573 | 0.6611 | 0.6556 |
LLM Name | ZS | ZST | ReaAct | PlanAct | PlanReAct |
---|---|---|---|---|---|
Mixtral-8x7B-Instruct-v0.1 | 0.3912 | 0.3971 | 0.3714 | 0.3195 | 0.3039 |
GPT-3.5-Turbo | 0.4196 | 0.3937 | 0.3868 | 0.4182 | 0.3960 |
GPT-4-0613 | 0.5801 | 0.5709 | 0.6129 | 0.5778 | 0.5716 |
xLAM-v0.1-r | 0.5492 | 0.4776 | 0.5020 | 0.5583 | 0.5030 |
Please note: All prompts provided by AgentLite are considered "unseen prompts" for xLAM-v0.1-r, meaning the model has not been trained with data related to these prompts.
LLM Name | Act | ReAct | BOLAA |
---|---|---|---|
GPT-3.5-Turbo-16k | 0.6158 | 0.6005 | 0.6652 |
GPT-4-0613 | 0.6989 | 0.6732 | 0.7154 |
xLAM-v0.1-r | 0.6563 | 0.6640 | 0.6854 |
Easy | Medium | Hard | ||||
---|---|---|---|---|---|---|
LLM Name | F1 Score | Accuracy | F1 Score | Accuracy | F1 Score | Accuracy |
GPT-3.5-Turbo-16k-0613 | 0.410 | 0.350 | 0.330 | 0.25 | 0.283 | 0.20 |
GPT-4-0613 | 0.611 | 0.47 | 0.610 | 0.480 | 0.527 | 0.38 |
xLAM-v0.1-r | 0.532 | 0.45 | 0.547 | 0.46 | 0.455 | 0.36 |
LLM Name | Unseen Insts & Same Set | Unseen Tools & Seen Cat | Unseen Tools & Unseen Cat |
---|---|---|---|
TooLlama V2 | 0.4385 | 0.4300 | 0.4350 |
GPT-3.5-Turbo-0125 | 0.5000 | 0.5150 | 0.4900 |
GPT-4-0125-preview | 0.5462 | 0.5450 | 0.5050 |
xLAM-v0.1-r | 0.5077 | 0.5650 | 0.5200 |
LLM Name | 1-step | 2-step | 3-step | 4-step | 5-step |
---|---|---|---|---|---|
GPT-4-0613 | - | - | - | - | 69.45 |
Claude-Instant-1 | 12.12 | 32.25 | 39.25 | 44.37 | 45.90 |
xLAM-v0.1-r | 4.10 | 28.50 | 36.01 | 42.66 | 43.96 |
Claude-2 | 26.45 | 35.49 | 36.01 | 39.76 | 39.93 |
Lemur-70b-Chat-v1 | 3.75 | 26.96 | 35.67 | 37.54 | 37.03 |
GPT-3.5-Turbo-0613 | 2.73 | 16.89 | 24.06 | 31.74 | 36.18 |
AgentLM-70b | 6.48 | 17.75 | 24.91 | 28.16 | 28.67 |
CodeLlama-34b | 0.17 | 16.21 | 23.04 | 25.94 | 28.16 |
Llama-2-70b-chat | 4.27 | 14.33 | 15.70 | 16.55 | 17.92 |
LLM Name | Success Rate | Progress Rate |
---|---|---|
xLAM-v0.1-r | 0.533 | 0.766 |
DeepSeek-67B | 0.400 | 0.714 |
GPT-3.5-Turbo-0613 | 0.367 | 0.627 |
GPT-3.5-Turbo-16k | 0.317 | 0.591 |
Lemur-70B | 0.283 | 0.720 |
CodeLlama-13B | 0.250 | 0.525 |
CodeLlama-34B | 0.133 | 0.600 |
Mistral-7B | 0.033 | 0.510 |
Vicuna-13B-16K | 0.033 | 0.343 |
Llama-2-70B | 0.000 | 0.483 |
This code is licensed under Apache 2.0. For models based on the deepseek model, which require you to follow the use based restrictions in the linked deepseek license. This is a research only project.
We want to acknowledge the work which have made contributions to our paper and the agent research community! If you find our work useful, please consider to cite
@article{zhang2024agentohana,
title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
journal={arXiv preprint arXiv:2402.15506},
year={2024}
}
@article{liu2024apigen,
title={APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets},
author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and others},
journal={arXiv preprint arXiv:2406.18518},
year={2024}
}
@article{zhang2024xlamfamilylargeaction,
title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and Liu, Zhiwei and Feng, Yihao and Awalgaonkar, Tulika and Murthy, Rithesh and Hu, Eric and Chen, Zeyuan and Xu, Ran and Niebles, Juan Carlos and Heinecke, Shelby and Wang, Huan and Savarese, Silvio and Xiong, Caiming},
journal={arXiv preprint arXiv:2409.03215}
year={2024}
}