Comprehensive Guide to Integrating Tools and APIs with Language Models

Comprehensive Guide to Integrating Tools and APIs with Language Models

Pranav Patel
LLMs

With a sudden rise of LLMs like GPT-4, we are seeing a massive rise in productivity. Language models have more utility than ever. Everyone is using them and they are everywhere. This raises the need to integrate LLMs with external tools and APIs. ChatGPT with extensions solves this to some extent, but not fully.

Issues like integrating ChatGPT with private tools are still not fully solved. This requires building your own framework for integration and then using ChatGPT. In this article we will cover how you can integrate tools and third-party APIs with GPT-4 using function calling, prompting techniques, and other methods.

Why do we need to integrate tools and APIs with LLMs

Language Models are obviously going to be anywhere. I like to call them “a unified UI” for a thousand tools. We have to use hundreds if not thousands of applications on a monthly basis to do everyday things. Using tools like Excel, Email Systems, CRMs, Project management tools, etc., can add so much friction to doing something as simple as replying to an email. If we can have an LLM tightly integrated with these tools, we can just write single-line prompts and LLM does all the tasks. Here are some reasons why we need to integrate tools into LLMs and how it would help

Using Private LLMs with Private Data

As mentioned before, it is almost necessary to use LLMs, and more so with private data. Industries like Healthcare, Oil and Gas, and Government agencies cannot provide their data to OpenAI at all, they need to use self-hosted solutions using models like LLaMA and Falcon. These self-hosted solutions don’t have access to plugins like ChatGPT does. They need to build their own pipelines and plugins. This is difficult and time-consuming.

However if done properly, using self-hosted LLMs with private data can solve many issues and can boost productivity. For example, in the healthcare industry, integrating LLMs with electronic health records (EHRs) can assist doctors and medical professionals in analyzing patient data and providing more accurate diagnoses. This can save time and improve patient outcomes.

Higher Utility

Integrating tools with LLMs can also increase the utility of these tools AND the LLMs. With LLMs, users can perform tasks more efficiently and accurately, reducing the need for manual labor and multiple tools. Just a single command to an LLM and everything gets taken care of. This can lead to cost savings for businesses and increased productivity for individuals. 

Better, More Accurate Responses

A simple yet VERY effective reason to integrate tools with LLMs is to increase the quality of the response. LLMs on their own are simply text-generation machines, very powerful, but they lack proper context. This means you can ask questions like “How to write a good proposal?” but you cannot ask “How to write a good proposal to sell my services to Mercity?”, this is because the LLM has no context of what your services are and who Mercity is. This is where the need for custom context arises.

Answering questions based on private data is actually much simpler. We have written an excellent guide on how to integrate custom private data with GPT-4. You can check it out. We use an embeddings-based retrieval system to extract the relevant chunks of text to answer questions based on private data.

Single Interface for Multiple Tools

As mentioned before, LLMs can act as a unified UI for multiple tools. This is becoming necessary because the number of tools we use is increasing rapidly. Also, the information and knowledge are completely spread out on different platforms. LLMs can reduce the friction of these multi-platform tasks. Language models like GPT-4 are smart enough to string function calls together and use multiple tools in a chain, collecting data, and planning and executing the given task.

For example, if you need to extract meeting notes from sales calls, write a proposal and send over it to the client. An LLM can find the meeting transcripts, extract notes from them, write a proposal, and have it ready for your review. This will save around 1~2 hours of time. Once the LLM has drafted a proposal, all you need to do is make any changes necessary and once done, you are ready to send it. So all you need to do is review the generation of the model, while the model takes care of the research and compiling the proposal, which takes more time.

How to Integrate Tools and APIs with LLMs

At Mercity, we have built our own pipelines for tool integration. At the core is an LLM, and the Tool-use Prompt. Note that LLM here can be any capable instruction following the Language Model. GPT-4 is the smartest and the best model out there, but bigger models like LLaMa 70B and Falcon 180B can also be used here. Models just have to be smart enough to follow the prompts and should be able to generate sophisticated outputs.

Let’s break this pipeline down step by step.

Large Language Model

A Language model like GPT-4 is at the core of it all, it acts as the user interface for the tools and the APIs we want to integrate. The model doesn’t necessarily need to be finetuned for chatting, but it would be better if it is. In our findings, we have noticed that larger models work better for these applications, simply because they are better instruction followers and are much better at maintaining multi-step conversations without losing the nature of the conversation. Smaller models like LLaMa-13B can be finetuned to a great degree to follow specific tool use prompts using PEFT Techniques. Specifically, techniques like Prefix Tuning and IA3 are very popular for tuning LLMs with smaller datasets.

Once the LLM is selected and is validated to follow instructions properly, we can tightly integrate it with a Tool Database and Tool-Use Prompt.

API Tool Database

A tool database is simply a collection of all the tools you might have or want to use with the language model. This can simply be a list of tools and APIs in text or can be a much more sophisticated dynamically fetched pipeline. Most of the time, we use simple text, with the name and description of the API along with how to use it and when, and provide it to the LLM in the form of a tool library.

When providing API, we abstract it as a function call and use the arguments to construct a schema for the API call.

Here’s what a demo tool library would look like:

Tool Library:

- web_search(query) - Use this function to find information on the Internet. It's a general search function that can be applied to almost any topic. You pass the query string here. Make sure your queries are precise.

- embedding_database_search(query) - This function is specifically designed for retrieving information. You can use this tool over others to find information about very specific personal topics. In `query` pass the information you want to find about a topic.

- wikipedia_search(query) - Use this function when the information needed can be found on Wikipedia. It serves as a direct conduit to this comprehensive knowledge base. This provides extensive knowledge on a specific topic. Use this function accordingly.

Tool Use Prompt

Once we have the tool library ready, we can put it in the tool use prompt and explain to the LLM how to call and use tools. This is a very important part of the pipeline as this determines exactly how and when the language model will use the tool. There are multiple prompting techniques that can be used here, but at Mercity we like to use our own self-built prompts. We will take a deeper look at the prompting techniques for tool use now.

Prompting for Tool Using and API Calling

As said above, this is perhaps the most important part. LLMs need to be prompted properly on how to use the tools you have provided and when to use them. We need to craft the perfect prompt for this. There are many ways to do this, but we like to use the most basic technique.

We simply provide the tool library to the model and ask the model to output in a very specific format so that we can parse and use the tool when needed. This is what the prompt looks like when combined with the aforementioned tool library:

You are a masterful tool user, you must assist the user with their queries, but you must also use the provided tools when necessary.

You must reply in a concise and simple way and must always follow the provided rules.

===========================================================

Tool Library:

- web_search(query) - Use this function to find information on the Internet. It's a general search function that can be applied to almost any topic. You pass the query string here. Make sure your queries are precise.

- embedding_database_search(query) - This function is specifically designed for retrieving information. You can use this tool over others to find information about very specific personal topics. In `query` pass the information you want to find about pets in the specified categories.

- wikipedia_search(query) - Use this function when the information needed can be found on Wikipedia. It serves as a direct conduit to this comprehensive knowledge base. This provides extensive knowledge on a specific topic. Use this function accordingly.



===========================================================

To use these tools you can output func_name(query) in middle of generation or ONLY output the function call.

Example outputs:

- The current president of the United States of America is web_search("Who is the current president of United States")

- wikipedia_search("Joe Biden") = You can output like this when user wants extensive detail on a specific topic or person.



===========================================================

Note that you must always follow the provided rules and output in the given manner. Using a function is not always necessary, use only when needed.

This is what the outputs look like from this prompt:

You can see that the model was able to properly identify when it needed to call the function. It did not call the function when I asked it about the topics I could write on, but did call the function when it was absolutely needed.

Even more so, it was properly able to identify when it needed to query my personal documents and was able to write an excellent query to use the embedding search tool with:

This method of prompting is extremely easy and works beautifully. In our experiments, we have seen some issues arise when you try to combine this method with already very long prompts. With a longer prompt, it gets extremely hard to make the model output in the proper format, and the accuracy of tool use drops. But these issues are largely only seen with GPT3.5, and not with GPT-4. GPT-4 is much better at following complicated formats and instructions.

React Based Prompting

There is another better prompting method called ReAct. This method has been popular lately. ReAct breaks down the LLM outputs into 3 aspects, Thought, Act, and Observation. Here is a breakdown of these parts:

  • Thought: This is the part where LLM THINKS what it needs to do. The model analyzes the input and generates somewhat of a plan on what do to.
  • Act: This is the action part. Based on the thought the LLM now acts. This can be using a tool or calling an API or interacting with something else. Or this can be left blank if needed.
  • Observation: In the end, based on the thoughts and the output of the Action, an observation is made. This observation can be an answer to a question or starting of yet another ReAct chain.

Here is an application of this prompt:

You can see that LLM was able to correctly identify the need for the tool and call it accordingly. ReAct prompting is better than the basic prompting we showed above because this allows the model to analyze the input before providing an output, and this boosts the accuracy. The only downside is a bit increase in token usage, but the increase in accuracy and control over outputs makes that worth it.

Here is the SYSTEM prompt we use:



You are a masterful tool user, you must assist the user with their queries, but you must also use the provided tools when necessary.

You must reply in a concise and simple way and must always follow the provided rules.

===========================================================

Tool Library:

- web_search(query) - Use this function to find information on the Internet. It's a general search function that can be applied to almost any topic. You pass the query string here. Make sure your queries are precise.

- embedding_database_search(query) - This function is specifically designed for retrieving information. You can use this tool over others to find information about very specific personal topics. In `query` pass the information you want to find about pets in the specified categories.

- wikipedia_search(query) - Use this function when the information needed can be found on Wikipedia. It serves as a direct conduit to this comprehensive knowledge base. This provides extensive knowledge on a specific topic. Use this function accordingly.



===========================================================

This is the format you need to output in:

Thought: THINK AND ANALYZE THE INPUT, GOAL AND SITUATION Act: If you need to call a tool or use a function, you can do it here: func(query). If no need to use a tool, leave this empty. The output of the function will be provided here. Observation: Based on the Thoughts and results of the Act, provide a reply. If you are using a tool, no need to output this.

Example Acts to use tools:

- The current president of the United States of America is web_search("Who is the current president of United States")

- wikipedia_search("Joe Biden") = You can output like this when the user wants extensive detail on a specific topic or person.



===========================================================

Note that you must always follow the provided rules and output in the given manner. Using a function is not always necessary, use it only when needed.

Function Calling

Function calling is a feature released by OpenAI. This allows you to integrate your Chat GPT models like GPT-3.5 and GPT-4 directly with the functions you want to call. You can provide the schema of your functions or APIs and the model will use the provided functions when needed.

Function Calling is the go-to way and probably the first step to take if you are looking to integrate an API with your GPT-4 or GPT-3.5.

Here is an example of how you are supposed to pass the schema of your function to the model input:

The model will use the function as needed:

How effective is Function Calling for Tool Use?

In our experience, we have found that OpenAI models usually work well with function calling. But as the number of functions grows and you try to add functions that are more custom to your needs, the quality drops quickly and drastically. Also, we have found that the token usage also increases greatly, this is because OpenAI adds the prompts to the system prompt, and the JSON schema takes up a lot of tokens when compared to ReAct or simple tool use prompting.

Many users have also reported that models sometimes hallucinate and output random function names. Here is a good forum post to read that shows how unreliable function calling is: Function Calling Very Unreliable.

Training LLMs to Use Tools

Toolformer

Toolformer is a model by Meta trained to decide which API to call and then call it. Meta trained this model specifically for tool use and has shown great results. The model calls the functions and stops the generation, then the tool use pipeline provides an answer and the generation continues.

This approach even though simple, is quite effective. But has major issues. For example, this approach works as long as the responses from the tools are short. If the responses grow in length, the quality of outputs will stop dropping. And most of the time the responses from the tools are going to be long and complicated, this can lead to context overflow and model forgetting what was being talked about originally.

Gorilla LLM

Gorilla LLM is a large language model coming out of Microsoft and UC Berkeley that can generate API calls from natural language queries. Gorilla LLM can understand and use over 1,600 APIs from various domains, such as machine learning, cloud computing, and web development.

Gorilla LLM is trained on three massive machine learning hub datasets: Torch Hub, TensorFlow Hub, and HuggingFace. It also uses a document retriever to adapt to changes in the API documentation and provide accurate and up-to-date results. Gorilla LLM outperforms other large language models, such as GPT-4, Chat-GPT, and Claude, in writing API calls. 

Use Cases of LLMs integrated with APIs and other Tools

Now that we have discussed how to connect APIs and Tools with LLMs, let’s talk about some of the use cases we have for this.

Integrating with Email Service Providers

This is perhaps the most obvious one. Email inboxes have become very messy with hundreds of emails coming in every day. This makes it incredibly difficult to process information properly and reply to them timely. We already have spam filters, but they do not help clean up the mess we have in our inboxes.

LLMs can be paired together with these inboxes to read your emails and provide summaries, prioritize what emails to reply and even reply to your emails if allowed to. Even very simple, 3 billion parameter models can be deployed to take care of these tasks.

You can build a private assistant to take care of your emails end to end using an LLM connected with GMail API, or via IMAP and SMTP.

Integrating with CRM Systems

Customer Relationship Management tools are extremely messy. Multiple teams use it, from sales to marketing to support and whatnot. CRMs are used for multiple things like storing customer data, call transcripts, feedback, and a ton of other data. And this data needs to be shared across teams. Hence, maintaining CRMs can be very complicated for everyone.

LLMs can be integrated with CRM APIs to simplify a ton of workflows, for example, LLMs can generate meeting notes, which can save salespeople a ton of time. It can extract valuable insights from support and customer meetings for marketing teams and put everything in a proper, consumable format.

LLMs can compress information spread in CRMs and generate simple reports for pretty much anything or any specific customer you have.

Integrating with CMS

Similar to CRMs, content management systems, and pipelines also have multiple functionalities, from creating and writing content to editing and to SEO optimization and whatnot. Language models can easily be integrated with every bit of these pipelines.

LLMs can be used to generate content, edit, and remove any unnecessary parts. LLM agents can also be deployed to plan, and generate content outlines, and then go ahead and generate the actual content and publish it.

WordPress APIs are one of the best and easiest to integrate with LLMs as you can access almost all parts of the pipeline.

Challenges of Integrating Tools with LLMs

Even though we have outlined many approaches to integrate LLMs with tools, there are still many challenges that make this task difficult. Let’s go over them.

Context Overflow

As seen with Toolformer and OpenAI function calling, context overflow is a big issue. This happens because we need to prompt the model with the tools we want to integrate. This means we need to add the tool names, descriptions of the tools, examples of how to use it when to use it, and more details. This can lead to major issues like a reduction in the output length because the prompt itself is so long Or a significant increase in token usage costs.

Accuracy drops as the number of tools grows

This is pretty evident. As the number of tools integrated with an LLM increases, maintaining accuracy and efficiency can become a difficult task. If your tools are very similar to each other, or if you don’t provide good enough examples, the model can get confused and call the wrong functions at times. Or not call a function at all! This can be fixed with better prompting.

Latency

Latency is a significant challenge when integrating tools with LLMs. The time it takes for data to be processed and for results to be produced from the tools. High latency can lead to delays in decision-making and can negatively impact the user experience. This is particularly problematic in real-time applications, where delays of even a few seconds can have significant issues.

Trust

This is not a huge issue, but if you are using the model to generate code or to act on your behalf, you need to trust the model. If the model, for example, replies incorrectly or deletes the wrong files from your folder, it can cause major problems. This can simply be fixed by making sure there is a human in the loop and reviewing the steps taken by the LLM.

Want to integrate your tools with LLMs?

If you want to integrate your own Email Systems, CRMs, or any other APIs or internal or external tools with private or public LLMs, contact us. We have a ton of experience building applications that require a tight integration of tools and models like LLaMA, GPT-4, etc.7

Subscribe to stay informed

Subscribe to our newsletter to stay updated on all things AI!
Subscribe
Awesome, you subscribed!
Error! Please try again.