Input and Output

GPTBots Agent supports multiple types of input and output messages, including text, images, audio, video, documents, and files. Developers can customize the message types that can be submitted to the Agent in input based on business needs and define the submission method (single-turn or interruptible conversation). In output, developers can customize the language, message type, TTS voice tone, and tool invocation status returned by the Agent to users, thereby enhancing user experience and interaction efficiency.

Message Types

The message types supported by the GPTBots platform are defined as six types: text, images, audio, video, documents, and files. Message types serve as the communication protocol between the Agent and users, facilitating interaction with the Agent. The message types that the Agent can input depend on the configuration of the "Input-Attachment" feature. The message types that the Agent can output depend on the capabilities of the LLM adopted by the Agent.

Message Type	Message Format	Size Limit
Text Message	Default is string type, default mandatory support	Based on the context tokens length of the LLM
Image Message	.jpg, .jpeg, .png, .gif, .webp, etc.	≤ 10 MB
Audio Message	.wav, .mp3, etc.	≤ 25 MB
Video Message	.mp4, etc.	≤ 50 MB
Document Message	.pdf, .txt, .docx, .xls, .csv, .html, .json, .md, etc.	≤ 25 MB
Files Message	Default is .zip type, file URL will be forcibly placed in Text Message	≤ 25 MB

Note: The format support for Image, Audio, Video, and Document messages varies depending on the choice of "System Recognition" and "LLM Recognition."

After uploading an attachment, the business process diagram for file recognition in the Agent is as follows:

flowchart TD
    A[User Input Message] --> B{Select Supported Message Type}
    B --> |Text| C[Text Processing]
    B --> |Image| D[Image Processing]
    B --> |Audio| E[Audio Processing]
    B --> |Video| F[Video Processing]
    B --> |Document| G[Document Processing]
    B --> |File| H[File Processing]
    C & D & E & F & G & H --> I[Unified Transmission to Agent]
    I --> J{File Recognition Method}
    J --> K[LLM File Recognition]
    J --> L[System File Recognition]

After user message submission, the business process diagram for the Agent's response to different message types is as follows:

flowchart TD
    A[User Request] --> B[Agent Processing]
    B --> C{Output Message Type}
    C --> D1[Text Message]
    D1 -- Need TTS? --> E{TTS Voice Generation?}
    E -- No --> F1[Directly Output Text]
    E -- Yes --> F2[Invoke TTS Service]
    F2 --> G1[Output Audio Message]
    C --> D2[Audio Message]
    D2 --> G2[Directly Output Audio]
    C --> D3[Image Message]
    D3 --> G3[Directly Output Image]

Input Guide

Voice

alt text
GPTBots supports voice input, allowing users to choose between using a microphone or uploading an audio file for voice input. During voice input, developers can select the following three options:

When "Disable" is selected, the Voice Recording button in the Agent's input box will be hidden, prohibiting users from inputting via voice.
When "Speech-to-Text" is selected, the Voice Recording button will be displayed in the Agent's input box, allowing users to input via voice recording. The ASR model will be invoked to convert the voice into a text message.
When "Submit Audio Message" is selected, the Voice Recording button will be displayed in the Agent's input box, allowing users to input via voice recording. The audio file will be submitted to the LLM in the Agent for direct recognition and processing.

Note: The availability of the "Submit Audio Message" option depends on the file recognition capabilities of the LLM in the Agent. For FlowAgent, it is determined by the intersection of the file recognition capabilities of all LLM models.

Attachments

Agent Attachments
The GPTBots attachment feature allows users to select Attachment Recognition Schemes and customize the message types they need to support, meeting the needs of different business scenarios. Attachments support three schemes: "Disable," "LLM File Recognition," and "System File Recognition."

When "Disable" is selected:
- The Attachment Upload button in the Agent's input box will be hidden, prohibiting users from uploading various files via attachments.
When "LLM File Recognition" is selected:
- The Attachment Upload button will be displayed in the Agent's input box, allowing users to upload various files via attachments.
- Supported file types: Determined by the file recognition capabilities of the LLM adopted by the Agent. For FlowAgent, it is determined by the intersection of the file recognition capabilities of all LLM models.
- After successfully uploading a file, it will be directly recognized and processed by the LLM in the Agent.
When "System File Recognition" is selected:
- The Agent will recognize and extract the uploaded attachment, convert it into a text message, and submit it as a user query to the LLM in the Agent.
- Supported file types: Currently determined by the file recognition capabilities of the GPTBots platform.
Number of Attachments:
- The system's maximum attachment limit is 9, with a default of 1.

Message Submission Methods

Single-Turn Mode: Only one message can be submitted at a time, and the next message can only be submitted after the AI response is complete.
Interruptible Conversation Mode: In specific scenarios, multiple messages can be submitted simultaneously, and the AI will provide a unified response. This is more in line with human communication habits, enhancing user experience.

When the interruptible conversation feature is enabled, multiple messages that meet the following three conditions will be merged and submitted to the LLM: AI response not completed, within 5 seconds, and up to 5 messages.

Output Guide

Output Control

Agent Output Control

Agent Output Language: Based on the language capabilities of the LLM, the output language of the Agent can be set (language control is a soft guide and cannot guarantee 100% effectiveness).
Tool Invocation Status: Supports hiding/displaying the process status of tool invocation, which can be set based on business needs.
Workflow Invocation Status: Supports hiding/displaying the process status of workflow invocation, which can be set based on business needs.

Voice

Voice TTS

Disable: The Agent does not support TTS output.
TTS Voice Generation: Supports customizing the selection of TTS model services and voice tones, converting text messages into sound for playback.

Note: TTS voice generation is only applicable to Text Messages replied by the Agent. Other message types do not support TTS voice generation.