Voice-Activated ChatGPT Integration

Introduction

This documentation outlines how to create a voice-activated chat application using JavaScript and OpenAI’s ChatGPT API. By capturing voice input from the user’s microphone, this application enhances accessibility and user interaction with AI.

Prerequisites

Before proceeding, ensure you have the following:

Basic knowledge of HTML, CSS, and JavaScript.
An API key from OpenAI to access the ChatGPT model.

Step 1: Obtain Your API Key

Sign Up / Log In: Visit the OpenAI website and create an account if you don’t already have one.
Get Your API Key: Navigate to the API section in your OpenAI account and generate your API key. Keep this key secure, as it will be used for authentication.

Step 2: Setting Up Your Project

Create a new directory for your project. Within that directory, create two files: index.html and app.js.

Directory Structure

/voice-chatgpt
    ├── index.html
    └── app.js

Step 3: Creating the HTML Structure

In the index.html file, set up a simple HTML structure to allow users to start and stop recording their voice input.

HTML Code (index.html)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Voice Activated ChatGPT</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 20px;
        }
        #response {
            margin-top: 10px;
            border: 1px solid #ccc;
            padding: 10px;
            min-height: 50px;
        }
        .button {
            padding: 10px 20px;
            margin: 10px;
            cursor: pointer;
        }
    </style>
</head>
<body>

    <h1>Voice-Activated Chat with ChatGPT</h1>
    <button id="start-button" class="button">Start Recording</button>
    <button id="stop-button" class="button" disabled>Stop Recording</button>

    <div id="response"></div>

    <script src="app.js"></script>
</body>
</html>

Step 4: Implementing JavaScript Functionality

In the app.js file, implement the logic to record audio from the microphone, convert it to text using the Web Speech API, and send the text input to the ChatGPT API.

JavaScript Code (app.js)

const apiKey = 'YOUR_API_KEY'; // Replace with your actual OpenAI API key
const startButton = document.getElementById('start-button');
const stopButton = document.getElementById('stop-button');
const responseDiv = document.getElementById('response');

let recognition;

// Initialize Speech Recognition
if ('webkitSpeechRecognition' in window) {
    recognition = new webkitSpeechRecognition();
    recognition.continuous = false; // Stop automatically after the first result
    recognition.interimResults = false; // We want only final results

    recognition.onresult = async (event) => {
        const userMessage = event.results[0][0].transcript;
        responseDiv.innerHTML = "You said: " + userMessage;
        await sendToChatGPT(userMessage);
    };

    recognition.onerror = (event) => {
        console.error('Error occurred in recognition: ' + event.error);
        responseDiv.innerHTML = "Error occurred while recognizing speech.";
    };
} else {
    alert('Your browser does not support speech recognition. Please use Chrome or Edge.');
}

startButton.addEventListener('click', () => {
    recognition.start();
    startButton.disabled = true;
    stopButton.disabled = false;
});

stopButton.addEventListener('click', () => {
    recognition.stop();
    startButton.disabled = false;
    stopButton.disabled = true;
});

async function sendToChatGPT(userMessage) {
    responseDiv.innerHTML += "<br/>Loading response...";

    try {
        const response = await fetch('https://api.openai.com/v1/chat/completions', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${apiKey}`
            },
            body: JSON.stringify({
                model: "gpt-3.5-turbo",
                messages: [{ role: "user", content: userMessage }]
            })
        });

        const data = await response.json();
        const botReply = data.choices[0].message.content;
        responseDiv.innerHTML += `<br/>ChatGPT says: ${botReply}`; // Display the bot's response
    } catch (error) {
        console.error('Error:', error);
        responseDiv.innerHTML += "<br/>Error occurred while fetching response.";
    }
}

Step 5: Running Your Application

Insert Your API Key: Open app.js and replace YOUR_API_KEY with your actual OpenAI API key.
Open index.html: Launch the index.html file in your web browser.
Interact with ChatGPT: Click the “Start Recording” button to begin capturing your voice input. Speak your prompt, then click “Stop Recording.” The application will process your input and display the response from ChatGPT.

Conclusion

In this project, we successfully built a voice-activated application that allows users to interact with ChatGPT using their microphone. By combining JavaScript, the Web Speech API, and OpenAI’s ChatGPT, we created an intuitive and accessible user experience. This application serves as a foundational step towards developing more complex AI-driven solutions.

Feel free to expand upon this project by adding features like error handling, persistent conversation history, or enhanced UI design!