Speech to Text Converter Using JavaScript

Speech to Text Converter Using JavaScript

Introduction

This project demonstrates how to use the Web Speech API in JavaScript to capture speech from the user's microphone and convert it into text. It uses a button to start the speech recognition process and displays the spoken words as text on the webpage. This is particularly useful for speech-to-text applications, voice commands, or simply converting voice input to text in real-time.


Technologies Used

  1. HTML - To create the structure of the webpage.

  2. JavaScript - To handle the speech recognition functionality using the Web Speech API.

  3. Web Speech API - A browser feature that enables the recognition of spoken words.


Code Breakdown

HTML Structure

<h1>Speech to Text Converter</h1>
<button onclick="startRecognition()">Start Speaking</button>
<div id="output" style="margin-top: 20px; font-size: 20px;">Your speech will appear here...</div>
  • Heading (h1): A simple heading that gives context to the page.

  • Button (button): The button starts the speech recognition process. When clicked, it triggers the JavaScript function startRecognition().

  • Output Div (div): This div is used to display the text converted from the spoken words. Initially, it has placeholder text ("Your speech will appear here...").

JavaScript with Web Speech API

The JavaScript code is embedded directly into the HTML file inside the <script> tag. It handles the core functionality of capturing voice input and converting it to text.

<script>
  const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

  if (SpeechRecognition) {
    const recognition = new SpeechRecognition();
    recognition.lang = 'en-US';  // Sets the language for recognition
    recognition.continuous = false;  // Stops listening after user stops talking
    recognition.interimResults = false;  // Displays only the final results

    // Event fired when recognition produces a result
    recognition.onresult = (event) => {
      const transcript = event.results[0][0].transcript;
      document.getElementById('output').innerText = transcript;
    };

    // Error handling
    recognition.onerror = (event) => {
      document.getElementById('output').innerText = `Error: ${event.error}`;
    };

    // Function to start recognition
    function startRecognition() {
      recognition.start();
    }
  } else {
    // If browser doesn't support SpeechRecognition API
    document.getElementById('output').innerText = "Speech recognition not supported in this browser.";
  }
</script>

Explanation of JavaScript Code:

  1. SpeechRecognition Object:

     const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
    

    The SpeechRecognition object is created by checking for support in the browser. Chrome uses the webkitSpeechRecognition variant.

  2. Configuration:

     recognition.lang = 'en-US';
     recognition.continuous = false;
     recognition.interimResults = false;
    
    • lang: Sets the language for recognition. In this case, it is set to English (en-US). It can be changed to other languages like 'hi-IN' for Hindi.

    • continuous: If true, the recognition continues even after the user stops speaking. Here it is set to false, meaning it stops when the user stops talking.

    • interimResults: If set to true, the speech recognition will return partial results as the user speaks. It's set to false here to wait for final results.

  3. onresult Event:

     recognition.onresult = (event) => {
       const transcript = event.results[0][0].transcript;
       document.getElementById('output').innerText = transcript;
     };
    

    This event is triggered when the speech recognition captures spoken words. The transcript is the text version of the speech and is displayed in the output div.

  4. onerror Event:

     recognition.onerror = (event) => {
       document.getElementById('output').innerText = `Error: ${event.error}`;
     };
    

    If any error occurs (e.g., permission denied, no microphone access), this event displays the error message.

  5. startRecognition Function:

     function startRecognition() {
       recognition.start();
     }
    

    This function is triggered when the button is clicked, starting the speech recognition process.

  6. Browser Compatibility Handling:

     if (SpeechRecognition) {
       // Recognition setup
     } else {
       document.getElementById('output').innerText = "Speech recognition not supported in this browser.";
     }
    

    The code checks if the browser supports the SpeechRecognition API. If it does, the speech recognition is set up. Otherwise, it displays an error message.


How to Use This Code

  1. Save the Code: Copy the entire HTML + JavaScript code and save it as an .html file.

  2. Open in Browser: Open the saved HTML file in a browser that supports the Web Speech API, such as Google Chrome.

  3. Allow Microphone Access: When prompted, allow the browser to access your microphone.

  4. Start Speaking: Click the "Start Speaking" button and begin speaking. Your speech will be displayed as text on the page.


Customizations

  1. Change Language: To recognize different languages, you can modify the recognition.lang property (e.g., 'hi-IN' for Hindi).

  2. Continuous Listening: Set recognition.continuous = true if you want the recognition to continue listening even after the user stops talking.

  3. Interim Results: If you want partial text while still speaking, set recognition.interimResults = true.


Limitations

  • Browser Support: The Web Speech API is not supported in all browsers. Chrome is the most reliable for this feature.

  • Security: The API requires the user to grant microphone access, which might affect user experience if they are concerned about privacy.


Conclusion

This simple project demonstrates how to leverage the Web Speech API in JavaScript to convert speech into text in real-time. It is a starting point for building more advanced voice recognition applications like voice-controlled interfaces, dictation apps, or interactive voice assistants.