Introduction
This project demonstrates how to use the Web Speech API in JavaScript to capture speech from the user's microphone and convert it into text. It uses a button to start the speech recognition process and displays the spoken words as text on the webpage. This is particularly useful for speech-to-text applications, voice commands, or simply converting voice input to text in real-time.
Technologies Used
HTML - To create the structure of the webpage.
JavaScript - To handle the speech recognition functionality using the Web Speech API.
Web Speech API - A browser feature that enables the recognition of spoken words.
Code Breakdown
HTML Structure
<h1>Speech to Text Converter</h1>
<button onclick="startRecognition()">Start Speaking</button>
<div id="output" style="margin-top: 20px; font-size: 20px;">Your speech will appear here...</div>
Heading (
h1
): A simple heading that gives context to the page.Button (
button
): The button starts the speech recognition process. When clicked, it triggers the JavaScript functionstartRecognition()
.Output Div (
div
): Thisdiv
is used to display the text converted from the spoken words. Initially, it has placeholder text ("Your speech will appear here...").
JavaScript with Web Speech API
The JavaScript code is embedded directly into the HTML file inside the <script>
tag. It handles the core functionality of capturing voice input and converting it to text.
<script>
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (SpeechRecognition) {
const recognition = new SpeechRecognition();
recognition.lang = 'en-US'; // Sets the language for recognition
recognition.continuous = false; // Stops listening after user stops talking
recognition.interimResults = false; // Displays only the final results
// Event fired when recognition produces a result
recognition.onresult = (event) => {
const transcript = event.results[0][0].transcript;
document.getElementById('output').innerText = transcript;
};
// Error handling
recognition.onerror = (event) => {
document.getElementById('output').innerText = `Error: ${event.error}`;
};
// Function to start recognition
function startRecognition() {
recognition.start();
}
} else {
// If browser doesn't support SpeechRecognition API
document.getElementById('output').innerText = "Speech recognition not supported in this browser.";
}
</script>
Explanation of JavaScript Code:
SpeechRecognition Object:
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
The
SpeechRecognition
object is created by checking for support in the browser. Chrome uses thewebkitSpeechRecognition
variant.Configuration:
recognition.lang = 'en-US'; recognition.continuous = false; recognition.interimResults = false;
lang
: Sets the language for recognition. In this case, it is set to English (en-US
). It can be changed to other languages like'hi-IN'
for Hindi.continuous
: Iftrue
, the recognition continues even after the user stops speaking. Here it is set tofalse
, meaning it stops when the user stops talking.interimResults
: If set totrue
, the speech recognition will return partial results as the user speaks. It's set tofalse
here to wait for final results.
onresult Event:
recognition.onresult = (event) => { const transcript = event.results[0][0].transcript; document.getElementById('output').innerText = transcript; };
This event is triggered when the speech recognition captures spoken words. The
transcript
is the text version of the speech and is displayed in theoutput
div.onerror Event:
recognition.onerror = (event) => { document.getElementById('output').innerText = `Error: ${event.error}`; };
If any error occurs (e.g., permission denied, no microphone access), this event displays the error message.
startRecognition Function:
function startRecognition() { recognition.start(); }
This function is triggered when the button is clicked, starting the speech recognition process.
Browser Compatibility Handling:
if (SpeechRecognition) { // Recognition setup } else { document.getElementById('output').innerText = "Speech recognition not supported in this browser."; }
The code checks if the browser supports the
SpeechRecognition
API. If it does, the speech recognition is set up. Otherwise, it displays an error message.
How to Use This Code
Save the Code: Copy the entire HTML + JavaScript code and save it as an
.html
file.Open in Browser: Open the saved HTML file in a browser that supports the Web Speech API, such as Google Chrome.
Allow Microphone Access: When prompted, allow the browser to access your microphone.
Start Speaking: Click the "Start Speaking" button and begin speaking. Your speech will be displayed as text on the page.
Customizations
Change Language: To recognize different languages, you can modify the
recognition.lang
property (e.g.,'hi-IN'
for Hindi).Continuous Listening: Set
recognition.continuous = true
if you want the recognition to continue listening even after the user stops talking.Interim Results: If you want partial text while still speaking, set
recognition.interimResults = true
.
Limitations
Browser Support: The Web Speech API is not supported in all browsers. Chrome is the most reliable for this feature.
Security: The API requires the user to grant microphone access, which might affect user experience if they are concerned about privacy.
Conclusion
This simple project demonstrates how to leverage the Web Speech API in JavaScript to convert speech into text in real-time. It is a starting point for building more advanced voice recognition applications like voice-controlled interfaces, dictation apps, or interactive voice assistants.