[go: up one dir, main page]

Skip to content
This repository has been archived by the owner on Jun 2, 2023. It is now read-only.

IBM/watson-speech-translator

Repository files navigation

WARNING: This repository is no longer maintained

This repository will not be updated. The repository will be kept available in read-only mode.

Create a language translator app with voice input and output

In this code pattern, we will create a language translator web app. Built with React components and a Node.js server, the app will capture audio input and stream it to a Watson Speech to Text service. As the input speech is transcribed, it will also be sent to a Watson Language Translator service to be translated into the language you select. Both the transcribed and translated text will be displayed by the app in real time. Each completed phrase will be sent to Watson Text to Speech to be spoken in your choice of locale-specific voices.

The best way to understand what is real-time transcription/translation vs. "completed phrase" vocalization is to try it out. You'll notice that the text is updated as words and phrases are completed and become better understood in context. To avoid backtracking or overlapping audio, only completed phrases are vocalized. These are typically short sentences or utterances where a pause indicates a break.

For the best live experience, wear headphones to listen to the translated version of what your microphone is listening to. Alternatively, you can use the toggle buttons to record and transcribe first without translating. When ready, select a language and voice and then enable translation (and speech).

When you have completed this code pattern, you will understand how to:

  • Stream audio to Speech to Text using a WebSocket
  • Use Language Translator with a REST API
  • Retrieve and play audio from Speech to Text using a REST API
  • Integrate Speech to Text, Language Translator, and Text to Speech in a web app
  • Use React components and a Node.js server

NOTE: This code pattern includes instructions for running Watson services on IBM Cloud or with the Watson API Kit on IBM Cloud Pak for Data. Click here for more information about IBM Cloud Pak for Data.

architecture

Flow

  1. User presses the microphone button and captures the input audio.
  2. The audio is streamed to Speech to Text using a WebSocket.
  3. The transcribed text from Speech to Text is displayed and updated.
  4. The transcribed text is sent to Language Translator and the translated text is displayed and updated.
  5. Completed phrases are sent to Text to Speech and the result audio is automatically played.

Steps

  1. Create the Watson services
  2. Deploy the server
  3. Use the web app

Create the Watson services

Note: You can skip this step if you will be using the Deploy to Cloud Foundry on IBM Cloud button below. That option automatically creates the services and binds them (providing their credentials) to the application.

Provision the following services:

  • Speech to Text
  • Language Translator
  • Text to Speech

The instructions will depend on whether you are provisioning services using IBM Cloud Pak for Data or on IBM Cloud.

Click to expand one:

IBM Cloud Pak for Data

Use the following instructions for each of the three services.

Install and provision service instances

The services are not available by default. An administrator must install them on the IBM Cloud Pak for Data platform, and you must be given access to the service. To determine whether the service is installed, Click the Services icon (services_icon) and check whether the service is enabled.

Gather credentials

  1. For production use, create a user to use for authentication. From the main navigation menu (☰), select Administer > Manage users and then + New user.
  2. From the main navigation menu (☰), select My instances.
  3. On the Provisioned instances tab, find your service instance, and then hover over the last column to find and click the ellipses icon. Choose View details.
  4. Copy the URL to use as the {SERVICE_NAME}_URL when you configure credentials.
  5. Optionally, copy the Bearer token to use in development testing only. It is not recommended to use the bearer token except during testing and development because that token does not expire.
  6. Use the Menu and select Users and + Add user to grant your user access to this service instance. This is the user name (and password) you will use when you configure credentials to allow the Node.js server to authenticate.
IBM Cloud

Create the service instances
  • If you do not have an IBM Cloud account, register for a free trial account here.
  • Click here to create a Speech to Text instance.
  • Click here to create a Language Translator instance.
  • Click here to create a Text to Speech instance.
Gather credentials
  1. From the main navigation menu (☰), select Resource list to find your services under Services.
  2. Click on each service to find the Manage view where you can collect the API Key and URL to use for each service when you configure credentials.

Deploy the server

Click on one of the options below for instructions on deploying the Node.js server.

local openshift cf

Use the web app

NOTE: The app was developed using Chrome on macOS. Browser compatibility issues are still being worked out.

watson-speech-translator.gif

  1. Browse to your app URL

    • Use the URL provided at the end of your selected deployment option.
  2. Select a speech recognition model

    • The drop-down will be populated with models supported by your Speech to Text service.
  3. Select an output language and voice

    • The drop-down will only include voices that are supported by your Text to Speech service. The list is also filtered to only show languages that can be translated from the source language using Language Translator.
  4. Use the Speech to Text toggle

    • Use the Speech to Text button (which becomes Stop Listening) to begin recording audio and streaming it to Speech to Text. Press the button again to stop listening/streaming.
  5. Use the Language Translation toggle

    • The Language Translation button (which becomes Stop Translating) is also a toggle. You can leave it enabled to translate while transcribing, or use it after you see the transcribed text that you'd like to translate and say.
  6. Disable Text to Speech

    • By default, the app automatically uses Text to Speech to read the translated output. The checkbox allows you to disable Text to Speech.
  7. Changing the language and voice

    • If you change the voice while language translation is enabled, any current transcribed text will be re-translated (and spoken if enabled).
  8. Resetting the transcribed text

    • The transcribed text will be cleared when you do any of the following:

      • Press Speech to Text to restart listening
      • Refresh the page
      • Change the speech recognition model

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ