Voice based solutions are needed when there is a legal requirement to play back an already recorded statement, recording and playing back the telephony conversations, speech recognition systems, visually impaired people, for training customer support representations and so on. Voice based solutions can be implemented using J2SE Java Sound API or Java Media Framework (JMF). Java Sound API specification, available from J2SE 1.3.x and higher, provides low-level support for audio-operations such as audio playback and capture (recording), mixing, MIDI-sequencing, and MIDI synthesis in an extensible, flexible framework. But JMF is a much richer set of API communizing all kinds of media with one single set of interfaces. This document explains about Java Sound API and its implementation.
Java Sound API provides playback and capture support for PCM encoded WAVE, AU, AIFF, AIFC audio file formats by default. Playback and capture of non-standard audio formats like mp3, Ogg, Speex, GSM 6.1.0, Tritonus can also be implemented using Java Sound API. Support for vendor specific formats is provided transparently by an extension framework exposed in the form of Java Sound Service Provider Interfaces (SPI). SPI allows to plug-in-in different encoders, decoders for vendor formats and transcoders for different formats. An implementation of Java Sound SPI should be registered as an extension to standard Java SDK by making it available in the CLASSPATH of Java-virtual-machine. Application code making use of Java Sound API is hence made independent of vendor specific audio implementations.
Playback and Capture using Java Sound
In order to play or capture audio using the Java Sound API, at least three things are needed:
- Formatted audio-data – Formatted audio data refers to sound in any of a number of standard formats.
- A Mixer – In the Java Sound API, devices are represented by Mixer objects. A device is often a software interface to a physical input/output device.
- A Line – A line is an element of the digital audio "pipeline"—that is, a path for moving audio into or out of the system.
Audio Format encapsulates encoding technique, number of channels, sample rate, bits/sample, frame rate, frame size (in bytes), byte-order, properties.
A possible configuration of lines for Audio-Output may be represented as below…
A possible configuration of lines for Audio-Input may be represented as below…
The hierarchy of the audio line interfaces is as follows…
Steps involved for recording and playback
Steps involved for PCM encoded standard-file-formats recording using Java Sound API
- Get a source-dataline to read audio-data from a microphone port.
- If line exists and is not open, open it with user permission (forcefully opening sound-input port is treated eavesdropping).
- Start the target-dataline.
- Read from target-dataline and write to an audio output stream.
- Stop and close target-dataline.
Steps involved for PCM encoded standard-file-formats playback using Java Sound API
- Read sound-file as audio input-stream.
- Get a source-dataline to write audio-data to a speaker-port.
- If line exists and is not open, open it.
- Start the source-dataline.
- Write to source-dataline.
In order to playback or record using non-standard extensions to Java Sound API, an additional intermediate step to decode vendor-encoding to PCM encoding is necessary.
From Java 1.5 onwards, support exists to embed additional metadata as a set of key-value (String-Object) data pair. This is an optional requirement which may not be honored by java sound service providers.
In order to read/write from or to local files, Applets have to be granted permissions in either of the two ways as suggested below…
- Install permission by modifying ~JAVAHOME/lib/security/java.policy file with additional grant declarations.
- Install permission by asking to user to sign digitally. (User is supposed to click on a digital agreement popped up while running the applet)
Option-A is not possible when the applet is catering to unknown users browsing on internet.
Option-B is made possible by buying a RSA digital signature from any of security solution vendors like Thwarte, Verisign etc.
Also Non-standard format service provider implementations have to be registered with JRE by copying SPI archives into ~JAVAHOME/lib/ext.
Recording API can be integrated with web-browser using any of client-computing facilities. Seamless client computing can be done with technologies like Java-Applets, MS ActiveX etc.; Client computing is needed for the interaction with sound-input port (microphone port) on the local machine. To enable Applets record with microphone-input as source, they need to be digitally signed and accepted by the user for security reasons.
User interacts through a web-browser like Internet Explorer with a Java-Runtime Environment supporting Java 1.3.x and higher. User requests a recording page from server with a specific URL. Server then returns a web-page with an embedded recording Applet object. User initiates recording by clicking on “record” button. Applet then listens to sound-input (microphone) indefinitely till user terminates recording by clicking on “stop” button.
Following sequence diagram illustrates a very high level process for recording. (NOTE: The process of server archiving sound-stream into a file on some database is not depicted here)
Where do we use Voice Based Solutions?
Voice Based Solutions can be used in applications such as
- Recording the user’s voice and playing it back when the user request for it.
- Recording a person’s legal statement and playing it back when there is a legal requirement.
- Recording and playing back the telephonic conversations.
- Speech Recognition systems
- Software that aids the visually impaired people.
- Voice based Knowledge imparting software
Java Sound API is much more robust and gives greater control over the audio. Another advantage is the ability to manipulate the individual data streams. In earlier versions of the Java Sound API, one needed access to the entire sound clip before a sound could be played. Now one can buffer and read the sound using any sort of Producer/Consumer scheme, opening the way to network and streaming audio.
http://kbs.cs.tu-berlin.de/~jutta/toast.html
http://www.tritonus.org/plugins.html
Related Posts
- Java Commons-Email API
- Java Class for Sending Email using Java API
- Java Developers Forgetting that Java is zero-indexed
- Code Review Checklist for Java
- Sybase with DotNet Application and SQL Server Reporting Services
Tags: Java API, Java Sound, Voice Based Solution




