Skip to content

Commit 2a0c2f3

Browse files
feat(tts): add method parameters
Add parameter `spellOutMode` to `synthesize`
1 parent cdcc228 commit 2a0c2f3

File tree

4 files changed

+191
-209
lines changed

4 files changed

+191
-209
lines changed

text-to-speech/src/main/java/com/ibm/watson/text_to_speech/v1/TextToSpeech.java

Lines changed: 81 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* (C) Copyright IBM Corp. 2019, 2022.
2+
* (C) Copyright IBM Corp. 2022.
33
*
44
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
55
* the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@
1212
*/
1313

1414
/*
15-
* IBM OpenAPI SDK Code Generator Version: 3.46.0-a4e29da0-20220224-210428
15+
* IBM OpenAPI SDK Code Generator Version: 3.53.0-9710cac3-20220713-193508
1616
*/
1717

1818
package com.ibm.watson.text_to_speech.v1;
@@ -62,17 +62,11 @@
6262
import com.ibm.watson.text_to_speech.v1.model.Voice;
6363
import com.ibm.watson.text_to_speech.v1.model.Voices;
6464
import com.ibm.watson.text_to_speech.v1.model.Words;
65-
import com.ibm.watson.text_to_speech.v1.websocket.SynthesizeCallback;
66-
import com.ibm.watson.text_to_speech.v1.websocket.TextToSpeechWebSocketListener;
6765
import java.io.InputStream;
6866
import java.util.HashMap;
6967
import java.util.Map;
7068
import java.util.Map.Entry;
71-
import okhttp3.HttpUrl;
7269
import okhttp3.MultipartBody;
73-
import okhttp3.OkHttpClient;
74-
import okhttp3.Request;
75-
import okhttp3.WebSocket;
7670

7771
/**
7872
* The IBM Watson™ Text to Speech service provides APIs that use IBM's speech-synthesis
@@ -98,6 +92,15 @@
9892
* also define speaker models to improve the quality of your custom prompts. The service support
9993
* custom prompts only for US English custom models and voices.
10094
*
95+
* <p>Effective 31 March 2022, all neural voices are deprecated. The deprecated voices remain
96+
* available to existing users until 31 March 2023, when they will be removed from the service and
97+
* the documentation. The neural voices are supported only for IBM Cloud; they are not available for
98+
* IBM Cloud Pak for Data. All enhanced neural voices remain available to all users. For more
99+
* information, see the [31 March 2022 service
100+
* update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#text-to-speech-31march2022)
101+
* in the release notes for {{site.data.keyword.texttospeechshort}} for
102+
* {{site.data.keyword.cloud_notm}}.{: deprecated}.
103+
*
101104
* <p>API Version: 1.0.0 See: https://cloud.ibm.com/docs/text-to-speech
102105
*/
103106
public class TextToSpeech extends BaseService {
@@ -158,6 +161,14 @@ public TextToSpeech(String serviceName, Authenticator authenticator) {
158161
* change from call to call; do not rely on an alphabetized or static list of voices. To see
159162
* information about a specific voice, use the [Get a voice](#getvoice).
160163
*
164+
* <p>**Note:** Effective 31 March 2022, all neural voices are deprecated. The deprecated voices
165+
* remain available to existing users until 31 March 2023, when they will be removed from the
166+
* service and the documentation. The neural voices are supported only for IBM Cloud; they are not
167+
* available for IBM Cloud Pak for Data. All enhanced neural voices remain available to all users.
168+
* For more information, see the [31 March 2022 service
169+
* update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#text-to-speech-31march2022)
170+
* in the release notes.
171+
*
161172
* <p>**See also:** [Listing all available
162173
* voices](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices#listVoices).
163174
*
@@ -186,6 +197,14 @@ public ServiceCall<Voices> listVoices(ListVoicesOptions listVoicesOptions) {
186197
* change from call to call; do not rely on an alphabetized or static list of voices. To see
187198
* information about a specific voice, use the [Get a voice](#getvoice).
188199
*
200+
* <p>**Note:** Effective 31 March 2022, all neural voices are deprecated. The deprecated voices
201+
* remain available to existing users until 31 March 2023, when they will be removed from the
202+
* service and the documentation. The neural voices are supported only for IBM Cloud; they are not
203+
* available for IBM Cloud Pak for Data. All enhanced neural voices remain available to all users.
204+
* For more information, see the [31 March 2022 service
205+
* update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#text-to-speech-31march2022)
206+
* in the release notes.
207+
*
189208
* <p>**See also:** [Listing all available
190209
* voices](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices#listVoices).
191210
*
@@ -206,10 +225,13 @@ public ServiceCall<Voices> listVoices() {
206225
* <p>**See also:** [Listing a specific
207226
* voice](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices#listVoice).
208227
*
209-
* <p>**Note:** The Arabic, Chinese, Czech, Dutch (Belgian and Netherlands), Australian English,
210-
* Korean, and Swedish languages and voices are supported only for IBM Cloud; they are deprecated
211-
* for IBM Cloud Pak for Data. Also, the `ar-AR_OmarVoice` voice is deprecated; use the
212-
* `ar-MS_OmarVoice` voice instead.
228+
* <p>**Note:** Effective 31 March 2022, all neural voices are deprecated. The deprecated voices
229+
* remain available to existing users until 31 March 2023, when they will be removed from the
230+
* service and the documentation. The neural voices are supported only for IBM Cloud; they are not
231+
* available for IBM Cloud Pak for Data. All enhanced neural voices remain available to all users.
232+
* For more information, see the [31 March 2022 service
233+
* update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#text-to-speech-31march2022)
234+
* in the release notes.
213235
*
214236
* @param getVoiceOptions the {@link GetVoiceOptions} containing the options for the call
215237
* @return a {@link ServiceCall} with a result of type {@link Voice}
@@ -250,41 +272,45 @@ public ServiceCall<Voice> getVoice(GetVoiceOptions getVoiceOptions) {
250272
* <p>**See also:** [The HTTP
251273
* interface](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-usingHTTP#usingHTTP).
252274
*
253-
* <p>**Note:** The Arabic, Chinese, Czech, Dutch (Belgian and Netherlands), Australian English,
254-
* Korean, and Swedish languages and voices are supported only for IBM Cloud; they are deprecated
255-
* for IBM Cloud Pak for Data. Also, the `ar-AR_OmarVoice` voice is deprecated; use the
256-
* `ar-MS_OmarVoice` voice instead.
275+
* <p>**Note:** Effective 31 March 2022, all neural voices are deprecated. The deprecated voices
276+
* remain available to existing users until 31 March 2023, when they will be removed from the
277+
* service and the documentation. The neural voices are supported only for IBM Cloud; they are not
278+
* available for IBM Cloud Pak for Data. All enhanced neural voices remain available to all users.
279+
* For more information, see the [31 March 2022 service
280+
* update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#text-to-speech-31march2022)
281+
* in the release notes.
257282
*
258283
* <p>### Audio formats (accept types)
259284
*
260285
* <p>The service can return audio in the following formats (MIME types). * Where indicated, you
261286
* can optionally specify the sampling rate (`rate`) of the audio. You must specify a sampling
262-
* rate for the `audio/l16` and `audio/mulaw` formats. A specified sampling rate must lie in the
263-
* range of 8 kHz to 192 kHz. Some formats restrict the sampling rate to certain values, as noted.
264-
* * For the `audio/l16` format, you can optionally specify the endianness (`endianness`) of the
265-
* audio: `endianness=big-endian` or `endianness=little-endian`.
287+
* rate for the `audio/alaw`, `audio/l16`, and `audio/mulaw` formats. A specified sampling rate
288+
* must lie in the range of 8 kHz to 192 kHz. Some formats restrict the sampling rate to certain
289+
* values, as noted. * For the `audio/l16` format, you can optionally specify the endianness
290+
* (`endianness`) of the audio: `endianness=big-endian` or `endianness=little-endian`.
266291
*
267292
* <p>Use the `Accept` header or the `accept` parameter to specify the requested format of the
268293
* response audio. If you omit an audio format altogether, the service returns the audio in Ogg
269294
* format with the Opus codec (`audio/ogg;codecs=opus`). The service always returns single-channel
270-
* audio. * `audio/basic` - The service returns audio with a sampling rate of 8000 Hz. *
271-
* `audio/flac` - You can optionally specify the `rate` of the audio. The default sampling rate is
272-
* 22,050 Hz. * `audio/l16` - You must specify the `rate` of the audio. You can optionally specify
273-
* the `endianness` of the audio. The default endianness is `little-endian`. * `audio/mp3` - You
274-
* can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz. *
275-
* `audio/mpeg` - You can optionally specify the `rate` of the audio. The default sampling rate is
276-
* 22,050 Hz. * `audio/mulaw` - You must specify the `rate` of the audio. * `audio/ogg` - The
277-
* service returns the audio in the `vorbis` codec. You can optionally specify the `rate` of the
278-
* audio. The default sampling rate is 22,050 Hz. * `audio/ogg;codecs=opus` - You can optionally
279-
* specify the `rate` of the audio. Only the following values are valid sampling rates: `48000`,
280-
* `24000`, `16000`, `12000`, or `8000`. If you specify a value other than one of these, the
281-
* service returns an error. The default sampling rate is 48,000 Hz. * `audio/ogg;codecs=vorbis` -
295+
* audio. * `audio/alaw` - You must specify the `rate` of the audio. * `audio/basic` - The service
296+
* returns audio with a sampling rate of 8000 Hz. * `audio/flac` - You can optionally specify the
297+
* `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/l16` - You must specify
298+
* the `rate` of the audio. You can optionally specify the `endianness` of the audio. The default
299+
* endianness is `little-endian`. * `audio/mp3` - You can optionally specify the `rate` of the
300+
* audio. The default sampling rate is 22,050 Hz. * `audio/mpeg` - You can optionally specify the
301+
* `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/mulaw` - You must specify
302+
* the `rate` of the audio. * `audio/ogg` - The service returns the audio in the `vorbis` codec.
282303
* You can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz. *
283-
* `audio/wav` - You can optionally specify the `rate` of the audio. The default sampling rate is
284-
* 22,050 Hz. * `audio/webm` - The service returns the audio in the `opus` codec. The service
285-
* returns audio with a sampling rate of 48,000 Hz. * `audio/webm;codecs=opus` - The service
286-
* returns audio with a sampling rate of 48,000 Hz. * `audio/webm;codecs=vorbis` - You can
287-
* optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz.
304+
* `audio/ogg;codecs=opus` - You can optionally specify the `rate` of the audio. Only the
305+
* following values are valid sampling rates: `48000`, `24000`, `16000`, `12000`, or `8000`. If
306+
* you specify a value other than one of these, the service returns an error. The default sampling
307+
* rate is 48,000 Hz. * `audio/ogg;codecs=vorbis` - You can optionally specify the `rate` of the
308+
* audio. The default sampling rate is 22,050 Hz. * `audio/wav` - You can optionally specify the
309+
* `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/webm` - The service
310+
* returns the audio in the `opus` codec. The service returns audio with a sampling rate of 48,000
311+
* Hz. * `audio/webm;codecs=opus` - The service returns audio with a sampling rate of 48,000 Hz. *
312+
* `audio/webm;codecs=vorbis` - You can optionally specify the `rate` of the audio. The default
313+
* sampling rate is 22,050 Hz.
288314
*
289315
* <p>For more information about specifying an audio format, including additional details about
290316
* some of the formats, see [Using audio
@@ -319,60 +345,16 @@ public ServiceCall<InputStream> synthesize(SynthesizeOptions synthesizeOptions)
319345
if (synthesizeOptions.customizationId() != null) {
320346
builder.query("customization_id", String.valueOf(synthesizeOptions.customizationId()));
321347
}
348+
if (synthesizeOptions.spellOutMode() != null) {
349+
builder.query("spell_out_mode", String.valueOf(synthesizeOptions.spellOutMode()));
350+
}
322351
final JsonObject contentJson = new JsonObject();
323352
contentJson.addProperty("text", synthesizeOptions.text());
324353
builder.bodyJson(contentJson);
325354
ResponseConverter<InputStream> responseConverter = ResponseConverterUtils.getInputStream();
326355
return createServiceCall(builder.build(), responseConverter);
327356
}
328357

329-
/**
330-
* Synthesize audio.
331-
*
332-
* <p>Synthesizes text to audio that is spoken in the specified voice. The service bases its
333-
* understanding of the language for the input text on the specified voice. Use a voice that
334-
* matches the language of the input text.
335-
*
336-
* <p>The method accepts a maximum of 5 KB of input text in the body of the request, and 8 KB for
337-
* the URL and headers. The 5 KB limit includes any SSML tags that you specify. The service
338-
* returns the synthesized audio stream as an array of bytes.
339-
*
340-
* <p>### Audio formats (accept types)
341-
*
342-
* <p>For more information about specifying an audio format, including additional details about
343-
* some of the formats, see [Audio
344-
* formats](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-audioFormats#audioFormats).
345-
*
346-
* @param synthesizeOptions the {@link SynthesizeOptions} containing the options for the call
347-
* @param callback the {@link SynthesizeCallback} callback
348-
* @return a {@link WebSocket} instance
349-
*/
350-
public WebSocket synthesizeUsingWebSocket(
351-
SynthesizeOptions synthesizeOptions, SynthesizeCallback callback) {
352-
com.ibm.cloud.sdk.core.util.Validator.notNull(
353-
synthesizeOptions, "synthesizeOptions cannot be null");
354-
com.ibm.cloud.sdk.core.util.Validator.notNull(callback, "callback cannot be null");
355-
356-
HttpUrl.Builder urlBuilder = HttpUrl.parse(getServiceUrl() + "/v1/synthesize").newBuilder();
357-
358-
if (synthesizeOptions.voice() != null) {
359-
urlBuilder.addQueryParameter("voice", synthesizeOptions.voice());
360-
}
361-
if (synthesizeOptions.customizationId() != null) {
362-
urlBuilder.addQueryParameter("customization_id", synthesizeOptions.customizationId());
363-
}
364-
365-
String url = urlBuilder.toString().replace("https://", "wss://");
366-
Request.Builder builder = new Request.Builder().url(url);
367-
368-
setAuthentication(builder);
369-
setDefaultHeaders(builder);
370-
371-
OkHttpClient client = configureHttpClient();
372-
return client.newWebSocket(
373-
builder.build(), new TextToSpeechWebSocketListener(synthesizeOptions, callback));
374-
}
375-
376358
/**
377359
* Get pronunciation.
378360
*
@@ -381,14 +363,17 @@ public WebSocket synthesizeUsingWebSocket(
381363
* default translation for the language of that voice or for a specific custom model to see the
382364
* translation for that model.
383365
*
366+
* <p>**Note:** Effective 31 March 2022, all neural voices are deprecated. The deprecated voices
367+
* remain available to existing users until 31 March 2023, when they will be removed from the
368+
* service and the documentation. The neural voices are supported only for IBM Cloud; they are not
369+
* available for IBM Cloud Pak for Data. All enhanced neural voices remain available to all users.
370+
* For more information, see the [31 March 2022 service
371+
* update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#text-to-speech-31march2022)
372+
* in the release notes.
373+
*
384374
* <p>**See also:** [Querying a word from a
385375
* language](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuWordsQueryLanguage).
386376
*
387-
* <p>**Note:** The Arabic, Chinese, Czech, Dutch (Belgian and Netherlands), Australian English,
388-
* Korean, and Swedish languages and voices are supported only for IBM Cloud; they are deprecated
389-
* for IBM Cloud Pak for Data. Also, the `ar-AR_OmarVoice` voice is deprecated; use the
390-
* `ar-MS_OmarVoice` voice instead.
391-
*
392377
* @param getPronunciationOptions the {@link GetPronunciationOptions} containing the options for
393378
* the call
394379
* @return a {@link ServiceCall} with a result of type {@link Pronunciation}
@@ -431,10 +416,13 @@ public ServiceCall<Pronunciation> getPronunciation(
431416
* <p>**See also:** [Creating a custom
432417
* model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customModels#cuModelsCreate).
433418
*
434-
* <p>**Note:** The Arabic, Chinese, Czech, Dutch (Belgian and Netherlands), Australian English,
435-
* Korean, and Swedish languages and voices are supported only for IBM Cloud; they are deprecated
436-
* for IBM Cloud Pak for Data. Also, the `ar-AR` language identifier cannot be used to create a
437-
* custom model; use the `ar-MS` identifier instead.
419+
* <p>**Note:** Effective 31 March 2022, all neural voices are deprecated. The deprecated voices
420+
* remain available to existing users until 31 March 2023, when they will be removed from the
421+
* service and the documentation. The neural voices are supported only for IBM Cloud; they are not
422+
* available for IBM Cloud Pak for Data. All enhanced neural voices remain available to all users.
423+
* For more information, see the [31 March 2022 service
424+
* update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#text-to-speech-31march2022)
425+
* in the release notes.
438426
*
439427
* @param createCustomModelOptions the {@link CreateCustomModelOptions} containing the options for
440428
* the call

0 commit comments

Comments
 (0)