Skip to content

Commit a8d2208

Browse files
committed
docs: add TTS migration guide for 1.1.0-RC1
- Add comprehensive migration guide with find/replace patterns - Include side-by-side code examples (old vs new API) - Add provider-agnostic usage examples and best practices - Document breaking changes in 1.1.0-RC1 upgrade notes
1 parent 2b195b3 commit a8d2208

File tree

4 files changed

+630
-11
lines changed

4 files changed

+630
-11
lines changed

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/audio/speech.adoc

Lines changed: 214 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,11 @@ byte[] audioBytes = response.getResult().getOutput();
8787
TextToSpeechResponseMetadata metadata = response.getMetadata();
8888
----
8989

90-
== Writing Portable Code
90+
== Writing Provider-Agnostic Code
91+
92+
One of the key benefits of the shared TTS interfaces is the ability to write code that works with any TTS provider without modification. The actual provider (OpenAI, ElevenLabs, etc.) is determined by your Spring Boot configuration, allowing you to switch providers without changing application code.
93+
94+
=== Basic Service Example
9195

9296
The shared interfaces allow you to write code that works with any TTS provider:
9397

@@ -103,6 +107,7 @@ public class NarrationService {
103107
}
104108
105109
public byte[] narrate(String text) {
110+
// Works with any TTS provider
106111
return textToSpeechModel.call(text);
107112
}
108113
@@ -116,6 +121,214 @@ public class NarrationService {
116121

117122
This service works seamlessly with OpenAI, ElevenLabs, or any other TTS provider, with the actual implementation determined by your Spring Boot configuration.
118123

124+
=== Advanced Example: Multi-Provider Support
125+
126+
You can build applications that support multiple TTS providers simultaneously:
127+
128+
[source,java]
129+
----
130+
@Service
131+
public class MultiProviderNarrationService {
132+
133+
private final Map<String, TextToSpeechModel> providers;
134+
135+
public MultiProviderNarrationService(List<TextToSpeechModel> models) {
136+
// Spring will inject all available TextToSpeechModel beans
137+
this.providers = models.stream()
138+
.collect(Collectors.toMap(
139+
model -> model.getClass().getSimpleName(),
140+
model -> model
141+
));
142+
}
143+
144+
public byte[] narrateWithProvider(String text, String providerName) {
145+
TextToSpeechModel model = providers.get(providerName);
146+
if (model == null) {
147+
throw new IllegalArgumentException("Unknown provider: " + providerName);
148+
}
149+
return model.call(text);
150+
}
151+
152+
public Set<String> getAvailableProviders() {
153+
return providers.keySet();
154+
}
155+
}
156+
----
157+
158+
=== Streaming Audio Example
159+
160+
The shared interfaces also support streaming for real-time audio generation:
161+
162+
[source,java]
163+
----
164+
@Service
165+
public class StreamingNarrationService {
166+
167+
private final TextToSpeechModel textToSpeechModel;
168+
169+
public StreamingNarrationService(TextToSpeechModel textToSpeechModel) {
170+
this.textToSpeechModel = textToSpeechModel;
171+
}
172+
173+
public Flux<byte[]> streamNarration(String text) {
174+
// TextToSpeechModel extends StreamingTextToSpeechModel
175+
return textToSpeechModel.stream(text);
176+
}
177+
178+
public Flux<TextToSpeechResponse> streamWithMetadata(String text, TextToSpeechOptions options) {
179+
TextToSpeechPrompt prompt = new TextToSpeechPrompt(text, options);
180+
return textToSpeechModel.stream(prompt);
181+
}
182+
}
183+
----
184+
185+
=== REST Controller Example
186+
187+
Building a REST API with provider-agnostic TTS:
188+
189+
[source,java]
190+
----
191+
@RestController
192+
@RequestMapping("/api/tts")
193+
public class TextToSpeechController {
194+
195+
private final TextToSpeechModel textToSpeechModel;
196+
197+
public TextToSpeechController(TextToSpeechModel textToSpeechModel) {
198+
this.textToSpeechModel = textToSpeechModel;
199+
}
200+
201+
@PostMapping(value = "/synthesize", produces = "audio/mpeg")
202+
public ResponseEntity<byte[]> synthesize(@RequestBody SynthesisRequest request) {
203+
byte[] audio = textToSpeechModel.call(request.text());
204+
return ResponseEntity.ok()
205+
.contentType(MediaType.parseMediaType("audio/mpeg"))
206+
.header("Content-Disposition", "attachment; filename=\"speech.mp3\"")
207+
.body(audio);
208+
}
209+
210+
@GetMapping(value = "/stream", produces = MediaType.APPLICATION_OCTET_STREAM_VALUE)
211+
public Flux<byte[]> streamSynthesis(@RequestParam String text) {
212+
return textToSpeechModel.stream(text);
213+
}
214+
215+
record SynthesisRequest(String text) {}
216+
}
217+
----
218+
219+
=== Configuration-Based Provider Selection
220+
221+
Switch between providers using Spring profiles or properties:
222+
223+
[source,yaml]
224+
----
225+
# application-openai.yml
226+
spring:
227+
ai:
228+
model:
229+
audio:
230+
speech: openai
231+
openai:
232+
api-key: ${OPENAI_API_KEY}
233+
audio:
234+
speech:
235+
options:
236+
model: gpt-4o-mini-tts
237+
voice: alloy
238+
239+
# application-elevenlabs.yml
240+
spring:
241+
ai:
242+
model:
243+
audio:
244+
speech: elevenlabs
245+
elevenlabs:
246+
api-key: ${ELEVENLABS_API_KEY}
247+
tts:
248+
options:
249+
model-id: eleven_turbo_v2_5
250+
voice-id: your_voice_id
251+
----
252+
253+
Then activate the desired provider:
254+
[source,bash]
255+
----
256+
# Use OpenAI
257+
java -jar app.jar --spring.profiles.active=openai
258+
259+
# Use ElevenLabs
260+
java -jar app.jar --spring.profiles.active=elevenlabs
261+
----
262+
263+
=== Using Portable Options
264+
265+
For maximum portability, use only the common `TextToSpeechOptions` interface methods:
266+
267+
[source,java]
268+
----
269+
@Service
270+
public class PortableNarrationService {
271+
272+
private final TextToSpeechModel textToSpeechModel;
273+
274+
public PortableNarrationService(TextToSpeechModel textToSpeechModel) {
275+
this.textToSpeechModel = textToSpeechModel;
276+
}
277+
278+
public byte[] createPortableNarration(String text) {
279+
// Use provider's default options for maximum portability
280+
TextToSpeechOptions defaultOptions = textToSpeechModel.getDefaultOptions();
281+
TextToSpeechPrompt prompt = new TextToSpeechPrompt(text, defaultOptions);
282+
TextToSpeechResponse response = textToSpeechModel.call(prompt);
283+
return response.getResult().getOutput();
284+
}
285+
}
286+
----
287+
288+
=== Working with Provider-Specific Features
289+
290+
When you need provider-specific features, you can still use them while maintaining a portable codebase:
291+
292+
[source,java]
293+
----
294+
@Service
295+
public class FlexibleNarrationService {
296+
297+
private final TextToSpeechModel textToSpeechModel;
298+
299+
public FlexibleNarrationService(TextToSpeechModel textToSpeechModel) {
300+
this.textToSpeechModel = textToSpeechModel;
301+
}
302+
303+
public byte[] narrate(String text, TextToSpeechOptions baseOptions) {
304+
TextToSpeechOptions options = baseOptions;
305+
306+
// Apply provider-specific optimizations if available
307+
if (textToSpeechModel instanceof OpenAiAudioSpeechModel) {
308+
options = OpenAiAudioSpeechOptions.builder()
309+
.from(baseOptions)
310+
.model("gpt-4o-tts") // OpenAI-specific: use high-quality model
311+
.speed(1.0)
312+
.build();
313+
} else if (textToSpeechModel instanceof ElevenLabsTextToSpeechModel) {
314+
// ElevenLabs-specific options could go here
315+
}
316+
317+
TextToSpeechPrompt prompt = new TextToSpeechPrompt(text, options);
318+
TextToSpeechResponse response = textToSpeechModel.call(prompt);
319+
return response.getResult().getOutput();
320+
}
321+
}
322+
----
323+
324+
=== Best Practices for Portable Code
325+
326+
1. **Depend on Interfaces**: Always inject `TextToSpeechModel` rather than concrete implementations
327+
2. **Use Common Options**: Stick to `TextToSpeechOptions` interface methods for maximum portability
328+
3. **Handle Metadata Gracefully**: Different providers return different metadata; handle it generically
329+
4. **Test with Multiple Providers**: Ensure your code works with at least two TTS providers
330+
5. **Document Provider Assumptions**: If you rely on specific provider behavior, document it clearly
331+
119332
== Provider-Specific Features
120333

121334
While the shared interfaces provide portability, each provider also offers specific features through provider-specific options classes (e.g., `OpenAiAudioSpeechOptions`, `ElevenLabsSpeechOptions`). These classes implement the `TextToSpeechOptions` interface while adding provider-specific capabilities.

0 commit comments

Comments
 (0)