Skip to content

Commit fe9856d

Browse files
committed
Merge remote-tracking branch 'upstream/main' into add-cohere-provider
2 parents 3aeaa5f + 701d5d0 commit fe9856d

File tree

187 files changed

+16739
-14211
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

187 files changed

+16739
-14211
lines changed

.github/workflows/cicd.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,17 @@ jobs:
3535
runs-on: ubuntu-latest
3636
strategy:
3737
matrix:
38-
ruby-version: ['3.1', '3.2', '3.3', '3.4']
38+
ruby-version: ['3.1', '3.2', '3.3', '3.4', 'jruby-head']
3939
rails-version: ['rails-7.1', 'rails-7.2', 'rails-8.0']
4040
exclude:
4141
# Rails 8 requires Ruby 3.2+
4242
- ruby-version: '3.1'
4343
rails-version: 'rails-8.0'
44+
# JRuby only supports up to 7.1 right now
45+
- ruby-version: 'jruby-head'
46+
rails-version: 'rails-8.0'
47+
- ruby-version: 'jruby-head'
48+
rails-version: 'rails-7.2'
4449

4550
steps:
4651
- uses: actions/checkout@v4

Gemfile

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,12 @@ group :development do
2424
gem 'rubocop-rspec'
2525
gem 'simplecov', '>= 0.21'
2626
gem 'simplecov-cobertura'
27-
gem 'sqlite3'
27+
28+
# database drivers for MRI and JRuby
29+
gem 'activerecord-jdbcsqlite3-adapter', platform: 'jruby'
30+
gem 'jdbc-sqlite3', platform: 'jruby'
31+
gem 'sqlite3', platform: 'mri'
32+
2833
gem 'vcr'
2934
gem 'webmock', '~> 3.18'
3035
gem 'yard', '>= 0.9'

bin/console

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ require 'irb'
99

1010
RubyLLM.configure do |config|
1111
config.openai_api_key = ENV.fetch('OPENAI_API_KEY', nil)
12+
config.openai_api_base = ENV.fetch('OPENAI_API_BASE', nil)
1213
config.anthropic_api_key = ENV.fetch('ANTHROPIC_API_KEY', nil)
1314
config.gemini_api_key = ENV.fetch('GEMINI_API_KEY', nil)
1415
config.deepseek_api_key = ENV.fetch('DEEPSEEK_API_KEY', nil)

docs/_config.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ nav_external_links:
2323
- title: GitHub
2424
url: https://github.com/crmne/ruby_llm
2525
hide_icon: false
26+
- title: Blog
27+
url: https://paolino.me
28+
hide_icon: false
2629

2730
# Footer content
2831
footer_content: "Copyright &copy; 2025 <a href='https://paolino.me'>Carmine Paolino</a>. Distributed under an <a href=\"https://github.com/crmne/ruby_llm/tree/main/LICENSE\">MIT license.</a>"

docs/configuration.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,9 +107,13 @@ Set the corresponding `*_api_key` attribute for each provider you want to enable
107107
* `deepseek_api_key`
108108
* `cohere_api_key`
109109
* `openrouter_api_key`
110-
* `ollama_api_base`
111110
* `bedrock_api_key`, `bedrock_secret_key`, `bedrock_region`, `bedrock_session_token` (See AWS documentation for standard credential methods if not set explicitly).
112111

112+
## Ollama API Base (`ollama_api_base`)
113+
114+
When using a local model running via Ollama, set the `ollama_api_base` to the URL of your Ollama server, e.g. `http://localhost:11434/v1`
115+
116+
113117
## Custom OpenAI API Base (`openai_api_base`)
114118
{: .d-inline-block }
115119

docs/guides/async.md

Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
---
2+
layout: default
3+
title: Scale with Async
4+
parent: Guides
5+
nav_order: 10
6+
permalink: /guides/async
7+
---
8+
9+
# Scale with Async
10+
{: .no_toc }
11+
12+
This guide covers using RubyLLM with Ruby's async ecosystem for handling thousands of concurrent AI conversations efficiently.
13+
{: .fs-6 .fw-300 }
14+
15+
## Table of contents
16+
{: .no_toc .text-delta }
17+
18+
1. TOC
19+
{:toc}
20+
21+
---
22+
23+
After reading this guide, you will know:
24+
25+
* Why LLM applications benefit dramatically from async Ruby
26+
* How RubyLLM automatically works with async
27+
* How to perform concurrent LLM operations
28+
* How to use async-job for background processing
29+
* How to handle rate limits with semaphores
30+
31+
For a deeper dive into Async, Threads, and why Async Ruby is perfect for LLM applications, including benchmarks and architectural comparisons, check out my blog post: [Async Ruby is the Future of AI Apps (And It's Already Here)](https://paolino.me/async-ruby-is-the-future/)
32+
33+
## Why Async for LLMs?
34+
35+
LLM operations are unique - they take 5-60 seconds and spend 99% of that time waiting for tokens to stream back. Using traditional thread-based job queues (Sidekiq, GoodJob, SolidQueue) for LLM operations creates a problem:
36+
37+
```ruby
38+
# With 25 worker threads configured:
39+
class ChatResponseJob < ApplicationJob
40+
def perform(conversation_id, message)
41+
# This occupies 1 of your 25 slots for 30-60 seconds...
42+
response = RubyLLM.chat.ask(message)
43+
# ...even though the thread is 99% idle
44+
end
45+
end
46+
47+
# Your 26th user? They're waiting in line.
48+
```
49+
50+
Async solves this by using fibers instead of threads:
51+
- **Threads**: OS-managed, preemptive, heavy (each needs its own database connection)
52+
- **Fibers**: Userspace, cooperative, lightweight (thousands can share a few connections)
53+
54+
## How RubyLLM Works with Async
55+
56+
The beautiful part: RubyLLM automatically becomes non-blocking when used in an async context. No configuration needed.
57+
58+
```ruby
59+
require 'async'
60+
require 'ruby_llm'
61+
62+
# This is all you need for concurrent LLM calls
63+
Async do
64+
10.times.map do
65+
Async do
66+
# RubyLLM automatically becomes non-blocking
67+
# because Net::HTTP knows how to yield to fibers
68+
message = RubyLLM.chat.ask "Explain quantum computing"
69+
puts message.content
70+
end
71+
end.map(&:wait)
72+
end
73+
```
74+
75+
This works because RubyLLM uses `Net::HTTP`, which cooperates with Ruby's fiber scheduler.
76+
77+
## Concurrent Operations
78+
79+
### Multiple Chat Requests
80+
81+
Process multiple questions concurrently:
82+
83+
```ruby
84+
require 'async'
85+
require 'ruby_llm'
86+
87+
def process_questions(questions)
88+
Async do
89+
tasks = questions.map do |question|
90+
Async do
91+
response = RubyLLM.chat.ask(question)
92+
{ question: question, answer: response.content }
93+
end
94+
end
95+
96+
# Wait for all tasks and return results
97+
tasks.map(&:wait)
98+
end.result
99+
end
100+
101+
questions = [
102+
"What is Ruby?",
103+
"Explain metaprogramming",
104+
"What are symbols?"
105+
]
106+
107+
results = process_questions(questions)
108+
results.each do |result|
109+
puts "Q: #{result[:question]}"
110+
puts "A: #{result[:answer]}\n\n"
111+
end
112+
```
113+
114+
### Batch Embeddings
115+
116+
Generate embeddings efficiently:
117+
118+
```ruby
119+
def generate_embeddings(texts, batch_size: 100)
120+
Async do
121+
embeddings = []
122+
123+
texts.each_slice(batch_size) do |batch|
124+
task = Async do
125+
response = RubyLLM.embed(batch)
126+
response.vectors
127+
end
128+
embeddings.concat(task.wait)
129+
end
130+
131+
# Return text-embedding pairs
132+
texts.zip(embeddings)
133+
end.result
134+
end
135+
136+
texts = ["Ruby is great", "Python is good", "JavaScript is popular"]
137+
pairs = generate_embeddings(texts)
138+
pairs.each do |text, embedding|
139+
puts "#{text}: #{embedding[0..5]}..." # Show first 6 dimensions
140+
end
141+
```
142+
143+
### Parallel Analysis
144+
145+
Run multiple analyses concurrently:
146+
147+
```ruby
148+
def analyze_document(content)
149+
Async do
150+
summary_task = Async do
151+
RubyLLM.chat.ask("Summarize in one sentence: #{content}")
152+
end
153+
154+
sentiment_task = Async do
155+
RubyLLM.chat.ask("Is this positive or negative: #{content}")
156+
end
157+
158+
{
159+
summary: summary_task.wait.content,
160+
sentiment: sentiment_task.wait.content
161+
}
162+
end.result
163+
end
164+
165+
result = analyze_document("Ruby is an amazing language with a wonderful community!")
166+
puts "Summary: #{result[:summary]}"
167+
puts "Sentiment: #{result[:sentiment]}"
168+
```
169+
170+
## Background Processing with `Async::Job`
171+
172+
The real power comes from using `Async::Job` for background processing. Your existing Active Job code doesn't need to change!
173+
174+
### Installation
175+
176+
```ruby
177+
# Gemfile
178+
gem 'async-job-adapter-active_job'
179+
180+
# config/application.rb
181+
config.active_job.queue_adapter = :async_job
182+
```
183+
184+
### Your Jobs Work Unchanged
185+
186+
Here's the key insight: you don't need to modify your jobs at all. `Async::Job` runs each job inside an async context automatically:
187+
188+
```ruby
189+
class DocumentAnalyzerJob < ApplicationJob
190+
def perform(document_id)
191+
document = Document.find(document_id)
192+
193+
# This automatically runs in an async context!
194+
# No need to wrap in Async blocks
195+
response = RubyLLM.chat.ask("Analyze: #{document.content}")
196+
197+
document.update!(
198+
analysis: response.content,
199+
analyzed_at: Time.current
200+
)
201+
end
202+
end
203+
```
204+
205+
### Using Different Adapters for Different Jobs
206+
207+
You might want to use `Async::Job` only for LLM operations while keeping CPU-intensive work on traditional adapters:
208+
209+
```ruby
210+
# Base job for LLM operations
211+
class LLMJob < ApplicationJob
212+
self.queue_adapter = :async_job
213+
end
214+
215+
# All LLM jobs inherit from this
216+
class ChatResponseJob < LLMJob
217+
def perform(conversation_id, message)
218+
# Runs with async-job adapter
219+
response = RubyLLM.chat.ask(message)
220+
# ...
221+
end
222+
end
223+
224+
# CPU-intensive jobs use default adapter (e.g., Sidekiq)
225+
class ImageProcessingJob < ApplicationJob
226+
def perform(image_id)
227+
# Runs with your default adapter
228+
# ...
229+
end
230+
end
231+
```
232+
233+
## Rate Limiting with Semaphores
234+
235+
When making many concurrent requests, use a semaphore to respect rate limits:
236+
237+
```ruby
238+
require 'async'
239+
require 'async/semaphore'
240+
241+
class RateLimitedProcessor
242+
def initialize(max_concurrent: 10)
243+
@semaphore = Async::Semaphore.new(max_concurrent)
244+
end
245+
246+
def process_items(items)
247+
Async do
248+
items.map do |item|
249+
Async do
250+
# Only 10 items processed at once
251+
@semaphore.acquire do
252+
response = RubyLLM.chat.ask("Process: #{item}")
253+
{ item: item, result: response.content }
254+
end
255+
end
256+
end.map(&:wait)
257+
end.result
258+
end
259+
end
260+
261+
# Usage
262+
processor = RateLimitedProcessor.new(max_concurrent: 5)
263+
items = ["Item 1", "Item 2", "Item 3", "Item 4", "Item 5", "Item 6"]
264+
results = processor.process_items(items)
265+
```
266+
267+
The semaphore ensures only 5 requests run concurrently, preventing rate limit errors while still maintaining high throughput.
268+
269+
## Summary
270+
271+
Key takeaways:
272+
273+
- LLM operations are perfect for async (99% waiting for I/O)
274+
- RubyLLM automatically works with async - no configuration needed
275+
- Use async-job for LLM background jobs without changing your job code
276+
- Use semaphores to manage rate limits
277+
- Keep thread-based processors for CPU-intensive work
278+
279+
The combination of RubyLLM and async Ruby gives you the ability to handle thousands of concurrent AI conversations on modest hardware - something that would require massive infrastructure with traditional thread-based approaches.
280+
281+
Ready to dive deeper? Read the full architectural comparison: [Async Ruby is the Future of AI Apps](https://paolino.me/async-ruby-is-the-future/)

0 commit comments

Comments
 (0)