Skip to content

Commit a4f34a8

Browse files
authored
Merge pull request #2680 from Avs163/achves-feature-apigw-lambda-transcribe-sam-js
New serverless pattern - lambda-transcribe-sam-js
2 parents 7e746d8 + 6b70fa9 commit a4f34a8

File tree

5 files changed

+415
-0
lines changed

5 files changed

+415
-0
lines changed

lambda-transcribe-sam-js/README.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Amazon S3 to AWS Lambda to Amazon Transcribe using AWS SAM
2+
3+
This pattern facilitates automatic audio transcription by using the Amazon Transcribe service through a serverless event-driven architecture. When audio files are uploaded to S3, they are automatically transcribed using Amazon Transcribe via a Lambda function invoked by S3 events.
4+
5+
This pattern enables speech-to-text transcription use cases by providing a serverless event-based pipeline that can process audio files uploaded to S3. The pattern uses AWS Lambda to coordinate with the Amazon Transcribe service, making it easy to integrate transcription capabilities into your applications without the need to manage infrastructure or manually initiate transcription jobs.
6+
7+
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.
8+
9+
## Requirements
10+
11+
* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
12+
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
13+
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
14+
* [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed
15+
16+
## Deployment Instructions
17+
18+
1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
19+
```
20+
git clone https://github.com/aws-samples/serverless-patterns
21+
```
22+
1. Change directory to the pattern directory:
23+
```
24+
cd lambda-transcribe-sam-js
25+
```
26+
1. From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yaml file:
27+
```
28+
sam deploy --guided
29+
```
30+
1. During the prompts:
31+
* Enter a stack name
32+
* Enter the desired AWS Region
33+
* Allow SAM CLI to create IAM roles with the required permissions.
34+
* Allow TranscribeFunction to operate without authentication.
35+
After running `sam deploy --guided` mode once and savings arguments to a configuration file (samconfig.toml), you can use `sam deploy` in future to use these defaults.
36+
37+
2. Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used for testing.
38+
39+
## How it works
40+
41+
When an object is uploaded to S3:
42+
43+
2. Lambda function starts a transcription job using Amazon Transcribe
44+
3. Amazon Transcribe processes the audio file and generates the transcription
45+
4. The transcription results are stored in the specified S3 bucket
46+
47+
## Testing
48+
49+
To test the deployed API endpoint:
50+
51+
1. Upload an audio file to the created S3 bucket(Note: Bucket name would be {AWS::StackName}-audio-uploads):
52+
```
53+
aws s3 cp audio.mp3 s3://your-bucket-name/
54+
```
55+
2. Get the S3 URL of the uploaded audio file
56+
3. You can list all transcription jobs using:
57+
58+
```bash
59+
aws transcribe list-transcription-jobs
60+
```
61+
4. You can check the transcription results in the S3 bucket once the job is complete:
62+
63+
```bash
64+
aws transcribe get-transcription-job --transcription-job-name "job-name-from-response"
65+
```
66+
## Cleanup
67+
68+
1. Delete the audio.mp3 file from the S3 bucket. This is because the bucket must be empty before it can be deleted.
69+
```bash
70+
aws s3 rm "s3://your-bucket-name/audio.mp3"
71+
```
72+
2. Delete the stack
73+
```bash
74+
sam delete
75+
```
76+
77+
----
78+
Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved.
79+
80+
SPDX-License-Identifier: MIT-0
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
{
2+
"title": "Amazon S3 to AWS Lambda to Amazon Transcribe using AWS SAM",
3+
"description": "This pattern creates a serverless pipeline that automatically triggers an Amazon Transcribe job when audio files are uploaded to an S3 bucket, using Lambda to orchestrate the transcription process.",
4+
"language": "Javascript",
5+
"level": "200",
6+
"framework": "SAM",
7+
"introBox": {
8+
"headline": "How it works",
9+
"text": [
10+
"When audio files are uploaded to the S3 bucket, it automatically triggers a Lambda function that starts a transcription job using Amazon Transcribe. The Lambda function extracts information about the uploaded audio file and sends it to the Amazon Transcribe service, which processes the audio and generates a text transcription that is saved back to the specified S3 bucket in the "transcriptions" folder."
11+
]
12+
},
13+
"gitHub": {
14+
"template": {
15+
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/lambda-transcribe-sam-js",
16+
"templateURL": "serverless-patterns/lambda-transcribe-sam-js",
17+
"projectFolder": "lambda-transcribe-sam-js",
18+
"templateFile": "template.yaml"
19+
}
20+
},
21+
"resources": {
22+
"bullets": [
23+
{
24+
"text": "Amazon S3",
25+
"link": "https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html"
26+
},
27+
{
28+
"text": "AWS Lambda",
29+
"link": "https://docs.aws.amazon.com/lambda/latest/dg/welcome.html"
30+
},
31+
{
32+
"text": "Amazon Transcribe",
33+
"link": "https://docs.aws.amazon.com/transcribe/latest/dg/what-is.html"
34+
}
35+
]
36+
},
37+
"deploy": {
38+
"text": [
39+
"sam build",
40+
"sam deploy --guided"
41+
]
42+
},
43+
"testing": {
44+
"text": [
45+
"See the GitHub repo for detailed testing instructions."
46+
]
47+
},
48+
"cleanup": {
49+
"text": [
50+
"sam delete"
51+
]
52+
},
53+
"authors": [
54+
{
55+
"name": "Achintya Veer Singh",
56+
"image": "https://avatars.githubusercontent.com/u/55053737?v=4",
57+
"bio": "Solutions Architect @ AWS",
58+
"linkedin": "www.linkedin.com/in/achintya-veer-singh-493403193",
59+
"twitter": "achintya_veer"
60+
}
61+
]
62+
}
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
{
2+
"title": "Amazon S3 to AWS Lambda to Amazon Transcribe using AWS SAM",
3+
"description": "This pattern creates a serverless pipeline that automatically invokes an Amazon Transcribe job when audio files are uploaded to an S3 bucket.",
4+
"language": "Node.js",
5+
"level": "200",
6+
"framework": "SAM",
7+
"introBox": {
8+
"headline": "How it works",
9+
"text": [
10+
"When audio files are uploaded to the S3 bucket, an AWS Lambda function starts a transcription job using Amazon Transcribe. The Lambda function extracts information about the uploaded audio file and sends it to the Amazon Transcribe service, which processes the audio and generates a text transcription that is saved back to the specified S3 bucket."
11+
]
12+
},
13+
"gitHub": {
14+
"template": {
15+
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/lambda-transcribe-sam-js",
16+
"templateURL": "serverless-patterns/lambda-transcribe-sam-js",
17+
"projectFolder": "lambda-transcribe-sam-js",
18+
"templateFile": "template.yaml"
19+
}
20+
},
21+
"resources": {
22+
"bullets": [
23+
{
24+
"text": "Amazon S3",
25+
"link": "https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html"
26+
},
27+
{
28+
"text": "AWS Lambda",
29+
"link": "https://docs.aws.amazon.com/lambda/latest/dg/welcome.html"
30+
},
31+
{
32+
"text": "Amazon Transcribe",
33+
"link": "https://docs.aws.amazon.com/transcribe/latest/dg/what-is.html"
34+
}
35+
]
36+
},
37+
"deploy": {
38+
"text": [
39+
"sam build",
40+
"sam deploy --guided"
41+
]
42+
},
43+
"testing": {
44+
"text": [
45+
"See the GitHub repo for detailed testing instructions."
46+
]
47+
},
48+
"cleanup": {
49+
"text": [
50+
"sam delete"
51+
]
52+
},
53+
"authors": [
54+
{
55+
"name": "Achintya Veer Singh",
56+
"image": "https://avatars.githubusercontent.com/u/55053737?v=4",
57+
"bio": "Solutions Architect @ AWS",
58+
"linkedin": "www.linkedin.com/in/achintya-veer-singh-493403193",
59+
"twitter": "achintya_veer"
60+
}
61+
],
62+
"patternArch": {
63+
"icon1": {
64+
"x": 10,
65+
"y": 50,
66+
"service": "s3",
67+
"label": "Amazon S3"
68+
},
69+
"icon2": {
70+
"x": 40,
71+
"y": 50,
72+
"service": "lambda",
73+
"label": "AWS Lambda"
74+
},
75+
"icon3": {
76+
"x": 65,
77+
"y": 50,
78+
"service": "transcribe",
79+
"label": "Amazon Transcribe"
80+
},
81+
"icon4": {
82+
"x": 90,
83+
"y": 50,
84+
"service": "s3",
85+
"label": "Amazon S3"
86+
},
87+
"line1": {
88+
"from": "icon1",
89+
"to": "icon2",
90+
"label": "Object Created"
91+
},
92+
"line2": {
93+
"from": "icon2",
94+
"to": "icon3"
95+
},
96+
"line3": {
97+
"from": "icon3",
98+
"to": "icon4"
99+
}
100+
}
101+
}
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';
2+
import { TranscribeClient, StartTranscriptionJobCommand } from '@aws-sdk/client-transcribe';
3+
4+
const s3Client = new S3Client({ region: process.env.REGION });
5+
const transcribeClient = new TranscribeClient({ region: process.env.REGION });
6+
7+
export const handler = async (event) => {
8+
try {
9+
// Get the S3 bucket and key from the event
10+
const bucket = event.Records[0].s3.bucket.name;
11+
const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
12+
13+
console.log(`Processing file: s3://${bucket}/${key}`);
14+
15+
// Extract file name and extension for the transcription job
16+
const fileName = key.split('/').pop();
17+
const fileNameWithoutExt = fileName.substring(0, fileName.lastIndexOf('.')) || fileName;
18+
const fileExt = fileName.substring(fileName.lastIndexOf('.') + 1).toLowerCase();
19+
20+
// Determine media format based on file extension
21+
let mediaFormat;
22+
switch (fileExt) {
23+
case 'mp3':
24+
mediaFormat = 'mp3';
25+
break;
26+
case 'wav':
27+
mediaFormat = 'wav';
28+
break;
29+
case 'flac':
30+
mediaFormat = 'flac';
31+
break;
32+
default:
33+
throw new Error(`Unsupported file format: ${fileExt}`);
34+
}
35+
36+
const transcriptionJobName = `${fileNameWithoutExt}-${Date.now()}`;
37+
38+
const mediaFileUri = `s3://${bucket}/${key}`;
39+
40+
const startTranscriptionParams = {
41+
TranscriptionJobName: transcriptionJobName,
42+
LanguageCode: 'en-US',
43+
MediaFormat: mediaFormat,
44+
Media: {
45+
MediaFileUri: mediaFileUri
46+
},
47+
OutputBucketName: bucket,
48+
OutputKey: `transcriptions/${fileNameWithoutExt}.json`
49+
};
50+
51+
const transcriptionCommand = new StartTranscriptionJobCommand(startTranscriptionParams);
52+
const transcriptionResponse = await transcribeClient.send(transcriptionCommand);
53+
54+
console.log(`Started transcription job: ${transcriptionJobName}`);
55+
console.log(`Transcription job response: ${JSON.stringify(transcriptionResponse)}`);
56+
57+
return {
58+
statusCode: 200,
59+
body: JSON.stringify({
60+
message: 'Transcription job started successfully',
61+
jobName: transcriptionJobName,
62+
jobStatus: transcriptionResponse.TranscriptionJob.TranscriptionJobStatus
63+
})
64+
};
65+
} catch (error) {
66+
console.error('Error processing the file:', error);
67+
68+
return {
69+
statusCode: 500,
70+
body: JSON.stringify({
71+
message: 'Error starting transcription job',
72+
error: error.message
73+
})
74+
};
75+
}
76+
};

0 commit comments

Comments
 (0)