Skip to content

Commit 45d09be

Browse files
raobermancontentshy982
authored
add pdfspeak example to community (#277)
* add pdfspeak to community * adding spdx * more spdx * Webapp SPDX licenses --------- Co-authored-by: content <roberman@x-fx-97-3.sekgr4zwtyaungdfswqvypcvif.xx.internal.cloudapp.net> Co-authored-by: shy982 <shyam.9201.08@gmail.com>
1 parent 064b21f commit 45d09be

File tree

129 files changed

+31947
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

129 files changed

+31947
-0
lines changed

community/pdfspeak/.env

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
2+
#these should all be the same value
3+
NVIDIA_API_KEY=""
4+
NV_API_KEY=""
5+
OPENAI_API_KEY=""
6+
#set this with your NGC key
7+
NGC_API_KEY=""
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
FROM alpine:latest
17+
18+
RUN apk add --no-cache bash openssl
19+
20+
COPY generate-certs.sh /generate-certs.sh
21+
RUN chmod +x /generate-certs.sh
22+
23+
CMD ["/generate-certs.sh"]

community/pdfspeak/README.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
3+
All rights reserved.
4+
SPDX-License-Identifier: Apache-2.0
5+
-->
6+
7+
8+
## PDFSpeak: Unlocking Multimodal PDF Intelligence through Speech
9+
10+
PDFSpeak is an innovative approach to interacting with complex PDF documents using NVIDIA's cutting-edge AI technologies through speech, vision, and text.
11+
12+
### Table of Contents
13+
1. [Introduction](#introduction)
14+
2. [Prerequisites](#prerequisites)
15+
3. [Setting up PDFSpeak](#Setting-up-PDFSpeak)
16+
17+
## Introduction
18+
19+
### What PDFSpeak Is ✔️
20+
21+
It is a cohesive solution enabling you to talk to your pdf with a familiar chat UI of the webapp which connects to which you can upload your pdf. You can then ask your queries out load, which are converted to prompts by [RIVA ASR pipeline](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html).; a GPU-accelerated compute pipeline, with optimized performance and accuracy. The prompt, along with the the pdf, reaches NV-Ingest.
22+
NVIDIA-Ingest is a scalable, performance-oriented document content and metadata extraction microservice. It enables parallelization of the process of splitting documents into pages where contents are classified (as tables, charts, images, text), extracted into discrete content, and further contextualized via optical character recognition (OCR) into a well defined JSON schema. From there, NVIDIA Ingest can optionally manage computation of embeddings for the extracted content, and also optionally manage storing into a vector database [Milvus](https://milvus.io/).
23+
The textual response from NV-Ingest then goes to [RIVA TTS pipeline](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html); a two-stage pipeline, generateing a mel-spectrogram using the first model, and then generating speech using the second model. This speech response is further processed by the webapp to be played back to you.
24+
25+
## Prerequisites
26+
27+
### Hardware
28+
29+
| GPU | Family | Memory | # of GPUs (min.) |
30+
| ------ | ------ | ------ | ------ |
31+
| A100 | SXM or PCIe | 80GB | 4 |
32+
33+
### Software
34+
35+
- Linux operating systems (Ubuntu 22.04 or later recommended)
36+
- [Docker](https://docs.docker.com/engine/install/)
37+
- [Docker Compose](https://docs.docker.com/compose/install/)
38+
- [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) (NVIDIA Driver >= `550`, CUDA >= `12.6`)
39+
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
40+
41+
> NOTE: The frontend-server container which leverages the `webapp/src/main/ui/web/pdf-speak/server` directory as the source is essentially a proxy server that ensures seamless SSL-secure streaming of audio from HTTPS frontend to the RIVA ASR container via gRPC. For more information on how this proxy was built, please refer to the published [RIVA Contact](https://github.com/nvidia-riva/sample-apps/tree/main/riva-contact) example project.
42+
43+
## Setting up PDFSpeak
44+
1. Git clone `nv-ingest` into the repository, and use commit `d0a3008c`:
45+
- `git clone https://github.com/NVIDIA/nv-ingest.git`
46+
- `cd nv-ingest`
47+
- `git checkout d0a3008c`
48+
- `cd ../`
49+
2. Make sure you are logged in to NGC via docker. Follow directions [here](https://docs.nvidia.com/launchpad/ai/base-command-coe/latest/bc-coe-docker-basics-step-02.html#logging-in-to-ngc-on-a-workstation).
50+
3. Add your NV_API_KEY, NVIDIA_API_KEY, OPENAI_API_KEY environment variables to access NVIDIA NIM endpoints (all set to the same key) to the `.env` file. Also add your `NGC_API_KEY` to this file.
51+
> [Note]: To generate, `NGC_API_KEY`, follow [Generate API keys](docs/docs/user-guide/developer-guide/ngc-api-key.md).
52+
53+
> If you require early access (EA), your `NGC_API_KEY` key must be created as a member of `nemo-microservice / ea-participants` which you may join by applying for early access [here](https://developer.nvidia.com/nemo-microservices-early-access/join). When approved, switch your profile to this org / team, then the key you generate will have access to the resources outlined below.
54+
55+
4. Start the containers:
56+
57+
`docker compose up`
58+
59+
5. Check if all components of NV-Ingest is up and healthy
60+
61+
`curl http://172.17.0.1:7670/v1/health/ready`
62+
63+
5. Check if all containers are up with `docker ps`
64+
65+
6. To access the UI. Go to https://localhost:3002/ . Click on Advanced and Proceed (Unsafe) option [This warning can be safely ignored as it shows up if a self signed certificate is used on X-platform CORS]. JupyterLab with exercises will be available on http://localhost:8888/.

0 commit comments

Comments
 (0)