From f9452ee54e5b5a2a12e8b2ef3e75a1e4d5e1bdb1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Anna=20Szyma=C5=84ska?= Date: Thu, 25 Sep 2025 13:58:39 +0200 Subject: [PATCH] Update links, use warnings --- .../01-fundamentals/01-getting-started.md | 2 +- .../01-natural-language-processing/useLLM.md | 2 +- .../useSpeechToText.md | 2 +- .../useTextEmbeddings.md | 6 +- .../02-computer-vision/useClassification.md | 6 +- .../02-computer-vision/useImageEmbeddings.md | 6 +- .../useImageSegmentation.md | 10 ++-- .../02-hooks/02-computer-vision/useOCR.md | 6 +- .../02-computer-vision/useObjectDetection.md | 6 +- .../02-computer-vision/useStyleTransfer.md | 6 +- .../02-computer-vision/useTextToImage.md | 2 +- .../02-computer-vision/useVerticalOCR.md | 6 +- .../useExecutorchModule.md | 2 +- .../ImageSegmentationModule.md | 2 +- docs/docs/04-benchmarks/inference-time.md | 40 ++++++++----- docs/docs/04-benchmarks/memory-usage.md | 12 +++- docs/docs/04-benchmarks/model-size.md | 8 ++- .../benchmarks/inference-time.md | 2 +- .../benchmarks/inference-time.md | 2 +- .../01-fundamentals/01-getting-started.md | 2 +- .../01-natural-language-processing/useLLM.md | 4 +- .../useSpeechToText.md | 2 +- .../useTextEmbeddings.md | 6 +- .../02-computer-vision/useClassification.md | 6 +- .../02-computer-vision/useImageEmbeddings.md | 6 +- .../useImageSegmentation.md | 10 ++-- .../02-hooks/02-computer-vision/useOCR.md | 6 +- .../02-computer-vision/useObjectDetection.md | 6 +- .../02-computer-vision/useStyleTransfer.md | 6 +- .../02-computer-vision/useVerticalOCR.md | 6 +- .../useExecutorchModule.md | 2 +- .../ImageSegmentationModule.md | 2 +- .../04-benchmarks/inference-time.md | 60 +++++++++++-------- .../04-benchmarks/memory-usage.md | 10 ++++ .../version-0.5.x/04-benchmarks/model-size.md | 6 ++ 35 files changed, 160 insertions(+), 108 deletions(-) diff --git a/docs/docs/01-fundamentals/01-getting-started.md b/docs/docs/01-fundamentals/01-getting-started.md index b5d60c35b..18d845862 100644 --- a/docs/docs/01-fundamentals/01-getting-started.md +++ b/docs/docs/01-fundamentals/01-getting-started.md @@ -76,7 +76,7 @@ If you plan on using your models via require() instead of fetching them from a u This allows us to use binaries, such as exported models or tokenizers for LLMs. -:::caution +:::warning When using Expo, please note that you need to use a custom development build of your app, not the standard Expo Go app. This is because we rely on native modules, which Expo Go doesn’t support. ::: diff --git a/docs/docs/02-hooks/01-natural-language-processing/useLLM.md b/docs/docs/02-hooks/01-natural-language-processing/useLLM.md index f88979cc4..424242120 100644 --- a/docs/docs/02-hooks/01-natural-language-processing/useLLM.md +++ b/docs/docs/02-hooks/01-natural-language-processing/useLLM.md @@ -195,7 +195,7 @@ Sometimes, you might want to stop the model while it’s generating. To do this, There are also cases when you need to check if tokens are being generated, such as to conditionally render a stop button. We’ve made this easy with the `isGenerating` property. -:::caution +:::warning If you try to dismount the component using this hook while generation is still going on, it will result in crash. You'll need to interrupt the model first and wait until `isGenerating` is set to false. ::: diff --git a/docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md b/docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md index 8876bf37e..8b03f8a62 100644 --- a/docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md +++ b/docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md @@ -20,7 +20,7 @@ description: "Learn how to use speech-to-text models in your React Native applic Speech to text is a task that allows to transform spoken language to written text. It is commonly used to implement features such as transcription or voice assistants. :::warning -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-whisper-tiny.en). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/speech-to-text-68d0ec99ed794250491b8bbe). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md b/docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md index c40d19e94..7720309bf 100644 --- a/docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md +++ b/docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md @@ -17,8 +17,8 @@ description: "Learn how to use text embeddings models in your React Native appli Text Embedding is the process of converting text into a numerical representation. This representation can be used for various natural language processing tasks, such as semantic search, text classification, and clustering. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-all-MiniLM-L6-v2). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/text-embeddings-68d0ed42f8ca0200d0283362). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -141,7 +141,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useClassification.md b/docs/docs/02-hooks/02-computer-vision/useClassification.md index b4d3f34a6..9e772f8c6 100644 --- a/docs/docs/02-hooks/02-computer-vision/useClassification.md +++ b/docs/docs/02-hooks/02-computer-vision/useClassification.md @@ -8,8 +8,8 @@ Image classification is the process of assigning a label to an image that best d Usually, the class with the highest probability is the one that is assigned to an image. However, if there are multiple classes with comparatively high probabilities, this may indicate that the model is not confident in its prediction. ::: -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-efficientnet-v2-s). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/classification-68d0ea49b5c7de8a3cae1e68). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -104,7 +104,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useImageEmbeddings.md b/docs/docs/02-hooks/02-computer-vision/useImageEmbeddings.md index 6dbdc7dcc..aa6d60044 100644 --- a/docs/docs/02-hooks/02-computer-vision/useImageEmbeddings.md +++ b/docs/docs/02-hooks/02-computer-vision/useImageEmbeddings.md @@ -18,8 +18,8 @@ description: "Learn how to use image embeddings models in your React Native appl Image Embedding is the process of converting an image into a numerical representation. This representation can be used for tasks, such as classification, clustering and (using contrastive learning like e.g. CLIP model) image search. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-clip-vit-base-patch32). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-embeddings-68d0eda599a9d37caaaf1ad0). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -119,7 +119,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. Performance also heavily depends on image size, because resize is expansive operation, especially on low-end devices. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useImageSegmentation.md b/docs/docs/02-hooks/02-computer-vision/useImageSegmentation.md index 7fee70880..6631fc217 100644 --- a/docs/docs/02-hooks/02-computer-vision/useImageSegmentation.md +++ b/docs/docs/02-hooks/02-computer-vision/useImageSegmentation.md @@ -4,8 +4,8 @@ title: useImageSegmentation Semantic image segmentation, akin to image classification, tries to assign the content of the image to one of the predefined classes. However, in case of segmentation this classification is done on a per-pixel basis, so as the result the model provides an image-sized array of scores for each of the classes. You can then use this information to detect objects on a per-pixel basis. React Native ExecuTorch offers a dedicated hook `useImageSegmentation` for this task. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-segmentation-68d5291bdf4a30bee0220f4f), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -55,7 +55,7 @@ To run the model, you can use the `forward` method. It accepts three arguments: - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at [`DeeplabLabel`](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/imageSegmentation.ts) enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: @@ -98,7 +98,7 @@ function App() { ### Memory usage -:::warning warning +:::warning Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. ::: @@ -108,7 +108,7 @@ Data presented in the following sections is based on inference with non-resized ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useOCR.md b/docs/docs/02-hooks/02-computer-vision/useOCR.md index 037daebf7..7791109da 100644 --- a/docs/docs/02-hooks/02-computer-vision/useOCR.md +++ b/docs/docs/02-hooks/02-computer-vision/useOCR.md @@ -4,8 +4,8 @@ title: useOCR Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -311,7 +311,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | -------------------------------------------- | -------------------------------------------------- | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useObjectDetection.md b/docs/docs/02-hooks/02-computer-vision/useObjectDetection.md index ac756d6a6..52c6703dd 100644 --- a/docs/docs/02-hooks/02-computer-vision/useObjectDetection.md +++ b/docs/docs/02-hooks/02-computer-vision/useObjectDetection.md @@ -5,8 +5,8 @@ title: useObjectDetection Object detection is a computer vision technique that identifies and locates objects within images or video. It’s commonly used in applications like image recognition, video surveillance or autonomous driving. `useObjectDetection` is a hook that allows you to seamlessly integrate object detection into your React Native applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/object-detection-68d0ea936cd0906843cbba7d). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -143,7 +143,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useStyleTransfer.md b/docs/docs/02-hooks/02-computer-vision/useStyleTransfer.md index 899a619ca..795c118e8 100644 --- a/docs/docs/02-hooks/02-computer-vision/useStyleTransfer.md +++ b/docs/docs/02-hooks/02-computer-vision/useStyleTransfer.md @@ -4,8 +4,8 @@ title: useStyleTransfer Style transfer is a technique used in computer graphics and machine learning where the visual style of one image is applied to the content of another. This is achieved using algorithms that manipulate data from both images, typically with the aid of a neural network. The result is a new image that combines the artistic elements of one picture with the structural details of another, effectively merging art with traditional imagery. React Native ExecuTorch offers a dedicated hook `useStyleTransfer`, for this task. However before you start you'll need to obtain ExecuTorch-compatible model binary. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/style-transfer-68d0eab2b0767a20e7efeaf5), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -102,7 +102,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useTextToImage.md b/docs/docs/02-hooks/02-computer-vision/useTextToImage.md index 83e47a3e2..2b9db6ab6 100644 --- a/docs/docs/02-hooks/02-computer-vision/useTextToImage.md +++ b/docs/docs/02-hooks/02-computer-vision/useTextToImage.md @@ -9,7 +9,7 @@ Text-to-image is a process of generating images directly from a description in n :::warning -It is recommended to use models provided by us which are available at our Hugging Face repository, you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/text-to-image-68d0edf50ae6d20b5f9076cd), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md b/docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md index 29a4de452..0ba9e8f98 100644 --- a/docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md +++ b/docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md @@ -8,8 +8,8 @@ The `useVerticalOCR` hook is currently in an experimental phase. We appreciate f Optical Character Recognition (OCR) is a computer vision technique used to detect and recognize text within images. It is commonly utilized to convert a variety of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. Traditionally, OCR technology has been optimized for recognizing horizontal text, and integrating support for vertical text recognition often requires significant additional effort from developers. To simplify this, we introduce `useVerticalOCR`, a tool designed to abstract the complexities of vertical text OCR, enabling seamless integration into your applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -326,7 +326,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | ---------------------------------------------------- | --------------------------------------------------------- | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/03-executorch-bindings/useExecutorchModule.md b/docs/docs/02-hooks/03-executorch-bindings/useExecutorchModule.md index 137b19d92..f9b9f21a9 100644 --- a/docs/docs/02-hooks/03-executorch-bindings/useExecutorchModule.md +++ b/docs/docs/02-hooks/03-executorch-bindings/useExecutorchModule.md @@ -4,7 +4,7 @@ title: useExecutorchModule useExecutorchModule provides React Native bindings to the ExecuTorch [Module API](https://pytorch.org/executorch/stable/extension-module.html) directly from JavaScript. -:::caution +:::warning These bindings are primarily intended for custom model integration where no dedicated hook exists. If you are considering using a provided model, first verify whether a dedicated hook is available. Dedicated hooks simplify the implementation process by managing necessary pre and post-processing automatically. Utilizing these can save you effort and reduce complexity, ensuring you do not implement additional handling that is already covered. ::: diff --git a/docs/docs/03-typescript-api/02-computer-vision/ImageSegmentationModule.md b/docs/docs/03-typescript-api/02-computer-vision/ImageSegmentationModule.md index 99deae014..89344c70a 100644 --- a/docs/docs/03-typescript-api/02-computer-vision/ImageSegmentationModule.md +++ b/docs/docs/03-typescript-api/02-computer-vision/ImageSegmentationModule.md @@ -63,7 +63,7 @@ To run the model, you can use the `forward` method on the module object. It acce - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at `DeeplabLabel` enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for the `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: diff --git a/docs/docs/04-benchmarks/inference-time.md b/docs/docs/04-benchmarks/inference-time.md index dd0f1275a..743e0bee9 100644 --- a/docs/docs/04-benchmarks/inference-time.md +++ b/docs/docs/04-benchmarks/inference-time.md @@ -2,7 +2,7 @@ title: Inference Time --- -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: @@ -29,23 +29,23 @@ Times presented in the tables are measured as consecutive runs of the model. Ini ## OCR -| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | -| --------------------- | :--------------------------: | :------------------------------: | :------------------------: | :-------------------------------: | :-------------------------------: | -| Detector (CRAFT_800) | 2099 | 2227 | ❌ | 2245 | 7108 | -| Recognizer (CRNN_512) | 70 | 252 | ❌ | 54 | 151 | -| Recognizer (CRNN_256) | 39 | 123 | ❌ | 24 | 78 | -| Recognizer (CRNN_128) | 17 | 83 | ❌ | 14 | 39 | +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | +| --------------------- | :--------------------------: | :------------------------------: | :-------------------: | :-------------------------------: | :-------------------------------: | +| Detector (CRAFT_800) | 2099 | 2227 | ❌ | 2245 | 7108 | +| Recognizer (CRNN_512) | 70 | 252 | ❌ | 54 | 151 | +| Recognizer (CRNN_256) | 39 | 123 | ❌ | 24 | 78 | +| Recognizer (CRNN_128) | 17 | 83 | ❌ | 14 | 39 | ❌ - Insufficient RAM. ## Vertical OCR -| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | -| --------------------- | :--------------------------: | :------------------------------: | :------------------------: | :-------------------------------: | :-------------------------------: | -| Detector (CRAFT_1280) | 5457 | 5833 | ❌ | 6296 | 14053 | -| Detector (CRAFT_320) | 1351 | 1460 | ❌ | 1485 | 3101 | -| Recognizer (CRNN_512) | 39 | 123 | ❌ | 24 | 78 | -| Recognizer (CRNN_64) | 10 | 33 | ❌ | 7 | 18 | +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | +| --------------------- | :--------------------------: | :------------------------------: | :-------------------: | :-------------------------------: | :-------------------------------: | +| Detector (CRAFT_1280) | 5457 | 5833 | ❌ | 6296 | 14053 | +| Detector (CRAFT_320) | 1351 | 1460 | ❌ | 1485 | 3101 | +| Recognizer (CRNN_512) | 39 | 123 | ❌ | 24 | 78 | +| Recognizer (CRNN_64) | 10 | 33 | ❌ | 7 | 18 | ❌ - Insufficient RAM. @@ -82,7 +82,7 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode ### Decoding -Average time for decoding one token in sequence of 100 tokens, with encoding context is obtained from audio of noted length. +Average time for decoding one token in sequence of 100 tokens, with encoding context obtained from audio of noted length. | Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | | ------------------ | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | @@ -112,7 +112,17 @@ Benchmark times for text embeddings are highly dependent on the sentence length. Image embedding benchmark times are measured using 224×224 pixel images, as required by the model. All input images, whether larger or smaller, are resized to 224×224 before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total inference time. ::: -## Text to Image +## Image Segmentation + +:::warning +Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. +::: + +| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 14 Pro Max (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | +| ----------------- | ---------------------------- | -------------------------------- | --------------------------------- | +| DEELABV3_RESNET50 | 1000 | 670 | 700 | + +## Text to image Average time for generating one image of size 256×256 in 10 inference steps. diff --git a/docs/docs/04-benchmarks/memory-usage.md b/docs/docs/04-benchmarks/memory-usage.md index e34c8a7ca..0b76b5dd5 100644 --- a/docs/docs/04-benchmarks/memory-usage.md +++ b/docs/docs/04-benchmarks/memory-usage.md @@ -69,7 +69,17 @@ title: Memory Usage | --------------------------- | :--------------------: | :----------------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 350 | 340 | -## Text to Image +## Image Segmentation + +:::warning +Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. +::: + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| ----------------- | ---------------------- | ------------------ | +| DEELABV3_RESNET50 | 930 | 660 | + +## Text to image | Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | | --------------------- | ---------------------- | ------------------ | diff --git a/docs/docs/04-benchmarks/model-size.md b/docs/docs/04-benchmarks/model-size.md index 5cf87f6fa..7089daddf 100644 --- a/docs/docs/04-benchmarks/model-size.md +++ b/docs/docs/04-benchmarks/model-size.md @@ -83,7 +83,13 @@ title: Model Size | --------------------------- | :----------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 352 | -## Text to Image +## Image Segmentation + +| Model | XNNPACK [MB] | +| ----------------- | ------------ | +| DEELABV3_RESNET50 | 168 | + +## Text to image | Model | Text encoder (XNNPACK) [MB] | UNet (XNNPACK) [MB] | VAE decoder (XNNPACK) [MB] | | ----------------- | --------------------------- | ------------------- | -------------------------- | diff --git a/docs/versioned_docs/version-0.3.x/benchmarks/inference-time.md b/docs/versioned_docs/version-0.3.x/benchmarks/inference-time.md index 31dc81174..7ae4c9875 100644 --- a/docs/versioned_docs/version-0.3.x/benchmarks/inference-time.md +++ b/docs/versioned_docs/version-0.3.x/benchmarks/inference-time.md @@ -91,7 +91,7 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode ### Decoding -Average time for decoding one token in sequence of 100 tokens, with encoding context is obtained from audio of noted length. +Average time for decoding one token in sequence of 100 tokens, with encoding context obtained from audio of noted length. | Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | | -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | diff --git a/docs/versioned_docs/version-0.4.x/benchmarks/inference-time.md b/docs/versioned_docs/version-0.4.x/benchmarks/inference-time.md index da35e7b6e..7af004cdb 100644 --- a/docs/versioned_docs/version-0.4.x/benchmarks/inference-time.md +++ b/docs/versioned_docs/version-0.4.x/benchmarks/inference-time.md @@ -90,7 +90,7 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode ### Decoding -Average time for decoding one token in sequence of 100 tokens, with encoding context is obtained from audio of noted length. +Average time for decoding one token in sequence of 100 tokens, with encoding context obtained from audio of noted length. | Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | | -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | diff --git a/docs/versioned_docs/version-0.5.x/01-fundamentals/01-getting-started.md b/docs/versioned_docs/version-0.5.x/01-fundamentals/01-getting-started.md index b5d60c35b..18d845862 100644 --- a/docs/versioned_docs/version-0.5.x/01-fundamentals/01-getting-started.md +++ b/docs/versioned_docs/version-0.5.x/01-fundamentals/01-getting-started.md @@ -76,7 +76,7 @@ If you plan on using your models via require() instead of fetching them from a u This allows us to use binaries, such as exported models or tokenizers for LLMs. -:::caution +:::warning When using Expo, please note that you need to use a custom development build of your app, not the standard Expo Go app. This is because we rely on native modules, which Expo Go doesn’t support. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useLLM.md b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useLLM.md index f639a6cf6..e49b1a8e4 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useLLM.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useLLM.md @@ -30,7 +30,7 @@ React Native ExecuTorch supports a variety of LLMs (checkout our [HuggingFace re Lower-end devices might not be able to fit LLMs into memory. We recommend using quantized models to reduce the memory footprint. ::: -:::caution +:::warning Up to version 0.5.3, our architecture was designed to support only one instance of the model runner at a time. As a consequence, only one active component could leverage `useLLM` concurrently. Starting with version 0.5.3, this limitation has been removed ::: @@ -199,7 +199,7 @@ Sometimes, you might want to stop the model while it’s generating. To do this, There are also cases when you need to check if tokens are being generated, such as to conditionally render a stop button. We’ve made this easy with the `isGenerating` property. -:::caution +:::warning If you try to dismount the component using this hook while generation is still going on, it will result in crash. You'll need to interrupt the model first and wait until `isGenerating` is set to false. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md index 3256e2e88..881cd4107 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md @@ -20,7 +20,7 @@ description: "Learn how to use speech-to-text models in your React Native applic Speech to text is a task that allows to transform spoken language to written text. It is commonly used to implement features such as transcription or voice assistants. :::warning -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-whisper-tiny.en). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/speech-to-text-68d0ec99ed794250491b8bbe). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useTextEmbeddings.md b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useTextEmbeddings.md index c40d19e94..7720309bf 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useTextEmbeddings.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useTextEmbeddings.md @@ -17,8 +17,8 @@ description: "Learn how to use text embeddings models in your React Native appli Text Embedding is the process of converting text into a numerical representation. This representation can be used for various natural language processing tasks, such as semantic search, text classification, and clustering. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-all-MiniLM-L6-v2). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/text-embeddings-68d0ed42f8ca0200d0283362). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -141,7 +141,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useClassification.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useClassification.md index b4d3f34a6..9e772f8c6 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useClassification.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useClassification.md @@ -8,8 +8,8 @@ Image classification is the process of assigning a label to an image that best d Usually, the class with the highest probability is the one that is assigned to an image. However, if there are multiple classes with comparatively high probabilities, this may indicate that the model is not confident in its prediction. ::: -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-efficientnet-v2-s). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/classification-68d0ea49b5c7de8a3cae1e68). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -104,7 +104,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageEmbeddings.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageEmbeddings.md index 1849a95ce..af4f58563 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageEmbeddings.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageEmbeddings.md @@ -18,8 +18,8 @@ description: "Learn how to use image embeddings models in your React Native appl Image Embedding is the process of converting an image into a numerical representation. This representation can be used for tasks, such as classification, clustering and (using contrastive learning like e.g. CLIP model) image search. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-clip-vit-base-patch32). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-embeddings-68d0eda599a9d37caaaf1ad0). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -119,7 +119,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. Performance also heavily depends on image size, because resize is expansive operation, especially on low-end devices. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageSegmentation.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageSegmentation.md index 7fee70880..6631fc217 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageSegmentation.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageSegmentation.md @@ -4,8 +4,8 @@ title: useImageSegmentation Semantic image segmentation, akin to image classification, tries to assign the content of the image to one of the predefined classes. However, in case of segmentation this classification is done on a per-pixel basis, so as the result the model provides an image-sized array of scores for each of the classes. You can then use this information to detect objects on a per-pixel basis. React Native ExecuTorch offers a dedicated hook `useImageSegmentation` for this task. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-segmentation-68d5291bdf4a30bee0220f4f), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -55,7 +55,7 @@ To run the model, you can use the `forward` method. It accepts three arguments: - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at [`DeeplabLabel`](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/imageSegmentation.ts) enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: @@ -98,7 +98,7 @@ function App() { ### Memory usage -:::warning warning +:::warning Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. ::: @@ -108,7 +108,7 @@ Data presented in the following sections is based on inference with non-resized ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useOCR.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useOCR.md index a23acd17c..0eeea17b4 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useOCR.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useOCR.md @@ -4,8 +4,8 @@ title: useOCR Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -311,7 +311,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | ----------------------------------------------- | ----------------------------------------------------- | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useObjectDetection.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useObjectDetection.md index ac756d6a6..52c6703dd 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useObjectDetection.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useObjectDetection.md @@ -5,8 +5,8 @@ title: useObjectDetection Object detection is a computer vision technique that identifies and locates objects within images or video. It’s commonly used in applications like image recognition, video surveillance or autonomous driving. `useObjectDetection` is a hook that allows you to seamlessly integrate object detection into your React Native applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/object-detection-68d0ea936cd0906843cbba7d). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -143,7 +143,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useStyleTransfer.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useStyleTransfer.md index 899a619ca..795c118e8 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useStyleTransfer.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useStyleTransfer.md @@ -4,8 +4,8 @@ title: useStyleTransfer Style transfer is a technique used in computer graphics and machine learning where the visual style of one image is applied to the content of another. This is achieved using algorithms that manipulate data from both images, typically with the aid of a neural network. The result is a new image that combines the artistic elements of one picture with the structural details of another, effectively merging art with traditional imagery. React Native ExecuTorch offers a dedicated hook `useStyleTransfer`, for this task. However before you start you'll need to obtain ExecuTorch-compatible model binary. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/style-transfer-68d0eab2b0767a20e7efeaf5), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -102,7 +102,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useVerticalOCR.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useVerticalOCR.md index e15c08fbe..b46ebc3b2 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useVerticalOCR.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useVerticalOCR.md @@ -8,8 +8,8 @@ The `useVerticalOCR` hook is currently in an experimental phase. We appreciate f Optical Character Recognition (OCR) is a computer vision technique used to detect and recognize text within images. It is commonly utilized to convert a variety of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. Traditionally, OCR technology has been optimized for recognizing horizontal text, and integrating support for vertical text recognition often requires significant additional effort from developers. To simplify this, we introduce `useVerticalOCR`, a tool designed to abstract the complexities of vertical text OCR, enabling seamless integration into your applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -326,7 +326,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | ------------------------------------------------------- | ------------------------------------------------------------ | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/03-executorch-bindings/useExecutorchModule.md b/docs/versioned_docs/version-0.5.x/02-hooks/03-executorch-bindings/useExecutorchModule.md index 137b19d92..f9b9f21a9 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/03-executorch-bindings/useExecutorchModule.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/03-executorch-bindings/useExecutorchModule.md @@ -4,7 +4,7 @@ title: useExecutorchModule useExecutorchModule provides React Native bindings to the ExecuTorch [Module API](https://pytorch.org/executorch/stable/extension-module.html) directly from JavaScript. -:::caution +:::warning These bindings are primarily intended for custom model integration where no dedicated hook exists. If you are considering using a provided model, first verify whether a dedicated hook is available. Dedicated hooks simplify the implementation process by managing necessary pre and post-processing automatically. Utilizing these can save you effort and reduce complexity, ensuring you do not implement additional handling that is already covered. ::: diff --git a/docs/versioned_docs/version-0.5.x/03-typescript-api/02-computer-vision/ImageSegmentationModule.md b/docs/versioned_docs/version-0.5.x/03-typescript-api/02-computer-vision/ImageSegmentationModule.md index 99deae014..89344c70a 100644 --- a/docs/versioned_docs/version-0.5.x/03-typescript-api/02-computer-vision/ImageSegmentationModule.md +++ b/docs/versioned_docs/version-0.5.x/03-typescript-api/02-computer-vision/ImageSegmentationModule.md @@ -63,7 +63,7 @@ To run the model, you can use the `forward` method on the module object. It acce - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at `DeeplabLabel` enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for the `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: diff --git a/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md b/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md index 504c0f6e9..411fdc9d1 100644 --- a/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md +++ b/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md @@ -2,7 +2,7 @@ title: Inference Time --- -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: @@ -29,23 +29,23 @@ Times presented in the tables are measured as consecutive runs of the model. Ini ## OCR -| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | -| --------------------- | :--------------------------: | :------------------------------: | :------------------------: | :-------------------------------: | :-------------------------------: | -| Detector (CRAFT_800) | 2099 | 2227 | ❌ | 2245 | 7108 | -| Recognizer (CRNN_512) | 70 | 252 | ❌ | 54 | 151 | -| Recognizer (CRNN_256) | 39 | 123 | ❌ | 24 | 78 | -| Recognizer (CRNN_128) | 17 | 83 | ❌ | 14 | 39 | +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | +| --------------------- | :--------------------------: | :------------------------------: | :-------------------: | :-------------------------------: | :-------------------------------: | +| Detector (CRAFT_800) | 2099 | 2227 | ❌ | 2245 | 7108 | +| Recognizer (CRNN_512) | 70 | 252 | ❌ | 54 | 151 | +| Recognizer (CRNN_256) | 39 | 123 | ❌ | 24 | 78 | +| Recognizer (CRNN_128) | 17 | 83 | ❌ | 14 | 39 | ❌ - Insufficient RAM. ## Vertical OCR -| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | -| --------------------- | :--------------------------: | :------------------------------: | :------------------------: | :-------------------------------: | :-------------------------------: | -| Detector (CRAFT_1280) | 5457 | 5833 | ❌ | 6296 | 14053 | -| Detector (CRAFT_320) | 1351 | 1460 | ❌ | 1485 | 3101 | -| Recognizer (CRNN_512) | 39 | 123 | ❌ | 24 | 78 | -| Recognizer (CRNN_64) | 10 | 33 | ❌ | 7 | 18 | +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | +| --------------------- | :--------------------------: | :------------------------------: | :-------------------: | :-------------------------------: | :-------------------------------: | +| Detector (CRAFT_1280) | 5457 | 5833 | ❌ | 6296 | 14053 | +| Detector (CRAFT_320) | 1351 | 1460 | ❌ | 1485 | 3101 | +| Recognizer (CRNN_512) | 39 | 123 | ❌ | 24 | 78 | +| Recognizer (CRNN_64) | 10 | 33 | ❌ | 7 | 18 | ❌ - Insufficient RAM. @@ -66,27 +66,27 @@ Times presented in the tables are measured as consecutive runs of the model. Ini Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. If you believe that this might be a problem for you, prefer `balanced` mode instead. -| Model (mode) | iPhone 16 Pro (XNNPACK) [latency \| tokens/s] | iPhone 14 Pro (XNNPACK) [latency \| tokens/s] | iPhone SE 3 (XNNPACK) [latency \| tokens/s] | Samsung Galaxy S24 (XNNPACK) [latency \| tokens/s] | OnePlus 12 (XNNPACK) [latency \| tokens/s] | -| ------------------------- | :-------------------------------------------: | :-------------------------------------------: | :-----------------------------------------: | :------------------------------------------------: | :----------------------------------------: | -| Whisper-tiny (fast) | 2.8s \| 5.5t/s | 3.7s \| 4.4t/s | 4.4s \| 3.4t/s | 5.5s \| 3.1t/s | 5.3s \| 3.8t/s | -| Whisper-tiny (balanced) | 5.6s \| 7.9t/s | 7.0s \| 6.3t/s | 8.3s \| 5.0t/s | 8.4s \| 6.7t/s | 7.7s \| 7.2t/s | -| Whisper-tiny (quality) | 10.3s \| 8.3t/s | 12.6s \| 6.8t/s | 7.8s \| 8.9t/s | 13.5s \| 7.1t/s | 12.9s \| 7.5t/s | +| Model (mode) | iPhone 16 Pro (XNNPACK) [latency \| tokens/s] | iPhone 14 Pro (XNNPACK) [latency \| tokens/s] | iPhone SE 3 (XNNPACK) [latency \| tokens/s] | Samsung Galaxy S24 (XNNPACK) [latency \| tokens/s] | OnePlus 12 (XNNPACK) [latency \| tokens/s] | +| ----------------------- | :-------------------------------------------: | :-------------------------------------------: | :-----------------------------------------: | :------------------------------------------------: | :----------------------------------------: | +| Whisper-tiny (fast) | 2.8s \| 5.5t/s | 3.7s \| 4.4t/s | 4.4s \| 3.4t/s | 5.5s \| 3.1t/s | 5.3s \| 3.8t/s | +| Whisper-tiny (balanced) | 5.6s \| 7.9t/s | 7.0s \| 6.3t/s | 8.3s \| 5.0t/s | 8.4s \| 6.7t/s | 7.7s \| 7.2t/s | +| Whisper-tiny (quality) | 10.3s \| 8.3t/s | 12.6s \| 6.8t/s | 7.8s \| 8.9t/s | 13.5s \| 7.1t/s | 12.9s \| 7.5t/s | ### Encoding Average time for encoding audio of given length over 10 runs. For `Whisper` model we only list 30 sec audio chunks since `Whisper` does not accept other lengths (for shorter audio the audio needs to be padded to 30sec with silence). -| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | -| -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | -| Whisper-tiny (30s) | 1034 | 1344 | 1269 | 2916 | 2143 | +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | +| ------------------ | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | +| Whisper-tiny (30s) | 1034 | 1344 | 1269 | 2916 | 2143 | ### Decoding -Average time for decoding one token in sequence of 100 tokens, with encoding context is obtained from audio of noted length. +Average time for decoding one token in sequence of 100 tokens, with encoding context obtained from audio of noted length. -| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | -| -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | -| Whisper-tiny (30s) | 128.03 | 113.65 | 141.63 | 89.08 | 84.49 | +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | +| ------------------ | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | +| Whisper-tiny (30s) | 128.03 | 113.65 | 141.63 | 89.08 | 84.49 | ## Text Embeddings @@ -111,3 +111,13 @@ Benchmark times for text embeddings are highly dependent on the sentence length. :::info Image embedding benchmark times are measured using 224×224 pixel images, as required by the model. All input images, whether larger or smaller, are resized to 224×224 before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total inference time. ::: + +## Image Segmentation + +:::warning +Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. +::: + +| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 14 Pro Max (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | +| ----------------- | ---------------------------- | -------------------------------- | --------------------------------- | +| DEELABV3_RESNET50 | 1000 | 670 | 700 | diff --git a/docs/versioned_docs/version-0.5.x/04-benchmarks/memory-usage.md b/docs/versioned_docs/version-0.5.x/04-benchmarks/memory-usage.md index 684020e2a..d9ce93f48 100644 --- a/docs/versioned_docs/version-0.5.x/04-benchmarks/memory-usage.md +++ b/docs/versioned_docs/version-0.5.x/04-benchmarks/memory-usage.md @@ -68,3 +68,13 @@ title: Memory Usage | Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | | --------------------------- | :--------------------: | :----------------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 350 | 340 | + +## Image Segmentation + +:::warning +Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. +::: + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| ----------------- | ---------------------- | ------------------ | +| DEELABV3_RESNET50 | 930 | 660 | diff --git a/docs/versioned_docs/version-0.5.x/04-benchmarks/model-size.md b/docs/versioned_docs/version-0.5.x/04-benchmarks/model-size.md index 9d20c95d5..a1da630e5 100644 --- a/docs/versioned_docs/version-0.5.x/04-benchmarks/model-size.md +++ b/docs/versioned_docs/version-0.5.x/04-benchmarks/model-size.md @@ -82,3 +82,9 @@ title: Model Size | Model | XNNPACK [MB] | | --------------------------- | :----------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 352 | + +## Image Segmentation + +| Model | XNNPACK [MB] | +| ----------------- | ------------ | +| DEELABV3_RESNET50 | 168 |