You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Architecture.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ The figure below shows the core framework structure of LLamaSharp.
6
6
7
7
-**Native APIs**: LLamaSharp calls the exported C APIs to load and run the model. The APIs defined in LLamaSharp specially for calling C APIs are named `Native APIs`. We have made all the native APIs public under namespace `LLama.Native`. However, it's strongly recommended not to use them unless you know what you are doing.
8
8
-**LLamaWeights**: The holder of the model weight.
9
-
-**LLamaContext**: A context which directly interact with the native library and provide some basic APIs such as tokenization and embedding. It takes use of `LLamaWeights`.
9
+
-**LLamaContext**: A context which directly interacts with the native library and provides some basic APIs such as tokenization and embedding. It takes use of `LLamaWeights`.
10
10
-**LLamaExecutors**: Executors which define the way to run the LLama model. It provides text-to-text and image-to-text APIs to make it easy to use. Currently we provide four kinds of executors: `InteractiveExecutor`, `InstructExecutor`, `StatelessExecutor` and `BatchedExecutor`.
11
11
-**ChatSession**: A wrapping for `InteractiveExecutor` and `LLamaContext`, which supports interactive tasks and saving/re-loading sessions. It also provides a flexible way to customize the text process by `IHistoryTransform`, `ITextTransform` and `ITextStreamTransform`.
12
12
-**Integrations**: Integrations with other libraries to expand the application of LLamaSharp. For example, if you want to do RAG ([Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation)), kernel-memory integration is a good option for you.
Copy file name to clipboardExpand all lines: docs/FAQ.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ Generally, there are two possible cases for this problem:
29
29
30
30
Please set anti-prompt or max-length when executing the inference.
31
31
32
-
Anti-prompt can also be called as "Stop-keyword", which decides when to stop the response generation. Under interactive mode, the maximum tokens count is always not set, which makes the LLM generates responses infinitively. Therefore, setting anti-prompt correctly helps a lot to avoid the strange behaviours. For example, the prompt file `chat-with-bob.txt` has the following content:
32
+
Anti-prompt can also be called as "Stop-keyword", which decides when to stop the response generation. Under interactive mode, the maximum tokens count is always not set, which makes the LLM generate responses infinitively. Therefore, setting anti-prompt correctly helps a lot to avoid the strange behaviours. For example, the prompt file `chat-with-bob.txt` has the following content:
33
33
34
34
```
35
35
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
@@ -43,7 +43,7 @@ User:
43
43
44
44
Therefore, the anti-prompt should be set as "User:". If the last line of the prompt is removed, LLM will automatically generate a question (user) and a response (bob) for one time when running the chat session. Therefore, the antiprompt is suggested to be appended to the prompt when starting a chat session.
45
45
46
-
What if an extra line is appended? The string "User:" in the prompt will be followed with a char "\n". Thus when running the model, the automatic generation of a pair of question and response may appear because the anti-prompt is "User:" but the last token is "User:\n". As for whether it will appear, it's an undefined behaviour, which depends on the implementation inside the `LLamaExecutor`. Anyway, since it may leads to unexpected behaviors, it's recommended to trim your prompt or carefully keep consistent with your anti-prompt.
46
+
What if an extra line is appended? The string "User:" in the prompt will be followed with a char "\n". Thus when running the model, the automatic generation of a pair of question and response may appear because the anti-prompt is "User:" but the last token is "User:\n". As for whether it will appear, it's an undefined behaviour, which depends on the implementation inside the `LLamaExecutor`. Anyway, since it may lead to unexpected behaviors, it's recommended to trim your prompt or carefully keep consistent with your anti-prompt.
In this inequality, `len(response)` refers to the expected tokens for LLM to generate.
61
61
62
-
## Choose models weight depending on you task
62
+
## Choose models weight depending on your task
63
63
64
64
The differences between modes may lead to much different behaviours under the same task. For example, if you're building a chat bot with non-English, a fine-tuned model specially for the language you want to use will have huge effect on the performance.
Copy file name to clipboardExpand all lines: docs/QuickStart.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ PM> Install-Package LLamaSharp
24
24
25
25
## Model preparation
26
26
27
-
There are two popular format of model file of LLM now, which are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. To get `GGUF` file, there are two options:
27
+
There are two popular formats of model file of LLM now, which are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. To get `GGUF` file, there are two options:
28
28
29
29
1. Search model name + 'gguf' in [Huggingface](https://huggingface.co), you will find lots of model files that have already been converted to GGUF format. Please take care of the publishing time of them because some old ones could only work with old version of LLamaSharp.
Copy file name to clipboardExpand all lines: docs/index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ If you are new to LLM, here're some tips for you to help you to get start with `
17
17
18
18
## Integrations
19
19
20
-
There are integarions for the following libraries, which help to expand the application of LLamaSharp. Integrations for semantic-kernel and kernel-memory are developed in LLamaSharp repository, while others are developed in their own repositories.
20
+
There are integrations for the following libraries, which help to expand the application of LLamaSharp. Integrations for semantic-kernel and kernel-memory are developed in LLamaSharp repository, while others are developed in their own repositories.
21
21
22
22
-[semantic-kernel](https://github.com/microsoft/semantic-kernel): an SDK that integrates LLM like OpenAI, Azure OpenAI, and Hugging Face.
23
23
-[kernel-memory](https://github.com/microsoft/kernel-memory): a multi-modal AI Service specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines, with support for RAG ([Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation)), synthetic memory, prompt engineering, and custom semantic memory processing.
@@ -32,7 +32,7 @@ There are integarions for the following libraries, which help to expand the appl
32
32
Community effort is always one of the most important things in open-source projects. Any contribution in any way is welcomed here. For example, the following things mean a lot for LLamaSharp:
33
33
34
34
1. Open an issue when you find something wrong.
35
-
2. Open an PR if you've fixed something. Even if just correcting a typo, it also makes great sense.
35
+
2. Open a PR if you've fixed something. Even if just correcting a typo, it also makes great sense.
36
36
3. Help to optimize the documentation.
37
37
4. Write an example or blog about how to integrate LLamaSharp with your APPs.
@@ -671,7 +671,7 @@ public static Span<float> llama_get_embeddings(SafeLLamaContextHandle ctx)
671
671
672
672
Apply chat template. Inspired by hf apply_chat_template() on python.
673
673
Both "model" and "custom_template" are optional, but at least one is required. "custom_template" has higher precedence than "model"
674
-
NOTE: This function does not use a jinja parser. It only support a pre-defined list of template. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
674
+
NOTE: This function does not use a jinja parser. It only supports a pre-defined list of template. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
0 commit comments