Commit f2007b2
authored
New approach to multimodal document ingestion (#2558)
* Prepare change for multimodal, rm old vision approach stuff
* Add LLM-based media describer
* Prepdocs progress
* Fix media description with OpenAI
* More prepdocs improvements for image handling
* Store bbox as list of pixel floats, add storage container just for extracted images
* Getting image citations almost working
* More progress on multimodal approach
* Update more tests
* Fix up more app tests
* Add test for upload_document_image
* Add media describer and embeddings tests
* Fix tests for vision, work on vectorizer
* Add font, rename multimodal doc
* Update links to multimodal
* Fix import
* Doc fixes
* Fix f-string syntax
* Markdown lint issues
* mypy fixes and reasoning fixes
* Rename vision variables, fix mypy
* Mypy fixes
* Fix all mypy issues
* Fixes to sidebar so that it all fits
* Fixes to sidebar so that it all fits
* Integrated vectorization and user upload work
* Progress on user upload support
* changes needed for user upload
* Update tests
* Integrated vectorization progress
* Fix tests
* Use ImageEmbeddings client directly
* Change frontend for vector fields
* Use boolean parameters in the backend as well, for vector fields
* Updated translations
* Change frontend for LLM inputs
* Change from LLM inputs to booleans
* Working on tests
* Blob manager improvements/tests
* Change to a global client that we close in lifespan
* Add latest int vect changes
* Update the tests
* Add as_bytes option
* Mypy fixes
* Mypy fixes
* More mypy fixes
* More mypy fixes
* Address more TODOs
* Fix E2E tests
* Add more tests for blobmanger
* Markdown fix, more coverage
* Fix broken MD link
* Increase coverage
* Increase test coverage
* Add diff-cover step to python test
* Fix diff-cover action
* Fetch origin main for diff-cover
* Increase test coverage
* More tests, Windows check
* Better copilot instructions
* Updated merge
* Revert integrated vectorization changes, using a different strategy
* Remove unused identity in int vect
* Refactor get_sources_content to return DataPoints
* Fix multimodal bicep variable error
* Fix prepdocs to properly close async clients
* better CSS for image URLs and images in Thought Process and Supporting Content
* Revert logging level to WARNING as before
* Update text splitter chunking logic and add full test coverage
* Use single token char
* Fix mypy error
* Add some helper functions and modules to improve code clarity for textsplitter
* Update splitting algorithm with better overlap algorithm, rename SplitPage to Chunk
* markdown issues
* Revise multimodal doc to be clearer
* Rephrase fragment shift to be more grokkable
* Reword duplicate part of textsplitter doc1 parent 84cf73e commit f2007b2
File tree
178 files changed
+7806
-3728
lines changed- .azdo/pipelines
- .github
- chatmodes
- workflows
- app
- backend
- approaches
- prompts
- core
- prepdocslib
- frontend/src
- api
- components
- AnalysisPanel
- Answer
- Example
- GPT4VSettings
- Settings
- SupportingContent
- TokenClaimsDisplay
- VectorSettings
- locales
- da
- en
- es
- fr
- it
- ja
- nl
- ptBR
- tr
- pages
- ask
- chat
- data
- Multimodal_Examples
- docs
- images
- infra
- tests
- snapshots
- test_app
- test_ask_prompt_template_concat
- client0
- client1
- test_ask_prompt_template
- client0
- client1
- test_ask_rtr_hybrid
- client0
- client1
- test_ask_rtr_text_agent_filter/agent_auth_client0
- test_ask_rtr_text_agent/agent_client0
- test_ask_rtr_text_filter_public_documents/auth_public_documents_client0
- test_ask_rtr_text_filter/auth_client0
- test_ask_rtr_text_semanticcaptions
- client0
- client1
- test_ask_rtr_text_semanticranker
- client0
- client1
- test_ask_rtr_text
- client0
- client1
- test_ask_vision
- client0
- client1
- vision_client0
- test_chat_followup
- client0
- client1
- test_chat_hybrid_semantic_captions
- client0
- client1
- test_chat_hybrid_semantic_ranker
- client0
- client1
- test_chat_hybrid
- client0
- client1
- test_chat_prompt_template_concat
- client0
- client1
- test_chat_prompt_template
- client0
- client1
- test_chat_seed
- client0
- client1
- test_chat_session_state_persists
- client0
- client1
- test_chat_stream_followup
- client0
- client1
- test_chat_stream_session_state_persists
- client0
- client1
- test_chat_stream_text_filter/auth_client0
- test_chat_stream_text_reasoning
- reasoning_client0
- reasoning_client1
- test_chat_stream_text
- client0
- client1
- test_chat_stream_vision
- client0
- client1
- vision_client0
- test_chat_text_agent/agent_client0
- test_chat_text_filter_agent/agent_auth_client0
- test_chat_text_filter_public_documents/auth_public_documents_client0
- test_chat_text_filter/auth_client0
- test_chat_text_reasoning
- reasoning_client0
- reasoning_client1
- test_chat_text_semantic_ranker
- client0
- client1
- test_chat_text_semanticcaptions
- client0
- client1
- test_chat_text_semanticranker
- client0
- client1
- test_chat_text
- client0
- client1
- test_chat_vector_semantic_ranker
- client0
- client1
- test_chat_vector
- client0
- client1
- test_chat_vision_user/auth_client0
- test_chat_vision_vectors
- client0
- client1
- vision_client0
- test_chat_vision
- client0
- client1
- test_chat_with_history
- client0
- client1
- test_prepdocslib_textsplitter
- test_pages_with_figures
- pages_with_figures.json
- pages_with_just_text.json
- test_sentencetextsplitter_list_parse_and_split
- test-data
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
178 files changed
+7806
-3728
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | 80 | | |
86 | 81 | | |
87 | 82 | | |
| |||
91 | 86 | | |
92 | 87 | | |
93 | 88 | | |
94 | | - | |
| 89 | + | |
95 | 90 | | |
96 | 91 | | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
101 | 96 | | |
102 | 97 | | |
103 | 98 | | |
| |||
126 | 121 | | |
127 | 122 | | |
128 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
129 | 128 | | |
130 | 129 | | |
131 | 130 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
29 | 33 | | |
30 | 34 | | |
31 | 35 | | |
| 36 | + | |
32 | 37 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
1 | 31 | | |
2 | 32 | | |
3 | 33 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | 70 | | |
76 | 71 | | |
77 | 72 | | |
| |||
87 | 82 | | |
88 | 83 | | |
89 | 84 | | |
90 | | - | |
| 85 | + | |
91 | 86 | | |
92 | 87 | | |
93 | 88 | | |
| |||
116 | 111 | | |
117 | 112 | | |
118 | 113 | | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
119 | 118 | | |
120 | 119 | | |
121 | 120 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | 65 | | |
71 | 66 | | |
72 | 67 | | |
| |||
82 | 77 | | |
83 | 78 | | |
84 | 79 | | |
85 | | - | |
| 80 | + | |
86 | 81 | | |
87 | 82 | | |
88 | 83 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
| 76 | + | |
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
| 64 | + | |
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| |||
255 | 255 | | |
256 | 256 | | |
257 | 257 | | |
258 | | - | |
| 258 | + | |
259 | 259 | | |
260 | 260 | | |
261 | 261 | | |
| |||
0 commit comments