Commit 7a2c913
llava : Add Granite Vision Support (#11794)
* Add super wip scripts for multimodal granite gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add example for converting mmgranite to gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* remove hardcoded path
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add vision feature layer to gguf params
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Clean up llava surgery and remove name substitution hacks
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add transformers llava next tensor name mapping
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Make siglip / openclip mutuall exclusive
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix projector linear substitution
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix linear 2 substitution index
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Increase max flattened gridpoints to 64
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix hardcoded concat for multiple feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Pull vision feature layers out of gguf keys
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* fix num gridpoints and use all layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Avoid dropping last image encoder layer in llava models
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use 10 for max number of patches
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Standardize vision feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Cleanup logs
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update comment for vision feature layer init
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update notes for alternative to legacy llm conversion script
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix notes rendering
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add v prefix to vision feature layer log
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use current defaults for feature layer
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use constant for max gridpoints / feat layers, style fixes
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* clarify non-negative feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Remove CLIP_API from func signature
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Clarify feature layers are non negative ints and not uint
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix condition for reading feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* pop last llava layer when feature layers are unset
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix unset vision layer 0
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update examples/llava/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Reenable assertion for out of bounds get_rows
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use std vector for gridpoints and feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Caculate max feature layer at load time
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Include base patch for granite vision allocation
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix trailing whitespace
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add max num patches = 10 back for minicpmv
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use unordered set to store feature layers
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use max feature layer for postnorm
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Apply suggestions from code review
---------
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>1 parent 08d5986 commit 7a2c913
File tree
6 files changed
+235
-42
lines changed- examples/llava
6 files changed
+235
-42
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| 104 | + | |
104 | 105 | | |
105 | 106 | | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
106 | 125 | | |
107 | 126 | | |
108 | 127 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
| 124 | + | |
123 | 125 | | |
124 | 126 | | |
125 | 127 | | |
| |||
444 | 446 | | |
445 | 447 | | |
446 | 448 | | |
447 | | - | |
| 449 | + | |
448 | 450 | | |
| 451 | + | |
449 | 452 | | |
450 | 453 | | |
451 | 454 | | |
| |||
585 | 588 | | |
586 | 589 | | |
587 | 590 | | |
| 591 | + | |
588 | 592 | | |
589 | 593 | | |
590 | 594 | | |
| |||
651 | 655 | | |
652 | 656 | | |
653 | 657 | | |
654 | | - | |
655 | 658 | | |
656 | 659 | | |
657 | 660 | | |
| |||
752 | 755 | | |
753 | 756 | | |
754 | 757 | | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
755 | 761 | | |
756 | | - | |
757 | | - | |
758 | | - | |
759 | | - | |
| 762 | + | |
760 | 763 | | |
761 | 764 | | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
762 | 771 | | |
763 | 772 | | |
764 | 773 | | |
| |||
846 | 855 | | |
847 | 856 | | |
848 | 857 | | |
849 | | - | |
850 | 858 | | |
851 | 859 | | |
852 | 860 | | |
| |||
857 | 865 | | |
858 | 866 | | |
859 | 867 | | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
860 | 881 | | |
861 | 882 | | |
862 | 883 | | |
| |||
1443 | 1464 | | |
1444 | 1465 | | |
1445 | 1466 | | |
1446 | | - | |
1447 | | - | |
| 1467 | + | |
| 1468 | + | |
1448 | 1469 | | |
1449 | | - | |
1450 | | - | |
1451 | | - | |
1452 | | - | |
1453 | | - | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
| 1476 | + | |
| 1477 | + | |
| 1478 | + | |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
1454 | 1487 | | |
1455 | 1488 | | |
1456 | 1489 | | |
| |||
1476 | 1509 | | |
1477 | 1510 | | |
1478 | 1511 | | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
1479 | 1515 | | |
1480 | 1516 | | |
1481 | 1517 | | |
| |||
1489 | 1525 | | |
1490 | 1526 | | |
1491 | 1527 | | |
1492 | | - | |
1493 | | - | |
| 1528 | + | |
| 1529 | + | |
| 1530 | + | |
| 1531 | + | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
1494 | 1535 | | |
1495 | 1536 | | |
1496 | 1537 | | |
| |||
2235 | 2276 | | |
2236 | 2277 | | |
2237 | 2278 | | |
2238 | | - | |
| 2279 | + | |
2239 | 2280 | | |
2240 | 2281 | | |
2241 | | - | |
| 2282 | + | |
2242 | 2283 | | |
2243 | 2284 | | |
2244 | 2285 | | |
| |||
2404 | 2445 | | |
2405 | 2446 | | |
2406 | 2447 | | |
2407 | | - | |
| 2448 | + | |
| 2449 | + | |
| 2450 | + | |
| 2451 | + | |
| 2452 | + | |
| 2453 | + | |
| 2454 | + | |
| 2455 | + | |
2408 | 2456 | | |
2409 | 2457 | | |
2410 | 2458 | | |
| |||
2929 | 2977 | | |
2930 | 2978 | | |
2931 | 2979 | | |
| 2980 | + | |
| 2981 | + | |
| 2982 | + | |
| 2983 | + | |
| 2984 | + | |
| 2985 | + | |
| 2986 | + | |
| 2987 | + | |
| 2988 | + | |
| 2989 | + | |
| 2990 | + | |
| 2991 | + | |
| 2992 | + | |
| 2993 | + | |
| 2994 | + | |
| 2995 | + | |
| 2996 | + | |
| 2997 | + | |
| 2998 | + | |
| 2999 | + | |
| 3000 | + | |
| 3001 | + | |
2932 | 3002 | | |
2933 | 3003 | | |
2934 | 3004 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
58 | 59 | | |
59 | 60 | | |
60 | 61 | | |
| |||
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
| 96 | + | |
95 | 97 | | |
96 | 98 | | |
| 99 | + | |
| 100 | + | |
97 | 101 | | |
98 | 102 | | |
99 | | - | |
100 | 103 | | |
101 | 104 | | |
102 | 105 | | |
| |||
0 commit comments