You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Vector store size impact varies by model**: GPT-4.1 series shows minimal latency impact across vector store sizes, while GPT-5 series shows significant increases.
226
220
@@ -240,10 +234,6 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
|| Extra Large (105 MB) | 0.636 | 0.528 | 0.528 | 0.528 |
259
245
260
246
**Key Insights:**
261
247
262
248
-**Best Performance**: gpt-5-mini consistently achieves the highest ROC AUC scores across all vector store sizes (0.909-0.939)
263
-
-**Best Latency**: gpt-4.1-nano shows the most consistent and lowest latency across all scales (4,171-4,809ms P50) but shows poor performance
249
+
-**Best Latency**: gpt-4.1-mini (default) provides the lowest median latencies while maintaining strong accuracy
264
250
-**Most Stable**: gpt-4.1-mini (default) maintains relatively stable performance across vector store sizes with good accuracy-latency balance
265
251
-**Scale Sensitivity**: gpt-5 shows the most variability in performance across vector store sizes, with performance dropping significantly at larger scales
266
252
-**Performance vs Scale**: Most models show decreasing performance as vector store size increases, with gpt-5-mini being the most resilient
@@ -270,4 +256,4 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
270
256
-**Signal-to-noise ratio degradation**: Larger vector stores contain more irrelevant documents that may not be relevant to the specific factual claims being validated
271
257
-**Semantic search limitations**: File search retrieves semantically similar documents, but with a large diverse knowledge source, these may not always be factually relevant
272
258
-**Document quality matters more than quantity**: The relevance and accuracy of documents is more important than the total number of documents
273
-
-**Performance plateaus**: Beyond a certain size (11 MB), the performance impact becomes less severe
259
+
-**Performance plateaus**: Beyond a certain size (11 MB), the performance impact becomes less severe
0 commit comments