-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Upstream model generated file and line linkification #1803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
a3c7e9e
cd34558
a25735f
490a486
25ca856
e7f3c3e
94547bc
9059f78
278f698
e4869a3
6fd00ab
62d47ce
d9d8f9d
a26acc2
e1735b5
1aca6d7
276c58a
7cc79d6
b45ccc1
7171dd5
e8528b8
fcd35be
7168dcd
4165d52
5784ec6
92f04c8
f1c6079
461a961
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,260 @@ | ||
| /*--------------------------------------------------------------------------------------------- | ||
| * Copyright (c) Microsoft Corporation. All rights reserved. | ||
| * Licensed under the MIT License. See License.txt in the project root for license information. | ||
| *--------------------------------------------------------------------------------------------*/ | ||
|
|
||
| import { IFileSystemService } from '../../../platform/filesystem/common/fileSystemService'; | ||
| import { FileType } from '../../../platform/filesystem/common/fileTypes'; | ||
| import { getWorkspaceFileDisplayPath, IWorkspaceService } from '../../../platform/workspace/common/workspaceService'; | ||
| import { CancellationToken } from '../../../util/vs/base/common/cancellation'; | ||
| import { normalizePath as normalizeUriPath } from '../../../util/vs/base/common/resources'; | ||
| import { Location, Position, Range, Uri } from '../../../vscodeTypes'; | ||
| import { coalesceParts, LinkifiedPart, LinkifiedText, LinkifyLocationAnchor } from './linkifiedText'; | ||
| import { IContributedLinkifier, LinkifierContext } from './linkifyService'; | ||
|
|
||
| // Matches markdown links where the text is a path and optional #L anchor is present | ||
| // Example: [src/file.ts](src/file.ts#L10-12) or [src/file.ts](src/file.ts) | ||
| const modelLinkRe = /\[(?<text>[^\]\n]+)\]\((?<target>[^\s)]+)\)/gu; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make sure this can support the case where the path text is code too as this is fairly common and models have likely been trained on a lot of it: Especially true if we get rid of the old linkifier
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, works. Added unit test to validate this. |
||
|
|
||
| export class ModelFilePathLinkifier implements IContributedLinkifier { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's break up the existing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
| constructor( | ||
| @IFileSystemService private readonly fileSystem: IFileSystemService, | ||
| @IWorkspaceService private readonly workspaceService: IWorkspaceService, | ||
| ) { } | ||
|
|
||
| async linkify(text: string, context: LinkifierContext, token: CancellationToken): Promise<LinkifiedText | undefined> { | ||
| let lastIndex = 0; | ||
| const parts: Array<LinkifiedPart | Promise<LinkifiedPart>> = []; | ||
|
|
||
| for (const match of text.matchAll(modelLinkRe)) { | ||
| const original = match[0]; | ||
| const prefix = text.slice(lastIndex, match.index); | ||
| if (prefix) { | ||
| parts.push(prefix); | ||
| } | ||
| lastIndex = match.index + original.length; | ||
|
|
||
| const parsed = this.parseModelLinkMatch(match); | ||
| if (!parsed) { | ||
| parts.push(original); | ||
| continue; | ||
| } | ||
|
|
||
| const workspaceFolders = this.workspaceService.getWorkspaceFolders(); | ||
| if (!this.canLinkify(parsed, workspaceFolders)) { | ||
| parts.push(original); | ||
| continue; | ||
| } | ||
|
|
||
| // Push promise to resolve in parallel with other matches | ||
| parts.push(this.resolveTarget(parsed.targetPath, workspaceFolders, parsed.preserveDirectorySlash, token).then(resolved => { | ||
| if (!resolved) { | ||
| return original; | ||
| } | ||
|
|
||
| const basePath = getWorkspaceFileDisplayPath(this.workspaceService, resolved); | ||
| const anchorRange = this.parseAnchor(parsed.anchor); | ||
| if (parsed.anchor && !anchorRange) { | ||
| return original; | ||
| } | ||
|
|
||
| if (anchorRange) { | ||
| const { range, startLine, endLine } = anchorRange; | ||
| const displayPath = endLine && startLine !== endLine | ||
| ? `${basePath}#L${startLine}-L${endLine}` | ||
| : `${basePath}#L${startLine}`; | ||
| return new LinkifyLocationAnchor(new Location(resolved, range), displayPath); | ||
| } | ||
|
|
||
| return new LinkifyLocationAnchor(resolved, basePath); | ||
| })); | ||
| } | ||
|
|
||
| const suffix = text.slice(lastIndex); | ||
| if (suffix) { | ||
| parts.push(suffix); | ||
| } | ||
|
|
||
| if (!parts.length) { | ||
| return undefined; | ||
| } | ||
|
|
||
| return { parts: coalesceParts(await Promise.all(parts)) }; | ||
| } | ||
|
|
||
| private parseModelLinkMatch(match: RegExpMatchArray): { readonly text: string; readonly targetPath: string; readonly anchor: string | undefined; readonly preserveDirectorySlash: boolean } | undefined { | ||
| const rawText = match.groups?.['text']; | ||
| const rawTarget = match.groups?.['target']; | ||
| if (!rawText || !rawTarget) { | ||
| return undefined; | ||
| } | ||
|
|
||
| const hashIndex = rawTarget.indexOf('#'); | ||
| const baseTarget = hashIndex === -1 ? rawTarget : rawTarget.slice(0, hashIndex); | ||
| const anchor = hashIndex === -1 ? undefined : rawTarget.slice(hashIndex + 1); | ||
|
|
||
| let decodedBase = baseTarget; | ||
| try { | ||
| decodedBase = decodeURIComponent(baseTarget); | ||
| } catch { | ||
| // noop | ||
| } | ||
|
|
||
| const preserveDirectorySlash = decodedBase.endsWith('/') && decodedBase.length > 1; | ||
| const normalizedTarget = this.normalizeSlashes(decodedBase); | ||
| const normalizedText = this.normalizeLinkText(rawText); | ||
| return { text: normalizedText, targetPath: normalizedTarget, anchor, preserveDirectorySlash }; | ||
| } | ||
|
|
||
| private normalizeSlashes(value: string): string { | ||
| // Collapse one or more backslashes into a single forward slash so mixed separators normalize consistently. | ||
| return value.replace(/\\+/g, '/'); | ||
| } | ||
|
|
||
| private normalizeLinkText(rawText: string): string { | ||
| let text = this.normalizeSlashes(rawText); | ||
| // Remove a leading or trailing backtick that sometimes wraps the visible link label. | ||
| text = text.replace(/^`|`$/g, ''); | ||
|
|
||
| // Look for a trailing #L anchor segment so it can be stripped before we compare names. | ||
| const anchorMatch = /^(.+?)(#L\d+(?:-\d+)?)$/.exec(text); | ||
| return anchorMatch ? anchorMatch[1] : text; | ||
| } | ||
|
|
||
| private canLinkify(parsed: { readonly text: string; readonly targetPath: string; readonly anchor: string | undefined }, workspaceFolders: readonly Uri[]): boolean { | ||
| const { text, targetPath, anchor } = parsed; | ||
| const textMatchesBase = targetPath === text; | ||
| const textIsFilename = !text.includes('/') && targetPath.endsWith(`/${text}`); | ||
| const descriptiveWithAnchor = !!anchor; // Allow any descriptive text when anchor is present | ||
|
|
||
| return Boolean(workspaceFolders.length) && (textMatchesBase || textIsFilename || descriptiveWithAnchor); | ||
| } | ||
|
|
||
| private async resolveTarget(targetPath: string, workspaceFolders: readonly Uri[], preserveDirectorySlash: boolean, token: CancellationToken): Promise<Uri | undefined> { | ||
| if (!workspaceFolders.length) { | ||
| return undefined; | ||
| } | ||
|
|
||
| if (token.isCancellationRequested) { | ||
| return undefined; | ||
| } | ||
|
|
||
| if (this.isAbsolutePath(targetPath)) { | ||
| // Choose URI construction strategy based on workspace folder schemes. | ||
| // For local (file:) workspaces we keep using Uri.file; for remote schemes we attempt | ||
| // to project the absolute path into the remote scheme preserving the folder URI's authority. | ||
| const normalizedAbs = targetPath.replace(/\\/g, '/'); | ||
|
|
||
| for (const folderUri of workspaceFolders) { | ||
| if (token.isCancellationRequested) { | ||
| return undefined; | ||
| } | ||
| if (folderUri.scheme === 'file') { | ||
| const absoluteFileUri = this.tryCreateFileUri(targetPath); | ||
| if (!absoluteFileUri) { | ||
| continue; | ||
| } | ||
| if (this.isEqualOrParentFs(absoluteFileUri, folderUri)) { | ||
| const stat = await this.tryStat(absoluteFileUri, preserveDirectorySlash, token); | ||
| if (stat) { | ||
| return stat; | ||
| } | ||
| } | ||
| continue; | ||
| } | ||
|
|
||
| // Remote / virtual workspace: attempt to map the absolute path into the same scheme. | ||
| // Only consider it if the folder path is a prefix of the absolute path to avoid | ||
| // generating unrelated URIs. | ||
| const folderPath = folderUri.path.replace(/\\/g, '/'); | ||
| const prefix = folderPath.endsWith('/') ? folderPath : folderPath + '/'; | ||
| if (normalizedAbs.startsWith(prefix)) { | ||
| const candidate = folderUri.with({ path: normalizedAbs }); | ||
| const stat = await this.tryStat(candidate, preserveDirectorySlash, token); | ||
| if (stat) { | ||
| return stat; | ||
| } | ||
| } | ||
| } | ||
| return undefined; | ||
| } | ||
|
|
||
| const segments = targetPath.split('/').filter(Boolean); | ||
| for (const folderUri of workspaceFolders) { | ||
| const candidate = Uri.joinPath(folderUri, ...segments); | ||
| const stat = await this.tryStat(candidate, preserveDirectorySlash, token); | ||
| if (stat) { | ||
| return stat; | ||
| } | ||
| } | ||
|
|
||
| return undefined; | ||
| } | ||
|
|
||
| private tryCreateFileUri(path: string): Uri | undefined { | ||
| try { | ||
| return Uri.file(path); | ||
| } catch { | ||
| return undefined; | ||
| } | ||
| } | ||
|
|
||
|
|
||
| private isEqualOrParentFs(target: Uri, folder: Uri): boolean { | ||
|
||
| const targetPath = normalizeUriPath(target).path; | ||
| const folderPath = normalizeUriPath(folder).path; | ||
| return targetPath === folderPath || targetPath.startsWith(folderPath.endsWith('/') ? folderPath : `${folderPath}/`); | ||
| } | ||
|
|
||
| private parseAnchor(anchor: string | undefined): { readonly range: Range; readonly startLine: string; readonly endLine: string | undefined } | undefined { | ||
| // Parse supported anchor formats: L123, L123-456, L123-L456 | ||
| if (!anchor) { | ||
| return undefined; | ||
| } | ||
| const match = /^L(\d+)(?:-L?(\d+))?$/.exec(anchor); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd skip the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
| if (!match) { | ||
| return undefined; | ||
| } | ||
|
|
||
| const startLine = match[1]; | ||
| const endLineRaw = match[2]; | ||
| const normalizedEndLine = endLineRaw === startLine ? undefined : endLineRaw; | ||
| const start = parseInt(startLine, 10) - 1; | ||
| const end = parseInt(normalizedEndLine ?? startLine, 10) - 1; | ||
| if (Number.isNaN(start) || Number.isNaN(end) || start < 0 || end < start) { | ||
| return undefined; | ||
| } | ||
|
|
||
| return { | ||
| range: new Range(new Position(start, 0), new Position(end, 0)), | ||
| startLine, | ||
| endLine: normalizedEndLine, | ||
| }; | ||
| } | ||
|
|
||
| private isAbsolutePath(path: string): boolean { | ||
| // Treat drive-letter prefixes (e.g. C:) or leading slashes as absolute paths. | ||
| return /^[a-z]:/i.test(path) || path.startsWith('/'); | ||
| } | ||
|
|
||
| private async tryStat(uri: Uri, preserveDirectorySlash: boolean, token: CancellationToken): Promise<Uri | undefined> { | ||
| if (token.isCancellationRequested) { | ||
| return undefined; | ||
| } | ||
| try { | ||
| const stat = await this.fileSystem.stat(uri); | ||
| if (stat.type === FileType.Directory) { | ||
| if (preserveDirectorySlash) { | ||
| return uri.path.endsWith('/') ? uri : uri.with({ path: `${uri.path}/` }); | ||
| } | ||
| if (uri.path.endsWith('/') && uri.path !== '/') { | ||
| return uri.with({ path: uri.path.slice(0, -1) }); | ||
| } | ||
| return uri; | ||
| } | ||
| return uri; | ||
| } catch { | ||
| return undefined; | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this class at all? Maybe just delete it and call new
ModelFilePathLinkifierFilePathLinkifierinsteadIt seems like the new linkifier should handle everything that the current one does. If there are any gaps we can fix those instead of keeping the fallback around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now ModelFilePathLinkifier handles structured markdown links with anchors texttext, while FilePathLinkifier handles inline code and plain text paths.
They're registered with clear separation of concerns, where FilePathLinkifier acts as a fallback for informal references.
That said, I'm happy to explore merging them if you think that would be better! Let me know your preference and I can make it work either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is that the model will handle creating proper links now. The inline code and plain text path logic was added because previous models were no good at generating links consistently
Worth testing removing the old linkifier to and seeing if the model still generates paths in plaintext / inline code. If it does, maybe we need to tweak the instructions a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still interested to see if we can avoid having two linkifier paths and just prefer using the model