Skip to content

Add source_url to MCP get_document_by_url output#3476

Open
rjernst wants to merge 1 commit into
elastic:mainfrom
rjernst:feature/mcp-source-url
Open

Add source_url to MCP get_document_by_url output#3476
rjernst wants to merge 1 commit into
elastic:mainfrom
rjernst:feature/mcp-source-url

Conversation

@rjernst
Copy link
Copy Markdown
Member

@rjernst rjernst commented Jun 5, 2026

The MCP get_document_by_url tool now returns a source_url field containing
the GitHub blob URL for the source file of each documentation page. The URL
is computed at indexing time using the same git context that powers the
existing "edit this page" link in the frontend, and stored as a keyword
field in the Elasticsearch index. API reference docs (sourced from the
temporary OpenAPI exporter) will have a null source_url until the planned
Elastic.ApiExplorer pipeline is in place. Tests cover the mapping path with
a stub gateway and validate the URL format against a live index.

The MCP get_document_by_url tool now returns a source_url field containing
the GitHub blob URL for the source file of each documentation page. The URL
is computed at indexing time using the same git context that powers the
existing "edit this page" link in the frontend, and stored as a keyword
field in the Elasticsearch index. API reference docs (sourced from the
temporary OpenAPI exporter) will have a null source_url until the planned
Elastic.ApiExplorer pipeline is in place. Tests cover the mapping path with
a stub gateway and validate the URL format against a live index.
@rjernst rjernst requested a review from a team as a code owner June 5, 2026 21:08
@rjernst rjernst requested a review from cotti June 5, 2026 21:08
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Label error. Requires exactly 1 of: automation, breaking, bug, changelog:skip, chore, ci, dependencies, documentation, enhancement, feature, fix, redesign. Found:

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Worried about impact? Review this PR in Change Stack to explore blast radius before you approve or request changes.

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a SourceUrl field that carries GitHub blob URLs from markdown export through to API responses. The change extends DocumentationDocument (search schema), DocumentResult (gateway contract), and DocumentResponse (API response) with an optional SourceUrl property. The markdown exporter now computes GitHub URLs from repository metadata and file paths, storing them in the exported document. The API gateway retrieves the field from Elasticsearch and maps it through DocumentResult to the response, where DocumentTools includes it in the final DocumentResponse. Unit tests verify the mapping behavior; integration tests validate the GitHub URL format.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a source_url field to the MCP get_document_by_url tool output.
Description check ✅ Passed The description is directly related to the changeset, explaining the purpose, implementation approach, and test coverage of the source_url feature.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feature/mcp-source-url
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/api/Elastic.Documentation.Mcp.Remote/Responses/McpResponses.cs`:
- Line 113: The SourceUrl property in McpResponses.cs currently serializes to
"sourceUrl" but the MCP contract requires "source_url"; update the SourceUrl
property to explicitly specify the JSON name (e.g., add
[JsonPropertyName("source_url")] above the public string? SourceUrl { get; init;
}) so the serializer emits "source_url" (ensure the System.Text.Json attribute
is imported or use [JsonProperty("source_url")] if the project uses
Newtonsoft.Json).

In `@tests/Mcp.Remote.Tests/DocumentToolsTests.cs`:
- Around line 31-33: The tests currently deserialize JSON into DocumentResponse
(via JsonSerializer.Deserialize with McpJsonContext.Default.DocumentResponse)
and assert response.SourceUrl, which won't detect if the emitted JSON property
name changed; update the tests to also assert the raw JSON string contains the
expected key "source_url" (or that "source_url" is absent when expected null)
before/after deserialization so a field-name contract regression is
caught—locate the assertions around response/DocumentResponse (lines with
JsonSerializer.Deserialize and response!.SourceUrl) and add a string-based
assertion against the raw json variable for the "source_url" property (and
mirror the same change for the other test at the noted lines 50-52).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3820dcc8-d841-4e4b-9dc1-5af3f1b4a013

📥 Commits

Reviewing files that changed from the base of the PR and between b86c5ee and 73a492f.

📒 Files selected for processing (8)
  • src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.Export.cs
  • src/api/Elastic.Documentation.Mcp.Remote/Gateways/DocumentGateway.cs
  • src/api/Elastic.Documentation.Mcp.Remote/Gateways/IDocumentGateway.cs
  • src/api/Elastic.Documentation.Mcp.Remote/Responses/McpResponses.cs
  • src/api/Elastic.Documentation.Mcp.Remote/Tools/DocumentTools.cs
  • src/services/search/Elastic.Documentation.Search.Contract/DocumentationDocument.cs
  • tests-integration/Mcp.Remote.IntegrationTests/DocumentToolsIntegrationTests.cs
  • tests/Mcp.Remote.Tests/DocumentToolsTests.cs

public string[]? AiQuestions { get; init; }
public string[]? AiUseCases { get; init; }
public DateTimeOffset? LastUpdated { get; init; }
public string? SourceUrl { get; init; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SourceUrl currently serializes as sourceUrl, not source_url (contract mismatch).

Line 113 needs an explicit JSON name override if the MCP contract is source_url as stated in the PR objective. With current options, this will emit sourceUrl and can break clients expecting source_url.

Proposed fix
 public sealed record DocumentResponse
 {
@@
+	[JsonPropertyName("source_url")]
 	public string? SourceUrl { get; init; }
@@
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public string? SourceUrl { get; init; }
[JsonPropertyName("source_url")]
public string? SourceUrl { get; init; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/Elastic.Documentation.Mcp.Remote/Responses/McpResponses.cs` at line
113, The SourceUrl property in McpResponses.cs currently serializes to
"sourceUrl" but the MCP contract requires "source_url"; update the SourceUrl
property to explicitly specify the JSON name (e.g., add
[JsonPropertyName("source_url")] above the public string? SourceUrl { get; init;
}) so the serializer emits "source_url" (ensure the System.Text.Json attribute
is imported or use [JsonProperty("source_url")] if the project uses
Newtonsoft.Json).

Comment on lines +31 to +33
var response = JsonSerializer.Deserialize(json, McpJsonContext.Default.DocumentResponse);
response.Should().NotBeNull();
response!.SourceUrl.Should().Be(expectedSourceUrl);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

These tests won’t catch a JSON field-name contract regression.

Because both assertions deserialize into DocumentResponse, they validate value mapping but not the emitted key name. Add a raw JSON assertion for "source_url" (or explicit absence when null) so contract breaks are caught.

Also applies to: 50-52

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/Mcp.Remote.Tests/DocumentToolsTests.cs` around lines 31 - 33, The tests
currently deserialize JSON into DocumentResponse (via JsonSerializer.Deserialize
with McpJsonContext.Default.DocumentResponse) and assert response.SourceUrl,
which won't detect if the emitted JSON property name changed; update the tests
to also assert the raw JSON string contains the expected key "source_url" (or
that "source_url" is absent when expected null) before/after deserialization so
a field-name contract regression is caught—locate the assertions around
response/DocumentResponse (lines with JsonSerializer.Deserialize and
response!.SourceUrl) and add a string-based assertion against the raw json
variable for the "source_url" property (and mirror the same change for the other
test at the noted lines 50-52).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant