Validation of absolute links relative to docs root

This commit is contained in:
Oleh Prypin
2023-11-21 23:19:02 +01:00
parent 9e443d2120
commit a42ab62311
5 changed files with 178 additions and 45 deletions

View File

@@ -382,7 +382,9 @@ Configure the strictness of MkDocs' diagnostic messages when validating links to
This is a tree of configs, and for each one the value can be one of the three: `warn`, `info`, `ignore`. Which cause a logging message of the corresponding severity to be produced. The `warn` level is, of course, intended for use with `mkdocs build --strict` (where it becomes an error), which you can employ in continuous testing.
> EXAMPLE: **Defaults of this config as of MkDocs 1.6:**
The config `validation.links.absolute_links` additionally has a special value `relative_to_docs`, for [validation of absolute links](#validation-of-absolute-links).
>? EXAMPLE: **Defaults of this config as of MkDocs 1.6:**
>
> ```yaml
> validation:
@@ -425,24 +427,46 @@ Note how in the above examples we omitted the 'nav' and 'links' keys. Here `abso
Full list of values and examples of log messages that they can hide or make more prominent:
* `validation.nav.omitted_files`
* "The following pages exist in the docs directory, but are not included in the "nav" configuration: ..."
* > The following pages exist in the docs directory, but are not included in the "nav" configuration: ...
* `validation.nav.not_found`
* "A relative path to 'foo/bar.md' is included in the 'nav' configuration, which is not found in the documentation files."
* "A reference to 'foo/bar.md' is included in the 'nav' configuration, but this file is excluded from the built site."
* > A relative path to 'foo/bar.md' is included in the 'nav' configuration, which is not found in the documentation files.
* > A reference to 'foo/bar.md' is included in the 'nav' configuration, but this file is excluded from the built site.
* `validation.nav.absolute_links`
* "An absolute path to '/foo/bar.html' is included in the 'nav' configuration, which presumably points to an external resource."
* > An absolute path to '/foo/bar.html' is included in the 'nav' configuration, which presumably points to an external resource.
<!-- -->
* `validation.links.not_found`
* "Doc file 'example.md' contains a link '../foo/bar.md', but the target is not found among documentation files."
* "Doc file 'example.md' contains a link to 'foo/bar.md' which is excluded from the built site."
* > Doc file 'example.md' contains a link '../foo/bar.md', but the target is not found among documentation files.
* > Doc file 'example.md' contains a link to 'foo/bar.md' which is excluded from the built site.
* `validation.links.anchors`
* "Doc file 'example.md' contains a link '../foo/bar.md#some-heading', but the doc 'foo/bar.md' does not contain an anchor '#some-heading'."
* "Doc file 'example.md' contains a link '#some-heading', but there is no such anchor on this page."
* > Doc file 'example.md' contains a link '../foo/bar.md#some-heading', but the doc 'foo/bar.md' does not contain an anchor '#some-heading'.
* > Doc file 'example.md' contains a link '#some-heading', but there is no such anchor on this page.
* `validation.links.absolute_links`
* "Doc file 'example.md' contains an absolute link '/foo/bar.html', it was left as is. Did you mean 'foo/bar.md'?"
* > Doc file 'example.md' contains an absolute link '/foo/bar.html', it was left as is. Did you mean 'foo/bar.md'?
* `validation.links.unrecognized_links`
* "Doc file 'example.md' contains an unrecognized relative link '../foo/bar/', it was left as is. Did you mean 'foo/bar.md'?"
* "Doc file 'example.md' contains an unrecognized relative link 'mail\@example.com', it was left as is. Did you mean 'mailto:mail\@example.com'?"
* > Doc file 'example.md' contains an unrecognized relative link '../foo/bar/', it was left as is. Did you mean 'foo/bar.md'?
* > Doc file 'example.md' contains an unrecognized relative link 'mail\@example.com', it was left as is. Did you mean 'mailto:mail\@example.com'?
#### Validation of absolute links
NEW: **New in version 1.6.**
> Historically, within Markdown, MkDocs only recognized **relative** links that lead to another physical `*.md` document (or media file). This is a good convention to follow because then the source pages are also freely browsable without MkDocs, for example on GitHub. Whereas absolute links were left unmodified (making them often not work as expected) or, more recently, warned against. If you dislike having to always use relative links, now you can opt into absolute links and have them work correctly.
If you set the setting `validation.links.absolute_links` to the new value `relative_to_docs`, all Markdown links starting with `/` will be understood as being relative to the `docs_dir` root. The links will then be validated for correctness according to all the other rules that were already working for relative links in prior versions of MkDocs. For the HTML output, these links will still be turned relative so that the site still works reliably.
So, now any document (e.g. "dir1/foo.md") can link to the document "dir2/bar.md" as `[link](/dir2/bar.md)`, in addition to the previously only correct way `[link](../dir2/bar.md)`.
You have to enable the setting, though. The default is still to just skip the link.
> EXAMPLE: **Settings to recognize absolute links and validate them:**
>
> ```yaml
> validation:
> links:
> absolute_links: relative_to_docs
> anchors: warn
> unrecognized_links: warn
> ```
## Build directories

View File

@@ -1227,19 +1227,3 @@ class PathSpec(BaseConfigOption[pathspec.gitignore.GitIgnoreSpec]):
return pathspec.gitignore.GitIgnoreSpec.from_lines(lines=value.splitlines())
except ValueError as e:
raise ValidationError(str(e))
class _LogLevel(OptionallyRequired[int]):
levels: Mapping[str, int] = {
"warn": logging.WARNING,
"info": logging.INFO,
"ignore": logging.DEBUG,
}
def run_validation(self, value: object) -> int:
if not isinstance(value, str):
raise ValidationError(f'Expected a string, but a {type(value)} was given.')
try:
return self.levels[value]
except KeyError:
raise ValidationError(f'Expected one of {list(self.levels)}, got {value!r}')

View File

@@ -1,13 +1,35 @@
from __future__ import annotations
from typing import IO, TYPE_CHECKING, Dict
import logging
from typing import IO, Dict, Mapping
from mkdocs.config import base
from mkdocs.config import config_options as c
from mkdocs.structure.pages import Page, _AbsoluteLinksValidationValue
from mkdocs.utils.yaml import get_yaml_loader, yaml_load
if TYPE_CHECKING:
import mkdocs.structure.pages
class _LogLevel(c.OptionallyRequired[int]):
levels: Mapping[str, int] = {
"warn": logging.WARNING,
"info": logging.INFO,
"ignore": logging.DEBUG,
}
def run_validation(self, value: object) -> int:
if not isinstance(value, str):
raise base.ValidationError(f"Expected a string, but a {type(value)} was given.")
try:
return self.levels[value]
except KeyError:
raise base.ValidationError(f"Expected one of {list(self.levels)}, got {value!r}")
class _AbsoluteLinksValidation(_LogLevel):
levels: Mapping[str, int] = {
**_LogLevel.levels,
"relative_to_docs": _AbsoluteLinksValidationValue.RELATIVE_TO_DOCS,
}
# NOTE: The order here is important. During validation some config options
@@ -146,37 +168,37 @@ class MkDocsConfig(base.Config):
class Validation(base.Config):
class NavValidation(base.Config):
omitted_files = c._LogLevel(default='info')
omitted_files = _LogLevel(default='info')
"""Warning level for when a doc file is never mentioned in the navigation.
For granular configuration, see `not_in_nav`."""
not_found = c._LogLevel(default='warn')
not_found = _LogLevel(default='warn')
"""Warning level for when the navigation links to a relative path that isn't an existing page on the site."""
absolute_links = c._LogLevel(default='info')
absolute_links = _LogLevel(default='info')
"""Warning level for when the navigation links to an absolute path (starting with `/`)."""
nav = c.SubConfig(NavValidation)
class LinksValidation(base.Config):
not_found = c._LogLevel(default='warn')
not_found = _LogLevel(default='warn')
"""Warning level for when a Markdown doc links to a relative path that isn't an existing document on the site."""
absolute_links = c._LogLevel(default='info')
absolute_links = _AbsoluteLinksValidation(default='info')
"""Warning level for when a Markdown doc links to an absolute path (starting with `/`)."""
unrecognized_links = c._LogLevel(default='info')
unrecognized_links = _LogLevel(default='info')
"""Warning level for when a Markdown doc links to a relative path that doesn't look like
it could be a valid internal link. For example, if the link ends with `/`."""
anchors = c._LogLevel(default='info')
anchors = _LogLevel(default='info')
"""Warning level for when a Markdown doc links to an anchor that's not present on the target page."""
links = c.SubConfig(LinksValidation)
validation = c.PropagatingSubConfig[Validation]()
_current_page: mkdocs.structure.pages.Page | None = None
_current_page: Page | None = None
"""The currently rendered page. Please do not access this and instead
rely on the `page` argument to event handlers."""

View File

@@ -1,6 +1,7 @@
from __future__ import annotations
import copy
import enum
import logging
import posixpath
import warnings
@@ -358,7 +359,7 @@ class _RelativePathTreeprocessor(markdown.treeprocessors.Treeprocessor):
@classmethod
def _possible_target_uris(
cls, file: File, path: str, use_directory_urls: bool
cls, file: File, path: str, use_directory_urls: bool, suggest_absolute: bool = False
) -> Iterator[str]:
"""First yields the resolved file uri for the link, then proceeds to yield guesses for possible mistakes."""
target_uri = cls._target_uri(file.src_uri, path)
@@ -397,14 +398,17 @@ class _RelativePathTreeprocessor(markdown.treeprocessors.Treeprocessor):
def path_to_url(self, url: str) -> str:
scheme, netloc, path, query, anchor = urlsplit(url)
absolute_link = None
warning_level, warning = 0, ''
# Ignore URLs unless they are a relative link to a source file.
if scheme or netloc: # External link.
return url
elif url.startswith(('/', '\\')): # Absolute link.
warning_level = self.config.validation.links.absolute_links
warning = f"Doc file '{self.file.src_uri}' contains an absolute link '{url}', it was left as is."
absolute_link = self.config.validation.links.absolute_links
if absolute_link is not _AbsoluteLinksValidationValue.RELATIVE_TO_DOCS:
warning_level = absolute_link
warning = f"Doc file '{self.file.src_uri}' contains an absolute link '{url}', it was left as is."
elif AMP_SUBSTITUTE in url: # AMP_SUBSTITUTE is used internally by Markdown only for email.
return url
elif not path: # Self-link containing only query or anchor.
@@ -430,7 +434,7 @@ class _RelativePathTreeprocessor(markdown.treeprocessors.Treeprocessor):
if target_file is None and not warning:
# Primary lookup path had no match, definitely produce a warning, just choose which one.
if not posixpath.splitext(path)[-1]:
if not posixpath.splitext(path)[-1] and absolute_link is None:
# No '.' in the last part of a path indicates path does not point to a file.
warning_level = self.config.validation.links.unrecognized_links
warning = (
@@ -438,7 +442,7 @@ class _RelativePathTreeprocessor(markdown.treeprocessors.Treeprocessor):
f"it was left as is."
)
else:
target = f" '{target_uri}'" if target_uri != url else ""
target = f" '{target_uri}'" if target_uri != url.lstrip('/') else ""
warning_level = self.config.validation.links.not_found
warning = (
f"Doc file '{self.file.src_uri}' contains a link '{url}', "
@@ -456,6 +460,8 @@ class _RelativePathTreeprocessor(markdown.treeprocessors.Treeprocessor):
if self.files.get_file_from_path(path) is not None:
if anchor and path == self.file.src_uri:
path = ''
elif absolute_link is _AbsoluteLinksValidationValue.RELATIVE_TO_DOCS:
path = '/' + path
else:
path = utils.get_relative_url(path, self.file.src_uri)
suggest_url = urlunsplit(('', '', path, query, anchor))
@@ -545,3 +551,7 @@ class _ExtractTitleTreeprocessor(markdown.treeprocessors.Treeprocessor):
def _register(self, md: markdown.Markdown) -> None:
self.postprocessors = tuple(md.postprocessors)
md.treeprocessors.register(self, "mkdocs_extract_title", priority=-1) # After the end.
class _AbsoluteLinksValidationValue(enum.IntEnum):
RELATIVE_TO_DOCS = -1

View File

@@ -1003,6 +1003,48 @@ class RelativePathExtensionTests(unittest.TestCase):
'<a href="./#test">link</a>',
)
def test_absolute_self_anchor_link_with_suggestion(self):
self.assertEqual(
self.get_rendered_result(
content='[link](/index#test)',
files=['index.md'],
logs="INFO:Doc file 'index.md' contains an absolute link '/index#test', it was left as is. Did you mean '#test'?",
),
'<a href="/index#test">link</a>',
)
def test_absolute_self_anchor_link_with_validation_and_suggestion(self):
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='relative_to_docs')),
content='[link](/index#test)',
files=['index.md'],
logs="WARNING:Doc file 'index.md' contains a link '/index#test', but the target 'index' is not found among documentation files. Did you mean '#test'?",
),
'<a href="/index#test">link</a>',
)
def test_absolute_anchor_link_with_validation(self):
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='relative_to_docs')),
content='[link](/foo/bar.md#test)',
files=['index.md', 'foo/bar.md'],
),
'<a href="foo/bar/#test">link</a>',
)
def test_absolute_anchor_link_with_validation_and_suggestion(self):
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='relative_to_docs')),
content='[link](/foo/bar#test)',
files=['zoo/index.md', 'foo/bar.md'],
logs="WARNING:Doc file 'zoo/index.md' contains a link '/foo/bar#test', but the target 'foo/bar' is not found among documentation files. Did you mean '/foo/bar.md#test'?",
),
'<a href="/foo/bar#test">link</a>',
)
def test_external_link(self):
self.assertEqual(
self.get_rendered_result(
@@ -1038,7 +1080,58 @@ class RelativePathExtensionTests(unittest.TestCase):
'<a href="/path/to/file">absolute link</a>',
)
def test_absolute_link(self):
def test_absolute_link_with_validation(self):
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='relative_to_docs')),
content='[absolute link](/path/to/file.md)',
files=['index.md', 'path/to/file.md'],
),
'<a href="path/to/file/">absolute link</a>',
)
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='relative_to_docs')),
use_directory_urls=False,
content='[absolute link](/path/to/file.md)',
files=['path/index.md', 'path/to/file.md'],
),
'<a href="to/file.html">absolute link</a>',
)
def test_absolute_link_with_validation_and_suggestion(self):
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='relative_to_docs')),
use_directory_urls=False,
content='[absolute link](/path/to/file/)',
files=['path/index.md', 'path/to/file.md'],
logs="WARNING:Doc file 'path/index.md' contains a link '/path/to/file/', but the target 'path/to/file' is not found among documentation files.",
),
'<a href="/path/to/file/">absolute link</a>',
)
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='relative_to_docs')),
content='[absolute link](/path/to/file)',
files=['path/index.md', 'path/to/file.md'],
logs="WARNING:Doc file 'path/index.md' contains a link '/path/to/file', but the target is not found among documentation files. Did you mean '/path/to/file.md'?",
),
'<a href="/path/to/file">absolute link</a>',
)
def test_absolute_link_with_validation_just_slash(self):
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='relative_to_docs')),
content='[absolute link](/)',
files=['path/to/file.md', 'index.md'],
logs="WARNING:Doc file 'path/to/file.md' contains a link '/', but the target '.' is not found among documentation files. Did you mean '/index.md'?",
),
'<a href="/">absolute link</a>',
)
def test_absolute_link_preserved_and_warned(self):
self.assertEqual(
self.get_rendered_result(
validation=dict(links=dict(absolute_links='warn')),