mdq aims to do for Markdown what jq does for JSON: provide an easy way to zero in on specific parts of a document.
For example, GitHub PRs are Markdown documents, and some organizations have specific templates with checklists for all reviewers to complete. Enforcing these often requires ugly regexes that are a pain to write and worse to debug. Instead, you can (for example) ask mdq for all uncompleted tasks:
mdq '- [ ]'
mdq is available under the Apache 2.0 or MIT licenses, at your option. I am open to other permissive licenses, if you have one you prefer.
Any of these will work:
-
cargo install --git https://github.com/yshavit/mdq
- Download binaries from the latest release (or any other release, of course).
- You can also grab the binaries from the latest build-release workflow run. You must be logged into GitHub to do
that (this is GitHub's limitation, not mine). You'll have to
chmod +x
them before you can run them.
Security concerns
The release and latest-workflow binaries are built on GitHub's servers, so if you trust my code (and dependencies), and you trust GitHub, you can trust the binaries. See https://github.com/yshavit/mdq/wiki/Release-binaries for information on how to verify them.Simple example to select sections containing "usage":
cat example.md | mdq '# usage'
Use pipe (|
) to chain filters together. For example, to select sections containing "usage", and within those find
all unordered list items:
cat example.md | mdq '# usage | -'
The filter syntax is designed to mirror Markdown syntax. You can select...
Element | Syntax |
---|---|
Sections | # title text |
Lists | - unordered list item text |
" | 1. ordered list item text |
" | - [ ] uncompleted task |
" | - [x] completed task |
" | - [?] any task |
Links | [display text](url) |
Images |  |
Block quotes | > block quote text |
Code blocks | ```language <code block text> |
Raw HTML | </> html_tag |
Plain paragraphs | P: paragraph text |
Tables | :-: header text :-: row text |
(Tables selection differs from other selections in that you can actually select only certain headers and rows, such that the resulting element is of a different shape than the original. See the example below, or the wiki for more detail.)
In any of the above, the text may be:
- an
unquoted string
that starts with a letter; this is case-insensitive - a
"quoted string"
(either single or double quotes); this is case-sensitive - a string (quoted or unquoted) anchored by
^
or$
(for start and end of string, respectively) - a
/regex/
- omitted or
*
, to mean "any"
See the tutorial for a bit more detail, and user manual for the full picture.
Many projects have bug report templates that ask the submitter to attest that they've checked existing issues for possible duplicates. In mdq, you can do:
if echo "$ISSUE_TEXT" | mdq -q '- [x] I have searched for existing issues' ; then
...
(The -q
option is like grep's: it doesn't output anything to stdout, but exits 0 if any items were found, or non-0 otherwise.)
This will match:
- I have searched for existing issues
... but will fail if the checkbox is unchecked:
- I have searched for existing issues
Some organizations use GitHub Actions to update their ticket tracker, if a PR mentions a ticket. You can use mdq to extract the link from Markdown as JSON, and then use jq to get the URL:
TICKET_URL="$(echo "$PR_TEXT"
| mdq --output json '# Ticket | [](^https://tickets.example.com/[A-Z]+-\d+$)'
| jq -r '.items[].link.url')"
This will match Markdown like:
Let's say you have a table whose columns reference people in an on-call schedule, rows correspond to weeks in YYYY-MM-DD
format:
On-Call Alice Bob Sam Pat 2024-01-08 x 2024-01-15 x 2024-01-22 x
To find out when Alice is on call:
cat oncall.md | mdq ':-: /On-Call|Alice/:-: *'
| On-Call | Alice |
|:----------:|:-----:|
| 2024-01-08 | x |
| 2024-01-15 | |
| 2024-01-22 | |
Or, to find out who's on call for the week of Jan 15:
cat oncall.md | mdq ':-: * :-: 2024-01-15'
| On-Call | Alice | Bob | Sam | Pat |
|:----------:|:-----:|:---:|:---:|----:|
| 2024-01-15 | | | x | |