Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
/ mdq Public

like jq but for Markdown: find specific elements in a md doc

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

yshavit/mdq

Repository files navigation

mdq: jq for Markdown

Code Coverage Build status Pending TODOs Ignored tests

What is mdq?

mdq aims to do for Markdown what jq does for JSON: provide an easy way to zero in on specific parts of a document.

For example, GitHub PRs are Markdown documents, and some organizations have specific templates with checklists for all reviewers to complete. Enforcing these often requires ugly regexes that are a pain to write and worse to debug. Instead, you can (for example) ask mdq for all uncompleted tasks:

mdq '- [ ]'

mdq is available under the Apache 2.0 or MIT licenses, at your option. I am open to other permissive licenses, if you have one you prefer.

Installation

Any of these will work:

  1. cargo install --git https://github.com/yshavit/mdq
  2. Download binaries from the latest release (or any other release, of course).
  3. You can also grab the binaries from the latest build-release workflow run. You must be logged into GitHub to do that (this is GitHub's limitation, not mine). You'll have to chmod +x them before you can run them.
Security concerns The release and latest-workflow binaries are built on GitHub's servers, so if you trust my code (and dependencies), and you trust GitHub, you can trust the binaries. See https://github.com/yshavit/mdq/wiki/Release-binaries for information on how to verify them.

Basic Usage

Simple example to select sections containing "usage":

cat example.md | mdq '# usage'

Use pipe (|) to chain filters together. For example, to select sections containing "usage", and within those find all unordered list items:

cat example.md | mdq '# usage | -'

The filter syntax is designed to mirror Markdown syntax. You can select...

Element Syntax
Sections # title text
Lists - unordered list item text
" 1. ordered list item text
" - [ ] uncompleted task
" - [x] completed task
" - [?] any task
Links [display text](url)
Images ![alt text](url)
Block quotes > block quote text
Code blocks ```language <code block text>
Raw HTML </> html_tag
Plain paragraphs P: paragraph text
Tables :-: header text :-: row text

(Tables selection differs from other selections in that you can actually select only certain headers and rows, such that the resulting element is of a different shape than the original. See the example below, or the wiki for more detail.)

In any of the above, the text may be:

  • an unquoted string that starts with a letter; this is case-insensitive
  • a "quoted string" (either single or double quotes); this is case-sensitive
  • a string (quoted or unquoted) anchored by ^ or $ (for start and end of string, respectively)
  • a /regex/
  • omitted or *, to mean "any"

See the tutorial for a bit more detail, and user manual for the full picture.

Examples

Ensuring that people have searched existing issues before submitting a bug report

Many projects have bug report templates that ask the submitter to attest that they've checked existing issues for possible duplicates. In mdq, you can do:

if echo "$ISSUE_TEXT" | mdq -q '- [x] I have searched for existing issues' ; then
  ...

(The -q option is like grep's: it doesn't output anything to stdout, but exits 0 if any items were found, or non-0 otherwise.)

This will match:

  • I have searched for existing issues

... but will fail if the checkbox is unchecked:

  • I have searched for existing issues

Extracting a referenced ticket

Some organizations use GitHub Actions to update their ticket tracker, if a PR mentions a ticket. You can use mdq to extract the link from Markdown as JSON, and then use jq to get the URL:

TICKET_URL="$(echo "$PR_TEXT"
  | mdq --output json '# Ticket | [](^https://tickets.example.com/[A-Z]+-\d+$)'
  | jq -r '.items[].link.url')"

This will match Markdown like:

Ticket

https://tickets.example.com/PROJ-1234

Whittling down a big table

Let's say you have a table whose columns reference people in an on-call schedule, rows correspond to weeks in YYYY-MM-DD format:

On-Call Alice Bob Sam Pat
2024-01-08 x
2024-01-15 x
2024-01-22 x

To find out when Alice is on call:

cat oncall.md | mdq ':-: /On-Call|Alice/:-: *'
|  On-Call   | Alice |
|:----------:|:-----:|
| 2024-01-08 |   x   |
| 2024-01-15 |       |
| 2024-01-22 |       |

Or, to find out who's on call for the week of Jan 15:

cat oncall.md | mdq ':-: * :-: 2024-01-15'
|  On-Call   | Alice | Bob | Sam | Pat |
|:----------:|:-----:|:---:|:---:|----:|
| 2024-01-15 |       |     |  x  |     |