0
Fork 0
mirror of https://github.com/withastro/astro.git synced 2024-12-23 21:53:55 -05:00
astro/packages/integrations/mdx/src
Remco Haszing a31bbd7ff8
fix(markdown): don’t generate mdast html nodes (#10104)
* fix(markdown): don’t generate mdast html nodes

`html` nodes from mdast are converted to `raw` hast nodes. These nodes
are then not processed by proper rehype plugins. Typically if a remark
plugin generates `html` nodes, this indicates it should have actually
been a rehype plugin.

This changes the remark plugins that generate `html` nodes into rehype
nodes. These were `remarkPrism` and `remarkShiki`.

Closes #9909

* Apply suggestions from code review

* refactor(mdx): move user defined rehype plugins after syntax highlighting

* fix(mdx): fix issue in mdx rehype plugin ordering

* docs: explain why html/raw nodes are avoided in changeset

This also includes some hints on what users could do to upgrade of they
rely on these nodes.

* Fix MDX rehype plugin ordering

* refactor(remark): restore remarkPrism and remarkShiki

They aren’t used anymore, but removing would be a breaking change.

* chore: mark deprecated

* Apply suggestions from code review

Co-authored-by: Sarah Rainsberger <sarah@rainsberger.ca>

* Update .changeset/thirty-beds-smoke.md

Co-authored-by: Sarah Rainsberger <sarah@rainsberger.ca>

---------

Co-authored-by: Emanuele Stoppa <my.burning@gmail.com>
Co-authored-by: Bjorn Lu <bjornlu.dev@gmail.com>
Co-authored-by: Sarah Rainsberger <sarah@rainsberger.ca>
2024-03-08 10:53:39 +00:00
..
index.ts chore: import sort source code, exception for the astro package (#10242) 2024-02-27 11:15:27 +00:00
plugins.ts fix(markdown): don’t generate mdast html nodes (#10104) 2024-03-08 10:53:39 +00:00
README.md Fix MDX README typo (#7567) 2023-07-05 14:15:32 +08:00
recma-inject-import-meta-env.ts Use esbuild for env replacement (#9652) 2024-01-11 12:06:14 +08:00
rehype-apply-frontmatter-export.ts Improve MDX rendering performance (#8533) 2023-09-14 20:05:38 +08:00
rehype-collect-headings.ts fix: Enforce the usage of type imports when possible (#6502) 2023-03-10 16:19:57 +01:00
rehype-meta-string.ts Preserve code element node meta for rehype syntax highlighters (#5335) 2022-11-09 08:32:13 -05:00
rehype-optimize-static.ts Update safe dependencies (#7430) 2023-06-21 21:09:49 +08:00
remark-images-to-component.ts chore: import sort source code, exception for the astro package (#10242) 2024-02-27 11:15:27 +00:00
utils.ts Use eslint-plugin-regexp (#9993) 2024-02-07 20:43:19 +08:00

Internal documentation

rehype-optimize-static

The rehype-optimize-static plugin helps optimize the intermediate hast when processing MDX, collapsing static subtrees of the hast as a "static string" in the final JSX output. Here's a "before" and "after" result:

Before:

function _createMdxContent() {
  return (
    <>
      <h1>My MDX Content</h1>
      <pre>
        <code class="language-js">
          <span class="token function">console</span>
          <span class="token punctuation">.</span>
          <span class="token function">log</span>
          <span class="token punctuation">(</span>
          <span class="token string">'hello world'</span>
          <span class="token punctuation">)</span>
        </code>
      </pre>
    </>
  );
}

After:

function _createMdxContent() {
  return (
    <>
      <h1>My MDX Content</h1>
      <pre set:html="<code class=...</code>"></pre>
    </>
  );
}

NOTE: If one of the nodes in pre is MDX, the optimization will not be applied to pre, but could be applied to the inner MDX node if its children are static.

This results in fewer JSX nodes, less compiled JS output, and less parsed AST, which results in faster Rollup builds and runtime rendering.

To achieve this, we use an algorithm to detect hast subtrees that are entirely static (containing no JSX) to be inlined as set:html to the root of the subtree.

The next section explains the algorithm, which you can follow along by pairing with the source code. To analyze the hast, you can paste the MDX code into https://mdxjs.com/playground.

How it works

Two variables:

  • allPossibleElements: A set of subtree roots where we can add a new set:html property with its children as value.
  • elementStack: The stack of elements (that could be subtree roots) while traversing the hast (node ancestors).

Flow:

  1. Walk the hast tree.
  2. For each node we enter, if the node is static (type is element or mdxJsxFlowElement), record in allPossibleElements and push to elementStack.
    • Q: Why do we record mdxJsxFlowElement, it's MDX?
      A: Because we're looking for nodes whose children are static. The node itself doesn't need to be static.
    • Q: Are we sure this is the subtree root node in allPossibleElements?
      A: No, but we'll clear that up later in step 3.
  3. For each node we leave, pop from elementStack. If the node's parent is in allPossibleElements, we also remove the node from allPossibleElements.
    • Q: Why do we check for the node's parent?
      A: Checking for the node's parent allows us to identify a subtree root. When we enter a subtree like C -> D -> E, we leave in reverse: E -> D -> C. When we leave E, we see that it's parent D exists, so we remove E. When we leave D, we see C exists, so we remove D. When we leave C, we see that its parent doesn't exist, so we keep C, a subtree root.
  4. (Returning to the code written for step 2's node enter handling) We also need to handle the case where we find non-static elements. If found, we remove all the elements in elementStack from allPossibleElements. This happens before the code in step 2.
    • Q: Why?
      A: Because if the node isn't static, that means all its ancestors (elementStack) have non-static children. So, the ancestors couldn't be a subtree root to be optimized anymore.
    • Q: Why before step 2's node enter handling?
      A: If we find a non-static node, the node should still be considered in allPossibleElements as its children could be static.
  5. Walk done. This leaves us with allPossibleElements containing only subtree roots that can be optimized.
  6. Add the set:html property to the hast node, and remove its children.
  7. 🎉 The rest of the MDX pipeline will do its thing and generate the desired JSX like above.

Extra

MDX custom components

Astro's MDX implementation supports specifying export const components in the MDX file to render some HTML elements as Astro components or framework components. rehype-optimize-static also needs to parse this JS to recognize some elements as non-static.

Further optimizations

In How it works step 4,

we remove all the elements in elementStack from allPossibleElements

We can further optimize this by then also emptying the elementStack. This ensures that if we run this same flow for a deeper node in the tree, we don't remove the already-removed nodes from allPossibleElements.

While this breaks the concept of elementStack, it doesn't matter as the elementStack array pop in the "leave" handler (in step 3) would become a no-op.

Example elementStack value during walking phase:

Enter: A
Enter: A, B
Enter: A, B, C
(Non-static node found): <empty>
Enter: D
Enter: D, E
Leave: D
Leave: <empty>
Leave: <empty>
Leave: <empty>
Leave: <empty>