obestwalter
Obligatory meta article about my website

a normal day in the office (keeping the lab equipment from exploding)

When I was hyping myself up to write the second article on my website after almost 3 years I naturally had a look how I can busy myself with everything else but writing the actual article (while still fooling myself into believing this is a necessary preparation for … just writing the blimmin’ article. But I think the work in my website generation laboratory was worth the fuzz this time, as I quite enjoy the new workflow. I now can write code sprinkled articles in a Jupyter notebook and have them rendered automatically into my lektor generated website using a bit of Python code hooked into lektors plugin system.

It is a never ending story: every time I want to write an article to publish on my website, I change my blog engine instead. At some point I even created my own static website generator. All this fuzz just to avoid actually writing things to … you know … put on a website :). This might be excused by the fact that I really enjoy spending my leisure time tinkering with things aimlessly rather then actually producing something that might be useful, but I finally started seeing through my evil self-sabotage mechanisms and was determined to put a stop to it! So I did the natural thing: I went to my lab [1] and tinkered with the engine.

Prologue 🔗

I played with a lot of blog engines over the years - while never really blogging anything. I mostly work in backend development, but there is something that fascinates me about web design and web development. It’s one of these things I guess :).

I am particularly fond of static website generators that provide a workflow that is similar to developing software. I played with pelican, jekyll, hugo, nikola, flask + frozen-flask and the lot. As already mentioned: I even wrote my own sphinx based generator … while still never really blogging anything.

At the beginning of 2017 I made a deal with a colleague that I would finally write a blog article about the pytest development sprint and a bit about my involvement. It would have been boring though if I would have used the site I had already online (last incarnation was a simple mkdocs driven thing). It would also have been boring to use one of the engines I already knew. Using something utterly profane like medium or wordpress was obviously completely out of the question! I mean, I could have just written the article then and be done with it. Who wants that? Right. Not me. So I started looking around for the next thing that could keep me from writing that article and I stumbled over lektor. Now this was something that could keep me busy for a while as it is not simply a static website generator, but rather something that you can use to build a static website generator with - a website generator generator! Long story short: I set that up from scratch with a simple sass style, wrote a little plugin to integrate that into lektors development server, and finally actually wrote and published that article. Nobody ever made a deal with me again that forced me to write another article, so that was it. I had unlocked the “i-have-a-blog-but-i-never-blog”-achievement once again - only on a higher level. Until very recently.

Because very recently I realized that I had produced a lot of material while trying to teach Python and test automation to all kinds of folks over the last years. I finally wanted to start sharing some of these materials on my website. As making strange deals seems to work with my contorted psyche, I made a deal with myself to publish at least one article a month for at least a year.

This time I was determined to resist the temptation to start from scratch and resolved to adjust the existing setup to fit my new needs. The new needs arose from the fact that I work mostly in Jupyter Notebooks nowadays, when creating learning materials and I like it, so I want to write articles like that and have them integrate into my website.

Jupyterizing Lektor 🔗

If you don’t know Lektor at all yet, here is the minimum amount of knowledge necessary to follow this article:

To build a page, Lektor takes a folder, a data model defined in an .ini and a Jinja2 template. The folder for the page needs at least a contents.lr file - a simple lektor specific text file format. This file contains data that fits the user defined data model. Usually this is some meta data about the content and the main content of the page formatted in markdown.

Lektor is extensible via a plugin system that involves creating an installable package and implementing some methods in a class inheriting from lektor.pluginsystem.Plugin. Methods in that plugin class will be called, when Lektor emits events in different phases of the build process.

Did someone else solve my problem already? 🔗

To integrate Jupyter notebooks, there is already a plugin that hasn’t been worked on for a while and is more of a prof of concept. It was a good starting point though to see what it does and to decide what I want:

Not really, so I created my own 🔗

lektor-jupyter-preprocess is a Lektor plugin that does the following:

How does it work? 🔗

99.9%: existing ecosystem, 0.1% wrapping it to integrate it into Lektor.

Generating well formed markdown 🔗

Generating markdown from a notebook comes out of the box via nbconvert - so if you take a notebook that looks like this in the browser:

example jupyter notebook

… and convert it with jupyter-nbconvert --to markdown example-notebook.ipynb, out drops an example-notebook-markdown.md that contains this:

example markdown output

This is somehow already what I want, but I want the output to be marked properly and I don’t want the whole traceback - just the name and message of the error is enough. So there is a little bit of massaging to be done. The question is: when should that happen? I could try to massage the generated output to my liking, but I’d rather poke a finger into my eye, so this has to happen when I can still work with the data.

Thanks to the friendly Jupyter Development Team, nbconvert is written in a way that it is not too hard to make this possible by inheriting from ExecutePreprocessor. This lets you hook into the execution of individual code cells and massage the contents there. So this is what I came up with:

# %load -s ArticleExecutePreprocessor ../../../packages/lektor-jupyter-preprocess/lektor_jupyter_preprocess.py
class ArticleExecutePreprocessor(ExecutePreprocessor):
    """Apply load magic and massage the markdown output."""

    def preprocess_cell(self, cell, resources, *args, **kwargs):
        if cell.cell_type != "code" or not cell.source.strip():
            return cell, resources

        cell = pre_process(cell)
        cell_config = {
            **config,
            **self.nb.metadata.get(PLUGIN_KEY, {}),
            **cell.metadata.get(PLUGIN_KEY, {}),
        }
        log.debug("final config for cell is:\n%s", config)
        language = self.nb.metadata.kernelspec.language
        if config["metadata.blackify"] and language == "python":
            cell.source = blackify(cell.source)
        if cell_config["metadata.execute"]:
            nodes = self.run_cell(cell, *args, **kwargs)[1]
        else:
            nodes = cell.outputs
        cell = post_process(language, cell, nodes, cell_config)
        return cell, resources
# %load -s pre_process ../../../packages/lektor-jupyter-preprocess/lektor_jupyter_preprocess.py
def pre_process(cell):
    """Apply magics and update cell level config overrides."""
    cell.source = cell.source.strip()
    if not cell.source:
        return cell

    assert isinstance(
        cell.source, str
    ), f"bad source type: {type(cell.source)}"
    lines = cell.source.strip().splitlines()
    load_candidate = lines[0].replace("# ", "")
    # TODO apply other magics (e.g. %%capture)?
    #  also: is there a more "official" way?
    if load_candidate.startswith("%load"):
        try:
            metadata_override = lines[1]
        except IndexError:
            metadata_override = None
        if metadata_override:
            try:
                metadata_override = eval(metadata_override)
            except Exception:
                log.exception(f"[IGNORE] eval of '{metadata_override}' failed")
            if isinstance(metadata_override, dict):
                cell.metadata[PLUGIN_KEY] = {
                    **cell.metadata.get(PLUGIN_KEY, {}),
                    **metadata_override,
                }
        cell.source = apply_load_magic(load_candidate)
    return cell
# %load -s post_process ../../../packages/lektor-jupyter-preprocess/lektor_jupyter_preprocess.py
def post_process(language, cell, nodes, cell_config) -> nbformat.NotebookNode:
    """Construct what should be written to the contents for this cell.

    This simply creates a new raw cell containing everything - not because it's the
    best solution but the easiest for my use case.

    TODO figure out a better way that also accommodates potential HTML/interactive
     output better Modifies cell in-place (#4).
    """
    out = [config["cell.source"].format(language=language, cell=cell)]
    for node in nodes:
        assert isinstance(node, nbformat.NotebookNode)
        # https://nbformat.readthedocs.io/en/latest/format_description.html
        if node.output_type == "execute_result":
            out.append(config["node.execute_result"].format(node=node))
        elif node.output_type == "stream":
            # TODO use tags/raises-exception like in pytest (not raising raises error)
            #  if <wherever that tags thing is> is set:
            #      raise DidNotRaise(f"should have raised but didn't:\n{node}")
            out.append(config["node.stream"].format(node=node))
        elif node.output_type == "error":
            if not cell_config["metadata.allow_errors"]:
                raise ErrorsNotAllowed(
                    f"raised but errors not allowed:\n{node}"
                )
            if cell_config["metadata.full_traceback"]:
                # TODO handle ANSI terminal colors stuff
                #  see if how jupyter does it is reusable
                #  to keep colours this would need to be HTML though
                out.append("".join(node.traceback))
            else:
                out.append(config["node.exception"].format(node=node))
        else:
            raise UnhandledOutputType(
                f"{node.output_type=} unknown - {cell.source=}"
            )
    return nbformat.NotebookNode(
        {"cell_type": "raw", "metadata": {}, "source": "\n".join(out)}
    )

This is how it is configurable at the moment:

# %load -r 23-50 ../../../packages/lektor-jupyter-preprocess/lektor_jupyter_preprocess.py
PLUGIN_KEY = "JUPYTER_PREPROCESS"
config = {
    "url.source": None,
    "metadata.blackify": True,
    "metadata.execute": True,
    # todo figure out how jupyter does these things and play together with it
    "metadata.allow_errors": False,
    "metadata.full_traceback": True,
    "cell.source": "\n\n```{language}\n{cell.source}\n```",
    # TODO figure out, why node.data[text/plain] is correct (no quotes around key!1?!?)
    "node.execute_result": "```text\n[result]\n{node.data[text/plain]}\n```",
    "node.stream": "```text\n[{node.name}]\n{node.text}\n```",
    "node.exception": "```text\n[{node.ename}]\n{node.evalue}\n```",
}
f"""configuration of the plugin.

This dict should define all existing keys and provide sane defaults (sane for me).

It can be overridden in these ways (sorted by order of precedence - last one wins):

* config values from configs/jupyter-preprocess.ini
* dict at {PLUGIN_KEY} in notebook metadata
* dict at {PLUGIN_KEY} in cell metadata
* dict literal on second line in a cell using the %load magic

See example-project/jupyter-preprocess.ini and tests/code.ipynb for examples
"""

The idea is to hook into the part of the conversion process where the notebook is preprocessed before the actual conversion to markdown.

In my case the necessary preprocessing means:

The complete plugin code is in lektor_jupyter_preprocess.py

If I run this through nbconvert now, the generated markdown looks like this:

example markdown improved

That’s more like it and I can tweak and extend the conversion process, whenever I need to.

Lektor plugin 🔗

The first incarnation of this used the before-build event that is called indiscriminately before a source is built. This caused an eternal build loop when generating the contents.lr from a notebook. I prevented this by adding caching to detect if the notebook had changed since the last build. This worked but was ugly. I had set out to only make this work, so I decided I was finished. The next weekend though I couldn’t help but having another look.

The current incarnation adds a new build program that slightly modifies the attachment build behaviour for “notebook powered” pages to preprocess the notebook as part of the normal build process. Still not knowing much about Lektor this might be less wrong, but more importantly: it works reliably without needing extra caching, and is easier to understand in the context of a build. I also like it more, because this motivated me to look a bit into how Lektor works. Which was very interesting.

These events are used in the plugin:

This is what the plugin class looks like:

# %load -s JupyterPreprocessPlugin ../../../packages/lektor-jupyter-preprocess/lektor_jupyter_preprocess.py
class JupyterPreprocessPlugin(Plugin):
    name = "Jupyter Notebook preprocessor"
    description = "Execute and render a Jupyter notebook. Provide the result in contents.lr."

    def on_setup_env(self, **_):
        """'Replace' attachment build program with an enhanced version.

        `get_build_program` finds this before the inbuilt, effectively shadowing it.

        This makes the preprocessing a part of the build of the page, overwriting
        contents.lr before it is processed, therefore preventing build loops.
        """
        update_global_config(self.get_config().to_dict())
        self.env.jinja_env.globals[PLUGIN_KEY] = {
            "url_source": config["url.source"],
            "paths": set(),
        }
        self.env.add_build_program(
            Attachment, NotebookAwareAttachmentBuildProgram
        )

    def on_before_build_all(self, **_):  # noqa
        _already_built.clear()

    def on_before_build(self, source, **_):
        attachments = getattr(source, "attachments", None)
        if not attachments:
            return

        attachment = attachments.get(f"{Path(source.path).name}.ipynb")
        if attachment:
            self.env.jinja_env.globals[PLUGIN_KEY]["paths"].add(source.path)

If you want to try it in your Lektor project 🔗

In the simplest case:

Keep in mind that the complete contents.lr gets rendered from the notebook, so you need to create the same structure like a normal contents.lr would look like (for my use case this is just right atm, but it would be not too hard to extend that in a way that the generated markdown does not clobber the file if it finds a special marker where to put the generated markdown inside the contents.lr). [3]

Additionally to generating the notebook the plugin injects the global variable JUPYTER_PREPROCESS into the jinja template which at the moment contains:

Here is how a footer with a link could look like using that data:

<footer>
    <div class="origin-hint">
    {% if this.path in JUPYTER_PREPROCESS.paths %}
        {% if JUPYTER_PREPROCESS.url_source is defined %}
            <a href="{{ JUPYTER_PREPROCESS.url_source }}{{ this.path }}">
                generated from a Jupyter notebook &mdash; view sources
            </a>
        {%  endif %}
    {% endif %}
    </div>
</footer>

See this small example project to see how it works in the simplest case

tox based development and publishing workflow 🔗

To cap it all off: if a project hasn’t got a tox.ini that wraps all important activities of the project into a neat package it doesn’t feel like a real project. So this is what tox -av tells me about the workflow of my Lektor project (in case I don’t write another article for the next three years and I come back to it and have no idea how stuff works 😉):

$ tox -av

default environments:
serve           -> run custom wrapper around the lektor server

additional environments:
clean           -> tidy up to start from a clean slate
build           -> build the website at ../build
serve-build     -> serve ../build at http://localhost:7777
serve-notebooks -> serve jupyter notebooks
deploy          -> build and push master (website build) to github
test            -> run tests for lebut

Epilogue 🔗

Now I have a reasonably pleasant workflow to turn my notebooks into website articles. I am pretty confident that I will be able to keep that deal with myself :).

  1. Can anyone tell me from which film this gif is? I found it on tenor and would like to give proper credit [⤴]

  2. This seems to be the the usual approach though, when looking at other blog engines. For Nikola there is a theme with inbuilt Jupyter support. Same for pelican. [⤴]

  3. there is already an issue for that - I might actually do that soonish, because having raw cells in the notebook that contain Lektor data is kinda ugly. [⤴]

meta python lektor jupyter diy [last update: 2019-11-18]