Compare commits

...

10 Commits

Author SHA1 Message Date
Cassowary 8404f8927d Documentation update! But the docs/index.md is still major WIP. 2023-08-05 12:09:20 -07:00
Cassowary f448a1f1ee Rename to heckweasel. 2023-04-18 17:31:25 -07:00
Cassowary 727b2b9309 General housekeeping update.
- Bump version to 0.6.0
- Add template functions to merge dictionaries for loading JSON data inside them
- Add extension management separate from MIME type
- Make the `tembed` processor which runs a generic jinja template through embedding in the template
2023-03-06 16:22:10 -08:00
Cassowary 357db6eca4 Major additions to support JSON files and provide compile time options
- Add file_json/get_file_json handling.
  This creates a new global template function to treat a file as a
  json file and returns a dict.

- Add some tools for merging dictionaries.

- Add command-line settable variables that get inserted into metadata
  tree so that at runtime options can be set.
2021-12-19 22:02:47 -08:00
Cassowary 4780764a60 Minor changes. Formatting changes. Add some Python version environments for testing. Extended get_file_list to allow a list of globs rather than just a single glob. 2021-06-30 00:39:50 -07:00
Cassowary b8bc24cf6f Reformatted with automated tools and minor fixes. 2021-04-28 23:09:35 -07:00
Cassowary bf0b7a1cb7 Comment out smart CSS from default mapping. Fix minor bug in template_tools 2019-06-03 19:23:34 -07:00
Cassowary 39dde28e35 Updates!
Some documentation expansion.
Add {do} support to jinja systems
2019-05-26 19:39:11 -07:00
Cassowary a0c4381c99 Major development update.
* Updated LICENSE, READMES/METADATA.md and TODO.md
* Added example blog to examples/
* Added preliminary Pygments support for embedding code in pages.
* Add preliminary Wordpress dump importer
* Expansions to template_tools and metadata to support Blog use case.
2019-05-23 17:51:21 -07:00
Cassowary 81532f3462 Minor doc additions. 2019-04-17 19:47:00 -07:00
71 changed files with 841 additions and 448 deletions

View File

@ -1,6 +1,6 @@
MIT License MIT License
Copyright (c) 2018 Cas Rusnov Copyright (c) 2023 Cas Rusnov
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal

View File

@ -1 +1 @@
include pixywerk2/defaults/*.yaml include heckweasel/defaults/*.yaml

View File

@ -1,5 +1,5 @@
# Pixywerk # # Heckweasel #
PixyWerk2 is a site compiler engineered like a metadata-based CMS with a template rendering system. Underneath it uses Heckweasel is a site compiler engineered like a metadata-based CMS with a template rendering system. Underneath it uses
Jinja2 templates to provide programmability, and a structured metadata system, along with processors to convert Jinja2 templates to provide programmability, and a structured metadata system, along with processors to convert
user-friendly files such as Markdown and RST into HTML with templates. user-friendly files such as Markdown and RST into HTML with templates.

24
TODO.md
View File

@ -1,8 +1,24 @@
# TODO # # TODO #
* Pygments pretty printing of source code et al. including exposing that to the template API (`pygment_format(get_file_content('whatever.py'))`).
* Smart CSS things (fill in the processors) * Smart CSS things (fill in the processors)
* Project global defines, parameters.
# Maybe # * pre- and post-scripts that will be run from __main__, either some shipped with heckweasel or project-level.
* Library of template modules? ATOM et al. * Library of template modules? ATOM et al.
* Some off the shelf website templates and a template manager.
* Live refreshing server thing which maps a heckweasel tree into a web server's memory and updates on change.
* https://github.com/Python-Markdown/markdown/wiki/Third-Party-Extensions
* add markdown_link_attr_modifier extension
* add figureAltCaption extension
* add qrcode extension
* Add support to define macros or whatever for Jinja, or to include generic stanzas in any output so adding macros won't mean repeatedly including them.
* It'd be good to generate a dependency tree and only recompile things based on changes, like makefile-like behavior.
* Fragments which would be blobs of mechanics like rss feed, thumbnail links, etc. They would be virtual files and other changes to processing
chains and project contents. `python -mheckweasel --fragment=rss,config=foo.meta` etc.
* Run commands as part of processing chains
* Project level processing chain overrides in the .meta or whatever.

View File

@ -1,7 +0,0 @@
{
"site_root":"https://example.com",
"title":"Test Metadata",
"author": "Test User",
"author_email": "test_user@example.com",
"uuid_oid_root": "pixywerk-demo"
}

View File

@ -1,33 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>{{ metadata.title }}</title>
<subtitle>{{ metadata.subtitle }}</subtitle>
<link href="{{ metadata.site_root }}/{{ metadata.file_name }}" rel="self" />
<link href="{{ metadata.site_root }}" />
<id>urn:uuid:{{ metadata.uuid }}</id>
<updated>{{ get_time_iso8601(metadata['build-time']) }}</updated>
{% set posts = get_file_list('blog_posts/*.cont') %}
{% for post in posts %}
{% set post_meta = get_file_metadata(post['file_path']) %}
<entry>
<title>{{ post_meta.title }}</title>
<link href="{{ metadata.site_root }}/{{post_meta.file_path}}" />
<id>urn:uuid:{{ post_meta.uuid }}</id>
<updated>{{ get_time_iso8601(post_meta.stat.mtime) }}</updated>
<summary>{{post_meta.summary }}</summary>
<!-- this would be the snippet, more than summary chunk -->
<!-- <content type="xhtml"> -->
<!-- <div xmlns="http://www.w3.org/1999/xhtml"> -->
<!-- <p>{{ post_meta.summary }}</p> -->
<!-- </div> -->
<!-- </content> -->
<author>
<name>{{ post_meta.author }}</name>
<email>{{ post_meta.author_email }}</email>
</author>
</entry>
{% endfor %}
</feed>

View File

@ -1,5 +0,0 @@
{
"type": "templatable",
"title": "Test RSS Feed",
"subtitle": "Some Subtitle"
}

View File

@ -1,5 +0,0 @@
Some more post
la la la

View File

@ -1,4 +0,0 @@
{
"title":"Another Post(tm)",
"summary":"Yet another post"
}

View File

@ -1 +0,0 @@
Some content.

View File

@ -1,4 +0,0 @@
{
"title":"Test.cont",
"summary":"Some empty test content"
}

View File

@ -1 +0,0 @@
yo fresh

View File

@ -1,5 +0,0 @@
{
"foo":"bar",
"title":"A title",
"summary":"Just a post."
}

View File

@ -1,19 +0,0 @@
<h1>Index of all content</h1>
{% for f in get_file_list('*', sort_order='file_name') %}
<a href="{{ get_file_name(f['file_name']) }}">{{get_file_name(f['file_name'])}}</a>
{% endfor %}
<p>Including foo.cont.meta:
<pre>
{{ get_file_content('foo.cont.meta') }}
</pre>
</p>
<h1>Metadata</h1>
<table class="metadata">
<tr><th>key</th><th>value</th></tr>
{% set metadata = get_file_metadata('foo.cont') %}
{% for k in metadata.keys() %}
<tr><td>{{k}}</td><td>{{metadata[k]}}</td></tr>
{% endfor %}
</table>

View File

View File

@ -1,9 +0,0 @@
# README #
This is a test of the emergency compiled HTML system. This is only a *test*.
[Foo!](foo.html)
{% for i in range(100) %}
* {{ i }}
{% endfor %}

View File

@ -1,3 +0,0 @@
{
"pragma":["no-proc"]
}

View File

@ -1,9 +0,0 @@
# README #
This is a test of the emergency compiled HTML system. This is only a *test*.
[Foo!](foo.html)
{% for i in range(100) %}
* {{ i }}
{% endfor %}

View File

@ -1,3 +0,0 @@
{
"title":"Yo, markdown"
}

View File

@ -1,32 +0,0 @@
<!DOCTYPE html>
<head>
<title>Debug for {{path}}</title>
<style type="text/css">
table { border: 1px solid black; }
div { border: 1px solid black; }
td { border: 1px solid black; }
</style>
</head>
<body>
<p>{{path}}</p>
<h1>Content</h1>
<div class="content">
{{content}}
</div>
<h1>Environment</h1>
<table class="environment">
<tr><th>key</th><th>value</th></tr>
{% for k in environ.keys() %}
<tr><td>{{k}}</td><td>{{environ[k]}}</td></tr>
{% endfor %}
</table>
<h1>Metadata</h1>
<table class="metadata">
<tr><th>key</th><th>value</th></tr>
{% for k in metadata.keys() %}
<tr><td>{{k}}</td><td>{{metadata[k]}}</td></tr>
{% endfor %}
</table>
</body>

View File

@ -1,6 +0,0 @@
<table class="werk-file-list">
<tr class="werk-file-list-head"><th>file</th><th>type</th><th>size</th><th>last change</th></tr>
{% for f in files.keys() %}
<tr class="werk-file-list-item"><td><a href="/{{files[f].relpath}}">{{f}}</a></td><td>{{files[f].type}}</td><td>{{files[f].size}}</td><td>{{files[f].ctime | date}}</td></tr>
{% endfor %}
</table>

View File

@ -1,13 +0,0 @@
<!DOCTYPE html>
<head>
<title>{{metadata.title}}</title>
<style type="text/css">
table { border: 1px solid black; }
div { border: 1px solid black; }
td { border: 1px solid black; }
</style>
</head>
<body>
{{content}}
</body>
</html>

151
docs/index.md Normal file
View File

@ -0,0 +1,151 @@
# HECKWEASEL documentation!
Welcome to the index for HECKWEASEL Documentation. In this directory you'll find a bunch of files but this is the introduction you need to understanding the way heckweasel works and how to use it. You wouldn't web a site.
## Introduction: TL;DR
Heckweasel compiles a set of files into a website.
There is your website **template**, separate files that are the **content** of your website (such as a blog post, an image, etc), and json files that are the **metadata** for each content file. These get compiled together into static web pages.
There's a lot more to it, and it is entirely programmable, but basically that's it.
## Introduction: What the hyeck is Heckweasel?!
Heckweasel is a website compiler framework. Primarily it allows the creation of web site using a collection of flat files which are in a maintainable form, producing the less maintainable formats that web browsers use.
The flat files in a heckweasel project are just a directory of files like any other. There is a default directory structure for projects but that isn't important right now.
Heckweasel projects generally take the form of a collection of one or more templates and a collection of one or more files that are filled into the templates. Pervasively, heckweasel draws a distinction between the contents of a web page and the template it gets put into. You can think of the template, as generally used by heckweasel, as a sort of picture frame into which your content is placed. The content itself may be implemented as one of several popular formats such as Markdown and HTML. Also of note is that there are sort of two routes from heckweeasel input to heckweasel output, one route is through the template system and the other route merely copies the input to the output.
Another important detail about heckweasel is metadata. Every item in the heckweasel project (thus, every file in the heckweasel project directory) has a collection of *metadata* associated with it, such as its file name, creation time, and other objective information, but also any arbitrary information about it such as its title, a short description, thumbnails or whatever. It's also important to note that the **content** of a file counts as metadata, and is stored the same way inside of heckweasel's way of looking at the files. Metadata is stored with the file as *filename*.meta and directories contain metadata in the file called .meta. Metadata is also inherited! So setting a template in a directory's metadata will apply to all of the contents of that directory. Metadata is all in a JSON format called JStyleSon, which is JSON except you can have comments in it. All of these metadata are accessable from the templates, which leads to...
The final important detail about heckweasel is that it, at is core, uses a programmable template system called Jinja. Jinja allows a lot, and I mean a *lot* of flexability in the way that the output is produced, giving complete programmability. This allows templates (and pages, for that matter) to contain programmable outcomes such as showing a list of all blog entries (each of which would be a separate file), or making a thumbnail gallery from a collection of pictures, or generating an RSS feed from all of the contents of the site. This also allows the website design to be broken into parts such that commonly-used patterns can be merely included in the file rather than being written repeatedly (although normally this function done with the page templates).
## Glossary
- **template**
- A -link-Jinja2 file which gets filled in with your content
- **content**
- The content which gets filled into templates to produce pages
- **metadata**
- Extra variables or values associated with content, which can be used to modify the way template works and do other tricks
## Just the very Basic Heckweasel Project
So with all of that said, the most basic possible heckweasel project that is actually functional would be something like a page template, and a content file called index. Heckweasel operates on an input directory and outputs to an output directory. This is admittedly not a normal use case since it doesn't benifit much from the elaborate system underneath, but it gets the idea across.
So you have your project directory `mywebsite`; inside we can have the directories `source` and `publish`, and various files, and well here's a picture:
- __mywebsite__
- __source__
- *.meta*
- __templates__
- *default.jinja*
- *index.md*
- *index.md.meta*
- __publish__
To explain the various files:
### *.meta*
This file is a JSON file containing project-wide metadata. Usually this would be metadata that applies, by default, to all files. Some things that affect the way Heckweasel processes files would be `template` which would set the default template to put content into and `templates` which would set the directory to look for templates in. By custom we also may want to set the title, author and other things like that which we may want to fill into the output files. We also put things like the eventual published address for the site (`site_root`).
Example .meta file:
```json
{
"site_root": "https://website.me",
"author": "Very Nice Person",
"title": "My Website"
}
```
### *default.jinja*
This is the default template. Heckweasel will look for `templates/default.jinja` unless another templates directory and template are specified. Jinja templates might output any kind of text file you want, but usually we put HTML inside them. Here's an example `default.jinja` that makes a barely functional web page but we'll explain more later:
```jinja2
<!DOCTYPE html>
<html>
<head>
<title>{{ metadata.title }}</title>
</head>
<body>
{{content}}
</body>
</html>
```
The main thing to notice is that this is a very simple HTML file. It does the bare minimum to render in a browser. The next thing to notice are all of the `{}` things. Those are Jinja commands. A `{{}}` containing a name will fill that name from the variables set in the Jinja environment. In Heckweasel the main two things are `content` and `metadata`. `Metadata` contains the metadata set via the `.meta` and other sources as discussed above. The new thing here is `content`, which is the *contents* of the page! As discussed above, the contents and template are considered separately, and so the page contents are filled into the template where the `{{content}}` tag is! You can also see that the title of the page is set based on the page's `title` metadata. We'll discuss this more in the next section.
Another interesting thing is, any styling that should be applied to the whole website, to a particular page type, or whatever goes in templates. For example this is where you'd include the site-wide CSS sheet for this site, and it would apply that style to all the pages (we'll discuss this more in a future section).
### *index.md*
This is the contents of the page that will eventually become `index.html` when heckweasel is done with it. Notice it is `.md` which means markdown, a user-friendly markup format - heckweasel will convert this to an HTML fragment and fill in the template's `content` with the result, producing `index.html`. This is how the magic happens! The contents of this file could be something as simple as:
```markdown
# Welcome!
Hello this is my website! Hi!
```
### *index.md.meta*
This contains the metadata specific to `index.md`. It can be left out if there isn't any specific metadata, but it's useful to make even an empty one for future reference. An example use of this is to set different title for each page.
Example:
```json
{
"title": "Welcome to my Home Page"
}
```
### Rolling it all together
Given the above tree, from the command line in the `mywebsite` directory, to compile this would be as simple as :
```bash
$ python -mheckweasel source publish
```
This would produce, in the `publish` directory, `index.html`, which would have contents like:
```html
<!DOCTYPE html>
<html>
<head>
<title>Welcome to my Home Page</title>
</head>
<body>
<h1>Welcome!</h1>
<p>This is my website! Hi!</p>
</body>
</html>
```
Notice how the result of converting `index.md` into HTML is inserted into the template where `{{content}}` was, and the value of `title` from `index.md.meta` is inserted where `{{metadata.title}}`. While `index.md` inherited the top-level `title` metadata from the top `.meta`, its own `index.md.meta` file overrode it. Neat!
The `publish` directory is ready to be serverd by a small HTTP server, placed in a web content directory, or whatever. We'll discuss that in a future section about hosting your Heckweasel site.
## Getting (very slightly) more advanced with Heckweasel
Now that we see how a project and its parts fit together we can make our little website slightly more interesting.
### Styling your Web Site
As we alluded to above, the templates are where style information generally lives.

View File

@ -15,6 +15,8 @@ On-disk meatdata is stored as a file along side the non-metadata file with the e
All files define the following keys by default: All files define the following keys by default:
relpath
: The relative path to the root of the site, useful for prepending to image `src=` and other resource paths such as CSS files and fonts in order to maintain locally viewable output.
file_name file_name
: The local path of the file : The local path of the file
file_path file_path
@ -60,6 +62,14 @@ author_email
site_root site_root
: The full URL for the root of this web site used for links and whatnot, with ending slash. : The full URL for the root of this web site used for links and whatnot, with ending slash.
Special Keys that can be defined, these change the processing in predictable ways:
type
: Define that the file that this metadata is applied to as a specific type from the type mapping table. Useful values are `passthrough` and `templatable` with obvious outcomes.
wildcard_metadata
: Define a dictionary of file globs (patterns which match files such as `*.txt`), with the value being a dictionary of additional metadata to apply to the matched files. This is generally
defined at the top level of the project to make certain file patterns treated as special without having to give them their own metadata.
## CACHING STRATEGY ## ## CACHING STRATEGY ##

5
docs/patterns.md Normal file
View File

@ -0,0 +1,5 @@
# Patterns for Site Design #
These are some simple patterns for things commonly needed in websites of various kinds.
##

View File

@ -1,11 +1,13 @@
# Project Layout # # Project Layout #
It is recommended that in general your project for PixyWerk2 site be layed out like: It is recommended that in general your project for Heckweasel site be layed out like:
``` ```
project_top/ project_top/
Makefile - Convenient for building your site Makefile - Convenient for building your site
src/ - All "source" pages are contained in here. src/ - All "source" pages are contained in here.
.meta - Top-level default metadata is set here .meta - Top-level default metadata is set here
index.cont - The content part of the index page
index.cont.meta - A metadata json file for the index, specifically.
templates/ - Templates go in here templates/ - Templates go in here
default.jinja2 - Default template that will be used if none are specified default.jinja2 - Default template that will be used if none are specified
publish/ - The path the build process will create, where the post-processed files go. publish/ - The path the build process will create, where the post-processed files go.
@ -19,7 +21,7 @@ site. Something as simple as:
``` ```
build: src/templates/* src/* build: src/templates/* src/*
python -mpixywerk2 src publish python -mheckweasel src publish
``` ```
## src/ ## ## src/ ##
@ -66,4 +68,4 @@ A simple default.jinja2 example:
## publish/ ## ## publish/ ##
This is arbitrary, and will be created by pixywerk at build time, but it will be the root path that should be published to your web server. This is arbitrary, and will be created by heckweasel at build time, but it will be the root path that should be published to your web server.

113
docs/templatefunctions.md Normal file
View File

@ -0,0 +1,113 @@
# Template Functions #
These are functions exposed to the templates which perform various useful actions for the site designer.
## get_file_list ##
Return a list of file names based on a wildcard glob, matched against the root of the project.
Prototype: `get_file_list(file_glob, sort_order, reverse, limit) -> [files]`
Arguments:
* file_glob: A standard file glob, for example `*.txt` matches all files that end in `.txt` in the root of the project. (default: `*`)
* sort_order: A string of either `file_path`, `file_name`, `ctime`, `mtime`, `size` and `ext` (default: `ctime`)
* reverse: whether the sort is reversed (default: False)
* limit: The number of entries to return from the top of the list, 0 for unlimited (default: `0`)
Returns:
* A list of file names.
## get_file_name ##
Return the filename that will result from processing the specified file based on the processors that it will be passed through.
Prototype: `get_file_name(file) -> outfile`
Arguments:
* file: The name of a file, with path, from root.
Returns:
* outfile: The name of the file, with path, that will result from processing.
## get_file_content ##
Return the rendered content of specified file. Caution: Can result in infinite loops if two templates include each other.
Prototype: `get_file_content(file) -> content`
Arguments:
* file: The name of the input file, with path, from root.
Returns:
* content: the contents that result from passing the specified file through its processors.
## get_raw ##
Return the raw contents of a source file. It is specifically not passed through any processing.
Prototype: `get_raw(file) -> content`
Arguments:
* file: The name of the input file, with path, from root.
Returns:
* content: the raw contents of the input file
## get_file_metadata ##
Return the metadata tree associated with a particular file.
Prototype: `get_file_metadata(file) -> metadata`
Arguments:
* file: the name of an input file, with path, from root
Returns:
* metadata: A dictionary of metadata loaded from the file tree.
## get_time_iso8601 ##
Return the date/time stamp in ISO 8601 format for a given time_t timestamp for UTC.
Prototype: `get_time_iso8601(timestamp) -> timestamp`
Arguments:
* timestamp: A time_t integer or float, in seconds since Jan 1 1970.
Returns:
* timestamp: A string in ISO8601 format of the date and timestamp, in the UTC timezone.
## get_date_iso8601 ##
Return the date stamp in ISO 8601 format for a given time_t timestamp for UTC.
Prototype: `get_date_iso8601(timestamp) -> timestamp`
Arguments:
* timestamp: A time_t integer or float, in seconds since Jan 1 1970.
Returns:
* timestamp: A string in ISO8601 format of the date stamp, in the UTC timezone.
## pygments_get_css ##
Return a blob of CSS produced from Pygments for a given `style`.
Prototype: `pygments_get_css(style) -> css`
Arguments:
* style (optional): A style identifier for the Pygments' HTMLFormatter.
Returns:
* css: A string of styles as returned by Pygments' HTMLFormatter.
## pygments_markup_contents_html ##
Format a code fragment with Pygments
Prototype: `pygments_markup_contents_html(input, filetype, style) -> html`
Arguments:
* input: A string containing the code to format (either literal, or imported with get_raw()).
* filetype: A string describing which lexer to use.
* style (optional) A style identifier for Pygments' HTMLFormatter.

View File

@ -1,2 +0,0 @@
build: src/templates/* src/* src/images/* src/posts/*
python -mpixywerk2 src publish

View File

@ -1,4 +0,0 @@
# Pixywerk.com Example #
This is an example blog system with the features most blogs would have (posts, tag cloud, atom/rss feeds,
index with images).

View File

@ -1,3 +0,0 @@
body { margin: 10% 10% 0 10% }

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

View File

@ -1,13 +0,0 @@
<html>
<head>
<title></title>
<link rel="stylesheet" type="text/css" href="css/main.css">
</head>
<body>
<p>This is my index!!</p>
for i in posts[:5]:
get metadata, fill in post image/text summary with link
</body>
</html>

View File

@ -1,18 +0,0 @@
<html>
<head>
<title>My first post</title>
<link rel="stylesheet" type="text/css" href="css/main.css">
</head>
<body>
<img src="../images/20190415-0.jpg" class="featured">
<div class="byline">
<p>Author: Cas Rusnov</p>
<p>Published: 2019-04-16T01:42:27.156392+00:00
</p>
</div>
<p>This is an example post!</p>
<p>yo fresh</p>
<p>There are many posts like it but this one is mine.</p>
</body>
</html>

View File

@ -1,9 +0,0 @@
<html>
<head>
<title>{{ metadata.title }}</title>
<link rel="stylesheet" type="text/css" href="css/main.css">
</head>
<body>
{{ content }}
</body>
</html>

View File

@ -1,6 +0,0 @@
{
"author": "Cas Rusnov",
"author_email": "rusnovn@gmail.com",
"uuid-oid-root": "pixywerk.com/",
"site_root": "https://pixywerk.com/"
}

View File

@ -1,3 +0,0 @@
body { margin: 10% 10% 0 10% }

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

View File

@ -1,5 +0,0 @@
<p>This is my index!!</p>
for i in posts[:5]:
get metadata, fill in post image/text summary with link

View File

@ -1,12 +0,0 @@
<img src="{{ metadata.featured }}" class="featured">
<div class="byline">
<p>Author: {{ metadata.author }}</p>
<p>Published: {{ get_time_iso8601(metadata.stat.ctime) }}
{% if metadata.stat.mtime-metadata.stat.ctime > 512 %}
Updated: {{ get_time_iso8601(metadata.stat.mtime) }}
{% endif %}
</p>
</div>
<p>This is an example post!</p>
<p>yo fresh</p>
<p>There are many posts like it but this one is mine.</p>

View File

@ -1,4 +0,0 @@
{
"title":"My first post",
"featured":"../images/20190415-0.jpg"
}

View File

@ -1,9 +0,0 @@
<html>
<head>
<title>{{ metadata.title }}</title>
<link rel="stylesheet" type="text/css" href="css/main.css">
</head>
<body>
{{ content }}
</body>
</html>

1
heckweasel/__init__.py Normal file
View File

@ -0,0 +1 @@
__version__ = '0.7.0'

View File

@ -11,14 +11,24 @@ import os
import shutil import shutil
import sys import sys
import time import time
from typing import Dict, List, cast from typing import Dict, List, cast
from .metadata import MetaTree
from .processchain import ProcessorChains from .processchain import ProcessorChains
from .processors.processors import PassthroughException from .processors.processors import PassthroughException
from .metadata import MetaTree from .pygments import pygments_get_css, pygments_markup_contents_html
from .template_tools import file_list, file_name, file_content, file_metadata, time_iso8601 from .template_tools import (
date_iso8601,
file_content,
file_list,
file_list_hier,
file_json,
file_metadata,
file_name,
file_raw,
time_iso8601,
)
from .utils import deep_merge_dicts
logger = logging.getLogger() logger = logging.getLogger()
@ -27,23 +37,30 @@ def setup_logging(verbose: bool = False) -> None:
pass pass
def get_args(args: List[str]) -> argparse.Namespace: def parse_var(varspec: str) -> List:
parser = argparse.ArgumentParser("Compile a Pixywerk directory into an output directory.") if (not ('=' in varspec)):
return [varspec, True]
return list(varspec.split('=', 2))
parser.add_argument("root", help="The root of the pixywerk directory to process.")
def get_args(args: List[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser("Compile a Heckweasel directory into an output directory.")
parser.add_argument("root", help="The root of the heckweasel directory to process.")
parser.add_argument("output", help="The output directory to export post-compiled files to.") parser.add_argument("output", help="The output directory to export post-compiled files to.")
parser.add_argument( parser.add_argument(
"-c", "--clean", help="Remove the target tree before proceeding (by renaming to .bak).", action="store_true" "-c", "--clean", help="Remove the target tree before proceeding (by renaming to .bak).", action="store_true"
) )
parser.add_argument("-s", "--safe", help="Abort if the target directory already exists.", action="store_true") parser.add_argument("-s", "--safe", help="Abort if the target directory already exists.", action="store_true")
parser.add_argument("-f", "--follow-links", help="Follow symbolic links in the input tree.", action="store_true")
parser.add_argument("-t", "--template", help="The template directory (default: root/templates)", default=None) parser.add_argument("-t", "--template", help="The template directory (default: root/templates)", default=None)
parser.add_argument("-d", "--dry-run", help="Perform a dry-run.", action="store_true") parser.add_argument("-d", "--dry-run", help="Perform a dry-run.", action="store_true")
parser.add_argument("-v", "--verbose", help="Output verbosely.", action="store_true") parser.add_argument("-v", "--verbose", help="Output verbosely.", action="store_true")
parser.add_argument("--processors", help="Specify a path to a processor configuration file.", default=None) parser.add_argument("--processors", help="Specify a path to a processor configuration file.", default=None)
parser.add_argument(
"-D", "--define", help="Add a variable to the metadata.", nargs="+", action="extend", type=parse_var)
result = parser.parse_args(args) result = parser.parse_args(args)
# validate arguments # validate arguments
if not os.path.isdir(result.root): if not os.path.isdir(result.root):
raise FileNotFoundError("can't find root folder {}".format(result.root)) raise FileNotFoundError("can't find root folder {}".format(result.root))
@ -75,25 +92,37 @@ def main() -> int:
"dir-template": "default-dir.jinja2", "dir-template": "default-dir.jinja2",
"filters": {}, "filters": {},
"build-time": time.time(), "build-time": time.time(),
"uuid-oid-root": "pixywerk", "uuid-oid-root": "heckweasel",
"summary": "", "summary": "",
"description": "", "description": "",
"author": "", "author": "",
"author_email": "" "author_email": "",
} }
if args.define:
for var in args.define:
default_metadata[var[0]] = var[1]
meta_tree = MetaTree(args.root, default_metadata) meta_tree = MetaTree(args.root, default_metadata)
file_list_cache = cast(Dict, {}) file_list_cache = cast(Dict, {})
file_cont_cache = cast(Dict, {}) file_cont_cache = cast(Dict, {})
file_name_cache = cast(Dict, {}) file_name_cache = cast(Dict, {})
file_raw_cache = cast(Dict, {})
flist = file_list(args.root, file_list_cache)
default_metadata["globals"] = { default_metadata["globals"] = {
"get_file_list": file_list(args.root, file_list_cache), "get_file_list": flist,
"get_hier": file_list_hier(args.root, flist),
"get_file_name": file_name(args.root, meta_tree, process_chains, file_name_cache), "get_file_name": file_name(args.root, meta_tree, process_chains, file_name_cache),
"get_file_content": file_content(args.root, meta_tree, process_chains, file_cont_cache), "get_file_content": file_content(args.root, meta_tree, process_chains, file_cont_cache),
"get_json": file_json(args.root),
"get_raw": file_raw(args.root, file_raw_cache),
"get_file_metadata": file_metadata(meta_tree), "get_file_metadata": file_metadata(meta_tree),
"get_time_iso8601": time_iso8601("UTC"), "get_time_iso8601": time_iso8601("UTC"),
"get_date_iso8601": date_iso8601("UTC"),
"pygments_get_css": pygments_get_css,
"pygments_markup_contents_html": pygments_markup_contents_html,
"merge_dicts": deep_merge_dicts,
} }
for root, _, files in os.walk(args.root): for root, _, files in os.walk(args.root, followlinks=args.follow_links):
workroot = os.path.relpath(root, args.root) workroot = os.path.relpath(root, args.root)
if workroot == ".": if workroot == ".":
workroot = "" workroot = ""
@ -112,7 +141,7 @@ def main() -> int:
continue continue
metadata = meta_tree.get_metadata(os.path.join(workroot, f)) metadata = meta_tree.get_metadata(os.path.join(workroot, f))
chain = process_chains.get_chain_for_filename(os.path.join(root, f), ctx=metadata) chain = process_chains.get_chain_for_filename(os.path.join(root, f), ctx=metadata)
print("process {} -> {}".format(os.path.join(root, f), os.path.join(target_dir, chain.output_filename))) print("process {} -> {} -> {}".format(os.path.join(root, f), repr(chain), os.path.join(target_dir, chain.output_filename)))
if not args.dry_run: if not args.dry_run:
try: try:
with open(os.path.join(target_dir, chain.output_filename), "w") as outfile: with open(os.path.join(target_dir, chain.output_filename), "w") as outfile:

View File

@ -10,6 +10,13 @@ templatable:
chain: chain:
- jinja2 - jinja2
# Any object that needs jinja and to be embedded in a parent template
tembed:
extension: null
chain:
- jinja2
- jinja2_page_embed
# Markdown, BBCode and RST are first run through the templater, and then # Markdown, BBCode and RST are first run through the templater, and then
# they are processed into HTML, and finally embedded in a page template. # they are processed into HTML, and finally embedded in a page template.
markdown: markdown:
@ -62,24 +69,24 @@ template-html:
- jinja2 - jinja2
- jinja2_page_embed - jinja2_page_embed
# Smart CSS are simply converted to CSS. # # Smart CSS are simply converted to CSS.
sass: # sass:
extension: # extension:
- sass # - sass
- scss # - scss
chain: # chain:
- process_sass # - process_sass
less: # less:
extension: # extension:
- less # - less
chain: # chain:
- process_less # - process_less
stylus: # stylus:
extension: # extension:
- styl # - styl
chain: # chain:
- process_styl # - process_styl
# # Images are processed into thumbnails and sized in addition to being retained as their original # # Images are processed into thumbnails and sized in addition to being retained as their original
# FIXME implement split chain processor, implement processor arguments, # FIXME implement split chain processor, implement processor arguments,

View File

@ -1,11 +1,11 @@
"""Constructs a tree-like object containing the metadata for a given path, and caches said metadata.""" """Constructs a tree-like object containing the metadata for a given path, and caches said metadata."""
import fnmatch
import logging import logging
import mimetypes import mimetypes
import os import os
import uuid import uuid
from typing import Any, Dict, List, Optional, Tuple, Union, cast
from typing import Dict, Optional, Union, List, Tuple, Any, cast
import jstyleson import jstyleson
@ -93,7 +93,7 @@ class MetaTree:
"""Retrieve the metadata for a given path """Retrieve the metadata for a given path
The general procedure is to iterate the tree, at each level The general procedure is to iterate the tree, at each level
m load .meta (JSON formatted dictionary) for that level, and load .meta (JSON formatted dictionary) for that level, and
then finally load the path.meta, and merge these dictionaries then finally load the path.meta, and merge these dictionaries
in descendant order. in descendant order.
@ -108,10 +108,14 @@ m load .meta (JSON formatted dictionary) for that level, and
# iterate path components from root to target path # iterate path components from root to target path
comps = [self._root] + rel_path.split("/") comps = [self._root] + rel_path.split("/")
fullpath = "" fullpath = ""
ospath = os.path.join(self._root, rel_path)
for pth in comps: for pth in comps:
fullpath = os.path.join(fullpath, pth) fullpath = os.path.join(fullpath, pth)
st = os.stat(fullpath) st = os.stat(fullpath)
if os.path.isdir(fullpath):
cachekey = os.path.join(fullpath, ".meta")
else:
cachekey = fullpath + ".meta" cachekey = fullpath + ".meta"
meta = cast(Dict, {}) meta = cast(Dict, {})
try: try:
@ -126,16 +130,20 @@ m load .meta (JSON formatted dictionary) for that level, and
meta = jstyleson.load(open(cachekey, "r")) meta = jstyleson.load(open(cachekey, "r"))
self._cache.put(cachekey, meta, st_meta.st_mtime) self._cache.put(cachekey, meta, st_meta.st_mtime)
if fullpath == ospath and "wildcard_metadata" in metablob:
for wild in metablob["wildcard_metadata"]:
if fnmatch.fnmatch(pth, wild[0]):
metablob.update(wild[1])
metablob.update(meta) metablob.update(meta)
# return final dict # return final dict
metablob["dir"], metablob["file_name"] = os.path.split(rel_path) metablob["dir"], metablob["file_name"] = os.path.split(rel_path)
metablob["file_path"] = rel_path metablob["file_path"] = rel_path
metablob["uuid"] = uuid.uuid3( metablob["relpath"] = os.path.relpath("/", "/" + metablob["dir"])
uuid.NAMESPACE_OID, metablob["uuid-oid-root"] + os.path.join(self._root, rel_path) metablob["uuid"] = uuid.uuid3(uuid.NAMESPACE_OID, metablob["uuid-oid-root"] + ospath)
)
metablob["os-path"], _ = os.path.split(fullpath) metablob["os-path"], _ = os.path.split(fullpath)
metablob["guessed-type"] = guess_mime(os.path.join(self._root, rel_path)) metablob["guessed-type"] = guess_mime(ospath)
if "mime-type" not in metablob: if "mime-type" not in metablob:
metablob["mime-type"] = metablob["guessed-type"] metablob["mime-type"] = metablob["guessed-type"]
metablob["stat"] = {} metablob["stat"] = {}

View File

@ -3,8 +3,7 @@
import os import os
import os.path import os.path
import random import random
from typing import Any, Dict, Iterable, List, Optional, Type, cast
from typing import List, Iterable, Optional, Any, Dict, Type, cast
import yaml import yaml
@ -91,6 +90,9 @@ class ProcessorChain:
fname = processor.filename(fname, self._ctx) fname = processor.filename(fname, self._ctx)
return fname return fname
def __repr__(self) -> str:
return "[" + ",".join([x.__class__.__name__ for x in self._processors]) + "]"
class ProcessorChains: class ProcessorChains:
"""Load a configuration for processor chains, and provide ability to process the chains given a particular input """Load a configuration for processor chains, and provide ability to process the chains given a particular input

View File

@ -1,6 +1,6 @@
"""Define a Jinja2 Processor which applies programmable templating to the input stream.""" """Define a Jinja2 Processor which applies programmable templating to the input stream."""
from typing import Iterable, Optional, Dict, cast from typing import Dict, Iterable, Optional, cast
from jinja2 import Environment, FileSystemLoader from jinja2 import Environment, FileSystemLoader
@ -22,11 +22,10 @@ class Jinja2(PassThrough):
iterable: The post-processed output stream iterable: The post-processed output stream
""" """
ctx = cast(Dict, ctx) ctx = cast(Dict, ctx)
template_env = Environment(loader=FileSystemLoader(ctx["templates"])) template_env = Environment(loader=FileSystemLoader(ctx["templates"]), extensions=["jinja2.ext.do"])
template_env.globals.update(ctx["globals"]) template_env.globals.update(ctx["globals"])
template_env.filters.update(ctx["filters"]) template_env.filters.update(ctx["filters"])
tmpl = template_env.from_string("".join([x for x in input_file])) tmpl = template_env.from_string("".join([x for x in input_file]))
return tmpl.render(metadata=ctx) return tmpl.render(metadata=ctx)
processor = Jinja2 processor = Jinja2

View File

@ -3,8 +3,7 @@
the target template is rendered).""" the target template is rendered)."""
import os import os
from typing import Dict, Iterable, Optional, cast
from typing import Iterable, Optional, Dict, cast
from jinja2 import Environment, FileSystemLoader from jinja2 import Environment, FileSystemLoader
@ -25,8 +24,7 @@ class Jinja2PageEmbed(Processor):
str: the new name for the file str: the new name for the file
""" """
return os.path.splitext(oldname)[0] + "." + self.extension(oldname, ctx)
return os.path.splitext(oldname)[0] + ".html"
def mime_type(self, oldname: str, ctx: Optional[Dict] = None) -> str: def mime_type(self, oldname: str, ctx: Optional[Dict] = None) -> str:
"""Return the mimetype of the post-processed file. """Return the mimetype of the post-processed file.
@ -39,7 +37,7 @@ class Jinja2PageEmbed(Processor):
str: the new mimetype of the file after processing str: the new mimetype of the file after processing
""" """
return "text/html" return ctx.get("mime", "text/html")
def process(self, input_file: Iterable, ctx: Optional[Dict] = None) -> Iterable: def process(self, input_file: Iterable, ctx: Optional[Dict] = None) -> Iterable:
"""Return an iterable object of the post-processed file. """Return an iterable object of the post-processed file.
@ -52,7 +50,7 @@ class Jinja2PageEmbed(Processor):
iterable: The post-processed output stream iterable: The post-processed output stream
""" """
ctx = cast(Dict, ctx) ctx = cast(Dict, ctx)
template_env = Environment(loader=FileSystemLoader(ctx["templates"])) template_env = Environment(loader=FileSystemLoader(ctx["templates"]), extensions=["jinja2.ext.do"])
template_env.globals.update(ctx["globals"]) template_env.globals.update(ctx["globals"])
template_env.filters.update(ctx["filters"]) template_env.filters.update(ctx["filters"])
tmpl = template_env.get_template(ctx["template"]) tmpl = template_env.get_template(ctx["template"])
@ -70,7 +68,7 @@ class Jinja2PageEmbed(Processor):
str: the new extension of the file after processing str: the new extension of the file after processing
""" """
return "html" return ctx.get("extension", "html")
processor = Jinja2PageEmbed processor = Jinja2PageEmbed

View File

@ -1,10 +1,10 @@
"""Passthrough progcessor which takes input and returns it.""" """Passthrough progcessor which takes input and returns it."""
import os import os
from typing import Dict, Iterable, Optional, cast
from .processors import Processor, PassthroughException
from ..utils import guess_mime from ..utils import guess_mime
from typing import Iterable, Optional, Dict, cast from .processors import PassthroughException, Processor
class PassThrough(Processor): class PassThrough(Processor):

View File

@ -2,8 +2,7 @@
import io import io
import os import os
from typing import Dict, Iterable, Optional
from typing import Iterable, Optional, Dict
import markdown import markdown

View File

@ -1,6 +1,5 @@
import abc import abc
from typing import Dict, Iterable, Optional
from typing import Iterable, Optional, Dict
class PassthroughException(Exception): class PassthroughException(Exception):
@ -65,3 +64,6 @@ class Processor(abc.ABC): # pragma: no cover
Returns: Returns:
iterable: The post-processed output stream iterable: The post-processed output stream
""" """
def repr(self) -> str:
return self.__class__.__name__

36
heckweasel/pygments.py Normal file
View File

@ -0,0 +1,36 @@
"""Map Pygments into the Template API for inclusion in outputs."""
from typing import Optional, cast
import pygments
import pygments.formatters
import pygments.lexers
import pygments.styles
import pygments.util
def pygments_markup_contents_html(input_text: str, file_type: str, style: Optional[str] = None) -> str:
"""Format input string with Pygments and return HTML."""
if style is None:
style = "default"
pyst = pygments.styles.get_style_by_name(style)
formatter = pygments.formatters.get_formatter_by_name("html", style=pyst)
try:
lexer = pygments.lexers.get_lexer_for_filename(file_type)
except pygments.util.ClassNotFound:
try:
lexer = pygments.lexers.get_lexer_by_name(file_type)
except pygments.util.ClassNotFound:
lexer = pygments.lexers.get_lexer_by_mimetype(file_type)
return pygments.highlight(input_text, lexer, formatter)
def pygments_get_css(style: Optional[str] = None) -> str:
"""Return the CSS styles associated with a particular style definition."""
if style is None:
style = "default"
pyst = pygments.styles.get_style_by_name(style)
formatter = pygments.formatters.get_formatter_by_name("html", style=pyst)
return formatter.get_style_defs()

View File

@ -0,0 +1,145 @@
import copy
import datetime
import glob
import itertools
import os
from typing import Callable, Dict, Iterable, List, Union, cast, Tuple
import jstyleson
import pytz
from .metadata import MetaTree
from .processchain import ProcessorChains
from .utils import deep_merge_dicts
def file_list(root: str, listcache: Dict) -> Callable:
def get_file_list(
path_glob: Union[str, List[str], Tuple[str]],
*,
sort_order: str = "ctime",
reverse: bool = False,
limit: int = 0) -> Iterable:
stattable = cast(List, [])
if isinstance(path_glob, str):
path_glob = [path_glob]
for pglob in path_glob:
if pglob in listcache:
stattable.extend(listcache[pglob])
else:
for fil in glob.glob(os.path.join(root, pglob)):
if os.path.isdir(fil):
continue
if fil.endswith(".meta") or fil.endswith("~"):
continue
st = os.stat(fil)
stattable.append(
{
"file_path": os.path.relpath(fil, root),
"file_name": os.path.split(fil)[-1],
"mtime": st.st_mtime,
"ctime": st.st_ctime,
"size": st.st_size,
"ext": os.path.splitext(fil)[1],
}
)
listcache[pglob] = stattable
ret = sorted(stattable, key=lambda x: x[sort_order], reverse=reverse)
if limit > 0:
return itertools.islice(ret, limit)
return ret
return get_file_list
def file_list_hier(root: str, flist: Callable) -> Callable:
"""Return a callable which, given a directory, will walk the directory and return the files within
it that match the glob passed."""
def get_file_list_hier(path: str, glob: str, *, sort_order: str = "ctime", reverse: bool = False) -> Iterable:
output = []
for pth in os.walk(os.path.join(root, path)):
output.extend(
flist(
os.path.join(os.path.relpath(os.path.realpath(pth[0]), root), glob),
sort_order=sort_order,
reverse=reverse,
)
)
return output
return get_file_list_hier
def file_name(root: str, metatree: MetaTree, processor_chains: ProcessorChains, namecache: Dict) -> Callable:
def get_file_name(file_name: str) -> Dict:
if file_name in namecache:
return namecache[file_name]
metadata = metatree.get_metadata(file_name)
chain = processor_chains.get_chain_for_filename(os.path.join(root, file_name), ctx=metadata)
namecache[file_name] = chain.output_filename
return namecache[file_name]
return get_file_name
def file_raw(root: str, contcache: Dict) -> Callable:
def get_raw(file_name: str) -> str:
if file_name in contcache:
return contcache[file_name]
with open(os.path.join(root, file_name), "r", encoding="utf-8") as f:
return f.read()
return get_raw
def file_json(root: str) -> Callable:
def get_json(file_name: str, parent: Dict = None) -> Dict:
outd = {}
if parent is not None:
outd = copy.deepcopy(parent)
with open(os.path.join(root, file_name), "r", encoding="utf-8") as f:
return deep_merge_dicts(outd, jstyleson.load(f))
return get_json
def file_content(root: str, metatree: MetaTree, processor_chains: ProcessorChains, contcache: Dict) -> Callable:
def get_file_content(file_name: str) -> Iterable:
if file_name in contcache:
return contcache[file_name]
metadata = metatree.get_metadata(file_name)
chain = processor_chains.get_chain_for_filename(os.path.join(root, file_name), ctx=metadata)
contcache[file_name] = chain.output
return str(chain.output)
return get_file_content
def file_metadata(metatree: MetaTree) -> Callable:
def get_file_metadata(file_name: str) -> Dict:
return metatree.get_metadata(file_name)
return get_file_metadata
def time_iso8601(timezone: str) -> Callable:
tz = pytz.timezone(timezone)
def get_time_iso8601(time_t: Union[int, float]) -> str:
return datetime.datetime.fromtimestamp(time_t, tz).isoformat("T")
return get_time_iso8601
def date_iso8601(timezone: str) -> Callable:
tz = pytz.timezone(timezone)
def get_date_iso8601(time_t: Union[int, float]) -> str:
return datetime.datetime.fromtimestamp(time_t, tz).strftime("%Y-%m-%d")
return get_date_iso8601

72
heckweasel/utils.py Normal file
View File

@ -0,0 +1,72 @@
from typing import Dict, Optional
import copy
import mimetypes
import os
def merge_dicts(dict_a: Dict, dict_b: Dict) -> Dict:
"""Merge two dictionaries (shallow).
Arguments:
dict_a (dict): The dictionary to use as the base.
dict_b (dict): The dictionary to update the values with.
Returns:
dict: A new merged dictionary.
"""
dict_z = dict_a.copy()
dict_z.update(dict_b)
return dict_z
def deep_merge_dicts(dict_a: Dict, dict_b: Dict, _path=None, cpy=False) -> Dict:
"""Merge two dictionaries (deep).
https://stackoverflow.com/questions/7204805/how-to-merge-dictionaries-of-dictionaries/7205107#7205107
Arguments:
dict_a (dict): The dictionary to use as the base.
dict_b (dict): The dictionary to update the values with.
_path (list): internal use.
Returns:
dict: A new merged dictionary.
"""
if cpy:
dict_a = copy.deepcopy(dict_a)
if _path is None:
_path = []
for key in dict_b:
if key in dict_a:
if isinstance(dict_a[key], dict) and isinstance(dict_b[key], dict):
deep_merge_dicts(dict_a[key], dict_b[key], _path + [str(key)])
elif dict_a[key] == dict_b[key]:
pass # same leaf value
else:
dict_a[key] = copy.deepcopy(dict_b[key])
else:
dict_a[key] = dict_b[key]
return dict_a
def guess_mime(path: str) -> Optional[str]:
"""Guess the mime type for a given path.
Arguments:
root (str): the root path of the file tree
path (str): the sub-path within the file tree
Returns:
str: the guessed mime-type
"""
mtypes = mimetypes.guess_type(path)
ftype = None
if os.path.isdir(path):
ftype = "directory"
elif os.access(path, os.F_OK) and mtypes[0]:
ftype = mtypes[0]
else:
ftype = "application/octet-stream"
return ftype

162
importwp.py Normal file
View File

@ -0,0 +1,162 @@
"""Convert a Wordpress XML dump into to a (mostly working) heckweasel tree."""
import argparse
import datetime
import json
import os
import sys
from urllib.parse import urlparse
from xml.etree.ElementTree import ElementTree
import requests
FILE_PATTERN = "{postdate}-{postname}.thtml"
def parse_args(args):
parser = argparse.ArgumentParser("importwp.py")
parser.add_argument("input", help="The input file.")
parser.add_argument("out_dir", help="Output root directory.", default='.')
parser.add_argument("--fetch-attachments", help="Fetch all attachments referred to in file.", action="store_true", dest='fetch_attachments')
parser.add_argument("--attachment-dir", help="Subdirectory to place attachments in.", default="attachments", dest='attachment_dir')
parser.add_argument("--post-dir", help="Subdirectory to place posts in.", default="posts", dest='post_dir')
parser.add_argument("--page-dir", help="Subdirectory to place pages in.", default="", dest='page_dir')
result = parser.parse_args(args)
result.post_dir = os.path.join(result.out_dir, result.post_dir)
result.page_dir = os.path.join(result.out_dir, result.page_dir)
result.attachment_dir = os.path.join(result.out_dir, result.attachment_dir)
return result
def parse_input(xmlpath):
tree = ElementTree()
tree_root = tree.parse(source=xmlpath)
posts = {}
attachments = {}
pages = {}
for node in tree_root.find("channel"):
if node.tag == "item":
post_type = node.find("{http://wordpress.org/export/1.2/}post_type")
if post_type is not None:
status = node.find("{http://wordpress.org/export/1.2/}status")
if status is not None and status.text == "draft":
continue
content = node.find("{http://purl.org/rss/1.0/modules/content/}encoded")
title = node.find("title")
pubdate = node.find("pubDate")
description = node.find("description")
post_name = node.find("{http://wordpress.org/export/1.2/}post_name")
categories = node.findall("category")
post_id = node.find("{http://wordpress.org/export/1.2/}post_id")
post_parent = node.find("{http://wordpress.org/export/1.2/}post_parent")
if post_type.text == "post":
# found a post!
posts[post_id.text] = {'content':content,
'title':title,
'pubdate':pubdate,
'description':description,
'post_name':post_name,
'categories':categories,
'post_parent':post_parent}
elif post_type.text == "attachment":
# attachment
att_url = node.find("{http://wordpress.org/export/1.2/}attachment_url")
attachments[post_id.text] = {'content':content,
'title':title,
'pubdate':pubdate,
'description':description,
'post_name':post_name,
'categories':categories,
'post_parent':post_parent,
'att_url':att_url,}
elif post_type.text == "page":
pages[post_id.text] = {'content':content,
'title':title,
'pubdate':pubdate,
'description':description,
'post_name':post_name,
'categories':categories,
'post_parent':post_parent}
return posts, attachments, pages
def fetch_attachment(attch, outdir):
url = attch['att_url'].text
p = urlparse(url)
filename = os.path.join(outdir, os.path.split(p.path)[-1])
print("fetching attachment",url,"->",filename)
r = requests.get(url)
with open(filename, 'wb') as outf:
outf.write(r.content)
def save_cont(post, outdir):
dt = datetime.datetime.strptime(post['pubdate'].text, "%a, %d %b %Y %H:%M:%S %z")
postdate = dt.strftime("%Y-%m-%d-%H%M%S")
filename = FILE_PATTERN.format(postdate=postdate, postname=post['post_name'].text)
print(post['title'].text, "->", filename)
with open(os.path.join(outdir, filename), "w") as outf:
outf.write(post['content'].text)
# handle attachments
tags = []
category = ""
for tg in post['categories']:
if "domain" in tg.attrib and tg.attrib["domain"] == "category":
category = tg.text
else:
tags.append(tg.text)
with open(os.path.join(outdir, filename + ".meta"), "w") as outf:
metadata = {
"title": post['title'].text,
"description": post['description'].text,
"post_time": dt.timestamp(),
"featured": "",
"tags": tags,
"category": category,
}
json.dump(metadata, outf)
def main():
args = parse_args(sys.argv[1:])
try:
os.mkdir(args.out_dir)
except FileExistsError:
pass
try:
os.mkdir(args.page_dir)
except FileExistsError:
pass
try:
os.mkdir(args.post_dir)
except FileExistsError:
pass
if args.fetch_attachments:
try:
os.mkdir(args.attachment_dir)
except FileExistsError:
pass
posts, attachments, pages = parse_input(args.input)
if args.fetch_attachments:
[fetch_attachment(post, args.attachment_dir) for post in attachments.values()]
[save_cont(post, args.post_dir) for post in posts.values()]
[save_cont(page, args.page_dir) for page in pages.values()]
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -1,80 +0,0 @@
import datetime
import glob
import itertools
import os
import pytz
from typing import Callable, Dict, List, Iterable, Union, cast
from .metadata import MetaTree
from .processchain import ProcessorChains
def file_list(root: str, listcache: Dict) -> Callable:
def get_file_list(path_glob: str, *, sort_order: str = "ctime", reverse: bool = False, limit: int = 0) -> Iterable:
stattable = cast(List, [])
if path_glob in listcache:
stattable = listcache[path_glob]
else:
for fil in glob.glob(os.path.join(root, path_glob)):
if os.path.isdir(fil):
continue
if fil.endswith(".meta") or fil.endswith("~"):
continue
st = os.stat(fil)
stattable.append(
{
"file_path": os.path.relpath(fil, root),
"file_name": os.path.split(fil)[-1],
"mtime": st.st_mtime,
"ctime": st.st_ctime,
"size": st.st_size,
"ext": os.path.splitext(fil)[1],
}
)
listcache[path_glob] = stattable
ret = sorted(stattable, key=lambda x: x[sort_order], reverse=reverse)
if limit > 0:
return itertools.islice(ret, limit)
return ret
return get_file_list
def file_name(root: str, metatree: MetaTree, processor_chains: ProcessorChains, namecache: Dict) -> Callable:
def get_file_name(file_name: str) -> Dict:
if file_name in namecache:
return namecache[file_name]
metadata = metatree.get_metadata(file_name)
chain = processor_chains.get_chain_for_filename(os.path.join(root, file_name), ctx=metadata)
namecache[file_name] = chain.output_filename
return namecache[file_name]
return get_file_name
def file_content(root: str, metatree: MetaTree, processor_chains: ProcessorChains, contcache: Dict) -> Callable:
def get_file_content(file_name: str) -> Iterable:
if file_name in contcache:
return contcache[file_name]
metadata = metatree.get_metadata(file_name)
chain = processor_chains.get_chain_for_filename(os.path.join(root, file_name), ctx=metadata)
contcache[file_name] = chain.output
return chain.output
return get_file_content
def file_metadata(metatree: MetaTree) -> Callable:
def get_file_metadata(file_name: str) -> Dict:
return metatree.get_metadata(file_name)
return get_file_metadata
def time_iso8601(timezone: str) -> Callable:
tz = pytz.timezone(timezone)
def get_time_iso8601(time_t: Union[int, float]) -> str:
return datetime.datetime.fromtimestamp(time_t, tz).isoformat("T")
return get_time_iso8601

View File

@ -1,42 +0,0 @@
import mimetypes
import os
from typing import Dict, Optional
def merge_dicts(dict_a: Dict, dict_b: Dict) -> Dict:
"""Merge two dictionaries.
Arguments:
dict_a (dict): The dictionary to use as the base.
dict_b (dict): The dictionary to update the values with.
Returns:
dict: A new merged dictionary.
"""
dict_z = dict_a.copy()
dict_z.update(dict_b)
return dict_z
def guess_mime(path: str) -> Optional[str]:
"""Guess the mime type for a given path.
Arguments:
root (str): the root path of the file tree
path (str): the sub-path within the file tree
Returns:
str: the guessed mime-type
"""
mtypes = mimetypes.guess_type(path)
ftype = None
if os.path.isdir(path):
ftype = "directory"
elif os.access(path, os.F_OK) and mtypes[0]:
ftype = mtypes[0]
else:
ftype = "application/octet-stream"
return ftype

0
demo/bar/baz/quux/quuux → pyproject.toml Executable file → Normal file
View File

View File

@ -1,9 +1,11 @@
"""Package configuration.""" """Package configuration."""
from setuptools import find_packages, setup from setuptools import find_packages, setup
LONG_DESCRIPTION = """Pixywerk 2 is a filesystem based static site generator.""" from heckweasel import __version__
INSTALL_REQUIRES = ["yaml-1.3", "markdown", "jstyleson", "jinja2"] LONG_DESCRIPTION = """Heckweasel is a filesystem based static site generator."""
INSTALL_REQUIRES = ["yaml-1.3", "markdown", "jstyleson", "jinja2", "pygments"]
# Extra dependencies # Extra dependencies
EXTRAS_REQUIRE = { EXTRAS_REQUIRE = {
@ -26,7 +28,7 @@ EXTRAS_REQUIRE = {
SETUP_REQUIRES = ["pytest-runner>=2.7.1", "setuptools_scm>=1.15.0"] SETUP_REQUIRES = ["pytest-runner>=2.7.1", "setuptools_scm>=1.15.0"]
setup( setup(
author="Cassowary Rusnov", author="Cassowary Rusnov",
author_email="rusnovn@gmail.com", author_email="alderconestudio@gmail.com",
classifiers=[ classifiers=[
"Development Status :: 1 - Pre-alpha", "Development Status :: 1 - Pre-alpha",
"Environment :: Console", "Environment :: Console",
@ -49,11 +51,12 @@ setup(
keywords=["cms", "website", "compiler"], keywords=["cms", "website", "compiler"],
license="MIT", license="MIT",
long_description=LONG_DESCRIPTION, long_description=LONG_DESCRIPTION,
name="pixywerk2", name="heckweasel",
packages=find_packages(exclude=["*.tests", "*.tests.*"]), packages=find_packages(exclude=["*.tests", "*.tests.*"]),
platforms=["GNU/Linux"], platforms=["GNU/Linux"],
setup_requires=SETUP_REQUIRES, setup_requires=SETUP_REQUIRES,
use_scm_version=True, use_scm_version=True,
url="https://git.antpanethon.com/cas/pixywerk2", url="https://git.aldercone.studio/aldercone/heckweasel",
zip_safe=False, zip_safe=False,
version=__version__,
) )

12
tox.ini
View File

@ -1,5 +1,5 @@
[tox] [tox]
envlist=py{36,37}-{code-quality, unit} #, py37-sphinx envlist=py{36,37,38,39}-{code-quality, unit} #, py37-sphinx
skipsdist = true skipsdist = true
[testenv] [testenv]
@ -7,16 +7,18 @@ setenv =
LANG = en_US.UTF-8 LANG = en_US.UTF-8
deps = .[tests] deps = .[tests]
commands = commands =
unit: py.test --strict --cov-report=term-missing --cov=pixywerk2 pixywerk2/tests/unit {posargs} unit: py.test --strict --cov-report=term-missing --cov=heckweasel heckweasel/tests/unit {posargs}
code-quality: flake8 pixywerk2 code-quality: flake8 heckweasel
code-quality: black -l 120 --check pixywerk2 code-quality: black -l 120 --check heckweasel
code-quality: - prospector -A code-quality: - prospector -A
code-quality: - mypy --ignore-missing-imports pixywerk2 code-quality: - mypy --ignore-missing-imports heckweasel
# sphinx: python setup.py build_sphinx -b html # sphinx: python setup.py build_sphinx -b html
# sphinx: python setup.py build_sphinx -b man # sphinx: python setup.py build_sphinx -b man
basepython = basepython =
py36: python3.6 py36: python3.6
py37: python3.7 py37: python3.7
py38: python3.8
py39: python3.9
[flake8] [flake8]
max-line-length = 120 max-line-length = 120