Skip to content

Commit

Permalink
Replace custom script with Paperless-ngx CLI
Browse files Browse the repository at this point in the history
  • Loading branch information
marcelbrueckner committed Sep 17, 2024
1 parent 1f74ed4 commit c40617d
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 101 deletions.
23 changes: 9 additions & 14 deletions docs/post-consumption/content-matching.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,28 @@ Paperless-ngx does a great job matching documents with correct correspondents, s
However, there are documents for which the automatic matching doesn't work or a single regular expression match isn't sufficient.
For such cases, further examining the document's content after consumption is necessary.

## Update document details via organize
## Update document details via organize and the Paperless-ngx CLI

[organize](https://github.com/tfeldmann/organize) is an open-source, command-line file management automation tool.
It allows to execute certain actions based on custom filters. These can be easily defined in YAML.

Probably the most helpful filter in this context is the `filecontent` filter. The document's content can be matched with regular expressions
which allows to dynamically re-use (parts of) the matched content in subsequent actions.
Probably the most helpful filter in this context is the `filecontent` filter. The document's content can be matched with regular expressions which allows to dynamically re-use (parts of) the matched content in subsequent actions.

Following script

1. ensures that a newly-consumed document gets assigned a proper title based on the document's content.
This helps to stick to a consistent naming pattern for documents that you receive regularly, e.g. invoices.
2. extracts a value out of the document content and stores it in a given custom field

The Paperless-ngx CLI can be used to update other fields as well. Check the CLI's help or [GitHub repository](https://github.com/marcelbrueckner/paperless-ngx-cli) for more information.

### Prerequisites

For this solution to work, you will need to install the following packages:

* [organize-tool](https://pypi.org/project/organize-tool/)
* [poppler](https://poppler.freedesktop.org/)[^1]
* [pypaperless-cli](https://pypi.org/project/pypaperless-cli/)

[^1]: Poppler is required for organize's `filecontent` filter to work, see [https://github.com/tfeldmann/organize/issues/322](https://github.com/tfeldmann/organize/issues/322).

Expand All @@ -41,8 +43,7 @@ Sticking to the general idea of our scripts folder layout, we will end up with f
paperless-ngx/
├─ my-post-consumption-scripts/
│ ├─ organize/
│ │ ├─ organize.config.yml.tpl
│ │ └─ pngx-update-document.py
│ │ └─ organize.config.yml.tpl
│ └─ post-consumption-wrapper.sh
# Obviously the below file only exists
# if you're running Paperless-ngx via Docker Compose
Expand All @@ -57,9 +58,10 @@ paperless-ngx/

```bash
# Token to access the REST API
PAPERLESS_TOKEN=
PNGX_TOKEN=
# Your Paperless-ngx URL, without trailing slash
PAPERLESS_URL=
# If running your post-consumption script within Docker, its likely to be http://localhost:8000
PNGX_HOST=
```

=== "organize.config.yml.tpl"
Expand All @@ -68,12 +70,6 @@ paperless-ngx/
--8<-- "scripts/post-consumption/content-matching/organize.config.yml.tpl"
```

=== "pngx-update-document.py"

```python
--8<-- "scripts/post-consumption/content-matching/pngx-update-document.py"
```

=== "post-consumption-wrapper.sh"

```bash
Expand All @@ -89,4 +85,3 @@ paperless-ngx/
## Notes

Script files can also be found on [GitHub](https://github.com/marcelbrueckner/paperless.sh/tree/main/scripts/post-consumption/content-matching).

Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,7 @@
# Add additional information to consumed documents
# based on hypercomplex ;) rules
# https://github.com/tfeldmann/organize/
# https://github.com/marcelbrueckner/paperless-ngx-cli
apt-get install poppler-utils
pip install organize-tool
pip install --root-user-action=ignore organize-tool
pip install --root-user-action=ignore pypaperless-cli
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,5 @@ rules:
- filecontent: 'Amount due.*(?P<amount>\d{2}\.\d{2})'
actions:
- echo: "Home Assistant hooray"
- shell: "./pngx-update-document.py --url http://localhost:8000 --document-id {env.DOCUMENT_ID} --title '{filecontent.title}' --custom-field-id 1 --custom-field-value {filecontent.amount}"
- shell: "pngx edit {env.DOCUMENT_ID} --title '{filecontent.title}' --custom-fields 1={filecontent.amount}"
- echo: "{shell.output}"
85 changes: 0 additions & 85 deletions scripts/post-consumption/content-matching/pngx-update-document.py

This file was deleted.

0 comments on commit c40617d

Please sign in to comment.