Development - Paperless-ngx (2024)

This section describes the steps you need to take to start developmenton Paperless-ngx.

Check out the source from GitHub. The repository is organized in thefollowing way:

  • main always represents the latest release and will only seechanges when a new release is made.
  • dev contains the code that will be in the next release.
  • feature-X contains bigger changes that will be in some release, butnot necessarily the next one.

When making functional changes to Paperless-ngx, always make your changeson the dev branch.

Apart from that, the folder structure is as follows:

  • docs/ - Documentation.
  • src-ui/ - Code of the front end.
  • src/ - Code of the back end.
  • scripts/ - Various scripts that help with different parts ofdevelopment.
  • docker/ - Files required to build the docker image.

Contributing to Paperless-ngx

Maybe you've been using Paperless-ngx for a while and want to add a featureor two, or maybe you've come across a bug that you have some ideas howto solve. The beauty of open source software is that you can see what'swrong and help to get it fixed for everyone!

Before contributing please review our code ofconductand other important information in the contributingguidelines.

Code formatting with pre-commit hooks

To ensure a consistent style and formatting across the project source,the project utilizes Git pre-commithooks to perform some formatting and linting before a commit is allowed.That way, everyone uses the same style and some common issues can be caughtearly on.

Once installed, hooks will run when you commit. If the formatting isn'tquite right or a linter catches something, the commit will be rejected.You'll need to look at the output and fix the issue. Some hooks, suchas the Python linting and formatting tool ruff, will format failingfiles, so all you need to do is git add those files againand retry your commit.

General setup

After you forked and cloned the code from GitHub you need to perform afirst-time setup.

Note

Every command is executed directly from the root folder of the project unless specified otherwise.

  1. Install prerequisites + pipenv as mentioned inBare metal route.

  2. Copy paperless.conf.example to paperless.conf and enable debugmode within the file via PAPERLESS_DEBUG=true.

  3. Create consume and media directories:

    $ mkdir -p consume media
  4. Install the Python dependencies:

    $ pipenv install --dev

    Note

    Using a virtual environment is highly recommended. You can spawn one via pipenv shell.

  5. Install pre-commit hooks:

  6. Apply migrations and create a superuser for your development instance:

    # src/$ python3 manage.py migrate$ python3 manage.py createsuperuser
  7. You can now either ...

    • install redis or

    • use the included scripts/start_services.sh to use docker to fireup a redis instance (and some other services such as tika,gotenberg and a database server) or

    • spin up a bare redis container

    $ docker run -d -p 6379:6379 --restart unless-stopped redis:latest
  8. Continue with either back-end or front-end development – or both :-).

Back end development

The back end is a Django application.PyCharm as well as Visual Studio Codework well for development, but you can use whatever you want.

Configure the IDE to use the src/-folder as the base source folder.Configure the following launch configurations in your IDE:

  • python3 manage.py runserver
  • python3 manage.py document_consumer
  • celery --app paperless worker -l DEBUG (or any other log level)

To start them all:

# src/$ python3 manage.py runserver & \ python3 manage.py document_consumer & \ celery --app paperless worker -l DEBUG

You might need the front end to test your back end code.This assumes that you have AngularJS installed on your system.Go to the Front end development section for further details.To build the front end once use this command:

# src-ui/$ npm install$ ng build --configuration production

Testing

  • Run pytest in the src/ directory to execute all tests. This alsogenerates a HTML coverage report. When runnings test, paperless.confis loaded as well. However, the tests rely on the defaultconfiguration. This is not ideal. But for now, make sure no settingsexcept for DEBUG are overridden when testing.

Note

The line length rule E501 is generally useful for getting multiplesource files next to each other on the screen. However, in somecases, its just not possible to make some lines fit, especiallycomplicated IF cases. Append # noqa: E501 to disable this checkfor certain lines.

Front end development

The front end is built using AngularJS. In order to get started, you need Node.js (version 14.15+) andnpm.

Note

The following commands are all performed in the src-ui-directory. You will need a running back end (including an active session) to connect to the back end API. To spin it up refer to the commands under the section above.

  1. Install the Angular CLI. You might need sudo privileges to perform this command:

    $ npm install -g @angular/cli
  2. Make sure that it's on your path.

  3. Install all necessary modules:

    $ npm install
  4. You can launch a development server by running:

    $ ng serve

    This will automatically update whenever you save. However, in-placecompilation might fail on syntax errors, in which case you need torestart it.

    By default, the development server is available on http://localhost:4200/ and is configured to access the API athttp://localhost:8000/api/, which is the default of the backend. If you enabled DEBUG on the back end, several security overrides for allowed hosts, CORS and X-Frame-Options are in place so that the front end behaves exactly as in production.

Testing and code style

The front end code (.ts, .html, .scss) use prettier for codeformatting via the Git pre-commit hooks which run automatically oncommit. See above for installation instructions. You can also run this via the CLI with acommand such as

$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files

Front end testing uses Jest and Playwright. Unit tests and e2e tests,respectively, can be run non-interactively with:

$ ng test$ npx playwright test

Playwright also includes a UI which can be run with:

$ npx playwright test --ui

Building the frontend

In order to build the front end and serve it as part of Django, execute:

$ ng build --configuration production

This will build the front end and put it in a location from which theDjango server will serve it as static content. This way, you can verifythat authentication is working.

Localization

Paperless-ngx is available in many different languages. Since Paperless-ngxconsists both of a Django application and an AngularJS front end, boththese parts have to be translated separately.

Front end localization

  • The AngularJS front end does localization according to the Angulardocumentation.
  • The source language of the project is "en_US".
  • The source strings end up in the file src-ui/messages.xlf.
  • The translated strings need to be placed in thesrc-ui/src/locale/ folder.
  • In order to extract added or changed strings from the source files,call ng extract-i18n.

Adding new languages requires adding the translated files in thesrc-ui/src/locale/ folder and adjusting a couple files.

  1. Adjust src-ui/angular.json:

    "i18n": { "sourceLocale": "en-US", "locales": { "de": "src/locale/messages.de.xlf", "nl-NL": "src/locale/messages.nl_NL.xlf", "fr": "src/locale/messages.fr.xlf", "en-GB": "src/locale/messages.en_GB.xlf", "pt-BR": "src/locale/messages.pt_BR.xlf", "language-code": "language-file" }}
  2. Add the language to the LANGUAGE_OPTIONS array insrc-ui/src/app/services/settings.service.ts:

    `dateInputFormat` is a special string that defines the behavior ofthe date input fields and absolutely needs to contain "dd", "mm"and "yyyy".
  3. Import and register the Angular data for this locale insrc-ui/src/app/app.module.ts:

    import localeDe from '@angular/common/locales/de'registerLocaleData(localeDe)

Back end localization

A majority of the strings that appear in the back end appear only whenthe admin is used. However, some of these are still shown on the frontend (such as error messages).

  • The django application does localization according to the Djangodocumentation.
  • The source language of the project is "en_US".
  • Localization files end up in the folder src/locale/.
  • In order to extract strings from the application, callpython3 manage.py makemessages -l en_US. This is important aftermaking changes to translatable strings.
  • The message files need to be compiled for them to show up in theapplication. Call python3 manage.py compilemessages to do this.The generated files don't get committed into git, since these arederived artifacts. The build pipeline takes care of executing thiscommand.

Adding new languages requires adding the translated files in thesrc/locale/-folder and adjusting the filesrc/paperless/settings.py to include the new language:

LANGUAGES = [ ("en-us", _("English (US)")), ("en-gb", _("English (GB)")), ("de", _("German")), ("nl-nl", _("Dutch")), ("fr", _("French")), ("pt-br", _("Portuguese (Brazil)")), # Add language here.]

Building the documentation

The documentation is built using material-mkdocs, see their documentation.If you want to build the documentation locally, this is how you do it:

  1. Have an active pipenv shell (pipenv shell) and install Python dependencies:

    $ pipenv install --dev
  2. Build the documentation

    $ mkdocs build --config-file mkdocs.yml

    alternatively...

  3. Serve the documentation. This will spin up acopy of the documentation at http://127.0.0.1:8000that will automatically refresh every time you changesomething.

    $ mkdocs serve

Building the Docker image

The docker image is primarily built by the GitHub actions workflow, butit can be faster when developing to build and tag an image locally.

Building the image works as with any image:

docker build --file Dockerfile --tag paperless:local --progress simple .

Extending Paperless-ngx

Paperless-ngx does not have any fancy plugin systems and will probably neverhave. However, some parts of the application have been designed to alloweasy integration of additional features without any modification to thebase code.

Making custom parsers

Paperless-ngx uses parsers to add documents. A parser isresponsible for:

  • Retrieving the content from the original
  • Creating a thumbnail
  • optional: Retrieving a created date from the original
  • optional: Creating an archived document from the original

Custom parsers can be added to Paperless-ngx to support more file types. Inorder to do that, you need to write the parser itself and announce itsexistence to Paperless-ngx.

The parser itself must extend documents.parsers.DocumentParser andmust implement the methods parse and get_thumbnail. You can provideyour own implementation to get_date if you don't want to rely onPaperless-ngx' default date guessing mechanisms.

class MyCustomParser(DocumentParser): def parse(self, document_path, mime_type): # This method does not return anything. Rather, you should assign # whatever you got from the document to the following fields: # The content of the document. self.text = "content" # Optional: path to a PDF document that you created from the original. self.archive_path = os.path.join(self.tempdir, "archived.pdf") # Optional: "created" date of the document. self.date = get_created_from_metadata(document_path) def get_thumbnail(self, document_path, mime_type): # This should return the path to a thumbnail you created for this # document. return os.path.join(self.tempdir, "thumb.webp")

If you encounter any issues during parsing, raise adocuments.parsers.ParseError.

The self.tempdir directory is a temporary directory that is guaranteedto be empty and removed after consumption finished. You can use thatdirectory to store any intermediate files and also use it to store thethumbnail / archived document.

After that, you need to announce your parser to Paperless-ngx. You need toconnect a handler to the document_consumer_declaration signal. Have alook in the file src/paperless_tesseract/apps.py on how that's done.The handler is a method that returns information about your parser:

def myparser_consumer_declaration(sender, **kwargs): return { "parser": MyCustomParser, "weight": 0, "mime_types": { "application/pdf": ".pdf", "image/jpeg": ".jpg", } }
  • parser is a reference to a class that extends DocumentParser.
  • weight is used whenever two or more parsers are able to parse afile: The parser with the higher weight wins. This can be used tooverride the parsers provided by Paperless-ngx.
  • mime_types is a dictionary. The keys are the mime types yourparser supports and the value is the default file extension thatPaperless-ngx should use when storing files and serving them fordownload. We could guess that from the file extensions, but somemime types have many extensions associated with them and the Pythonmethods responsible for guessing the extension do not always returnthe same value.
Development - Paperless-ngx (2024)
Top Articles
Latest Posts
Article information

Author: Allyn Kozey

Last Updated:

Views: 5434

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Allyn Kozey

Birthday: 1993-12-21

Address: Suite 454 40343 Larson Union, Port Melia, TX 16164

Phone: +2456904400762

Job: Investor Administrator

Hobby: Sketching, Puzzles, Pet, Mountaineering, Skydiving, Dowsing, Sports

Introduction: My name is Allyn Kozey, I am a outstanding, colorful, adventurous, encouraging, zealous, tender, helpful person who loves writing and wants to share my knowledge and understanding with you.