paperless

Prerequisite

Create database

Create the password with openssl rand -base64 32
- Save this value in secrets.yaml in postgresql.password.paperless
- Access infra server to create paperless_db with podman exec -it postgresql psql -U postgres

CREATE USER paperless WITH PASSWORD 'postgresql.password.paperless';
CREATE DATABASE paperless_db;
ALTER DATABASE paperless_db OWNER TO paperless;

Create oidc secret and hash

Create the secret with openssl rand -base64 32
access to auth vm
- podman exec -it authelia sh
- authelia crypto hash generate pbkdf2 --password 'paperless.oidc.secret'
Save this value in secrets.yaml in paperless.oidc.secret and paperless.oidc.hash
Use client_secret_post, django encodes the secret value wrong frequently.

Create session secret value

Create the secret with LC_ALL=C tr -dc 'A-Za-z0-9!#%&()*+,-./:;<=>?@[\]^_{|}~' </dev/urandom | head -c 32
- Save this value in secrets.yaml in paperless.session_secret

Create admin password

Create the secret with openssl rand -base64 32
Save this value in secrets.yaml in paperless.il.password

Add postgresql dump backup list

set_postgresql.yaml

- name: Set connected services list
  ansible.builtin.set_fact:
    connected_services:
      - ...
      - "paperless"

Configuration

Access to paperless

https://paperless.ilnmors.com
- name: il
- E-mail: il@ilnmors.internal
- password: paperless.il.password

Oauth configuration

My Profiles: Connect new social account: Authelia
- Continue
- Login with Authelia

OCR configuration

Configuration: OCR settings
- Output Type: pdfa
- Mode: skip
  - When the archive file has broken ocr text, then conduct replcae command manually
- Skip archive File: never
- Deskew: disable (toggle to enable and once more to active disable option)
- rotate: disable (toggle to enable and once more to active disable option)

The non-standard pdf file

Some pdf files doesn't follow the standard, for example korean court or government pdf files.
Before upload this kind of non-standard pdf files, convert it first.
This process uses ghostscript and powershell in Windows for console

# 1. The engine
$gsPath = "C:\Program Files\gs\gs10.07.0\bin\gswin64c.exe"

# 2. new folder which the converted file will be stored
$outputDirName = "converted_pdfs"
$outputDir = Join-Path (Get-Location) $outputDirName
if (!(Test-Path $outputDir)) { New-Item -ItemType Directory -Path $outputDir }

# 3. Find all pdf files
$files = Get-ChildItem -Filter *.pdf

foreach ($file in $files) {
    if ($file.FullName -like "*$outputDirName*") { continue }
    
    $inputPath = $file.FullName
    $outputPath = Join-Path $outputDir $file.Name
    
    Write-Host "convert: $($file.Name)" -ForegroundColor Cyan
    
    $gsArgs = @(
        "-sDEVICE=pdfwrite",
        "-dCompatibilityLevel=1.4",
        "-dPDFSETTINGS=/default",
        "-dNOPAUSE",
        "-dQUIET",
        "-dBATCH",
        "-dNoOutputFonts", # Change all text as image
        "-sOutputFile=$outputPath",
        "$inputPath"
    )
    
    # 실행
    & $gsPath @gsArgs
}

Write-Host "`n[Complete] All file is stored in '$outputDirName'." -ForegroundColor Green

3.3 KiB Raw Blame History

paperless

Prerequisite

Create database

Create oidc secret and hash

Create session secret value

Create admin password

Add postgresql dump backup list

Configuration

Access to paperless

Oauth configuration

OCR configuration

The non-standard pdf file

3.3 KiB

Raw Blame History