Files
ilnmors-homelab/docs/services/app/paperless-ngx.md

3.3 KiB

paperless

Prerequisite

Create database

  • Create the password with openssl rand -base64 32
    • Save this value in secrets.yaml in postgresql.password.paperless
    • Access infra server to create paperless_db with podman exec -it postgresql psql -U postgres
CREATE USER paperless WITH PASSWORD 'postgresql.password.paperless';
CREATE DATABASE paperless_db;
ALTER DATABASE paperless_db OWNER TO paperless;

Create oidc secret and hash

  • Create the secret with openssl rand -base64 32
  • access to auth vm
    • podman exec -it authelia sh
    • authelia crypto hash generate pbkdf2 --password 'paperless.oidc.secret'
  • Save this value in secrets.yaml in paperless.oidc.secret and paperless.oidc.hash
  • Use client_secret_post, django encodes the secret value wrong frequently.

Create session secret value

  • Create the secret with LC_ALL=C tr -dc 'A-Za-z0-9!#%&()*+,-./:;<=>?@[\]^_{|}~' </dev/urandom | head -c 32
    • Save this value in secrets.yaml in paperless.session_secret

Create admin password

  • Create the secret with openssl rand -base64 32
  • Save this value in secrets.yaml in paperless.il.password

Add postgresql dump backup list

- name: Set connected services list
  ansible.builtin.set_fact:
    connected_services:
      - ...
      - "paperless"

Configuration

Access to paperless

Oauth configuration

  • My Profiles: Connect new social account: Authelia
    • Continue
    • Login with Authelia

OCR configuration

  • Configuration: OCR settings
    • Output Type: pdfa
    • Mode: skip
      • When the archive file has broken ocr text, then conduct replcae command manually
    • Skip archive File: never
    • Deskew: disable (toggle to enable and once more to active disable option)
    • rotate: disable (toggle to enable and once more to active disable option)

The non-standard pdf file

  • Some pdf files doesn't follow the standard, for example korean court or government pdf files.
  • Before upload this kind of non-standard pdf files, convert it first.
  • This process uses ghostscript and powershell in Windows for console
# 1. The engine
$gsPath = "C:\Program Files\gs\gs10.07.0\bin\gswin64c.exe"

# 2. new folder which the converted file will be stored
$outputDirName = "converted_pdfs"
$outputDir = Join-Path (Get-Location) $outputDirName
if (!(Test-Path $outputDir)) { New-Item -ItemType Directory -Path $outputDir }

# 3. Find all pdf files
$files = Get-ChildItem -Filter *.pdf

foreach ($file in $files) {
    if ($file.FullName -like "*$outputDirName*") { continue }
    
    $inputPath = $file.FullName
    $outputPath = Join-Path $outputDir $file.Name
    
    Write-Host "convert: $($file.Name)" -ForegroundColor Cyan
    
    $gsArgs = @(
        "-sDEVICE=pdfwrite",
        "-dCompatibilityLevel=1.4",
        "-dPDFSETTINGS=/default",
        "-dNOPAUSE",
        "-dQUIET",
        "-dBATCH",
        "-dNoOutputFonts", # Change all text as image
        "-sOutputFile=$outputPath",
        "$inputPath"
    )
    
    # 실행
    & $gsPath @gsArgs
}

Write-Host "`n[Complete] All file is stored in '$outputDirName'." -ForegroundColor Green