A Raspberry Pi-based automated document scanning solution that uploads scanned documents directly to Nextcloud. This project provides an Ansible playbook to automatically configure a Raspberry Pi as a network-connected scanner with automatic upload capabilities.
- Automatic document scanning with SANE
- OCR processing using Tesseract
- Automatic upload to Nextcloud
- Runs as system services for reliability
- Minimal configuration needed after setup
- Raspberry Pi (any model) with Raspberry OS installed
- Scanner compatible with SANE (check SANE supported devices)
- Nextcloud instance with write access to a specific folder
- Ansible installed on your control machine
- SSH access to the Raspberry Pi
-
Clone this repository:
git clone https://github.com/yourusername/Pi-Scanner.git cd Pi-Scanner -
Set up SSH key access to your Raspberry Pi:
ssh-copy-id pi@<your-pi-ip-address>
-
Create your configuration file and fill in your values:
cp Ansible/config.example.yml Ansible/config.yml $EDITOR Ansible/config.ymlconfig.ymlholds:cloud_url— your Nextcloud server base URL (see "Finding your share token" below)cloud_user— the share token, not a username (see below)cloud_pass— the password set on the shareuser,datafolder— optional overrides (default topiand/home/pi/scan-data)
It is gitignored because it contains credentials in plain text.
Scans are uploaded to a public File Drop share rather than a personal account, so the "user" is actually the share token taken from the share link.
-
In Nextcloud, create a folder and share it as a link with "Allow upload and editing" (a File Drop / upload-only share is fine).
-
Set a password on the share.
-
Copy the share link — it looks like:
https://yourserver.de/s/HWyoGEkKRwBY2xK └──────── cloud_url ────────┘ └── token ──┘ -
Fill
config.ymlaccordingly:Field Value from the example link cloud_urlhttps://yourserver.de(before/s/)cloud_userHWyoGEkKRwBY2xK(the token after/s/)cloud_passthe password you set on the share
Uploads use the public WebDAV endpoint
<cloud_url>/public.php/dav/files/<token>/<filename>. -
Run the Ansible playbook:
ansible-playbook -i <your-pi-ip-address>, -u pi Ansible/playbook_setup_scanner-Pi.yml
To use a config file in another location:
ansible-playbook -i <your-pi-ip-address>, -u pi \ -e config_file=/path/to/config.yml Ansible/playbook_setup_scanner-Pi.yml
-
The Raspberry Pi will be configured with the hostname "ScannerPi"
-
Two services will be running:
scand: Monitors for new documents and handles scanninguploadd: Handles uploading scanned documents to Nextcloud
-
Scanned documents will be:
- Automatically processed for better quality
- Converted to searchable PDFs (with OCR)
- Uploaded to your specified Nextcloud folder
-
Check service status:
sudo systemctl status scand sudo systemctl status uploadd
-
View logs:
journalctl -u scand journalctl -u uploadd
-
Common issues:
- If scanning fails, ensure your scanner is properly connected and recognized by SANE
- If uploads fail, verify your Nextcloud credentials and connectivity
- Check permissions if files aren't being created or uploaded
Canon imageFORMULA scanners (e.g. P-208II) ship with an AutoStart / CaptureOnTouch
feature that, when enabled, makes the scanner boot as a USB Mass Storage device (a
virtual installer CD) instead of a scanner. In that state lsusb shows the device but
scanimage -L finds nothing.
Diagnose by checking the USB product ID:
lsusb | grep -i canon1083:1660→ AutoStart on, presenting as mass storage — SANE cannot use it1083:165f→ AutoStart off, presenting as a scanner — works with thecanon_drbackend
If you see the mass-storage ID, turn AutoStart off (it is a setting stored in the scanner's firmware, toggled with Canon's CaptureOnTouch utility on Windows). The scanner then re-enumerates as a normal scanner. This is a per-device firmware setting and cannot be changed from the Pi.
The Nextcloud credentials are stored unencrypted on the Raspberry Pi. This is considered acceptable as:
- The credentials only have access to a specific upload directory
- The Raspberry Pi should be physically secured and on a trusted network
- The credentials cannot be used to access other parts of your Nextcloud instance
The scanning service runs continuously and handles document scanning and processing:
- Uses
scanadffor ADF (Automatic Document Feeder) scanning in duplex mode - Scans at 300 DPI in grayscale
- Performs automatic deskewing (both software and roller-based)
- Processing pipeline:
- Scans pages to PNG format
- Compresses to JPG using ImageMagick (85% quality, grayscale)
- Combines all pages into a single PDF using
img2pdf - Cleans up temporary PNG/JPG files
- Implements systemd watchdog for service health monitoring
A separate service handles the upload process to Nextcloud:
- Monitors the data folder for new PDF files
- Uploads using Nextcloud's WebDAV interface
- Automatically removes successfully uploaded files
- Runs checks every 60 seconds
- Uses systemd watchdog for service health monitoring
- Scanned documents:
{{ datafolder }}(default:/home/pi/scan-data) - Service scripts:
/home/pi/scand.shand/home/pi/uploadd.sh - Service definitions:
/etc/systemd/system/scand.serviceand/etc/systemd/system/uploadd.service
Advanced users can modify the scanning parameters by editing /home/pi/scand.sh:
- Resolution (default: 300 DPI)
- Page height limit (default: 500)
- Image compression quality (default: 85%)
- Scanner-specific options via
scanadf
Feel free to open issues or submit pull requests if you have suggestions for improvements.
This project is licensed under the MIT License - see the LICENSE file for details.