This PDF Conversion API is a Flask-based application that allows you to convert PDF files to text and extract images. It provides asynchronous processing using Celery and supports multiple extraction methods. The API accepts PDF files either as direct uploads or via URL.
- PDF to text and image extraction
- File upload and URL download
- Asynchronous processing with Celery
- Image extraction in base64-encoded format
- Support for PyPDF2, PDFMiner, and PyMuPDF extraction methods
/convertpdfurl
(GET): Accepts a URL to download the PDF file, extraction method, and image extraction options. Returns a task ID for asynchronous processing./convertpdftask
(POST): Accepts a PDF file upload, extraction method, and image extraction options. Returns a task ID for asynchronous processing./task-result/<task_id>
(GET): Accepts a task ID and returns the status and result of the task in a JSON response./convertpdf
(POST): Accepts a PDF file upload and converts it to text and images synchronously. Returns the results in a JSON response.
The image requires a local running instance of the stack. More specifically, the service requires a rabbitmq server for its messages.
This project includes a run.sh
script to manage the Docker image and container for the PDF Conversion API. The script provides a simple way to build, start, stop, and restart the application, as well as view logs.
./run.sh build
Builds the Docker image.
./run.sh start {dev|prod}
Starts the Docker container for the application in either development or production mode. If the image is not found, it will be built automatically.
- In development mode, the application's source code is mounted as a volume, allowing you to see changes without rebuilding the image.
- In production mode, the application runs using the built image.
./run.sh stop
Stops the running Docker container for the application.
./run.sh restart
Rebuilds the Docker image, stops the running container (if any), and starts a new container in dev mode.
./run.sh logs
Displays the logs for the running Docker container.