A simple Python web app to translate PDF files while preserving their format using Azure OpenAI GPT-4.
- Python 3.12 (Python 3.13 not yet supported by PyMuPDF)
- Cross-platform compatible (works on Windows, macOS, Linux)
Run the PowerShell script:
.\install.ps1-
Create a virtual environment with Python 3.12:
python3.12 -m venv venv
-
Activate the virtual environment:
# On macOS/Linux: source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables. Create a
.envfile with:AZURE_OPENAI_API_KEY=your_api_key AZURE_OPENAI_ENDPOINT=your_endpoint AZURE_OPENAI_API_VERSION=2024-10-21 AZURE_OPENAI_DEPLOYMENT_NAME=gpt4 -
Run the app:
python app.py
-
Open http://127.0.0.1:5000 in your browser.
- Select a PDF file by dragging and dropping or clicking the upload area.
- Choose the target language from the dropdown.
- Download the translated PDF.
- Internal use only. This tool is intended for non-production, internal workflows (review, drafts, enablement). Do not use for customer-facing deliverables without manual QA.
- No OCR. Scanned/image-only PDFs are not translated because text cannot be extracted. Use OCR first to convert to selectable text.
- Layout fidelity. The app tries to preserve layout, but some text may be slightly re-positioned or appear with background patches. Long lines may be auto-shrunk to fit.
- Images and graphics. Images are preserved; complex vector elements behind text can cause minor artifacts in some slides.
- Fonts and glyphs. The app embeds a Unicode font (DejaVu Sans) for overlays. Rare glyphs may still render differently from the original.
- Performance and size. Very large PDFs or pages with many images will take longer and may increase output size slightly.
- Privacy & data. Do not upload confidential or regulated data. Content is sent to Azure OpenAI according to your Azure subscription/data policies.
- Copyright. Ensure you have the right to translate the PDF content.
Note: This project is provided as-is without warranty. Validate all outputs before sharing externally.