AI Pipeline for WordPress on n8n: Text Normalization, Translation, and TTS
Keywords: WordPress, n8n, LLM text normalization, machine translation, speech synthesis, Google Cloud Run, reproducibility
Abstract
I present a reproducible, self‑hosted pipeline that converts WordPress articles into multilingual voiceovers (Italian/English). The system orchestrates LLM‑based text cleaning, machine translation, and neural text‑to‑speech (TTS) via n8n. A lightweight microservice on Google Cloud Run performs long‑form TTS and writes audio artifacts (.wav) to Google Cloud Storage (GCS). Metadata and processing state are tracked through Google Sheets, enabling deterministic batch runs and idempotent processing. We describe the architecture, interfaces, and evaluation protocol (latency, throughput, and audio integrity checks) to facilitate reuse and extension.
1. Problem Statement
Blogs often publish long‑form content without an accessible audio counterpart. Manual conversion to voice is time‑consuming and error‑prone (HTML artifacts, punctuation issues, malformed text). The goal is to automate the post → clean text → translation → TTS → publish loop with traceability and repeatability, while keeping infrastructure self‑hosted for control and cost predictability.
2. System Overview
The solution is composed of three subsystems:
- Orchestration (n8n)
- Scheduled ingestion of WordPress posts.
- Text normalization with an LLM (“Clean” nodes per language).
- Optional translation to EN.
- HTTP requests to the TTS microservice.
- Update of two WordPress pages (IT/EN) with an
<audio> player block.
- Status tracking in Google Sheets (work queue).
- TTS Microservice (Cloud Run)
- FastAPI endpoint
/long-tts receiving {text, language, voice, filename}.
- Uses Google Cloud Text‑to‑Speech to synthesize
.wav files.
- Streams output to a GCS bucket and returns artifact metadata.
- Artifacts & State
- Audio:
gs://<bucket>/<ID>.wav and gs://<bucket>/<ID>EN.wav.
- Index: Google Sheets with columns
ID, Title, Date, Content, Link, Status.
- Presentation: WordPress pages IT/EN updated with an audio player snippet.
Workflow JSONs are provided in /n8n-workflows. Replace every INSERT_YOUR_ID_HERE with your instance‑specific IDs before running.
3. Data Flow
Ingest (Part 1) → fetch posts from WordPress → strip HTML → append/update rows in Google Sheets.Process (Part 2) → pick next row with empty Status → clean text (LLM) → translate to EN → call /long-tts twice (IT/EN) → update WordPress pages → set Status=done.
4. Methods
4.1 Text Normalization (LLM)
- Objective: remove markup and special tokens that degrade TTS (e.g., HTML entities, escaped quotes, code fragments).
- Approach: two dedicated “Clean” steps (IT/EN) to avoid cross‑language artifacts; simple post‑processing (newline collapse).
4.2 Machine Translation
- Objective: generate a high‑fidelity English version for bilingual synthesis.
- Approach: Google Translate node in n8n; downstream clean step to stabilize punctuation/spacing.
4.3 Neural TTS
- Objective: synthesize natural speech from normalized text.
- Approach: Google Cloud TTS (e.g.,
it-IT-Neural2-F and en-US-Neural2-D), invoked via Cloud Run microservice to support long texts, centralized logging, and uniform storage in GCS.
- Output: uncompressed or PCM‑linear
.wav files named by post ID (plus EN suffix for English).
4.4 Publication
- Objective: consistent exposure of artifacts to end users.
- Approach: WordPress REST updates a dedicated IT page and an EN page; an
<audio> block references GCS URLs. Pages act as an index/catalog of voiceovers distinct from the original posts.
5. Implementation Details
5.1 Orchestration (n8n)
- Triggers: nightly schedule.
- Queueing: Google Sheets serves as a work queue with
Status acting as a completion flag.
- Idempotency: the pipeline selects a single pending row (oldest by
Date), ensuring one‑by‑one processing without duplicates.
- Error Handling: if a step fails, the row remains pending; you can re‑run Part 2 safely.
5.2 Cloud Run Service
/GCP Code contains:
main.py: FastAPI app exposing /long-tts.
requirements.txt: Python dependencies.
Dockerfile.txt: image build.
Deploy:
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
gcloud builds submit --tag gcr.io/YOUR_PROJECT_ID/long-tts
gcloud run deploy long-tts \
--image gcr.io/YOUR_PROJECT_ID/long-tts \
--region europe-west1 \
--platform managed \
--allow-unauthenticated
Example request:
curl -X POST "https://<service>.run.app/long-tts" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello World",
"language": "en-US",
"voice": "en-US-Neural2-D",
"filename": "sample.wav"
}'
5.3 WordPress Integration
- Two page IDs (IT/EN) are updated by the workflow.
- The audio player uses
<audio><source src="https://storage.googleapis.com/<bucket>/<ID>.wav" type="audio/mpeg"></audio>; the EN version appends EN.wav.
6. Evaluation Protocol
We propose a lightweight, reproducible protocol. Record results per batch in a CSV or Sheet.
- Latency (ms/char): measure end‑to‑end synthesis time divided by input character count.
- Throughput (posts/hour): number of posts fully processed in a 60‑minute window.
- Audio Integrity: detect silent output or truncated files by checking duration vs. expected duration (given TTS speaking rate).
- Text Cleanliness Score: pre/post token ratio (e.g., removed HTML entities, code spans) as a proxy for normalization quality.
- Publication Success Rate: ratio of successful WordPress page updates vs. attempts.
No human MOS is reported here; if you need perceptual quality, add a small listening test protocol with 5‑point MOS and inter‑rater agreement.
7. Reproducibility
- Deterministic scheduling: nightly cron with a single‑row pick policy.
- Pinned artifacts: TTS voices (
it-IT-Neural2-F, en-US-Neural2-D) fixed for runs.
- Versioning: keep a changelog of node versions and
main.py image digest (from Cloud Run).
- Config surface: all instance‑specific IDs are explicit in the JSON as
INSERT_YOUR_ID_HERE—replace them but do not change field names.
8. Limitations
- Reliance on external services (Translate/TTS) may introduce latency spikes or regional availability constraints.
- TTS prosody depends on input punctuation; text normalization reduces but cannot fully remove artifacts.
- The catalog approach (IT/EN pages) centralizes voiceovers, not embedding per‑post players (by design).
9. Ethical & Operational Considerations
- Copyright & consent: ensure you are authorized to transform and redistribute post content as audio.
- Attribution: link back to the original article in the audio block.
- Privacy: avoid synthesizing PII; clean steps should strip emails/API keys/code fragments from the TTS input.
- Accessibility: provide transcripts when feasible for hearing‑impaired users.
10. How to Use (Quick Start)
- Deploy the Cloud Run service in
/GCP Code.
- Create the Google Sheet (
ID, Title, Date, Content, Link, Status).
- Import
/n8n-workflows/WordPress_Automations__Part1.json and Part2.json into n8n.
- Replace all
INSERT_YOUR_ID_HERE with your WordPress page IDs, Google Sheet IDs, and credential IDs.
- Map credentials (WordPress, Google, OpenAI if you keep LLM cleaning).
- Run Part 1 to populate the Sheet; Run Part 2 to synthesize and publish.
- Evaluate using the protocol in §6 and iterate.
11. Future Work
- Add exponential backoff/retry on TTS HTTP calls.
- Support additional languages/voices; auto‑select voice based on locale.
- Optional per‑post embedding (update the article body instead of a catalog page).
- Introduce signed URLs for private buckets.
Artifact Locations
- Workflows:
/n8n-workflows/WordPress_Automations__Part1.json, /n8n-workflows/WordPress_Automations__Part2.json
- Service code:
/GCP Code
- Diagrams:
/images/WordPress_Automations__Part1.png, /images/WordPress_Automations__Part2.png