Automating Translation of Documents and Text Using Google Translate API

Automating Translation of Documents and Text Using Google Translate API

Project Overview

Managing multilingual websites can be a complex and time-consuming process, especially when it comes to translating content that is dynamically loaded from a database and accompanied by associated documents such as PDFs. One of our clients faced this exact challenge: their website content is available in multiple languages, and the PDFs are stored in an AWS S3 bucket.

The traditional process involved manually uploading each document in English, translating it into multiple languages, and uploading the translated versions again. This was not only time-consuming but also cost-intensive, especially as the website content and documents grew.

To streamline this process, we implemented an automated translation workflow that uses the Google Cloud Translation API for both simple text translations and document translations. Here’s how we approached the problem and built a more efficient solution.

Problem Overview

The client’s website pulls content dynamically from the database in multiple languages. When new documents are uploaded in English, they also need to be translated into other languages and uploaded back to the system. Each of these documents is stored in an AWS S3 bucket, making it even more challenging to create a unified translation process that works seamlessly.

Automated Translation Workflow

Simple Text Translation Using Google Cloud Translation API

For translating plain text, we integrated the @google-cloud/translate API. This API provides high-quality translations with support for over 100 languages. The process is straightforward:

  • When new content is added via the admin panel, the system automatically translates the text into the required languages using Google Cloud Translation.

  • These translations are then stored in the database, reducing manual effort and ensuring that multilingual content is generated instantly.

const {Translate} = require('@google-cloud/translate');
const translationClient = new Translate({
  credentials: {
    private_key: <Service Account Private key>,
    client_email: <Service Account Email>,
  }
});


const textTranslation = async (text, lang) => {
  const [translation] = await translationClient.translate(text, lang);
  return translation;
}

Document Translation Workflow

Translating documents such as PDFs required a more sophisticated solution, as the Google Cloud Translation API doesn’t directly support translating documents stored in AWS S3 buckets. Here's how we automated this process:

Step 1: Upload Base Document to Google Cloud Storage

When a document is uploaded in English, it is first uploaded to a Google Cloud Storage bucket. This is required because Google’s Document Translation service works within its own storage ecosystem.

const { Storage } = require('@google-cloud/storage');

const storage = new Storage({
  projectId: <google project id>,
  credentials: {
    private_key: <Service Account Private key>,
    client_email: <Service Account Email>,
  },
});

const googleUpload = async (filePath) => {
  await storage.bucket(bucketName).upload(filePath, {
    destination: <Path of google bucket folder>,
  });
}

Step 2: Translate the Document

Once the document is in the Google Cloud bucket, we initiate the document translation process using Google’s Document AI Translation service. The service provides high-quality document translations while maintaining the original formatting, which is crucial for PDF files

const { TranslationServiceClient } = require('@google-cloud/translate');
const translationClient = new TranslationServiceClient({
  credentials: {
    private_key: <Service Account Private key>,
    client_email: <Service Account Email>,
  }
});


const documentTranslate = async () => {
  const request = {
    parent: translationClient.locationPath(<Google Project Id>, <Bucket Location>),
    documentInputConfig: {
      gcsSource: { inputUri: `gs://<Bucket>/<Path of base file>` },
    },
    documentOutputConfig: {
      gcsDestination: { outputUriPrefix: `gs://<Bucket>/<Path of output file>` },
    },
    sourceLanguageCode: <Source Language>,
    targetLanguageCode: <Target Language>,
    isTranslateNativePdfOnly: true,
  };

  await translationClient.translateDocument(request);
}

Step 3: Download the Translated Document

After the document is translated into the target languages, it is automatically downloaded from the Google Cloud bucket.

const googleDownload = async () => {
  await storage.bucket(<Google Bucket>).file(<Google Bucket File Path>).download({
    destination: <Local Destination path>
  });
}

Step 4: Upload Translated Document to AWS S3

Finally, the translated document is uploaded back to the appropriate AWS S3 bucket. This ensures that the website content and associated documents are stored in a unified location, ready to be accessed by the multilingual site.

const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');

const s3 = new S3Client({
  region: process.env.S3_REGION,
  credentials: {
    accessKeyId: <AWS Access Key>,
    secretAccessKey: <AWS Secret Key>,
  },
});

async upload() {
  const params = {
    Bucket: <S3 Bucket Name>,
    Key: <S3 File path>,
    Body: <File Data>,
  };
  const command = new PutObjectCommand(params);
  await s3.send(command);
}

Benefits of the Automated Workflow

  • Time Efficiency: Manual translation and uploading of documents across multiple languages and storage locations is eliminated, saving considerable time.

  • Cost Efficiency: By automating translations, we drastically reduce the need for external translation services and manual effort, leading to cost savings.

  • Consistency: The Google Cloud Translation API ensures accurate and consistent translations across both text and documents, meeting the client's requirement for high-quality output.

  • Scalability: As the website grows, the system can handle larger volumes of content and documents without additional manual intervention.

  • Seamless Integration: Despite using two different cloud storage solutions (Google Cloud for translation and AWS S3 for storage), the workflow integrates them seamlessly, allowing for smooth document management.

Vivek Thumar

Software Engineer

Vivek is a software engineer with a passion for building innovative solutions. He enjoys working on challenging problems and creating software that makes a positive impact.