How to Set Up a Browser Inside Docker for Puppeteer Using Node 20 Slim Image |

How to Set Up a Browser Inside Docker for Puppeteer Using Node 20 Slim Image

Posted on Nov 17, 2023

Web scraping and automated testing are essential parts of modern web development. Puppeteer, a Node library, provides a high-level API over the Chrome or Chromium browser. However, setting it up in a Docker container can be challenging. This blog post will guide you through setting up a browser inside Docker for Puppeteer, using the slim variant of the Node 20 image.

# Use the slim variant of the Node 20 image
FROM node:20-slim

WORKDIR /app

# Install dependencies for Puppeteer
# The slim image is Debian-based, so we use apt-get
RUN apt-get update && apt-get install -y \
    wget \
    curl \
    git \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxi6 \
    libxtst6 \
    libnss3 \
    libcups2 \
    libxss1 \
    libxrandr2 \
    libasound2 \
    libpangocairo-1.0-0 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libgtk-3-0 \
    libgbm1 \
    tzdata \
    && apt-get clean && rm -rf /var/lib/apt/lists/* 

# Set timezone if needed
ENV TZ=UTC
RUN ln -fs /usr/share/zoneinfo/$TZ /etc/localtime && dpkg-reconfigure -f noninteractive tzdata

# Install gnupg and other basic utilities
RUN apt-get update && apt-get install -y wget gnupg2 apt-transport-https curl && apt-get clean

# Add Google Chrome's public key
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -

# Add Google Chrome to the list of repositories
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'

# Update apt and install Chrome along with other dependencies
RUN apt-get update && apt-get install -y \
    git \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxi6 \
    libxtst6 \
    libnss3 \
    libcups2 \
    libxss1 \
    libxrandr2 \
    libasound2 \
    libpangocairo-1.0-0 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libgtk-3-0 \
    libgbm1 \
    tzdata \
    google-chrome-stable \
    && apt-get clean && rm -rf /var/lib/apt/lists/*


# Environment variables to help Puppeteer
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
    PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable \
    CHROME_BIN=/usr/bin/google-chrome-stable

# Copy package.json and package-lock.json (if available)
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy the rest of the application
COPY . .

# Build the application if necessary
# RUN npm run build # Uncomment and add your logic for building your nodejs server

# Start the application
# CMD ["npm", "run", "start:dev"] # Uncomment and add your logic for starting your nodejs server

Dockerfile Breakdown

  1. Base Image: We start with node:20-slim as our base image. This is a lightweight version of the Node image, ideal for a small footprint.
FROM node:20-slim
WORKDIR /app
  1. Install Dependencies for Puppeteer: The slim image is Debian-based, so we use apt-get to install necessary libraries. These libraries are required for Puppeteer to interact with the browser correctly.
RUN apt-get update && apt-get install -y \
    wget \
    curl \
    git \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxi6 \
    libxtst6 \
    libnss3 \
    libcups2 \
    libxss1 \
    libxrandr2 \
    libasound2 \
    libpangocairo-1.0-0 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libgtk-3-0 \
    libgbm1 \
    tzdata \
    && apt-get clean && rm -rf /var/lib/apt/lists/* 
  1. Timezone Configuration: Setting the timezone is crucial for some applications. We use the tzdata package for this.
ENV TZ=UTC
RUN ln -fs /usr/share/zoneinfo/$TZ /etc/localtime && dpkg-reconfigure -f noninteractive tzdata
  1. Install Basic Utilities: We install wget, gnupg2, and curl for downloading and verifying files.
RUN apt-get update && apt-get install -y wget gnupg2 apt-transport-https curl && apt-get clean
  1. Setting Up Google Chrome: Puppeteer works best with Chrome, so we add Google Chrome’s public key and repository to install the stable version.
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
RUN apt-get update && apt-get install -y \
    git \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxi6 \
    libxtst6 \
    libnss3 \
    libcups2 \
    libxss1 \
    libxrandr2 \
    libasound2 \
    libpangocairo-1.0-0 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libgtk-3-0 \
    libgbm1 \
    tzdata \
    google-chrome-stable \
    && apt-get clean && rm -rf /var/lib/apt/lists/*
  1. Puppeteer Environment Variables: We set environment variables to prevent Puppeteer from downloading Chromium as we are using Chrome.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable
ENV CHROME_BIN=/usr/bin/google-chrome-stable
  1. Node.js Dependencies: Copy the package.json and package-lock.json files and install dependencies.
COPY package*.json ./
RUN npm install
  1. Application Setup: Copy the application source code and build it if necessary.
COPY . .
RUN npm run build
  1. Starting the Application: Finally, we set the command to start the application.
CMD ["npm", "run", "start:dev"]