How to Deploy PGvector and PGadmin on Docker and Ditch Pinecone

How to Deploy PGvector and PGadmin on Docker and Ditch Pinecone

PostgreSQL is a powerful, open-source object-relational database system known for its robustness, extensibility, and compliance with SQL standards. It has been actively developed for over 35 years and is widely used for managing complex data workloads.

PGVector is an extension for PostgreSQL that enables vector similarity search, allowing you to store, query, and index vectors. This is particularly useful for applications involving high-dimensional data, such as recommendation systems, image search, and natural language processing.

PGAdmin is a web-based graphical user interface (GUI) management tool for PostgreSQL. It simplifies database administration tasks, providing a user-friendly interface for executing SQL queries, managing database objects, and monitoring database performance.

Pgvector can store embeddings generated by AI models, such as those from OpenAI. This integration allows for efficient similarity searches and retrieval-augmented generation (RAG), where embeddings help find relevant information to augment AI model responses

Pgvector transforms PostgreSQL into a powerful vector database, enabling efficient storage and querying of embeddings. This capability is essential for AI applications that rely on vector similarity, such as recommendation systems, NLP, and image search. With enhancements from pgvectorscale, PostgreSQL can now compete with specialized vector databases, offering high performance and scalability at a lower cost

How to Deploy PGvector and PGadmin on Docker

1. Prerequisites

Before you begin, make sure you have the following prerequisites in place:

Dockge and CloudFlare tunnels are optional, they are just here to help us deploy easier the compose file and have a subdomain added to the PGAdmin so we can access it via SSL with CloudFlare Protection. They are not mandatory you can use what ever stack you like.

You can use also Traefik as a reverse proxy for your apps. I have created a full tutorial with Dockge install also to manage your containers on: How to Use Traefik as A Reverse Proxy in Docker

2. Docker Compose File

The following Docker Compose file sets up a PostgreSQL database with the PGVector extension and a PGAdmin instance for database management.

version: "3.8"
services:
  db:
    hostname: pgvector_db
    container_name: pgvector_db_container
    image: ankane/pgvector
    ports:
      - 5432:5432
    restart: unless-stopped
    environment:
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_HOST_AUTH_METHOD=trust
    volumes:
      - ./local_pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 5s
      timeout: 5s
      retries: 5
  pgadmin:
    image: dpage/pgadmin4
    container_name: pgadmin4_container
    restart: unless-stopped
    ports:
      - 5016:80
    user: "$UID:$GID"
    environment:
      - PGADMIN_DEFAULT_EMAIL=${PGADMIN_DEFAULT_EMAIL}
      - PGADMIN_DEFAULT_PASSWORD=${PGADMIN_DEFAULT_PASSWORD}
    volumes:
      - ./pgadmin-data:/var/lib/pgadmin

Explanation of the Docker Compose File

  • services: Defines the services to be run.
    • db: This service sets up the PostgreSQL database with the PGVector extension.
      • hostname: Sets the hostname for the container.
      • container_name: Names the container for easier identification.
      • image: Uses the ankane/pgvector image, which includes PostgreSQL with the PGVector extension.
      • ports: Maps port 5432 on the host to port 5432 in the container.
      • restart: Ensures the container restarts unless stopped manually.
      • environment: Sets environment variables for the PostgreSQL database.
      • volumes: Mounts a local directory to persist PostgreSQL data.
      • healthcheck: Configures a health check to ensure the database is ready.
    • pgadmin: This service sets up the PGAdmin web interface.
      • image: Uses the dpage/pgadmin4 image.
      • container_name: Names the container for easier identification.
      • restart: Ensures the container restarts unless stopped manually.
      • ports: Maps port 5016 on the host to port 80 in the container.
      • user: Sets the user ID and group ID for the container.
      • environment: Sets environment variables for PGAdmin.
      • volumes: Mounts a local directory to persist PGAdmin data.

3. Environment Variables File

Create a .env file with the following content to set the necessary environment variables:

POSTGRES_USER='user'
POSTGRES_PASSWORD='pass'
POSTGRES_DB='vector'
PGADMIN_DEFAULT_PASSWORD=pgpass
[email protected]

Here you can add your users and passwords, POSTGRES ones are for the database and PGADMIN are to access the UI for administrating the PGvector database.

4. Deploying PGVector and PGAdmin

To deploy the services, run the following command:

docker-compose up -d

This command will start the PostgreSQL database with PGVector and the PGAdmin web interface in detached mode.

5. Accessing the PostgreSQL Database with PGAdmin

  1. Open your web browser and navigate to http://ip:5016.
  2. Log in using the email and password specified in the .env file.
  3. Register a new server in PGAdmin using the following details:
    • Host name/address: pgvector_db
    • Port: 5432
    • Username: The value of POSTGRES_USER from the .env file.
    • Password: The value of POSTGRES_PASSWORD from the .env file.

6. Configure the CloudFlare Tunnels

You need to let CloudFlare Tunel know which port to use, you just need to go in Access - Tunnels and choose the tunnel you created and add a hostname that will link a domain or subdomain and the service and port.

Cloudflare Tunnel setup

You can also check Setup CloudPanel as Reverse Proxy with Docker and Dokge to use CloudPanel as a reverse proxy to your Docker containers.

Conclusion

Deploying PostgreSQL with PGVector and PGAdmin using Docker simplifies the setup process and provides a powerful environment for managing high-dimensional data and performing vector similarity searches. This setup is ideal for AI-based applications and other use cases requiring efficient data management and analysis.