Skip to main content

Hardware Resourcing

· 6 min read
9h
Just a random guy who loves coding.

Recommended CPU / RAM / Disk to run EzHR Chatbot.

Running Locally (Docker Compose)

If you are running your app locally using Docker, it is advised to allocate at least 4 vCPU cores and 4 GB of RAM to Docker, with 10 GB storage being the preferred amount. You can manage these settings within the Resources section of Docker Desktop.

Single Cloud Instance

ComponentMinimum RequirementRecommended
CPU2 vCPU cores4 vCPU cores
Memory (RAM)4 GB8 GB
Disk10 GB + 2.5x indexed documents50–100 GB (extra space)
GPUNot requiredNot required

1. Processor (CPU):

  • Requirement: At least 2 vCPU cores.
  • Recommended: 4 vCPU cores for smoother performance.
  • Reason:
    • Supports concurrent execution of multiple Docker containers.
    • Handles CPU-bound processes such as the API server, Redis cache, Nginx reverse proxy, and embedding model.
  • Supported CPU architectures:
    • 64-bit system:
      • x86_64/amd64
      • AArch64/arm64
    • 32-bit system: Not supported

2. Memory (RAM):

  • Requirement: At least 4 GB of RAM.
  • Recommended: 8 GB of RAM for optimal performance.
  • Reason:
    • Provides sufficient memory for:
      • Running the embedding model and the vector database.
      • Storing cached results in Redis for improved query response times.

3. Storage

  • Requirement: Minimum of 10 GB plus roughly 2.5 times the size of the indexed documents.
  • Recommended: Extra disk space for flexibility, such as 50–100 GB, as storage is generally inexpensive.
  • Reason:
    • Document Storage (MinIO): Holds company documents (e.g., PDFs).
    • Vector Database (Qdrant): Stores document embeddings, which scale with the number and size of indexed documents.
    • Docker images and container logs require additional space.

Qdrant Consideration

For persistent storage, Qdrant requires block-level access to storage devices with a POSIX-compatible file system. Network systems such as iSCSI that provide block-level access are also acceptable.

Qdrant won't work with Network file systems such as NFS, or Object storage systems such as S3.

If you offload vectors to a local disk, we recommend you use a solid-state (SSD or NVMe) drive.

4. GPU

  • Requirement: Not Required.

5. Networking

These following ports are used by services:

  • 1433: For communcating with SQL Server as database
  • 6379: For communicating with Redis as cache
  • 6333: For communicating with Qdrant as index via HTTP API, for the Monitoring health and metrics endpoints
  • 6334: For communicating with Qdrant via gRPC
  • 9000: For access web interface of MinIO as object_storage
  • 9001: For communicating with MinIO API as object_storage
  • 5000: For communicating with Backend as api_server
  • 3000: For frontend - web interface

Details on Resource Consumsion

ComponentCPUMemoryNotes
api_server1 CPU1 GBHandles query processing and LLM integration.
background1 CPU256 MBFor document embedding and background tasks.
database (SQL)500m CPU2 GBCan share existing SQL resources to save costs.
qdrant1 CPU256 MBScales with document volume; higher recommended.
nginx + frontend250m CPU512 MBFor serving the UI and reverse proxy setup.

Scaling Considerations

For heavier workloads or larger datasets, consider:

  • Expanding CPU cores (e.g., 6–8 cores) to handle higher query throughput.
  • Increasing memory to 16 GB for processing larger datasets or running concurrent queries.
  • Reserving additional disk space to accommodate future growth in indexed documents.

Kubernetes or AWS ECS

For more fine-grained control and efficient scaling, you can deploy each component to its own dedicated resources in a Kubernetes or AWS ECS environment.

NOTE: We are not currently support deployment on kubernetes. These numbers should be reference-only.

ComponentCPUMemoryNotes
api_server1 CPU1 GBHandles query processing and LLM integration.
background1 CPU256 MBFor document embedding and background tasks.
database (SQL)500m CPU2 GBCan share existing SQL resources to save costs.
qdrant1 CPU256 MBScales with document volume; higher recommended.
nginx + frontend250m CPU512 MBFor serving the UI and reverse proxy setup.

General guides:

  1. Component Separation: Deploy each major component (api_server, background, database, qdrant, nginx, frontend) as individual Kubernetes Deployments or StatefulSets for easier scaling and management.

  2. Resource Requests and Limits: Use Kubernetes resource requests and limits to ensure fair CPU and memory allocation:

    resources:
    requests:
    memory: "256Mi"
    cpu: "500m"
    limits:
    memory: "512Mi"
    cpu: "1"
  3. Persistent Storage: Use Persistent Volume Claims (PVCs) for storage-heavy components like:

    • Qdrant: Store vector embeddings persistently.
    • MinIO: For document storage.
    • Example:
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
      name: qdrant-storage
      spec:
      accessModes:
      - ReadWriteOnce
      resources:
      requests:
      storage: 20Gi
  4. Load Balancer: Use an Ingress Controller (e.g., NGINX) to manage external traffic to the frontend and api_server.

    • Example:
      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
      name: app-ingress
      spec:
      rules:
      - host: app.example.com
      http:
      paths:
      - path: /
      pathType: Prefix
      backend:
      service:
      name: frontend-service
      port:
      number: 80
  5. Autoscaling: Use Horizontal Pod Autoscaler (HPA) for scaling critical components based on CPU or memory usage:

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
    name: api-server-hpa
    spec:
    scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
    minReplicas: 1
    maxReplicas: 5
    metrics:
    - type: Resource
    resource:
    name: cpu
    targetAverageUtilization: 80
  6. Configuration Management: Store sensitive information (API keys, database credentials) in Secrets and other configuration in ConfigMaps:

    apiVersion: v1
    kind: Secret
    metadata:
    name: api-secrets
    type: Opaque
    data:
    OPENAI_API_KEY: <base64_encoded_key>
  7. Monitoring and Logging: Use tools like Prometheus, Grafana, and Fluentd to monitor resource usage and collect logs for debugging.

  8. External Dependencies: If using external LLM services (e.g., OpenAI API), ensure network egress rules and API rate limits are configured properly.

  9. Helm or Kustomize: Use Helm charts or Kustomize to template and manage complex Kubernetes configurations for better reusability and versioning.