Skip to main content
Le Kiet
Software Engineer | Data Scientist
View all authors

Architecture

· 5 min read
9h
Just a random guy who loves coding.
Le Kiet
Software Engineer | Data Scientist

Overview

This document provides a high-level overview of the architecture for our Large Language Model (LLM) application, designed to assist back-office employees with intelligent query responses. The application leverages a combination of a large language model (LLM), document embedding, semantic search, and caching to deliver fast and accurate answers based on company documents. Each component is containerized, enabling scalability, modularity, and ease of deployment.

Architecture

Architecture Diagram

The following is a simplified view of the system’s core components and their interactions:

  • User Interface layer
  • API & Backend layer
  • LLM & Embedding layer
  • Data Management layer
  • Document processing layer

Components

1. User Interface layer

Web Frontend

  • Service: NextJS
  • Function: Provides the primary user interface for employees to interact with the assistant. Built with NextJS, the frontend is responsive and optimized for delivering a seamless user experience.

Zalo Chatbot

  • Service: using Zalo API
  • Function: Acts as an additional chatbot interface, potentially integrated with Zalo (a popular chat platform). The chatbot is proxied through Nginx for better performance and security.

2. API & Backend layer

Web Server

  • Service: Nginx
  • Function: Operates as a reverse proxy, routing requests from the frontend to the backend API server. Nginx enhances security and load balancing, helping manage high request volumes effectively.

API Server

  • Service: FastAPI
  • Function: The core backend server, handling API requests, managing cache checks, interfacing with the LLM, and orchestrating document retrieval. FastAPI’s asynchronous capabilities allow the API to handle multiple requests efficiently, making it suitable for real-time applications.

3. LLM & Embedding layer

Large language model

  • Service: OpenAI API/self-host LLM model
  • Function: The primary LLM for answering user queries. The model receives user queries (with context provided from relevant documents) and generates human-like responses.

Embedding model

  • Service: OpenAI API/self-host embedding model
  • Function: Maps text to vector embeddings, a process crucial for similarity search. This component enables the system to find and rank relevant documents based on user queries. The choice of GPT or PhoBERT allows flexibility in handling multilingual or domain-specific embedding needs.

4. Data Management layer

LLM Cache

  • Service: Redis
  • Function: Caches responses from the LLM based on query embeddings, minimizing redundant API calls. This significantly reduces response latency for frequently asked questions and optimizes API usage costs.

Vector Database

  • Service: Qdrant
  • Function: Stores vector embeddings generated from documents and supports semantic search. When a query is received, Qdrant enables the system to quickly find and return similar documents based on vector similarity.

Relational Database

  • Service: PostgreSQL
  • Function: Manages structured data, including user information, permissions, and system metadata. This database is essential for tracking user access and managing query history and other operational data.

5. Document Processing layer

Document Storage

  • Service: MinIO
  • Function: Stores documents (in PDF format) sourced from the company’s website or other repositories. MinIO, an S3-compatible object storage solution, offers scalable and private storage for company documents.

Background Task Queue

  • Service: Celery
  • Function: Orchestrates background tasks, including document fetching, processing, and embedding. This allows the system to handle these tasks asynchronously, ensuring that document indexing does not impact the main query processing flow.

Workflow

  1. User Query:
  • A user submits a query through the Web Frontend or Zalo Chatbot.
  1. Request Handling:
  • The query is routed via the Web Server (Nginx) to the API Server (FastAPI).
  1. Cache Lookup:
  • The API Server checks the LLM Cache (Redis) to see if a response for this query is available.
  • If a cached response is found, it is returned to the user, reducing latency and API costs.
  1. Embedding and Document Retrieval:
  • If no cache hit, the query is embedded by the Embedding Model and sent to the Vector Database (Qdrant).
  • Qdrant performs a similarity search, returning relevant documents.
  1. LLM Query:
  • The API Server forwards the user’s query and relevant document context to the Large Language Model (OpenAI API) for response generation.
  • The response is then cached in Redis for future use.
  1. Background Document Processing:
  • Periodically, Celery fetches new documents from the Company’s Website, processes them into vector embeddings using the Embedding Model, and stores these embeddings in the Vector Database for future query matching.

Acknowledgements

This application builds on top of other open-source projects and leverages them:

  • NextJS for a dynamic and responsive front-end.
  • FastAPI for a high-performance, asynchronous API backend.
  • Redis for caching responses and reducing latency.
  • Qdrant as a vector database for semantic search.
  • MinIO for scalable object storage of documents.
  • Celery for managing background tasks, ensuring real-time user experience is not impacted by document processing.

This high-level architecture is designed to support a fast, scalable, and cost-effective solution for assisting employees with document-based query responses. It leverages modern microservices architecture principles, providing a robust foundation for further enhancements.

Naming Conventions

· 6 min read
9h
Just a random guy who loves coding.
Le Kiet
Software Engineer | Data Scientist

A well-defined Git naming convention enhances collaboration, supports automated versioning, and simplifies project maintenance. This document outlines recommended naming conventions for branches, commits, and tags to facilitate a smooth workflow and compatibility with Semantic Release.

Commit Conventions

0. Scopes

Backend Scopes (Python, FastAPI)

  • api: General API-related changes, such as routes or request handling.
    • Example: fix(api): correct token validation issue
  • auth: Authentication and authorization logic.
    • Example: feat(auth): add OAuth2 support
  • llm: LLM-specific logic, including model loading and response generation.
    • Example: perf(llm): optimize LLM response caching
  • database: Database models, migrations, or queries.
    • Example: fix(database): correct migration script for user model
  • schemas: Pydantic or data validation schemas.
    • Example: refactor(schemas): update response schema for LLM results
  • middleware: FastAPI middleware (e.g., CORS, logging).
    • Example: chore(middleware): add logging middleware for request tracing
  • background: Background tasks or scheduled jobs.
    • Example: feat(background): add scheduled job to refresh model
  • config: Configuration files or environment variable updates.
    • Example: chore(config): add env variable for LLM model path
  • testing: Unit and integration tests for backend logic.
    • Example: test(testing): add tests for LLM response accuracy
  • deps: Dependency updates or additions.
    • Example: chore(deps): upgrade FastAPI to latest version

Frontend Scopes (JavaScript, Next.js)

  • ui: General UI/UX changes, layout adjustments, and styling.
    • Example: style(ui): improve layout for model response view
  • auth: Frontend authentication handling, including login and token management.
    • Example: feat(auth): add persistent session handling
  • api: Frontend API calls or data fetching logic.
    • Example: refactor(api): update request headers for secured endpoint
  • llm-display: UI components specifically for displaying LLM responses.
    • Example: feat(llm-display): add loading spinner for LLM responses
  • forms: Form components for user inputs, especially related to LLM queries.
    • Example: fix(forms): correct validation error messages
  • state: Application state management (e.g., Redux, context API).
    • Example: chore(state): add state management for user settings
  • config: Frontend configuration, environment variables, or Next.js settings.
    • Example: chore(config): add environment variable for API base URL
  • deps: Frontend dependencies and package management.
    • Example: chore(deps): update Next.js to latest version
  • i18n: Localization and language support.
    • Example: feat(i18n): add support for Spanish language
  • testing: Unit or end-to-end tests for frontend components.
    • Example: test(testing): add Jest tests for LLM response UI
  • seo: SEO-related adjustments (e.g., meta tags, titles).
    • Example: chore(seo): update meta description for homepage

Full-Stack or Shared Scopes

  • docker: Docker configuration and setup.
    • Example: chore(docker): optimize Dockerfile for smaller image size
  • docs: Documentation changes for both backend and frontend.
    • Example: docs(docs): update README with API usage examples
  • env: Environment variables or configuration shared between backend and frontend.
    • Example: chore(env): add new environment variable for model type
  • build: Build scripts or deployment configurations.
    • Example: chore(build): optimize build for production deployment
  • ci: Continuous Integration setup or updates.
    • Example: chore(ci): add automated test for LLM endpoint responses

1. Branch Naming Conventions

Branch naming conventions help distinguish between different types of work and keep the Git repository organized. Each branch name should reflect its purpose and scope, using a clear and consistent naming pattern.

Primary Branches (Default Branches)

Primary branches represent key stages of the project lifecycle:

  • main: The main branch contains the latest stable production-ready code.

Feature Branches

Feature branches are used to work on new features or enhancements in isolation from the main codebase.

Pattern: feat/{issue-id}/{short-description}

Examples:

  • feat/102/add-user-authentication
  • feat/145/improve-dashboard-ui

Guidelines:

  • Issue ID: Include the issue or task ID if using a tracking tool like Jira or GitHub Issues.
  • Short Description: Use a concise description in kebab-case (lowercase, separated by hyphens) to describe the purpose of the feature.

Hotfix and Bugfix Branches

Hotfix and bugfix branches are used to address issues either in production or development environments.

Hotfix Branches

Pattern: hotfix/{short-description}

Examples:

  • hotfix/fix-critical-auth-bug
  • hotfix/remove-duplicate-entries

Bugfix Branches

Pattern: bugfix/{issue-id}/{short-description}

Examples:

  • bugfix/203-correct-profile-picture-upload
  • bugfix/207-fix-login-loop-error

2. Commit Message Conventions

Using semantic and structured commit messages ensures readability, consistency, and compatibility with Semantic Release tools, enabling automated versioning.

Commit Conventions

Commit Message Structure

A standard commit message follows this format:

<type>(<scope>): <subject>
  • type: Specifies the type of change (e.g., feat, fix) to indicate its impact on the code.
  • scope: Indicates the module or area affected (optional).
  • subject: A brief, imperative description of the change.

Semantic Commit Types

The following commit types align with Semantic Versioning, automatically updating version numbers based on the types of changes introduced.

  • feat: A new feature (increases the MINOR version).
  • fix: A bug fix (increases the PATCH version).
  • docs: Documentation changes only.
  • style: Changes in code formatting, not affecting code behavior.
  • refactor: Refactoring code without affecting functionality.
  • test: Adding or modifying tests.
  • perf: Code changes that improve performance.
  • chore: Routine tasks, maintenance, or build changes.
  • build: Changes affecting the build system or dependencies.

Best Practices for Commit Messages

Examples:

feat(auth): add OAuth2 support for social login
fix(ui): correct layout issue on mobile navbar
docs(readme): update installation instructions

Guidelines:

  • Use the imperative form: (e.g., “fix,” not “fixed”).
  • Keep subjects concise (50 characters or less).
  • Add a body section if further explanation is needed, separated by a blank line after the subject line.
  • Avoid WIP (Work In Progress) commits in the main branches.

3. Tag Naming Conventions

Tags mark specific points in the project’s history, typically used for release versions. Follow Semantic Versioning format for tagging releases.

Release Tags

Semantic Release relies on tags for versioning. Tags should be consistent, indicating the major, minor, and patch levels as vMAJOR.MINOR.PATCH.

Pattern: v<MAJOR>.<MINOR>.<PATCH>

Examples:

v1.0.0
v1.2.1
v2.0.0-alpha

Notes:

  • Pre-release tags: Use pre-release identifiers (e.g., alpha, beta) for early versions (e.g., v1.0.0-alpha).
  • Automated Tagging: When using Semantic Release, tags are automatically generated based on commit messages, ensuring consistent versioning.

Roadmap

· 6 min read
9h
Just a random guy who loves coding.
Le Kiet
Software Engineer | Data Scientist

This milestone plan is structured to guide the development of a full-stack LLM application from ideation to full public launch, focusing on core functionality, testing, and iterative feedback.

Summary Table of Milestones

MilestoneGoalKey Deliverables
1. Discovery and PlanningDefine scope and architectureRequirements doc, architecture diagram
2. Core Infrastructure SetupSet up essential infrastructureStorage, databases, task queue
3. Prototype CompletionDevelop basic LLM functionalityBasic functional prototype
4. Core Features DevelopedAdd document processing, caching, searchFeature-complete system
5. Alpha ReleaseInternal testing and feedbackStable alpha version
6. Beta ReleaseLimited user testing and feedbackBeta version, user feedback
7. Performance OptimizationOptimize performance and securityOptimized, secure application
8. Public Launch PrepFinalize for public releaseLaunch-ready product, documentation
9. Public LaunchRelease to all usersPublicly accessible application
10. Post-Launch ImprovementsMinor enhancements and bug fixesImproved version with fixes
11. Scaling and MaintenanceEnsure scalability and plan future growthScalable, maintained application

Milestone 1: Discovery and Planning

  • Goal: Define project requirements, key objectives, and architecture.
  • Details:
    • Conduct research on user needs, LLM capabilities, and expected system load.
    • Define project scope, including the primary features and user stories.
    • Outline initial system architecture (frontend, backend, cache, databases).
    • Set up project repository, CI/CD pipelines, and preliminary development environment.
  • Deliverables: Requirements document, architecture diagram, project timeline, and initial codebase setup.

Milestone 2: Core Infrastructure Setup

  • Goal: Establish core infrastructure, including storage, databases, and background processing.
  • Details:
    • Set up and configure Document Storage (AWS S3 or MinIO), Vector Database (Qdrant), and Relational Database (SQL Server).
    • Configure the background task queue (Celery on AWS Step Functions or similar) for asynchronous tasks like embedding documents.
    • Establish serverless or managed services for storage and databases to reduce costs and improve scalability.
  • Deliverables: Functional infrastructure with integration tests confirming connectivity and data storage.

Milestone 3: Prototype Completion

  • Goal: Develop a functional prototype with basic LLM and embedding model integration.
  • Details:
    • Integrate the LLM API (e.g., OpenAI API) and embedding model (GPT or PhoBERT) for basic query processing.
    • Build a simple frontend UI (NextJS) for users to submit queries and receive responses.
    • Implement a basic API server (FastAPI on AWS Lambda) with reverse proxy (Nginx).
    • Test end-to-end flow from user input to LLM response and back.
  • Deliverables: Basic functional prototype enabling user queries and AI-generated responses.

Milestone 4: Core Features Developed

  • Goal: Implement essential features like document processing, caching, and vector search.
  • Details:
    • Implement background tasks for document embedding and storage in the Vector Database.
    • Add caching (Redis) for frequently accessed queries to improve response times.
    • Build vector-based search functionality for finding similar documents based on embeddings.
    • Refine UI and backend to support these features with scalability in mind.
  • Deliverables: Feature-complete application with document embedding, caching, and search.

Milestone 5: Alpha Release (Internal Testing)

  • Goal: Launch an internal version for testing core functionalities and performance.
  • Details:
    • Deploy to a test environment with restricted access for internal team members.
    • Conduct usability testing for UI, API reliability, and system stability.
    • Run initial performance tests to check if response times and throughput meet expectations.
    • Collect internal feedback to refine UX and resolve major issues.
  • Deliverables: Stable alpha version with feedback from internal testing and an updated backlog.

Milestone 6: Beta Release (Limited User Testing)

  • Goal: Release a beta version for a limited audience to gather real user feedback.
  • Details:
    • Deploy to a production-like environment with limited access for select users.
    • Implement logging and monitoring to capture usage patterns, errors, and bottlenecks.
    • Integrate a feedback system to gather insights on user experience and response accuracy.
  • Deliverables: Beta version with logging and analytics, and documented user feedback.

Milestone 7: Performance and Security Optimization

  • Goal: Optimize the system for speed, efficiency, and security.
  • Details:
    • Optimize caching for frequently accessed responses.
    • Review and optimize vector database queries for faster results.
    • Conduct a security audit focused on data protection, authentication, and permissions.
    • Run load testing to confirm system scalability.
  • Deliverables: Performance-optimized, secure application ready for a wider release, with testing reports.

Milestone 8: Public Launch Preparation

  • Goal: Finalize all details for a successful public release.
  • Details:
    • Complete final testing, including UAT (User Acceptance Testing), to ensure functional readiness.
    • Finalize documentation for users and developers, including API docs and setup guides.
    • Prepare marketing and user onboarding materials, if needed.
    • Establish support channels for post-launch user queries.
  • Deliverables: Launch-ready product, comprehensive documentation, and user onboarding resources.

Milestone 9: Version 1.0 - Public Launch

  • Goal: Release the application to all intended users.
  • Details:
    • Officially launch the application with monitoring for critical issues.
    • Monitor user activity and metrics for the first 24-48 hours to ensure stability.
    • Provide support for major issues with a rapid response plan.
  • Deliverables: Full public release with active user support and monitoring.

Milestone 10: Post-Launch Improvements (Version 1.1)

  • Goal: Address post-launch feedback and add minor improvements.
  • Details:
    • Collect and analyze public feedback to identify improvement areas.
    • Resolve post-launch issues or bugs based on user reports and monitoring.
    • Implement minor feature enhancements or optimizations.
  • Deliverables: Improved version with bug fixes, optimizations, and minor enhancements.

Milestone 11: Scaling and Maintenance (Ongoing)

  • Goal: Ensure the application is scalable and well-maintained.
  • Details:
    • Review performance under varying loads to identify scaling needs.
    • Set up maintenance plans for databases, storage, and serverless resources to optimize costs.
    • Prioritize future feature requests for competitive advantage.
  • Deliverables: Scalable, well-maintained application with a roadmap for future updates.

This plan provides a structured path from initial development to launch and ongoing maintenance, helping ensure that each stage is well-prepared and aligned with the project’s goals.

donut ver two

· One min read
Le Kiet
Software Engineer | Data Scientist
Andy Sloane
A very cool person
_,x,y,o       ,N;char       b[1840]       ;p(n,c)
{for(;n --;x++) c==10?y +=80,x=
o-1:x>= 0?80>x? c!='~'? b[y+x]=
c:0:0:0 ;}c(q,l ,r,o,v) char*l,
*r;{for (;q>=0; )q=("A" "YLrZ^"
"w^?EX" "novne" "bYV" "dO}LE"
"{yWlw" "Jl_Ja|[ur]zovpu" "" "i]e|y"
"ao_Be" "osmIg}r]]r]m|wkZU}{O}" "xys]]\
x|ya|y" "sm||{uel}|r{yIcsm||ya[{uE" "{qY\
w|gGor" "VrVWioriI}Qac{{BIY[sXjjsVW]aM" "T\
tXjjss" "sV_OUkRUlSiorVXp_qOM>E{BadB"[_/6 ]-
62>>_++ %6&1?r[q]:l[q])-o;return q;}E(a){for (
o= x=a,y=0,_=0;1095>_;)a= " <.,`'/)(\n-" "\\_~"[
c (12,"!%*/')#3" "" "+-6,8","\"(.$" "01245"
" &79",46)+14], p("" "#$%&'()0:439 "[ c(10
, "&(*#,./1345" ,"')" "+%-$02\"! ", 44)+12]
-34,a); }main(k){float A=0,B= 0,i,j,z[1840];
puts("" "\x1b[2J");;; for(;; ){float e=sin
(A), n= sin(B),g=cos( A),m= cos(B);for(k=
0;1840> k;k++)y=-10-k/ 80 ,o=41+(k%80-40
)* 1.3/y+n,N=A-100.0/y,b[k]=".#"[o+N&1], z[k]=0;
E( 80-(int)(9*B)%250);for(j=0;6.28>j;j +=0.07)
for (i=0;6.28>i;i+=0.02){float c=sin( i), d=
cos( j),f=sin(j),h=d+2,D=15/(c*h*e+f *g+5),l
=cos(i) ,t=c*h*g-f*e;x=40+2*D*(l*h* m-t*n
),y=12+ D *(l*h*n+t*m),o=x+80*y,N =8*((f*
e-c*d*g )*m -c*d*e-f*g-l*d*n) ;if(D>z
[o])z[o ]=D,b[ o]=" ." ".,,-+"
"+=#$@" [N>0?N: 0];;;;} printf(
"%c[H", 27);for (k=1;18 *100+41
>k;k++) putchar (k%80?b [k]:10)
;;;;A+= 0.053;; B+=0.03 ;;;;;}}

(as with the first one, compile it with -lm, and it needs ANSI-ish terminal emulation)

have a donut

· One min read
Le Kiet
Software Engineer | Data Scientist
Andy Sloane
A very cool person

(compile with gcc -o donut donut.c -lm, and it needs ANSI- or VT100-like emulation)

             k;double sin()
,cos();main(){float A=
0,B=0,i,j,z[1760];char b[
1760];printf("\x1b[2J");for(;;
){memset(b,32,1760);memset(z,0,7040)
;for(j=0;6.28>j;j+=0.07)for(i=0;6.28
>i;i+=0.02){float c=sin(i),d=cos(j),e=
sin(A),f=sin(j),g=cos(A),h=d+2,D=1/(c*
h*e+f*g+5),l=cos (i),m=cos(B),n=s\
in(B),t=c*h*g-f* e;int x=40+30*D*
(l*h*m-t*n),y= 12+15*D*(l*h*n
+t*m),o=x+80*y, N=8*((f*e-c*d*g
)*m-c*d*e-f*g-l *d*n);if(22>y&&
y>0&&x>0&&80>x&&D>z[o]){z[o]=D;;;b[o]=
".,-~:;=!*#$@"[N>0?N:0];}}/*#****!!-*/
printf("\x1b[H");for(k=0;1761>k;k++)
putchar(k%80?b[k]:10);A+=0.04;B+=
0.02;}}/*****####*******!!=;:~
~::==!!!**********!!!==::-
.,~~;;;========;;;:~-.
..,--------,*/

(This was my first attempt at obfuscated C and I feel it's pretty amateurish; se Donut Marke II for a more impressive demo — though this one is simple and elegant in comparison.)