## Tech projects --- ## Summary 1. How I work - My principles - Development process 2. Projects - How to save event data to build a batch processing pipeline - Migrate an OS build from manual UI ([Cubic](https://github.com/PJ-Singh-001/Cubic)) to a fully automated build system --- ## My principles * [KISS](https://en.wikipedia.org/wiki/KISS_principle) * [Lean Software Development](https://en.wikipedia.org/wiki/Lean_software_development) --- ## Development process - Create an epic and an epic specification issue - Collect requirements - Build needed proof of concepts - Refine the epic and user stories - Integrate user stories into the codebase - Track analytics --- ## How to save event data to build a batch processing pipeline --- ### Context XXII creates events on some actions analyzed by the LLM per video frame. --- ### Need We want to make statistics from events data. Need: Save raw events data to be able to: - Create batch data processing pipeline - Replay any event when needed --- ### Requirements - Easily scalable - Low maintenance - Low cost --- ### Where to save data? - [Object Storage (S3)](https://en.wikipedia.org/wiki/Object_storage) - [PostgreSQL](https://www.postgresql.org/) - [OLAP database](https://en.wikipedia.org/wiki/Online_analytical_processing) Choice: `Object Storage (S3)` --- ### How to save data inside the Object Storage (S3) bucket? - Raw ([JSON Lines](https://jsonlines.org/)) - [Apache Parquet](https://parquet.apache.org/) Choice: `Raw` --- ### How to create a batch data processing pipeline? - [Kafka](https://kafka.apache.org/) - Cloud Queue ([AWS SQS](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html), [GCP Pub/Sub](https://cloud.google.com/pubsub/), etc.) - [Redis](https://redis.io/docs/) - [RabbitMQ](https://www.rabbitmq.com/) - [NATS](https://docs.nats.io/) - [PostgreSQL](https://www.postgresql.org/) Choice: `PostgreSQL as Queue` --- ### How to use PostgreSQL as a queueing system? Simple table, delete rows when processed ```sql CREATE TABLE raw_events_queue ( id uuid PRIMARY KEY, created_at TIMESTAMP WITH TIME ZONE NOT NULL, event_payload jsonb NOT NULL ); ``` --- ### Index bloat problem How to prevent it? - Edit [autovacuum](https://www.postgresql.org/docs/current/runtime-config-vacuum.html) settings - Use [table partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html) ([pg_partman](https://github.com/pgpartman/pg_partman)) - Use [TRUNCATE](https://www.postgresql.org/docs/current/sql-truncate.html) instead of [DELETE](https://www.postgresql.org/docs/current/sql-delete.html) --- ### Implementation
flowchart LR A[Appliance] -->|push events| B[Cloud Pipeline] B -->|store events| C[(Postgres Queue)] C -->|every X time| D[CronJob] D -->|store events| E[(S3 file)]
--- ### Five months later - Dozen millions of events saved - Our small database never go above 15% of usage - No linked problems --- ## Migrate an OS build from manual UI ([Cubic](https://github.com/PJ-Singh-001/Cubic)) to fully automated build system --- ### Context Build OS image using [prebacking](https://docs.aws.amazon.com/whitepapers/latest/overview-deployment-options/prebaking-vs.-bootstrapping-amis.html) for local appliances running onpremise. --- ### Problem [Cubic](https://github.com/PJ-Singh-001/Cubic). - Has only a GUI. - Limited functionalities. - Customized files not always shared. - Long build time (~30 minutes). --- ### Requirements - Functionally equivalent to current OS. - Keep [Ubuntu Subiquity](https://github.com/canonical/subiquity) installer. --- ### Understand how Cubic works Reverse engineer official Cubic debian-package to understand how it works. --- ### How to reproduce the OS build?
flowchart LR A[ISO] -->|dd extract| B[OS Filesystem] B -->|chroot + scripts| C[Customize OS Filesystem] C -->|xorriso compress| D[ISO]
--- ### How to build the OS image in CI? 1. Reproduce Cubic build with a script. 2. Use tools like [Gitlab-CI](https://docs.gitlab.com/ci/) to build the OS image and push it in a registry. --- ### How to run the OS build in CI? Problem: Some scripts need root access. Solution: Virtual Machine in CI using [Vagrant](https://www.vagrantup.com/). --- ### Results Over 1510 builds done in CI over few years. --- ## Thanks for listening! If needed I have the source code of both projects, we can dig further into it.