Wednesday, May 20, 2026

Qdrant is staying in my setup for good.

Recently, I moved from application-level cosine similarity comparisons to a dedicated vector search engine running at the storage layer. Qdrant significantly improved the performance of semantic search in my pipeline, reducing similarity search time from around 40 seconds to roughly 1 second for the same workload.

Qdrant logo
What surprised me most was not only the raw speedup, but how much architectural complexity disappeared after the migration.

The previous approach was simple: embeddings were stored as regular data and compared inside the application layer. That was a reasonable starting point.

But I was indexing Flink, which is a big Repo, the architecture crossed a clear boundary.
 
At that point, the application was doing work that belonged to a specialized vector search engine:
- loading large embedding sets into memory
- calculating similarity scores one by one
- sorting candidates in the application process

Moving this responsibility into Qdrant changed the shape of the system.

The application now focuses on orchestration: parsing code, generating embeddings, storing metadata, and asking semantic questions. Qdrant handles vector indexing, similarity search, scoring, and retrieval.

That separation matters.

It made the pipeline faster, but also cleaner:
- fewer memory-heavy operations in the app layer
- clearer ownership between application logic and retrieval infrastructure
- a much better foundation for future RAG-style workflows (MCP Server?)

It doesn't have to be “always start with the most advanced tool.” Start with the simple architecture that lets you understand the problem. Then, when the system shows you where the boundary is, move the responsibility to the right layer.

In this case, vector similarity search clearly belonged in a vector database.

Qdrant turned out to be the right fit.

Repo: https://github.com/wbrycki/code-genius

Qdrant: https://qdrant.tech/

Monday, May 4, 2026

Do You Really Need Horizontal Scaling?

In times of hype around distributed systems, it’s tempting to scale right away. Using Spark, Flink, or Dask as a processing engine is often straightforward. 

But should we design for clusters from the beginning, or start simpler and evolve the system over time?

To effectively scale, we need to understand the data volume. Because horizontal scaling can often be an overkill.

Scaling is not free. The more distributed a system becomes, the higher its complexity. More components means more failure points, operational cost rises.

Vertical scaling is underrated in my opinion.

Newest machines have dozens or even hundreds of cores. Same with RAM.

The “just add more nodes” approach is often the default assumption. But what does adding more nodes really mean? Network overhead, serialization, harder debugging.

Image: Spark Data Scaling Horizontal and Vertical 

 

Example of ETL batch processing:


Spark runs in two main execution modes: 

  • local mode (driver and execution on a single machine)
  • cluster mode (distributed executors across nodes)

 

Reality Check on Horizontal Scaling

In distributed systems, moving data between partitions introduces a cost known as shuffling.

Shuffling is required to achieve proper data redistribution and balanced workload across partitions. It also occurs in single-node systems (Spark still operates with a logical distributed execution model, even when running locally). However, in cluster environments, shuffle becomes significantly more expensive due to network communication and coordination overhead.

Technical cost

  • network latency, data shuffling
  • In many cases, Spark performance is dominated more by shuffle behavior and data layout than by raw compute.
  • serialization/deserialization (can also happen on single node, for example local disk spillover)

Operational cost

  • Kubernetes, cluster management
  • monitoring, observability
  • deployments

Cognitive cost

  • debugging distributed systems
  • tracing
  • consistency issues

However, a single large machine is not a universal solution. Now we come to trade-offs.

With vertical scaling we have simplicity, lower latency, easier debugging.

However, vertical scaling is ultimately constrained by hardware limits.

At a certain point, the data size or workload characteristics simply exceed what a single machine can handle. The question is: are you actually at that point yet?

As Microsoft research point out: 

"that the majority of analytics
jobs do not process huge data sets. For example, at least
two analytics production clusters (at Microsoft and Ya-
hoo) have median job input sizes under 14 GB"

or 

that the majority of real-
world analytic jobs process less than 100 GB of input,
but popular infrastructures such as Hadoop/MapReduce
were originally designed for petascale processing."

from: Scale-up vs Scale-out for Hadoop: Time to rethink?

This research paper is from 2013 and now we are dealing with a lot more data, but not always and not everywhere - I still experienced under 100GB workloads in current times.


Now let's move to Horizontal scaling: we get scalability, fault tolerance (new Executor can re-start a task)

For some systems it's the only possibility, and we have to count the added complexity and operational overhead in. 

Making decision:

Are we CPU-bound?  -> scale vertically -> tune jobs -> then scale horizontally if not helping

Tuning jobs: data organization (e.g., Iceberg partitioning - when using Iceberg) often has higher impact than scaling compute. 

Are we really in Petabyte scale? -> scale horizontally, as single machine will probably not be effective

SLA (Service Level Agreement) Low Latency or High Availability requirements -> scale horizontally.

Does the operational cost align with business expectations? Maybe longer running job over night will also be tolerable. Cost per Job?

I would rather not scale in early stage system or up to couple TB of data. 
I would rather scale for massive datasets, HA requirements, or when dealing with streaming or real-time.

Even if we need to think about scaling from the beginning, it is still better to start simple and scale later. But always measure first.
It is useful to understand tools like autoscalers and Kubernetes taints & tolerations (in environments such as EC2 nodes on EKS), but it is equally important to know when not to use them.

As always, it is not only CPU and memory that matter, but also I/O bottleneck. A larger VM does not always translate into linear performance gains. And scaling is a decision, not a default.


Tuesday, October 29, 2024

Evaluation of role playing in LLM Systems based on chatGPT


Many of us already have had or will have an interaction with some form of Large Language Model (LLM), regardless of whether we want it or not. 

Direct or indirect. 

It could be a chatbot, a mail that we just got from someone, a recipe, detailed guidance on technical problems related to programming or general life advice.

When we ask a question, ChatGPT appears to us as a "wise machine" which can, in theory, help with everything. Of course help (advise, summary, explanations, guides, suggestions...) comes not in physical form, but rather text, image, video in some cases. which does not make it any more or less important.

 

AI System that generate dialogues, a chatbot. Is that what it is? Well, I think there is more.

 

 It is not only the system prepared for giving us the answers, it was instructed to behave in this on another way, to provide only high quality answers. 

 

Have you noticed chatGPT almost never answers in a rude way?

 

Even if I have a bad day and don't write "please" at the end :) And I've experienced already something else with different, not so well refined models. That would mean, for sure chatGPT was instructed, per default, to give us the nice, quality answers. To make it a better product. To act as an omniscient being, that likes to share the knowledge.

But that means it's already playing a role, because it's in that just described role. That would mean, we can ask ChatGPT to alter the role a little bit, or even preserve the "defaults" AND be at the same time, a debate opponent, negotiator, character in a story, investigator, or a diplomat. 

So now, ChatGPT having all the skills from before, can have also specific stance or personality.

Examples? How will I use it? Coming right up. 

Inspired by various posts about what I can use ChatGPT for, I came up with an idea. Let’s play roles: ChatGPT will be the interviewer for my current position, and I need to pass the interview. I’ll formulate it like this:

“Act as an interviewer for a Senior Engineer role and ask me questions. Then tell me if I got the job or not.”