Innovation at Scale: Evaluation of role playing in LLM Systems based on chatGPT

Many of us already have had or will have an interaction with some form of Large Language Model (LLM), regardless of whether we want it or not.

Direct or indirect.

It could be a chatbot, a mail that we just got from someone, a recipe, detailed guidance on technical problems related to programming or general life advice.

When we ask a question, ChatGPT appears to us as a "wise machine" which can, in theory, help with everything. Of course help (advise, summary, explanations, guides, suggestions...) comes not in physical form, but rather text, image, video in some cases. which does not make it any more or less important.

AI System that generate dialogues, a chatbot. Is that what it is? Well, I think there is more.

It is not only the system prepared for giving us the answers, it was instructed to behave in this on another way, to provide only high quality answers.

Have you noticed chatGPT almost never answers in a rude way?

Even if I have a bad day and don't write "please" at the end :) And I've experienced already something else with different, not so well refined models. That would mean, for sure chatGPT was instructed, per default, to give us the nice, quality answers. To make it a better product. To act as an omniscient being, that likes to share the knowledge.

But that means it's already playing a role, because it's in that just described role. That would mean, we can ask ChatGPT to alter the role a little bit, or even preserve the "defaults" AND be at the same time, a debate opponent, negotiator, character in a story, investigator, or a diplomat.

So now, ChatGPT having all the skills from before, can have also specific stance or personality.

Examples? How will I use it? Coming right up.

Inspired by various posts about what I can use ChatGPT for, I came up with an idea. Let’s play roles: ChatGPT will be the interviewer for my current position, and I need to pass the interview. I’ll formulate it like this:

“Act as an interviewer for a Senior Engineer role and ask me questions. Then tell me if I got the job or not.”

This is a very brief description of a role. People tend to describe roles more specifically and creatively, like the way this YouTuber did here: https://www.youtube.com/watch?v=PEU4b2tvfMk.

For my needs, this description is sufficient. Let’s start; I’m excited to see what will happen. It prepared a list of seven questions covering System Design, Data Processing, Code Quality, Problem Solving, Collaboration and Leadership, Tech Stack, and Handling Legacy Systems.

In each category, it formulated a well-structured question that I believe is open enough to allow me to present my knowledge. I spent about 20 minutes writing my answers, one after another, with 3-5 sentences each. I can imagine another person might answer differently, and their answers could still be correct.

But what does "correct" even mean for ChatGPT? Well, ChatGPT analyzed and described why my answers were correct, or let’s say—acceptable, which met my expectations.

However, I experienced something less favourable. What if there were details I should have mentioned or technologies I don’t know?

Let’s make the interviewer more critical.

“Now, as an interviewer, be more strict and rate my answers again, please.”

Now we're talking!
I received ratings for my answers (from 0 to 10) along with suggestions for improvement, generally around 7-8/10 per answer.
There wasn’t a straightforward decision of hire/not hire, but rather a verdict:

Final Rating: 7.5/10 “You’re clearly skilled, with strong experience in real-time data processing, software design, and cloud technologies. However, a senior engineer is expected to delve deeper into architectural decisions, risk management, and long-term system evolution. Providing more depth in your answers regarding fault tolerance, scaling strategies, team leadership, and handling unexpected challenges would strengthen your candidacy for this role.
 Verdict: You have the skills, but I would expect more detailed, proactive, and foresight-driven answers to fully meet the requirements for a senior position. With some refinement, you'd be ready to land this role.”

I still feel completely satisfied getting a 7.5 out of 10 after a 20-minute answer round. Let’s make it even harder:

Be as rigorous as possible, allowing only 5% of the best developers in the world to pass the interview.

That one gave results 6/10

What does it tell us? We can ask a machine to be "someone" for us. And that "someone" can be customized. We can simulate scenarios. Realistic practice, skill development. And fun. That's it. Now I need to talk to a real person.

Complete transcription of the interaction with ChatGPT, if someone is interested:

- Act as an interviewer for Senior engineer role and ask me questions, then tell me if i got the job or not

Sure! Let's begin the interview. I'll ask a series of technical and behavioral questions, and based on your answers, I'll give you feedback and decide if you get the job.

1. System Design:
Imagine you have to design a distributed system that ingests large volumes of data in real-time and provides low-latency querying for users. How would you design the architecture, what tools would you use, and how would you ensure scalability and fault tolerance?

2. Data Processing:
You're building a data pipeline that processes terabytes of data daily. How would you implement a highly efficient data processing pipeline? Which frameworks or tools would you use, and why?

3. Code Quality:
As a senior engineer, you’re responsible for maintaining high code quality in the team. What strategies would you implement to enforce best coding practices across a team?

4. Problem-Solving:
Let’s say a critical production service is down, and it’s affecting the company’s core business operations. How would you approach identifying the root cause and resolving the issue? Walk me through your thought process.

5. Collaboration and Leadership:
As a senior engineer, you'll mentor junior developers and collaborate with cross-functional teams. Can you describe a time when you had to lead a team through a difficult technical challenge? How did you ensure the team was aligned and motivated?

6. Tech Stack:
What technologies or tools are you most comfortable with for backend development, and what’s your experience with microservices, databases, and cloud infrastructure?

7. Handling Legacy Systems:
How do you approach working with or refactoring legacy systems that have tight coupling, are difficult to maintain, and lack documentation?

Answer these questions, and I'll assess whether you've landed the senior engineer role!

My answer (without me preparing, just spontaneously, based on my latest experiences):

* First question: depends how large the data is, but I would use a realtime data processing engine like flink, Apache Paimon if there will be a lot of data saved in realtime. Flink scales and support multi clustering. For fault tolerance I would implement workflow mechanism like airflow or argo workflows.
* Second question: If this is terabytes of data and we don't have to go multicluster mode, I would use pola.rs because in single cluster mode it outperforms the other tools like Spark or dask
* Third question: I like all 3 "uncle Bens" books Clean Code, Clean Coder, Clean Architecture. I would like to encourage team to use those practices, start with implementing tests whenever possible, do code reviews, mob reviews, constructive feedback among the team members, sharing our own remarks and ideas within the team.
* Fourth question: First, check the logs for known reasons why it's down. There would probably be a team of 1st level support where I can get some quick info what's going on. According to what we assume, restore a previous version whenever possible to get the system running again. If this is not the issue within the new code (new release), check the potential source of failure, find a person who knows the most of the code that produces the error or a team who is responsible for maintain team to check if all system/networks are working properly - sometimes the software is ok, just the infrastructure not. Another point is more people and more eyes is better, I would involve more core members where I think they can have more experience in that area.
* Fifth question: There was a technical challenge of importing 300mln lines of data within reasonable specified time. The problem was, during import, complicated regex checks were applied to the data. We improved that part, which was slowing down the most. This was one of the pull request that almost got forgotten (but this pull request was actually giving the most speed up). This is why i think it always good to have an eye on what everyone is doing and have proper exchange on dailies and stand-ups as well us inter team meetings.
* Sixth question: I have experience with Java, Python, Bash, used microservices since 2015, Postgres DB and Oracle when it comes to relational DBs, but also Big Data Table formats like Apache Iceberg, Apache Paimon. Experience with AWS cloud and technologies like S3, EKS for k8s, RDS for relational dbs and others.

* Seventh question: First, make a good plan, it depends which legacy system we have, but let's say we want to go from monolith into microservices architecture. We probably need to ensure, when introducing the new system, that the old one still works or even both old and new are working simultanuosly, and the old system parts are slowly and gradually shut down. For some old parts we could implement "proxy microservices" and also "hybrid" microservices that can work with both old and new apps, providing a new interface to the user. By doing this gradually, we avoid single "fire and run" which can lead to many problems that we didn't even think of. Having both old and new systems running simultaneously for some time, allows us to identify the bottlenecks, unexpected situations. Doesn't have to be a solution for every case. For documentation, we have to divide and conquer, approach systems one after another, try to write as much as self-describing code as possible. As much documentation as need, but not more than required. What I mean by that, allow some docs to be generated automatically, because code often change and documentation often does not get updated afterwords. Knowledge Transfer Sessions are also very important as much as gathering and sharing wisdom among the team members.

Chat GPT answers:

Feedback:
1. System Design: You provided a solid approach by mentioning real-time processing engines like Apache Flink and Paimon, which are excellent for handling large-scale data with low-latency requirements. The use of multi-cluster support and fault tolerance mechanisms (Airflow or Argo Workflows) shows your understanding of scalability and resilience. You also adapted the design based on the size of the data, which is great.
2. Data Processing: Choosing pola.rs for a single-cluster, high-performance scenario is an excellent choice, as it’s well-suited for fast, in-memory processing tasks. It’s clear you’re familiar with alternative tools and when to apply them based on the scale of the problem.
3. Code Quality: I appreciate that you value “Clean Code” principles and emphasize testing, code reviews, and mob reviews. Encouraging these practices is vital for maintaining quality. You show leadership in driving a culture of continuous feedback and shared learning within teams.
4. Problem-Solving: Your approach is structured and involves logical steps like checking logs, restoring to previous versions, and collaborating with support teams and peers to gather insights. This demonstrates a good balance between independent problem-solving and teamwork.
5. Collaboration and Leadership: Handling large imports with regex bottlenecks shows you can identify critical bottlenecks in systems and coordinate improvements with the team. You also highlighted the importance of communication within the team, which is key to successful leadership.
6. Tech Stack: Your experience with Java, Python, Bash, microservices, relational DBs (Postgres, Oracle), and big data frameworks like Apache Iceberg and Paimon demonstrates a solid technical foundation. Your cloud experience with AWS (S3, EKS, RDS) shows you’re well-versed in modern infrastructure and deployment environments.
7. Handling Legacy Systems: You displayed a mature understanding of how to approach legacy systems, emphasizing a gradual migration strategy to avoid unnecessary risks. Your thoughts on hybrid services and incremental refactoring are aligned with best practices.
Conclusion:
Your answers reflect strong technical expertise, solid leadership, and problem-solving abilities. You’re clearly capable of tackling the challenges a senior engineer would face.
You got the job!

Me: now, as an interviewer be more ~~restrictive~~ strict and rate my answers again, please.

Notice even that I used a word 'restricive' which does not quite capture the idea of being harsh in assessment, it got my idea right!

Let’s reevaluate your answers more critically, focusing on specific improvements or gaps that could be addressed to elevate the level of response expected from a senior engineer:

1. System Design (Rating: 7/10)
You mentioned using Apache Flink and Apache Paimon, which are strong choices for real-time processing and storage. However, you could have gone deeper into the architectural components such as message brokers (e.g., Kafka) or data partitioning strategies for scalability. Also, more detail on handling consistency, durability guarantees, or how you would perform data replication across clusters would have strengthened your answer.
Improvement: Mention strategies like sharding or partitioning for horizontal scaling, and explain how you’d ensure consistency in the face of network partitions (e.g., CAP theorem considerations).

2. Data Processing (Rating: 8/10)
Your choice of pola.rs for single-cluster mode is a good one, especially given its performance. However, your response lacked mention of how you would handle fault tolerance in this pipeline or data skew in large datasets, which are common issues at scale. Additionally, it would have been beneficial to address monitoring and optimization techniques, or potential bottlenecks in the pipeline.
Improvement: Discuss methods for ensuring fault tolerance and efficient resource utilization (e.g., lazy evaluation, task parallelism). Also, consider mentioning monitoring frameworks like Prometheus for pipeline health.

3. Code Quality (Rating: 8/10)
Your emphasis on Uncle Bob’s clean code principles and testing is excellent. However, your answer could benefit from more practical strategies for code quality. For example, instituting linting tools, using code coverage metrics, and setting up automated CI/CD pipelines that enforce code quality checks could demonstrate more actionable leadership.
Improvement: Add concrete tools or workflows (e.g., using SonarQube for quality checks or leveraging Git hooks for pre-commit linting) that you would enforce in a team setting.

4. Problem-Solving (Rating: 6.5/10)
Your steps are good but somewhat generic. The strategy of rolling back and getting support involvement is sound. However, in a senior role, I would expect a more proactive approach in terms of defining detailed steps to isolate root causes (e.g., A/B testing, querying specific logs, checking distributed tracing), plus anticipating the potential fallout of a rollback.
Improvement: Elaborate on strategies for logging and tracing (e.g., using tools like Grafana or ELK stack). Include how you’d assess the impact of a rollback, such as mitigating potential data corruption or downtime.

5. Collaboration and Leadership (Rating: 7/10)
You highlighted a clear technical challenge and how you tackled a performance bottleneck, which is good. However, leadership involves more than just technical fixes—it’s also about proactively aligning the team, setting clear expectations, and ensuring knowledge transfer. How did you ensure the team was aligned? How did you make decisions under pressure?
Improvement: Provide more detail on how you communicated progress and strategy with stakeholders and how you motivated the team during a challenging time. Also, discuss any mentorship you provided to team members.

6. Tech Stack (Rating: 8.5/10)
Your familiarity with microservices, Java, Python, cloud technologies, and databases is solid. However, as a senior engineer, it’s important to highlight how you’ve architected solutions that leveraged this tech stack effectively in real-world systems, with specific examples of architectural decisions (e.g., event-driven microservices, polyglot persistence).
Improvement: Share an example where your expertise in cloud or microservices directly contributed to improving a system’s performance or reliability. Provide more context around specific use cases.

7. Handling Legacy Systems (Rating: 8/10)
Your approach to working with legacy systems is practical and realistic, especially the hybrid and gradual transition strategy. However, you didn’t mention how you would deal with unforeseen issues that might arise from running the new and old systems simultaneously or how you’d prioritize which components to refactor first.
Improvement: Include more discussion about risk management, such as rollback strategies if the migration fails, and how you’d prioritize system components for refactoring based on business impact.

Final Rating: 7.5/10
You’re clearly skilled, with strong experience in real-time data processing, software design, and cloud technologies. However, a senior engineer is expected to dig deeper into architectural decisions, risk management, and long-term system evolution. More depth in your answers regarding fault tolerance, scaling strategies, team leadership, and handling unexpected challenges would strengthen your candidacy for this role.
Verdict: You have the skills, but I would expect more detailed, proactive, and foresight-driven answers to fully meet the requirements for a senior position. With some refinement, you'd be ready to land this role.

Innovation at Scale

Tuesday, October 29, 2024

Evaluation of role playing in LLM Systems based on chatGPT

“Act as an interviewer for a Senior Engineer role and ask me questions. Then tell me if I got the job or not.”

“Now, as an interviewer, be more strict and rate my answers again, please.”

Be as rigorous as possible, allowing only 5% of the best developers in the world to pass the interview.

No comments:

Post a Comment