Kafka Interview Questions

Questions –

  1. How do you create a topic in Kafka using the Confluent CLI?
    • Command
  2. Explain the role of the Schema Registry in Kafka.
  3. How do you register a new schema in the Schema Registry?
  4. What is the importance of key-value messages in Kafka?
  5. Describe a scenario where using a random key for messages is beneficial.
  6. Provide an example where using a constant key for messages is necessary.
  7. Write a simple Kafka producer code that sends JSON messages to a topic.
  8. How do you serialize a custom object before sending it to a Kafka topic?
  9. Describe how you can handle serialization errors in Kafka producers.
  10. Write a Kafka consumer code that reads messages from a topic and deserializes them from JSON.
  11. How do you handle deserialization errors in Kafka consumers?
  12. Explain the process of deserializing messages into custom objects.
  13. What is a consumer group in Kafka, and why is it important?
  14. Describe a scenario where multiple consumer groups are used for a single topic.
  15. How does Kafka ensure load balancing among consumers in a group?
  16. How do you send JSON data to a Kafka topic and ensure it is properly serialized?
  17. Describe the process of consuming JSON data from a Kafka topic and converting it to a usable format.
  18. Explain how you can work with CSV data in Kafka, including serialization and deserialization.
  19. Write a Kafka producer code snippet that sends CSV data to a topic.
  20. Write a Kafka consumer code snippet that reads and processes CSV data from a topic.
  21. Different way to receive and ack the kafka
  22. What makes kafka fast?



2.Explain the role of the Schema Registry in Kafka.

The Schema Registry in Kafka plays a crucial role in managing schemas for data that is sent to and from Kafka topics.

Schema Management:

  • Centralized Schema Repository: The Schema Registry acts as a centralized repository for schemas used in Kafka messages. It stores and manages schemas independently from the Kafka brokers.
  • Schema Evolution: It facilitates schema evolution by allowing compatibility checks between different versions of schemas. This ensures that producers and consumers can evolve their schemas without causing disruptions.

Example:

  • Suppose a producer wants to publish messages to a Kafka topic using Avro serialization. Before sending data, it registers the Avro schema with the Schema Registry, which assigns it an ID. When the producer sends a message, it includes the schema ID alongside the serialized data. Consumers retrieve the schema ID from the message, fetch the corresponding schema from the Schema Registry, and deserialize the data accordingly.

22.What makes kafka fast?

Zero-copy writes make Kafka fast, but how exactly?

Kafka is a message broker, and it accepts messages from the network and writes to the disk, and vice versa. The traditional way of moving data from network to disk involves `read` and `write` system calls, which require data to be moved to and from user space to kernel space.

Kafka leverages `sendfile` system call which copies data from one file descriptor to another within the kernel. Kafka uses this to directly transfer data from the network socket to the file on disk, bypassing unnecessary copies.

If you are interested, just read the man page of `sendfile` system call. In most cases, whenever you see something extracting extreme performance a major chunk of it comes from leveraging the right system call.

ps: I used this zero copy while building Remote Shuffle Service for Apache Spark. It proved pivotal in getting a great performance while moving multi-tb data across machines.

⚡ Admissions for my System Design June 2024 cohort are open, if you are SDE-2 and above and looking to build a rock-solid intuition to design any and every system, check out

UBer Usecase –

https://www.linkedin.com/pulse/case-study-kafka-async-queuing-consumer-proxy-vivek-bansal-lt1pc/?trackingId=sXBYzdx7T42SFdmitvQVwQ%3D%3D

Spring-boot

Index

  1. Versions
  2. Interview Questions

Versions

VersionRelease DateMajor FeaturesComment
3.2.3February 22, 2024Upgraded dependencies (Spring Framework 6.1.4, Spring Data JPA 3.1.3, Spring Security 6.2.2, etc.) https://www.codejava.net/spring-boot-tutorials
3.1.3September 20, 2023Enhanced developer experience, improved reactive support, and updated dependencies https://spring.io/blog/2022/05/24/preparing-for-spring-boot-3-0
3.0.xMay 2020 – December 2022Introduced reactive programming, improved build system, and various dependency updates throughout the series (refer to official documentation for details)
2.xMarch 2018 – May 2020Introduced Spring Boot actuator, developer tools, and auto-configuration (refer to official documentation for specific features within each version)2.7.7 used in project (switch)
1.xApril 2014 – February 2018Initial versions focusing on simplifying Spring application development1.5.22.RELEASE used in project (consumers)

Springboot versions and corresponding spring version support:

Spring Boot VersionSupported Spring Framework Versions
1.x4.x
2.0.x – 2.3.x5.x
2.4.x5.x, 6.x
3.0.x – 3.2.x6.x


Interview Questions

  • Why springboot over spring?
    1. Convention-over-Configuration:
      • Spring Boot: Spring Boot follows convention-over-configuration principles, reducing the need for explicit configuration. Annotations like @Service are automatically recognized and configured based on conventions.
      • Spring (Traditional): In traditional Spring applications, while you can use annotations, you might need more explicit configuration, especially in XML-based configurations.
    2. Auto-Configuration:
      • Spring Boot: Spring Boot provides auto-configuration, which means that common configurations are automatically applied based on the project’s dependencies. For example, if you have @Service annotated classes, Spring Boot will automatically configure them as Spring beans.
      • Spring (Traditional): In traditional Spring, you might need to configure components more explicitly, specifying details in XML files or Java-based configuration classes.
    3. Reduced Boilerplate Code:
      • Spring Boot: Spring Boot’s defaults and starters significantly reduce boilerplate code. You can focus more on writing business logic and less on configuration.
      • Spring (Traditional): Without the conventions and defaults of Spring Boot, you might find yourself writing more configuration code to set up beans and application context.
    4. Simplified Dependency Management:
      • Spring Boot: The use of starters simplifies dependency management. With the appropriate starter, you get a predefined set of dependencies, including those for services, making it easy to include and manage dependencies.
      • Spring (Traditional): While you can manage dependencies in traditional Spring, Spring Boot provides a more streamlined way to do so with starters.
    5. Out-of-the-Box Features:
      • Spring Boot: Spring Boot provides out-of-the-box features, such as embedded servers, metrics, and health checks. These features are often automatically configured, making it easier to develop production-ready applications.
      • Spring (Traditional): While you can manually configure these features in traditional Spring, Spring Boot simplifies the process and encourages best practices.
    6. Faster Project Bootstrap:
      • Spring Boot: With its starters and defaults, Spring Boot allows for faster project bootstrapping. You can create a fully functional application with minimal setup.
      • Spring (Traditional): Setting up a traditional Spring application might involve more manual configuration and a longer setup time.
  1. Annotations in springboot
    • @SpringbootApplication
      1. @EnableAutoconfiguration
      2. @ComponentScan
      3. @SpringBootConfiguration specialised form of @Configuration

Messaging

INdex

  • difference
  • version of apache kafka

Key Differences:

Active MQ vs IBM MQ / WebSphere MQ Vs Kafka

Kafka Consumption Optimisation

  • Kafka parameters & Performance Optimization

Following are the parameters of Kafka that can be balanced one over other for performance-

  1. Partition : a partition is a logical unit of storage for messages. Each topic in Kafka can be divided into one or more partitions. Messages are stored inorder within each partition, and each message is assigned a unique identifier called an offset.
  2. Number of brokers :
  3. Number of consumer instances or no. of pods on which these instances are running
  4. Concurrency :
  5. Consumer group :
    • Use a consumer group to scale out consumption. This will allow you to distribute the load of consuming messages across multiple consumers, which can improve throughput.
  6. fetch size of batch data :

Optimal Partition Configuration-

Increase the number of partitions. This will allow more consumers to read messages in parallel, which will improve throughput. so it the partition and consumer should have 1:1 ration for better performance?

Note: Kafka related Bottlenecks will not occur while pushing the data because as in this case it depends on external source of data how fast it generates. Bottlenecks occurs when huge data on topic and limited consumer capacity (instances, capacity, consumption configuration etc).

Use cases:

Case 1: If Kafka consumer is struggling to keep up with the incoming data (suppose 170million events data lag). To decrease the lag and improve the performance of your Kafka setup, you can consider the following steps:

  1. Consumer Configuration:
    • Increase the number of consumer instances to match the partition count or even exceed it. Since you have 40 partitions, consider having at least 40 consumer instances. This ensures that each partition is consumed by a separate consumer, maximizing parallelism and throughput.
    • Tune the consumer configuration parameters to optimize performance. Specifically, consider adjusting the fetch.min.bytes, fetch.max.wait.ms, max.poll.records, and max.partition.fetch.bytes settings to balance the trade-off between latency and throughput. Experiment with different values to find the optimal configuration for your use case.
  2. Partition Configuration:
    • Assess the data distribution pattern to ensure an even distribution across partitions. If the data is skewed towards certain partitions, consider implementing a custom partitioner or using a key-based partitioning strategy to distribute the load more evenly.
    • If you anticipate further data growth or increased load, you might consider increasing the number of partitions. However, adding partitions to an existing Kafka topic requires careful planning, as it can have implications for ordering guarantees and consumer offsets.
  3. Cluster Capacity:
    • Evaluate the overall capacity and performance of your Kafka cluster. Ensure that your brokers have sufficient CPU, memory, and disk I/O resources to handle the volume of data and consumer concurrency.
    • Monitor the broker metrics to identify any potential bottlenecks. Consider scaling up your cluster by adding more brokers if necessary.
  4. Monitoring and Alerting:
    • Implement robust monitoring and alerting systems to track lag, throughput, and other relevant Kafka metrics. This enables you to proactively identify issues and take appropriate actions.
  5. Consumer Application Optimization:
    • Review your consumer application code for any potential performance bottlenecks. Ensure that your code is optimized, handles messages efficiently, and avoids any unnecessary delays or blocking operations.

Spring Kafka

Index

  1. Resources
    • v3.1 features
  2. Producer
  3. Consumer
    • consumer variations -8
    • consumer factory
  4. Todo
  5. Findings/Answers

API Docs:

  1. https://docs.spring.io/spring-kafka/docs/current/api/

For new features added in specific version of spring-kafka refer :

  1. https://docs.spring.io/spring-kafka/docs/ [refer the version from below link if not knoe–>select version > refernces>htmls]
  2. https://spring.io/projects/spring-kafka#learn

Notes to implement for performance:

https://spring.io/projects/spring-kafka#learn

linkedln :

13 ways to learn Kafka:

  1. 1. Tutorial: Official Apache Kafka Quickstart – https://lnkd.in/eVrMwgCw
  2. 2. Documentation: Official Apache Kafka Documentation – https://lnkd.in/eEU2sZvq
  3. 3. Tutorial: Kafka Learning with RedHat – https://lnkd.in/em-wsvDt
  4. 4. Read: Kafka – The Definitive Guide: Real-Time Data and Stream Processing at Scale – https://lnkd.in/ez3aCVsH
  5. 5. Course: Apache Kafka Essential Training: Getting Started – https://lnkd.in/ettejx2w
  6. 6. Read: Kafka in Action – https://lnkd.in/ed7ViYQZ
  7. 7. Course: Apache Kafka Deep Dive – https://lnkd.in/ekaB9mv6
  8. 8. Read: Apache Kafka Quick Start Guide – https://lnkd.in/e-3pSXnu
  9. 9. Course: Learn Apache Kafka for Beginners – https://lnkd.in/ewh6uUyT
  10. 10. Course: Apache Kafka Crash Course for Java and Python Developers – https://lnkd.in/e72AHUY4
  11. 11. Read: Mastering Kafka Streams and ksqlDB: Building real-time data systems by example – https://lnkd.in/eqr_DaY2
  12. 12. Course: Deploying and Running Apache Kafka on Kubernetes – https://lnkd.in/ezQ58usN
  13. 13. Course: Stream Processing Design Patterns with Kafka Streams – https://lnkd.in/egrks3rn

Kafka 3.1 features –

  1. Micrometer observations –
  2. Same broker for multiple test cases
  3. Retryable topic changes are permanent.
  4. KafkaTemplate supporting CompletableFuture(?) instead of LIstenableFuture(?).
  5. Testing Changes
    • Since 3.0.1 the application sets the default broker to application broker spring.kafka.bootstrap-servers – default embedded one.
    • .

References: https://docs.spring.io/spring-kafka/docs/current/reference/html/

Points :

  1. Starting with version 2.5 , Broker can be changed at runtime – Section “Connecting to Kafka”
    • Suport For ABSwitchCluster -one cluster active at a time

Junit and Mocking and spy

Index

  • Junit
  • Mockito
  • testing private method (reflection ,powermock)

2 testing fameworks- MOckito abd Sping testing

MOckMOckBean
annotationpary of MockitoframeworkSPring testing framework
framework@RunWith(MockitoJUnitRunner.class) public class MyServiceTest { @Mock private MyRepository myRepository;@SpringBootTest public class MyServiceIntegrationTest { @Autowired private MyService myService; @MockBean private MyRepository myRepository;
PurposeUnitTetsingIntegration testing

Spy vs Mock

@spy@Mock
Functionalitypartially mocked version of a real object.complete replacement for a real object.
CreationWraps an existing objectCreates a new object
ControlPartial control (can define specific behaviors for specific methods)Full control over behavior
Use caseTesting interactions within a real object with some real behaviorIsolating dependencies, unit testing with specific behavior
accesscan access private methods of the original objectcannot access private methods of the original object
examplebelow codebelow code //

Mock usage

// Interface we want to test
public interface EmailService {
  void sendEmail(String recipient, String message);
}

// Test class using mock
@Test
public void testSendEmail() {
  // Create a mock object of EmailService
  EmailService mockEmailService = Mockito.mock(EmailService.class);

  // Define behavior for the mock object
  Mockito.when(mockEmailService.sendEmail("user@example.com", "Hello world!")).thenReturn(true);

  // Use the mock object in your test logic
  myService.sendNotification(mockEmailService, "user@example.com", "Hello world!");

  // Verify interactions with the mock object
  Mockito.verify(mockEmailService).sendEmail("user@example.com", "Hello world!");
}

IN above example :

  • We create a mock object of EmailService using Mockito.mock.
  • We define behavior for the sendEmail method using Mockito.when. Here, it always returns true.
  • We use the mock object in the test and verify its interaction later.

Spy usage

// Real implementation of EmailService
public class RealEmailService implements EmailService {
  @Override
  public void sendEmail(String recipient, String message) {
    // Send email logic goes here (not shown for simplicity)
  }
}

// Test class using spy
@Test
public void testSendEmailWithSpy() {
  // Create a real object
  EmailService realEmailService = new RealEmailService();

  // Create a spy object from the real object
  EmailService spyEmailService = Mockito.spy(realEmailService);

  // Define behavior for specific method (optional)
  Mockito.when(spyEmailService.sendEmail("admin@example.com", "Alert!")).thenReturn("Sent successfully");

  // Use the spy object in your test logic
  myService.sendNotification(spyEmailService, "user@example.com", "Hello world!");
  myService.sendNotification(spyEmailService, "admin@example.com", "Alert!");

  // Verify interactions (optional)
  Mockito.verify(spyEmailService, times(2)).sendEmail(anyString(), anyString());
}

IN above example :

  • Create a real RealEmailService object.
  • Create a spy of the RealEmailService using Mockito.spy.
  • Define behavior for the sendEmail method to return a specific message for “admin@example.com” email.
  • Use the spy object and verify interactions (optional).

		<dependency>
			<groupId>org.powermock</groupId>
			<artifactId>powermock-module-junit4</artifactId>
			<version>1.7.4</version>
		</dependency>

https://www.learnbestcoding.com/post/21/unit-test-private-methods-and-classes

REfernce

https://www.tutorialspoint.com/mockito/mockito_spying.htm

Quartz

Quartz provides a way to schedule the recurring execution of job.

Important Points

  • Quartz differentiates between Job (What/Task) and Trigger (when) – so both appear as different statement
  • Two types pf scheduling- simple and cron based.
  • Quartz related properties related to cron and others are usually kept in quartz.properties

Cron expression and their meaning-

  1. http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html

Questions

Configuration – number of thread pool size.

Web-service Interview Questions

  1.  Interceptor and its use 
  2.  How interceptor works 
  3.  Controller Advice 
  4.  How do we create interceptor 
  5. Rest
    1. Richardson Rest Maturity model (4 level that defines the maturity of rest servies)
    2. Rest Api
      1. in rest api a recommended term used to refer to multiple resources
      2. time to first hello world rest
      3. rest api to add the version in header by using accept and content type
        1. https://blog.allegro.tech/2015/01/Content-headers-or-how-to-version-api.html
    3.  Http Response Code 
    4. How communication is done between 2 rest services
    5. What is Response Body
    6. idempotent http req 
    7. restful services architecture 
    8. http methods
    9. API security
      1. using oauth , what scope is required for write access to api
      2. which grant type support refresh token
      3. property to include in json to represent subresources /in rest api (llinks/embedded?)
  6. benefit of GRAPHQL over rest approaches
  7. WebHooks vs sync api and usage
  8. how to handle transactional commit over distributed system (2phase commit , saga pattern – push or pull )
    • when to use push vs when to use pull mechanism

HTTP Verb CRUD Entire Collection (e.g. /customers) Specific Item (e.g. /customers/{id}) 
POST Create 201 (Created), ‘Location’ header with link to /customers/{id} containing new ID. 404 (Not Found), 409 (Conflict) if resource already exists.. 
GET Read 200 (OK), list of customers. Use pagination, sorting and filtering to navigate big lists. 200 (OK), single customer. 404 (Not Found), if ID not found or invalid. 
PUT Update/Replace 405 (Method Not Allowed), unless you want to update/replace every resource in the entire collection. 200 (OK) or 204 (No Content). 404 (Not Found), if ID not found or invalid. 
PATCH Update/Modify 405 (Method Not Allowed), unless you want to modify the collection itself. 200 (OK) or 204 (No Content). 404 (Not Found), if ID not found or invalid. 
DELETE Delete 405 (Method Not Allowed), unless you want to delete the whole collection—not often desirable. 200 (OK). 404 (Not Found), if ID not found or invalid. 

Spring Interview Questions

Index

  1. Spring bean scope
  2. Use of @qualifer and @primary 
  3. Security in spring 
  4. Dependency injection 
  5. Rest controller 
  6.  Difference between DI and IOC 
  7.  Bean lifecycle 
  8. Req attr Param 
  9. Spring JDBC 
  10.  Qualifier 
  11. Exception Handler 
  12. diff @Configuration @EnableAutoConfiguration & @ComponentScan 
  13. Circular Dependency (How to resolve)
  14. Proxy and why needed? how to create one?


Spring bean scope