Kafka Consumption Optimisation

  • Kafka parameters & Performance Optimization

Following are the parameters of Kafka that can be balanced one over other for performance-

  1. Partition : a partition is a logical unit of storage for messages. Each topic in Kafka can be divided into one or more partitions. Messages are stored inorder within each partition, and each message is assigned a unique identifier called an offset.
  2. Number of brokers :
  3. Number of consumer instances or no. of pods on which these instances are running
  4. Concurrency :
  5. Consumer group :
    • Use a consumer group to scale out consumption. This will allow you to distribute the load of consuming messages across multiple consumers, which can improve throughput.
  6. fetch size of batch data :

Optimal Partition Configuration-

Increase the number of partitions. This will allow more consumers to read messages in parallel, which will improve throughput. so it the partition and consumer should have 1:1 ration for better performance?

Note: Kafka related Bottlenecks will not occur while pushing the data because as in this case it depends on external source of data how fast it generates. Bottlenecks occurs when huge data on topic and limited consumer capacity (instances, capacity, consumption configuration etc).

Use cases:

Case 1: If Kafka consumer is struggling to keep up with the incoming data (suppose 170million events data lag). To decrease the lag and improve the performance of your Kafka setup, you can consider the following steps:

  1. Consumer Configuration:
    • Increase the number of consumer instances to match the partition count or even exceed it. Since you have 40 partitions, consider having at least 40 consumer instances. This ensures that each partition is consumed by a separate consumer, maximizing parallelism and throughput.
    • Tune the consumer configuration parameters to optimize performance. Specifically, consider adjusting the fetch.min.bytes, fetch.max.wait.ms, max.poll.records, and max.partition.fetch.bytes settings to balance the trade-off between latency and throughput. Experiment with different values to find the optimal configuration for your use case.
  2. Partition Configuration:
    • Assess the data distribution pattern to ensure an even distribution across partitions. If the data is skewed towards certain partitions, consider implementing a custom partitioner or using a key-based partitioning strategy to distribute the load more evenly.
    • If you anticipate further data growth or increased load, you might consider increasing the number of partitions. However, adding partitions to an existing Kafka topic requires careful planning, as it can have implications for ordering guarantees and consumer offsets.
  3. Cluster Capacity:
    • Evaluate the overall capacity and performance of your Kafka cluster. Ensure that your brokers have sufficient CPU, memory, and disk I/O resources to handle the volume of data and consumer concurrency.
    • Monitor the broker metrics to identify any potential bottlenecks. Consider scaling up your cluster by adding more brokers if necessary.
  4. Monitoring and Alerting:
    • Implement robust monitoring and alerting systems to track lag, throughput, and other relevant Kafka metrics. This enables you to proactively identify issues and take appropriate actions.
  5. Consumer Application Optimization:
    • Review your consumer application code for any potential performance bottlenecks. Ensure that your code is optimized, handles messages efficiently, and avoids any unnecessary delays or blocking operations.

Spring Kafka

Index

  1. Resources
    • v3.1 features
  2. Producer
  3. Consumer
    • consumer variations -8
    • consumer factory
  4. Todo
  5. Findings/Answers

API Docs:

  1. https://docs.spring.io/spring-kafka/docs/current/api/

For new features added in specific version of spring-kafka refer :

  1. https://docs.spring.io/spring-kafka/docs/ [refer the version from below link if not knoe–>select version > refernces>htmls]
  2. https://spring.io/projects/spring-kafka#learn

Notes to implement for performance:

https://spring.io/projects/spring-kafka#learn

linkedln :

13 ways to learn Kafka:

  1. 1. Tutorial: Official Apache Kafka Quickstart – https://lnkd.in/eVrMwgCw
  2. 2. Documentation: Official Apache Kafka Documentation – https://lnkd.in/eEU2sZvq
  3. 3. Tutorial: Kafka Learning with RedHat – https://lnkd.in/em-wsvDt
  4. 4. Read: Kafka – The Definitive Guide: Real-Time Data and Stream Processing at Scale – https://lnkd.in/ez3aCVsH
  5. 5. Course: Apache Kafka Essential Training: Getting Started – https://lnkd.in/ettejx2w
  6. 6. Read: Kafka in Action – https://lnkd.in/ed7ViYQZ
  7. 7. Course: Apache Kafka Deep Dive – https://lnkd.in/ekaB9mv6
  8. 8. Read: Apache Kafka Quick Start Guide – https://lnkd.in/e-3pSXnu
  9. 9. Course: Learn Apache Kafka for Beginners – https://lnkd.in/ewh6uUyT
  10. 10. Course: Apache Kafka Crash Course for Java and Python Developers – https://lnkd.in/e72AHUY4
  11. 11. Read: Mastering Kafka Streams and ksqlDB: Building real-time data systems by example – https://lnkd.in/eqr_DaY2
  12. 12. Course: Deploying and Running Apache Kafka on Kubernetes – https://lnkd.in/ezQ58usN
  13. 13. Course: Stream Processing Design Patterns with Kafka Streams – https://lnkd.in/egrks3rn

Kafka 3.1 features –

  1. Micrometer observations –
  2. Same broker for multiple test cases
  3. Retryable topic changes are permanent.
  4. KafkaTemplate supporting CompletableFuture(?) instead of LIstenableFuture(?).
  5. Testing Changes
    • Since 3.0.1 the application sets the default broker to application broker spring.kafka.bootstrap-servers – default embedded one.
    • .

References: https://docs.spring.io/spring-kafka/docs/current/reference/html/

Points :

  1. Starting with version 2.5 , Broker can be changed at runtime – Section “Connecting to Kafka”
    • Suport For ABSwitchCluster -one cluster active at a time

Recursion

Objectives

  1. Significance
  2. Flow of control
  3. Memory usage
  4. Interview FAQs

Recursion is a process by which a function or a method calls itself again and again. Analogy: Anything that can be done in for loop can be done by recursion

Iteration VS Recursion

CriteriaRecursionIteration
DefinitionRecursion is a process where a method calls itself repeatedly until a base condition is met.Iteration is a process by which a piece of code is repeatedly executed for a finite number of times or until a condition is met.
ApplicabilityIs the application for functions.Is applicable for loops.
Chunk of code can be applied to?Works well for smaller code size.Works well for larger code size.
Memory FootprintUtilizes more memory as each recursive call is pushed to the stackComparatively less memory is used.
MaintainabilityDifficult to debug and maintainEasier to debug and maintain
Type of errors expectedResults in stack overflow if the base condition is not specified or not reachedMay execute infinitely but will ultimately stop execution with any memory errors
Time-ComplexityTime complexity is very high.Time complexity is relatively on the lower side.





Structure : Any method that implements Recursion has two basic parts:

  1. Method call which can call itself i.e. recursive
  2. A precondition that will stop the recursion.

Note that a precondition is necessary for any recursive method as, if we do not break the recursion then it will keep on running infinitely and result in a stack overflow.

Syntax- The general syntax of recursion is as follows:

methodName (T parameters…)
{   
    if (precondition == true)
//precondition or base condition
    {
        return result;
    }
    return methodName (T parameters…);
 
        //recursive call
}

Associated Error – Stack Overflow Error In Recursion

We are aware that when any method or function is called, the state of the function is stored on the stack and is retrieved when the function returns. The stack is used for the recursive method as well.

But in the case of recursion, a problem might occur if we do not define the base condition or when the base condition is somehow not reached or executed. If this situation occurs then the stack overflow may arise.

Types of recursion

  1. Tail Recursion
  2. Head Recursion
  3. Tree recursion

Tail Recursion

When the call to the recursive method is the last statement executed inside the recursive method, it is called “Tail Recursion”.

In tail recursion, the recursive call statement is usually executed along with the return statement of the method.

methodName ( T parameters…){
{   
    if (base_condition == true)
    {
        return result;
    }
    return methodName (T parameters …)      //tail recursion
}

Head Recursion

Head recursion is any recursive approach that is not a tail recursion. So even general recursion is ahead recursion.

methodName (T parameters…){
    if (some_condition == true)
    {
        return methodName (T parameters…);
    }
    return result;
}

Tree Recursion

Has two recursive calls(Left/right sub-tree)

Problem-Solving Using Recursion

The basic idea behind using recursion is to express the bigger problem in terms of smaller problems. Also, we need to add one or more base conditions so that we can come out of recursion.

PAGE BREAK

FAQS

Q #1.How does Recursion work in Java?

Answer: In recursion, the recursive function calls itself repeatedly until a base condition is satisfied. The memory for the called function is pushed on to the stack at the top of the memory for the calling function. For each function call, a separate copy of local variables is made.

Q #2) Why is Recursion used?

Answer: Recursion is used to solve those problems that can be broken down into smaller ones and the entire problem can be expressed in terms of a smaller problem.

Recursion is also used for those problems that are too complex to be solved using an iterative approach. Besides the problems for which time complexity is not an issue, use recursion.

Q #3) What are the benefits of Recursion?

Answer:

The benefits of Recursion include:

  1. Recursion reduces redundant calling of function.
  2. Recursion allows us to solve problems easily when compared to the iterative approach.

Q #4) Which one is better – Recursion or Iteration?

Answer: Recursion makes repeated calls until the base function is reached. Thus there is a memory overhead as a memory for each function call is pushed on to the stack.

Iteration on the other hand does not have much memory overhead. Recursion execution is slower than the iterative approach. Recursion reduces the size of the code while the iterative approach makes the code large.

Recursion is better than the iterative approach for problems like the Tower of Hanoi, tree traversals, etc.

Finding the factorial of a number using recursion
Factorial Recursive call

Question : how unflods while same method called:

https://leetcode.com/problems/merge-two-binary-trees/editorial/

DSA

Index

  • Theory
  • FAQs

A data structure is a way of organizing(defining, storing & retrieving) the data so that the data can be used efficiently.

You must about the data-structure before implementing in application to implement in a better and optimized way.

Note: This section is for

  1. Theoretical comparison
  2. Implementation of standard DS structures

(For Problem & Solution refer Competitive Programming section)

Topics :

  1. Introduction (& Brief Comparison)
    • Array vs Array-list
  2. Array & String
    1. Array: Way to declare
    2. Array: ways to copy
    3. Array: Advantages/Applications and disadvantages
  3. Stack[todo]
  4. Queue[todo]
  5. List & Linked-list
    1. Define the Structure and add element
    2. traversal
    3. Find loop
    4. reverse link-list
    5. Mid element
    6. Doubly Linked-list
  6. Tree
    1. Structure
    2. BST -Adding nodes
    3. Traversal
      1. DFS
        • In-order
        • Pre-Order
        • Post-Order
      2. BFS
  7. Map


List : Array Vs Linked List

Parameters of comparison :

  1. Speed of accessibility
    1. read/search
    2. insert/Delete – if tied to shifting other elements ,
  2. storage space required
ParameterArrayLinked List
StrategyStatic in nature (shortage or wastage of memory)dynamic in nature (grow and shrink at runtime as per need)
Access &TraversalFaster as index basedSlower as required traversal to the element,node by node
Insertion & RemovalRequire shifting (if middle element
is removed or added at middle)
Fastest Insertion and Deletion
Unit of storageUses dynamically allocated node to store data.
DefinitionCollection of Nodes

Frequently asked DS Question Solutions and Tips for optimization.

  1. Arrays Questions
    • Equilibrium Index of an array
    • Find row number of a binary matrix having maximum number of 1s
  2. Tree
    • Recursion
      1. Head recursion
      2. Tail recursion
      3. Tree recursion
  3. Utilities
    1. Occurrence of each element
      • Using additional DS i.e Map
      • Without using additional DS
    2. Max Length Sub-array
      • brute-force
      • hash-map
    3. convert character to upper case
    4. Matrix
      • properties of matrix
      • related to rows and columns
    5. Bit-wise Operation
      • Single iteration find non-duplicate(XOR)

MFA questions

K8s Certifications

  • Kubernetes and Cloud Native Associate (KCNA)
    • The Kubernetes and Cloud Native Associate (KCNA) exam demonstrates a user’s foundational knowledge and skills in Kubernetes and the wider cloud native ecosystem.A certified KCNA will confirm conceptual knowledge of the entire cloud native ecosystem, particularly focusing on Kubernetes.
  • Certified Kubernetes Application Developer (CKAD)
    • The Certified Kubernetes Application Developer exam certifies that users can design, build, configure, and expose cloud native applications for Kubernetes. A CKAD can define application resources and use core primitives to build, monitor, and troubleshoot scalable applications and tools in Kubernetes.
  • Certified Kubernetes Administrator (CKA)
    • The Certified Kubernetes Administrator (CKA) program provides assurance that CKAs have the skills, knowledge, and competency to perform the responsibilities of Kubernetes administrators. A certified Kubernetes administrator has demonstrated the ability to do basic installation as well as configuring and managing production-grade Kubernetes clusters.
    • Certified Kubernetes Security Specialist (CKS)
      • The Certified Kubernetes Security Specialist program provides assurance that the holder is comfortable and competent with a broad range of best practices. CKS certification covers skills for securing container-based applications and Kubernetes platforms during build, deployment and runtime. Candidates for CKS must hold a current Certified Kubernetes Administrator (CKA) certification to demonstrate they possess sufficient Kubernetes expertise before sitting for the CKS.

reference – https://kubernetes.io/training/