Utilities :

Utilities

  1. Glimpse
    • Code Template
      1. Two pointers
      2. Sliding window
      3. LinkedLIst
      4. Tree
      5. Graph
      6. Heap
      7. Binary search
      8. BackTracking
      9. Dynamic programming
      10. Dijkstra Algorithm
    • Quick Notes – Cheatsheet
  2. Utilities
    1. Occurrence-count
      • Using additional DS i.e Map
      • Without using additional DS
    2. Character
      • Convert character to upper case
    3. Matrix Properties
    4. Probable ArrayIndexOutOfBound
      • Graceful handling
    5. Taking input-Bugfree in java


Code Template


Here are code templates for common patterns for all the data structures and algorithms:

Two pointers: one input, opposite ends

public int fn(int[] arr) {
    int left = 0;
    int right = arr.length - 1;
    int ans = 0;

    while (left < right) {
        // do some logic here with left and right
        if (CONDITION) {
            left++;
        } else {
            right--;
        }
    }

    return ans;
}

Two pointers: two inputs, exhaust both

public int fn(int[] arr1, int[] arr2) {
    int i = 0, j = 0, ans = 0;

    while (i < arr1.length && j < arr2.length) {
        // do some logic here
        if (CONDITION) {
            i++;
        } else {
            j++;
        }
    }

    while (i < arr1.length) {
        // do logic
        i++;
    }

    while (j < arr2.length) {
        // do logic
        j++;
    }

    return ans;
}

Sliding window

public int fn(int[] arr) {
    int left = 0, ans = 0, curr = 0;

    for (int curr = 0; curr < arr.length; curr++) {
        // do logic here to add arr[right] to curr

        while (WINDOW_CONDITION_BROKEN) {
            // remove arr[left] from curr
            left++;
        }

        // update ans
    }

    return ans;
}

Build a prefix sum (left sum)

public int[] fn(int[] arr) {
    int[] prefix = new int[arr.length];
    prefix[0] = arr[0];

    for (int i = 1; i < arr.length; i++) {
        prefix[i] = prefix[i - 1] + arr[i];
    }

    return prefix;
}

Efficient string building

public String fn(char[] arr) {
    StringBuilder sb = new StringBuilder();
    for (char c: arr) {
        sb.append(c);
    }

    return sb.toString();
}

Linked list: fast and slow pointer

public int fn(ListNode head) {
    ListNode slow = head;
    ListNode fast = head;
    int ans = 0;

    while (fast != null && fast.next != null) {
        // do logic
        slow = slow.next;
        fast = fast.next.next;
    }

    return ans;
}

Find number of subarrays that fit an exact criteria

//TOdo - what purpose
public int fn(int[] arr, int k) {
    Map<Integer, Integer> counts = new HashMap<>();
    counts.put(0, 1);
    int ans = 0, curr = 0;

    for (int num: arr) {
        // do logic to change curr
        ans += counts.getOrDefault(curr - k, 0);
        counts.put(curr, counts.getOrDefault(curr, 0) + 1);
    }

    return ans;
}

Monotonic increasing stack

//The same logic can be applied to maintain a monotonic queue.
public int fn(int[] arr) {
    Stack<Integer> stack = new Stack<>();
    int ans = 0;

    for (int num: arr) {
        // for monotonic decreasing, just flip the > to <
        while (!stack.empty() && stack.peek() > num) {
            // do logic
            stack.pop();
        }

        stack.push(num);
    }

    return ans;
}

Binary tree: DFS (recursive)

public int dfs(TreeNode root) {
    if (root == null) {
        return 0;
    }

    int ans = 0;
    // do logic
    dfs(root.left);
    dfs(root.right);
    return ans;
}

Binary tree: DFS (iterative)

public int dfs(TreeNode root) {
    Stack<TreeNode> stack = new Stack<>();
    stack.push(root);
    int ans = 0;

    while (!stack.empty()) {
        TreeNode node = stack.pop();
        // do logic
        if (node.left != null) {
            stack.push(node.left);
        }
        if (node.right != null) {
            stack.push(node.right);
        }
    }

    return ans;
}

Binary tree: BFS

public int fn(TreeNode root) {
    Queue<TreeNode> queue = new LinkedList<>();
    queue.add(root);
    int ans = 0;

    while (!queue.isEmpty()) {
        int currentLength = queue.size();
        // do logic for current level

        for (int i = 0; i < currentLength; i++) {
            TreeNode node = queue.remove();
            // do logic
            if (node.left != null) {
                queue.add(node.left);
            }
            if (node.right != null) {
                queue.add(node.right);
            }
        }
    }

    return ans;
}

Graph: DFS (recursive)

For the graph templates, assume the nodes are numbered from 0 to n - 1 and the graph is given as an adjacency list. Depending on the problem, you may need to convert the input into an equivalent adjacency list before using the templates.

Set<Integer> seen = new HashSet<>();

public int fn(int[][] graph) {
    seen.add(START_NODE);
    return dfs(START_NODE, graph);
}

public int dfs(int node, int[][] graph) {
    int ans = 0;
    // do some logic
    for (int neighbor: graph[node]) {
        if (!seen.contains(neighbor)) {
            seen.add(neighbor);
            ans += dfs(neighbor, graph);
        }
    }

    return ans;
}

Graph: DFS (iterative)

public int fn(int[][] graph) {
    Stack<Integer> stack = new Stack<>();
    Set<Integer> seen = new HashSet<>();
    stack.push(START_NODE);
    seen.add(START_NODE);
    int ans = 0;

    while (!stack.empty()) {
        int node = stack.pop();
        // do some logic
        for (int neighbor: graph[node]) {
            if (!seen.contains(neighbor)) {
                seen.add(neighbor);
                stack.push(neighbor);
            }
        }
    }

    return ans;
}

Graph: BFS

public int fn(int[][] graph) {
    Queue<Integer> queue = new LinkedList<>();
    Set<Integer> seen = new HashSet<>();
    queue.add(START_NODE);
    seen.add(START_NODE);
    int ans = 0;

    while (!queue.isEmpty()) {
        int node = queue.remove();
        // do some logic
        for (int neighbor: graph[node]) {
            if (!seen.contains(neighbor)) {
                seen.add(neighbor);
                queue.add(neighbor);
            }
        }
    }

    return ans;
}

Find top k elements with heap

public int[] fn(int[] arr, int k) {
    PriorityQueue<Integer> heap = new PriorityQueue<>(CRITERIA);
    for (int num: arr) {
        heap.add(num);
        if (heap.size() > k) {
            heap.remove();
        }
    }

    int[] ans = new int[k];
    for (int i = 0; i < k; i++) {
        ans[i] = heap.remove();
    }

    return ans;
}

Binary search

public int fn(int[] arr, int target) {
    int left = 0;
    int right = arr.length - 1;
    while (left <= right) {
        int mid = left + (right - left) / 2;
        if (arr[mid] == target) {
            // do something
            return mid;
        }
        if (arr[mid] > target) {
            right = mid - 1;
        } else {
            left = mid + 1;
        }
    }

    // left is the insertion point
    return left;
}

Binary search: duplicate elements, left-most insertion point

public int fn(int[] arr, int target) {
    int left = 0;
    int right = arr.length;
    while (left < right) {
        int mid = left + (right - left) / 2;
        if (arr[mid] >= target) {
            right = mid;
        } else {
            left = mid + 1;
        }
    }

    return left;
}

Binary search: duplicate elements, right-most insertion point

public int fn(int[] arr, int target) {
    int left = 0;
    int right = arr.length;
    while (left < right) {
        int mid = left + (right - left) / 2;
        if (arr[mid] > target) {
            right = mid;
        } else {
            left = mid + 1;
        }
    }

    return left;
}

Binary search: for greedy problems

If looking for a minimum:

public int fn(int[] arr) {
    int left = MINIMUM_POSSIBLE_ANSWER;
    int right = MAXIMUM_POSSIBLE_ANSWER;
    while (left <= right) {
        int mid = left + (right - left) / 2;
        if (check(mid)) {
            right = mid - 1;
        } else {
            left = mid + 1;
        }
    }

    return left;
}

public boolean check(int x) {
    // this function is implemented depending on the problem
    return BOOLEAN;
}

If looking for a maximum:

public int fn(int[] arr) {
    int left = MINIMUM_POSSIBLE_ANSWER;
    int right = MAXIMUM_POSSIBLE_ANSWER;
    while (left <= right) {
        int mid = left + (right - left) / 2;
        if (check(mid)) {
            left = mid + 1;
        } else {
            right = mid - 1;
        }
    }

    return right;
}

public boolean check(int x) {
    // this function is implemented depending on the problem
    return BOOLEAN;
}

Backtracking

public int backtrack(STATE curr, OTHER_ARGUMENTS...) {
    if (BASE_CASE) {
        // modify the answer
        return 0;
    }

    int ans = 0;
    for (ITERATE_OVER_INPUT) {
        // modify the current state
        ans += backtrack(curr, OTHER_ARGUMENTS...)
        // undo the modification of the current state
    }
}

Dynamic programming: top-down memoization

Map<STATE, Integer> memo = new HashMap<>();

public int fn(int[] arr) {
    return dp(STATE_FOR_WHOLE_INPUT, arr);
}

public int dp(STATE, int[] arr) {
    if (BASE_CASE) {
        return 0;
    }

    if (memo.contains(STATE)) {
        return memo.get(STATE);
    }

    int ans = RECURRENCE_RELATION(STATE);
    memo.put(STATE, ans);
    return ans;
}

Build a trie

// note: using a class is only necessary if you want to store data at each node.
// otherwise, you can implement a trie using only hash maps.
class TrieNode {
    // you can store data at nodes if you wish
    int data;
    Map<Character, TrieNode> children;
    TrieNode() {
        this.children = new HashMap<>();
    }
}

public TrieNode buildTrie(String[] words) {
    TrieNode root = new TrieNode();
    for (String word: words) {
        TrieNode curr = root;
        for (char c: word.toCharArray()) {
            if (!curr.children.containsKey(c)) {
                curr.children.put(c, new TrieNode());
            }

            curr = curr.children.get(c);
        }

        // at this point, you have a full word at curr
        // you can perform more logic here to give curr an attribute if you want
    }

    return root;
}

Dijkstra’s algorithm

int[] distances = new int[n];
Arrays.fill(distances, Integer.MAX_VALUE);
distances[source] = 0;

Queue<Pair<Integer, Integer>> heap = new PriorityQueue<Pair<Integer,Integer>>(Comparator.comparing(Pair::getKey));
heap.add(new Pair(0, source));

while (!heap.isEmpty()) {
    Pair<Integer, Integer> curr = heap.remove();
    int currDist = curr.getKey();
    int node = curr.getValue();
    
    if (currDist > distances[node]) {
        continue;
    }
    
    for (Pair<Integer, Integer> edge: graph.get(node)) {
        int nei = edge.getKey();
        int weight = edge.getValue();
        int dist = currDist + weight;
        
        if (dist < distances[nei]) {
            distances[nei] = dist;
            heap.add(new Pair(dist, nei));
        }
    }
}


Quick Notes – Cheat-sheet



This article will be a collection of cheat sheets that you can use as you solve problems and prepare for interviews. You will find:

  • Time complexity (Big O) cheat sheet
  • General DS/A flowchart (when to use each DS/A)
  • Stages of an interview cheat sheet

Time complexity (Big O) cheat sheet

big O chart

First, let’s talk about the time complexity of common operations, split by data structure/algorithm. Then, we’ll talk about reasonable complexities given input sizes.

Arrays (dynamic array/list)

Given n = arr.length,

  • Add or remove element at the end: �(1)O(1) amortized
  • Add or remove element from arbitrary index: �(�)O(n)
  • Access or modify element at arbitrary index: �(1)O(1)
  • Check if element exists: �(�)O(n)
  • Two pointers: �(�⋅�)O(nk), where �k is the work done at each iteration, includes sliding window
  • Building a prefix sum: �(�)O(n)
  • Finding the sum of a subarray given a prefix sum: �(1)O(1)

Strings (immutable)

Given n = s.length,

  • Add or remove character: �(�)O(n)
  • Access element at arbitrary index: �(1)O(1)
  • Concatenation between two strings: �(�+�)O(n+m), where �m is the length of the other string
  • Create substring: �(�)O(m), where �m is the length of the substring
  • Two pointers: �(�⋅�)O(nk), where �k is the work done at each iteration, includes sliding window
  • Building a string from joining an array, stringbuilder, etc.: �(�)O(n)

Linked Lists

Given �n as the number of nodes in the linked list,

  • Add or remove element given pointer before add/removal location: �(1)O(1)
  • Add or remove element given pointer at add/removal location: �(1)O(1) if doubly linked
  • Add or remove element at arbitrary position without pointer: �(�)O(n)
  • Access element at arbitrary position without pointer: �(�)O(n)
  • Check if element exists: �(�)O(n)
  • Reverse between position i and j: �(�−�)O(ji)
  • Detect a cycle: �(�)O(n) using fast-slow pointers or hash map

Hash table/dictionary

Given n = dic.length,

  • Add or remove key-value pair: �(1)O(1)
  • Check if key exists: �(1)O(1)
  • Check if value exists: �(�)O(n)
  • Access or modify value associated with key: �(1)O(1)
  • Iterate over all keys, values, or both: �(�)O(n)

Note: the �(1)O(1) operations are constant relative to n. In reality, the hashing algorithm might be expensive. For example, if your keys are strings, then it will cost �(�)O(m) where �m is the length of the string. The operations only take constant time relative to the size of the hash map.


Set

Given n = set.length,

  • Add or remove element: �(1)O(1)
  • Check if element exists: �(1)O(1)

The above note applies here as well.


Stack

Stack operations are dependent on their implementation. A stack is only required to support pop and push. If implemented with a dynamic array:

Given n = stack.length,

  • Push element: �(1)O(1)
  • Pop element: �(1)O(1)
  • Peek (see element at top of stack): �(1)O(1)
  • Access or modify element at arbitrary index: �(1)O(1)
  • Check if element exists: �(�)O(n)

Queue

Queue operations are dependent on their implementation. A queue is only required to support dequeue and enqueue. If implemented with a doubly linked list:

Given n = queue.length,

  • Enqueue element: �(1)O(1)
  • Dequeue element: �(1)O(1)
  • Peek (see element at front of queue): �(1)O(1)
  • Access or modify element at arbitrary index: �(�)O(n)
  • Check if element exists: �(�)O(n)

Note: most programming languages implement queues in a more sophisticated manner than a simple doubly linked list. Depending on implementation, accessing elements by index may be faster than �(�)O(n), or �(�)O(n) but with a significant constant divisor.


Binary tree problems (DFS/BFS)

Given �n as the number of nodes in the tree,

Most algorithms will run in �(�⋅�)O(nk) time, where �k is the work done at each node, usually �(1)O(1). This is just a general rule and not always the case. We are assuming here that BFS is implemented with an efficient queue.


Binary search tree

Given �n as the number of nodes in the tree,

  • Add or remove element: �(�)O(n) worst case, �(log⁡�)O(logn) average case
  • Check if element exists: �(�)O(n) worst case, �(log⁡�)O(logn) average case

The average case is when the tree is well balanced – each depth is close to full. The worst case is when the tree is just a straight line.


Heap/Priority Queue

Given n = heap.length and talking about min heaps,

  • Add an element: �(log⁡�)O(logn)
  • Delete the minimum element: �(log⁡�)O(logn)
  • Find the minimum element: �(1)O(1)
  • Check if element exists: �(�)O(n)

Binary search

Binary search runs in �(log⁡�)O(logn) in the worst case, where �n is the size of your initial search space.


Miscellaneous

  • Sorting: �(�⋅log⁡�)O(n⋅logn), where �n is the size of the data being sorted
  • DFS and BFS on a graph: �(�⋅�+�)O(nk+e), where �n is the number of nodes, �e is the number of edges, if each node is handled in �(1)O(1) other than iterating over edges
  • DFS and BFS space complexity: typically �(�)O(n), but if it’s in a graph, might be �(�+�)O(n+e) to store the graph
  • Dynamic programming time complexity: �(�⋅�)O(nk), where �n is the number of states and �k is the work done at each state
  • Dynamic programming space complexity: �(�)O(n), where �n is the number of states

Input sizes vs time complexity

The constraints of a problem can be considered as hints because they indicate an upper bound on what your solution’s time complexity should be. Being able to figure out the expected time complexity of a solution given the input size is a valuable skill to have. In all LeetCode problems and most online assessments (OA), you will be given the problem’s constraints. Unfortunately, you will usually not be explicitly told the constraints of a problem in an interview, but it’s still good for practicing on LeetCode and completing OAs. Still, in an interview, it usually doesn’t hurt to ask about the expected input sizes.


n <= 10

The expected time complexity likely has a factorial or an exponential with a base larger than 2 – �(�2⋅�!)O(n2⋅n!) or �(4�)O(4n) for example.

You should think about backtracking or any brute-force-esque recursive algorithm. n <= 10 is extremely small and usually any algorithm that correctly finds the answer will be fast enough.


10 < n <= 20

The expected time complexity likely involves �(2�)O(2n). Any higher base or a factorial will be too slow (320320 = ~3.5 billion, and 20!20! is much larger). A 2�2n usually implies that given a collection of elements, you are considering all subsets/subsequences – for each element, there are two choices: take it or don’t take it.

Again, this bound is very small, so most algorithms that are correct will probably be fast enough. Consider backtracking and recursion.


20 < n <= 100

At this point, exponentials will be too slow. The upper bound will likely involve �(�3)O(n3).

Problems marked as “easy” on LeetCode usually have this bound, which can be deceiving. There may be solutions that run in �(�)O(n), but the small bound allows brute force solutions to pass (finding the linear time solution might not be considered as “easy”).

Consider brute force solutions that involve nested loops. If you come up with a brute force solution, try analyzing the algorithm to find what steps are “slow”, and try to improve on those steps using tools like hash maps or heaps.


100 < n <= 1,000

In this range, a quadratic time complexity �(�2)O(n2) should be sufficient, as long as the constant factor isn’t too large.

Similar to the previous range, you should consider nested loops. The difference between this range and the previous one is that �(�2)O(n2) is usually the expected/optimal time complexity in this range, and it might not be possible to improve.


1,000 < n < 100,000

�<=105n<=105 is the most common constraint you will see on LeetCode. In this range, the slowest acceptable common time complexity is �(�⋅log⁡�)O(n⋅logn), although a linear time approach �(�)O(n) is commonly the goal.

In this range, ask yourself if sorting the input or using a heap can be helpful. If not, then aim for an �(�)O(n) algorithm. Nested loops that run in �(�2)O(n2) are unacceptable – you will probably need to make use of a technique learned in this course to simulate a nested loop’s behavior in �(1)O(1) or �(log⁡�)O(logn):

  • Hash map
  • A two pointers implementation like sliding window
  • Monotonic stack
  • Binary search
  • Heap
  • A combination of any of the above

If you have an �(�)O(n) algorithm, the constant factor can be reasonably large (around 40). One common theme for string problems involves looping over the characters of the alphabet at each iteration resulting in a time complexity of �(26�)O(26n).


100,000 < n < 1,000,000

�<=106n<=106 is a rare constraint, and will likely require a time complexity of �(�)O(n). In this range, �(�⋅log⁡�)O(n⋅logn) is usually safe as long as it has a small constant factor. You will very likely need to incorporate a hash map in some way.


1,000,000 < n

With huge inputs, typically in the range of 109109 or more, the most common acceptable time complexity will be logarithmic �(log⁡�)O(logn) or constant �(1)O(1). In these problems, you must either significantly reduce your search space at each iteration (usually binary search) or use clever tricks to find information in constant time (like with math or a clever use of hash maps).

Other time complexities are possible like �(�)O(n​), but this is very rare and will usually only be seen in very advanced problems.


Sorting algorithms

All major programming languages have a built-in method for sorting. It is usually correct to assume and say sorting costs �(�⋅log⁡�)O(n⋅logn), where �n is the number of elements being sorted. For completeness, here is a chart that lists many common sorting algorithms and their completeness. The algorithm implemented by a programming language varies; for example, Python uses Timsort but in C++, the specific algorithm is not mandated and varies.

sorting algorithm complexities

Definition of a stable sort from Wikipedia: “Stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list.”


General DS/A flowchart

Here’s a flowchart that can help you figure out which data structure or algorithm should be used. Note that this flowchart is very general as it would be impossible to cover every single scenario.

Note that this flowchart only covers methods taught in LICC, and as such more advanced algorithms like Dijkstra’s is excluded.

data structures and algorithm flowchart

Introduction to Java Concurrency :

There are a number of ways to achieve asynchronous communication:

  • Spring async provides a means to implement concurrency with abstraction over thread handling
  • Messaging Queue – ActiveMQ, Kafka, RabitMQ
  • Multi-Threading or Concurrency
  • Rest-Template(sync) vs the Web-Client (async)

Writing a correct program is hard and writing a correct concurrent program is harder.


There are simply more things can go wrong in concurrent program than a sequential program.
So,Why Use Concurrency? Two reasons to use Concurrency:-

  1. Concurrency SIMPLIFIES development of complex system by turning COMPLICATED ASYNCHRONOUS CODE into STRAIGHT-LINER code.
  2. Easiest way to tap the COMPUTING POWER of MULTI-PROCESSOR system.

There are several Other factors too which favors executing multiple programs simultaneously(using multi threading).

  1. Resource utilization:Programs(sequence of statements) sometimes have to wait for external operation like Input or Output WHILE WAITING(by one thread) can do no USEFUL WORK.it is EFFICIENT to LET ANOTHER PROGRAM run in THAT WAIT TIME.(by another thread)
  2. Fairness:It is preferable to let them share the computer via finer-grained time slicing than to let one program run to completion and then start another.
  3. GUI applications for improving the responsiveness of the user interface
  4. server applications for improving resource utilization and throughput.
Simplified handling of ASYNCHRONOUS events by Multi-Threading:

A complicated asynchronous workflow can be decomposed into a number of simpler,synchronous workflows each running in a separate thread, interacting only with each other at specific synchronization points.
Example:
Assertion:
A server application that accepts socket connection from multiple remote client MAY be easier to design when EACH CONNECTION is ALLOTTED it’s OWN THREAD and allowed to use SYNCHRONOUS i/o.
Reason:
If application goes to read data when the data is not available, read blocks until some data is available.
In Single Threaded application : the block on read operation not only STALLS the corresponding REQUEST but also PROCESSING of all other
request stalls until the single thread is blocked.
To Avoid this problem single Threaded application forced to use NON-BLOCKING I/O , which is far more error-prone and complicated than SYNCHRONOUS I/O.
However, if each request has its own thread, then blocking does not affect the processing of other requests.

Risks of Thread:
  1. Safety Hazard:Access to shared variables must be properly coordinated so that threads do not interfere with one another. Fortunately, Java provides synchronization mechanisms to coordinate such access.
  2. Liveliness failure:A liveliness failure occurs when an activity gets into a state such that it is permanently unable to make forward progress. safety means “nothing bad ever happens”, Liveliness concerns the complementary goal that “something good eventually happens”.Like most concurrency bugs, bugs that cause liveliness failures can be elusive because they depend on the relative timing of events in different threads, and therefore do not always manifest themselves in development or testing.
  3. Performance hazards:In well designed concurrent applications the use of threads is a net performance gain,But if there are more CONTEXT-SWITCHES it leads to performance over-head i.e a thread being suspended to let another thread run and in-between doing thread tracking task(restoring execution context,CPU time spend in switching than being used in active thread task).
Threads are EVERYWHERE:

“Frameworks introduce concurrency into applications by calling application components from framework threads. Components invariably access application state, thus requiring that all code paths accessing that state be thread-safe.”

Following facilities causes Application-CODE not called from ‘thread managed by application’ but from framework threads:

  1. Timer: Timer is convenient mechanisms to run the TASKS at later time periodically(once or repeatedly).The introduction of a Timer can complicate an otherwise sequential program, because TimerTasks are executed in a thread managed by the Timer, not the application. If a TimerTask accesses data that is also accessed by other application threads, then not only must the TimerTask do so in a thread-safe manner, but so must any other classes that access that data.The easiest way to achieve this is to ensure that objects accessed by the Timer-Task are themselves thread-safe, thus encapsulating the thread safety within the shared objects.
  2. Servlets and JavaServer Pages (JSPs):The servlets framework handle all dispatching requests from remote HTTP clients, to the appropriate servlet or JSP(a component of application logic) and in high-volume web sites, multiple clients may require the services of the same servlet at once.In other words, servlets need to be thread-safe.
  3. Remote Method Invocation:RMI lets you invoke methods on objects running in another JVM.When the RMI code calls your remote object, in what thread does that call happen? You don’t know, but it’s definitely not in a thread you created—your object gets called in a thread managed by RMI. RMI create many threads. So,A remote object must guard against thread safety hazards.
  4. Swing and AWT: When the user performs a UI action, an event handler is called in the event thread to perform whatever operation the user requested. If the handler needs to access application state that is also accessed from other threads (such as a document being edited), then the event handler, along with any other code that accesses that state, must do so in a thread-safe manner.


Best Pacctices – Src – Linkedin

  1. [1.] Always design with concurrency in mind, considering thread safety and potential race conditions from the start.
  2. [2.] Understand the Java memory model and the guarantees it provides for visibility and ordering of shared variables.
  3. [3.] Use synchronization mechanisms, such as synchronized blocks and methods, to ensure thread safety when accessing shared data.
  4. [4.] Minimize the scope of synchronized blocks to reduce contention and improve performance.
  5. [5.] Use volatile keyword for simple flags or variables that require atomic updates without mutual exclusion.
  6. [6.] Prefer higher-level concurrency utilities, such as java.util.concurrent classes, over low-level thread manipulation.
  7. [7.] Utilize concurrent collections, such as ConcurrentHashMap and ConcurrentLinkedQueue, for efficient and thread-safe data sharing.
  8. [8.] Be cautious when sharing mutable objects among threads and consider using immutable objects or thread-local variables instead.
  9. [9.] Understand the happens-before relationship to reason about the visibility and ordering of operations in a concurrent program.
  10. [10.] Use atomic variables, such as AtomicInteger and AtomicReference, to perform atomic operations without explicit synchronization.
  11. [11.] Be aware of the potential pitfalls of double-checked locking and prefer lazy initialization techniques like using the Initialization-on-Demand Holder idiom.
  12. [12.] Use thread-local variables when you need to maintain thread-specific state without explicit synchronization.
  13. [13.] Beware of the risks associated with publishing shared objects prematurely, and ensure proper synchronization or immutability before sharing them.
  14. [14.] Employ thread pools and the Executor framework for efficient thread management and task execution.
  15. [15.] Tune thread pool configurations, such as the number of threads and the work queue size, based on the characteristics of your application.
  16. [16.] Consider the ExecutorCompletionService to process task results as they become available, improving responsiveness and resource utilization.
  17. [17.] Be cautious when interrupting threads and handle InterruptedException properly to ensure graceful termination and cleanup.
  18. [18.] Avoid unnecessary synchronization by using thread-safe libraries and designing thread-safe code from the ground up.
  19. [19.] Profile and measure the performance of your concurrent applications to identify bottlenecks and areas for optimization.
  20. [20.] Stay updated with the latest advancements in Java concurrency.

Micro-services

Index-

  1. Overview
  2. Creating micro-services
  3. Locating micro-services
  4. Micro-service Communication
  5. Data Handling

Unit 1: Overview

Background :

Cloud application has several (can discuss separately) advantages . to use those advantages we are moving towards the CLOUD-NATIVE application.
Cloud native applications are composed of smaller, independent, self-contained pieces that are called micro-services.

Opposite of CLOUD-NATIVE application is distributed computing which has some 8 fallacies(0-latency,bandwidth is infinite etc) as mentioned in 1994

Salient features

  • language agnostic
  • should have different phases- deploy and run
  • HTTP/REST communication are synchronous in nature and asynchronisation communication adds to loose coupling between services
  • Language agnostic(free/doubter) comes to forefront ,with standards like AMQP(Advance Messaging Queue Protocol),Message Queuing Telemetry Transport(MQTT) eding out Language specific mechanism like JMS(Java Messaging Service).
  • A saying goes: be liberal in what you accept and be conservative in what you send. that is if you are defining 4 enumeration possibly generated in API request then if in case the 5th one comes up it should log the error and must not fail the application.

Unit 2: Creating micro-service in java

DEFINITION:

Each micro-service should handle a single complete business function.

API-DESIGN

Two strategies: Top-Down and Bottom-Up approach
Mostly used is Top-Down where we first declare the API and then write the corresponding service-code/business code.

SOME NAMING CONVENTION FOR API -DESIGN
  • Use noun and plural instead of verb
PRACTICES FOR MAINTAINING OF CHANGE IN API:
  1. maintain different versions OR,
  2. pass new version in header

Unit 3 : Locating services

Micro-services are designed to be easily scaled . the scaling is done horizontally by scaling individual services.
With multiple-instances of micro-services you need a method to locate the micro-services and load-balancing over different instances of service you are calling.

Following topics involved:

  1. Service registry – consul, zookeeper ,Eureka etc.
  2. Service Invocation -Client-side (sidecar or client Library) or server-side(server proxy)
  3. API gateway
1.Service registry

Service registry is a persistent store that holds all available instances and route on which they can be reached on.

Four reasons why a MICRO-SERVICE must communicate with service-registry:

  1. .Registration – After service deployed. it MUST register with service-registry.
  2. Heart beat – The micro services must send regular heartbeats to show that it is ready to receive requests.
  3. Service discovery In-order to communicate with other services, micro-service must call service-registry to get the list of available instances.
  4. De-registration – when service is down it must be removed from service-registry

Example :
service-registry: consul, Eureka etc.
service-registry are registered by registrar(some service registers have internal to them).
Which Service-Registry to choose:-
Consistency vs Availability:
Most service registries provide partition tolerance and either consistency or availability.A service registry cannot provide all three(tolerance,Consistency, Availability) at same time because of CAP theorem.
(Banking/Finance related service) choose Consistency
Speed(Media Entertainment) choose Availability
For example, Eureka provides availability and both Consul and Apache Zookeeper provide consistency

2.Service Invocation

When a service needs to communicate with another service, it uses the information stored in the service registry. Actual call to micro-service are made either from client-side or from server-side.

Server-Side micro-service calls:

Server-side communication with the micro-service is performed by using a SERVICE PROXY.
SERVICE PROXY is either part of service-registry or a separate service.
micro-services that wants to communicate make a call to WELL KNOWN End points of service proxy .which is responsible for interacting with service registry and forwarding the request.
Load balancing is handled completely on server side.

Disadvantage of Server-Side micro-service calls :
Number of hops is increased.
Advantage:
Testing is easy and proxy can we mocked and tested.

Client-Side micro-service calls:

There is two parts in this process- getting instance(s) of micro-service from service registry and then it makes request to those services.
the micro-service location can be cached on client side to avoid extra path travel in a timeout manner so that if service is down calls don’t fail.
the Request is handled in one of following two ways:
1.Client-library
2.Sidecar

Client Library

It is part of micro-service itself what
it isolates the logic of remote communication. instead of making new we can use 3rd party provided library(Eureka).Other
libraries, such as Netflix Ribbon, provide client-side load balancing.

Sidecar

It is deployed with micro service but runs as a separate process(may be like a config-project corresponding to main project )
A good middle ground between the service proxy and a client library is a sidecar.
Sidecars maintain the service registration with the service registry, and usually also perform

client-side load balancing for outbound calls to other micro-services.
Because sidecars run in their own process, they are independent of language.
Netflix Prana is an open source sidecar that provides the capabilities of the Java based
Ribbon and Eureka client libraries to non-Java applications.

3.API gateway

All requests from external client comes to API-gateway endpoints and then the calls redirected to specific micro service.


UNIT 4: Micro-service communication

Following Topics involved :

  1. Synchronous vs Asynchronous communicate
  2. Fault tolerance (resilience)
Synchronous vs Asynchronous communicate

Synchronous communication is a request that requires a response, whether that be immediately or after some
amount of time. Asynchronous communication is a message where no response is required.

Previous Understanding:
Synchronous communication – one message delivered at a time,and other have to wait until previous once release the resource(thread).

Async support for JAX-RS

Although JAX-RS requests will always be synchronous (you always require a response), you can take advantage of asynchronous programming models. The asynchronous support in JAX-RS 2.0 allows threads to perform other requests while waiting for the HTTP response.

Another option is using reactive libraries like RxJava1. RxJava is a Java implementation of ReactiveX2, a library that is designed to aggregate and filter responses from multiple parallel outbound requests.

Asynchronous messaging (EVENTS)

A request from an external client generally must go through several micro-services before returning a response. If each of these calls is made synchronously, then the total time for the call holds up other requests. The more complex the micro service system and the more interactions between micro-services required per external request, the bigger the resultant latency in your system.
If the requests made in the process can be replaced with asynchronous events, then you should implement the event instead.

The microservices pattern requires each microservice to own its own data. This requirement means when a request comes in, there might be several services that need to update their databases. Services should advertise data changes by using EVENTS that other interested parties can subscribe to.

To coordinate the messaging of your application, you can use any one of the available message brokers and matching client libraries. Some examples that integrate with Java are the AMQP broker6 with the RabibitMQ Java client7, Apache Kafka8, and MQTT9, which is aimed at the Internet of Things space.

In brief:
for better throughput or better response time for ALL THREADS we go by Async Communication.

Internal messaging in Micro-service component

Events can be used within a single microservice.
Using internal events can increase the speed of your response to synchronous requests because you do not have to wait for other synchronous requests to complete before returning a response.
TTake advantage of the Java EE specification support for events. Internal asynchronous events
can be done by using Contexts and Dependency Injection (CDI)1.

Possible Application/Conclusion :
To attain responsiveness from the collective APIs introduce MessageBroker(ACtiveMQ,Kafka) .and increase the no. of threads consumers to more than 1. ,because when all the thread will be added to queue and only 1 thread is picked at a time then rest have to wait. But the incoming Thread don’t have to wait it is put in queue.

It is Kind of partial-utilization of Async feature.(because it says 2 things – the new incoming thread should not have to wait for previous one and already placed should be responding – in this case the new incomming don’t have to wait they are put in queue but collective responsiveness is lacking).

When to use Async and Sync ?

Example : 1.GameOn!
Suppose a user is authenticated to play the game.he sends the request to update room(to accommodate new player) to a service say-mapService. the user is expecting a response whether the request is updated or not.
In this case the Synchronous communication (REST) should be used.
Now the DB has been updated for mapService other micro-services needs to be updated of this change this require no response so it can be done by Event(Asynchronous communication).

In Brief:
Async offer fast processing but synchronous is easy and simple to implement.

Example : 2.On-line retail store.
suppose a user is authenticated and places the Order . the order request to a service needs to respond whether it is successful or not so it’s is synchronous and then the other services which subscriber to this event (like inventory management or the sipping details)will need to update it’s db they don’t require the responding back so these are should be Asynchronous communication.

Fault tolerance

A strong case of micro-service architecture is fault-tolerance and resilience.

Resilient to change:

Consumer OF API:

As a Consumer of API you should Accept unknown attributes
Do not issue an exception if you receive an unexpected variable. If the response contains the information you need, it does not matter what else is provided alongside.
But validate the request against the variables or attributes that you need

Do not validate against variables just because they are provided. If you are not using them
as part of your request, do not rely on them being there.

Producer of API:
  1. Accept unknown attributes as part of the request
  2. Only return attributes that are relevant for the API being invoked
Timeout:

The request must have a timeout. Because we expect services to come and go, we should not be waiting indefinitely for a response.
The asynchronous support in JAX-RS 2.0 has a setTimeout function on the AsynResponse object (see “Synchronous messaging (REST)” on page 34). Tools like Failsafe and Netflix Hystrix have built-in support for timeouts.

CIRCUIT-BREAKERS:

If every-time the timeout issue is occurring then it is not a good way to error out each time, we can give fullback after a limited no. of APIs fail .
The circuit breaker makes a note every time a specific request fails or produces a timeout. If this count reaches a certain level, it prevents further calls from occurring, instead of returning an error instantly
It also incorporates a mechanism for retrying the call, either after a certain amount of time or in reaction to an event.

BulkHeads:

If a compartment of ship fails.it should not sink the whole ship. At each step you should think of removing the dependency.


UNIT 5: HANDLING DATA

Searching Algorithms

  1. Linear search
  2. Binary search

Binary Search

 Search a sorted array by repeatedly dividing the search interval (Divide and conquer )in half. Begin with an interval covering the whole array. If the value of the search key is less than the item in the middle of the interval, narrow the interval to the lower half. Otherwise, narrow it to the upper half. Repeatedly check until the value is found or the interval is empty.

Algorithm – We basically ignore half of the elements just after one comparison.

  1. Compare x(element to be searched for) with the middle element.
  2. If x matches with the middle element, we return the mid index.
  3. Else If x is greater than the mid element, then x can only lie in the right half subarray after the mid element. So we recur for the right half.
  4. Else (x is smaller) recur for the left half.

Two approaches-

  • Iterative – tip – for simplicity of understanding go with this
  • Recursive (ignored as slightly complex to understand and implement )

Iterative approach –

int binarySearch(int arr[], int x)
    {
        int l = 0, r = arr.length - 1;
        while (l <= r) {
            int m = l + (r - l) / 2;

            // Check if x is present at mid
            if (arr[m] == x)
                return m;

            // If x greater, ignore left half
            if (arr[m] < x)
                l = m + 1;

            // If x is smaller, ignore right half
            else
                r = m - 1;
        }

        // if we reach here, then element was
        // not present
        return -1;
    }

JVM

Following topics are covered –

  1. JVM
    1. Overall-view picture
    2. Summery of different parts
    3. Internals Architecture of JVM
  2. JIT
  3. Garbage Collection
    1. Types of JVM GC
    2. FAQs
      1. What are the different sections of Heap and how the different sections are cleared?(Barclays)
    3. GC ALgo
  4. JVM config
  5. JVMHeap Dump
  6. JMM -Java Memory map
JVM Architecture

JVM’s Job –Load & Execute

JVM has various sub components internally.

  1. Class loader sub system: JVM’s class loader sub system performs 3 tasks
    a. It loads .class file into memory.
    b. It verifies byte code instructions.
    c. It allots memory required for the program.
  2. Run time data area: This is the memory resource used by JVM and it is divided into 5 parts
    a. Method area: Method area stores class code and method code. (metaspace in Java SE 8)
    b. Heap: Objects are created on heap.
    c. Java stacks: Java stacks are the places where the Java methods are executed. A Java stack contains frames. On each frame, a separate method is executed.
    d. Program counter registers: The program counter registers store memory address of the instruction to be executed by the micro processor.
    e. Native method stacks: The native method stacks are places where native methods (for example, C language programs) are executed. Native method is a function, which is written in another language other than Java.
  3. Native method interface: Native method interface is a program that connects native methods libraries (C header files) with JVM for executing native methods.
  4. Native method library: holds the native libraries information.
  5. Execution engine: Execution engine contains interpreter and JIT compiler, which converts byte code into machine code. JVM uses optimization technique(Hotspot Profiler) to decide which part to be interpreted and which part to be used with JIT compiler. The HotSpot represent the block of code(frequent) executed by JIT compiler.

Python Testing

Python : Choosing a Test Runner

There are many test runners available for Python. The three most popular test runners are:

  1. unittest
  2. nose or nose2
  3. pytest
PyUnit/ unittest

The one built into the Python standard library is called unittest. In this tutorial, you will be using unittest test cases and the unittest test runner. The principles of unittest are easily portable to other frameworks.

unit-test requires that:
  • You put your tests into classes as methods
  • You use a series of special assertion methods in the unittest.TestCase class instead of the built-in assert statement

To convert the test to a unittest test case, you would have to:

  1. Import unittest from the standard library
  2. Create a class called TestSum that inherits from the TestCase class
  3. Convert the test functions into methods by adding self as the first argument
  4. Change the assertions to use the self.assertEqual() method on the TestCase class
  5. Change the command-line entry point to call unittest.main()

Note: Be careful if you’re writing test cases that need to execute in both Python 2 and 3. In Python 2.7 and below, unittest is called unittest2. If you simply import from unittest, you will get different versions with different features between Python 2 and 3.

Resource: https://docs.python.org/2/library/unittest.html

The Python unit testing framework, sometimes referred to as “PyUnit,” is a Python language version of Junit

unittest module supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework.

To achieve this, unittest supports some important concepts:

test fixture

A test fixture represents the preparation needed to perform one or more tests, and any associate cleanup actions.

test case

A test case checks for a specific response to a particular set of inputs. unittest provides a base class, TestCase, which may be used to create new test cases.

test suite

A test suite is a collection of test cases, test suites, or both. It is used to aggregate tests that should be executed together.

test runner

A test runner is a component which orchestrates the execution of tests and provides the outcome to the user. The runner may use a graphical interface, a textual interface, or return a special value to indicate the results of executing the tests.

A test runner is an object that provides a single method, run(), which accepts a TestCase or TestSuite object as a parameter, and returns a result object. The class TestResult is provided for use as the result object. unittest provides the TextTestRunner as an example test runner which reports test results on the standard error stream by default. Alternate runners can be implemented for other environments (such as graphical environments) without any need to derive from a specific class.

Maintainable code using library

Check your code style

PEP 8 is the Python code style guide, and it sets out rules for things like line length, indentation, multi-line expressions, and naming conventions

1. Pylint

Pylint is a library that checks for PEP 8 style violations and common errors.

can also be run from the command line.

To install, run pip install pylint.

To use Pylint from the command line, run pylint

path/to/dir or pylint [options] path/to/module.py. Pylint will output warnings about style violations and other errors to the console.

You can customize what errors Pylint checks for with a configuration file called pylintrc.

Outsource your code style

Remembering to run linters manually from the command line for each file you change is a pain,

A great solution is to use a library that automatically reformats your code into something that passes PEP 8 for you.

1.Autopep8

Autopep8 automatically formats the code in the module you specify. It will re-indent lines, fix indentation, remove extraneous whitespace, and refactor common comparison mistakes (like with booleans and None).

To install, run pip install --upgrade autopep8.

To reformat code in place, run autopep8 --in-place --aggressive --aggressive <filename>. The aggressive flags (and the number of them) indicate how much control you want to give autopep8 over your code style

 

Check your test coverage

You’re writing tests, right? Then you will want to make sure new code committed to your codebase is tested and doesn’t drop your overall amount of test coverage.

For measuring test coverage, we have one recommendation: Coverage.

Coverage has several options for the way it reports your test coverage to you, including outputting results to the console or to an HTML page and indicating which line numbers are missing test coverage

To install, run pip install coverage. To run a program and see its output,

 run coverage run [path/to/module.py] [args], and you will see your program’s output.

To see a report of which lines of code are missing coverage, run coverage report -m.

Core Python

Following topics are covered

  1. Introduction
  2. Logging
  3. Exception Handling
  4. Searching & Regex
  5. Packaging
  6. File handling & problem solution

Introduction


Logging:

Basics of using the logging module to record the events in a file are very simple.
For that, simply import the module from the library.

  1. Create and configure the logger. It can have several parameters. But importantly, pass the name of the file in which you want to record the events.
  2. Here the format of the logger can also be set. By default, the file works in append mode but we can change that to write mode if required.
  3. Also, the level of the logger can be set which acts as the threshold for tracking based on the numeric values assigned to each level.
    There are several attributes which can be passed as parameters.
  4. The list of all those parameters is given in Python Library. The user can choose the required attribute according to the requirement.

After that, create an object and use the various methods as shown in the example.

#importing module
import logging
#Create and configure logger
FORMAT = '%(asctime)-15s %(clientip)s %(user)-8s %(levelname)s %(message)s'
logging.basicConfig(filename="newfile.log",format= FORMAT,filemode='w')
#Creating an object
logger=logging.getLogger()
#Setting the threshold of logger to DEBUG
logger.setLevel(logging.DEBUG)
#Test messages
logger.debug("Harmless debug Message")
logger.info("Just an information")
logger.warning("Its a Warning")
logger.error("Did you try to divide by zero")
logger.critical("Internet is down")



The above code will generate a file with the provided name

Configuration:

Standard attributes names:

Attribute nameFormatDescription
argsYou shouldn’t need to format this yourself.The tuple of arguments merged into msg to produce message, or a dict whose values are used for the merge (when there is only one argument, and it is a dictionary).
asctime%(asctime)sHuman-readable time when the LogRecord was created. By default this is of the form ‘2003-07-08 16:49:45,896’ (the numbers after the comma are millisecond portion of the time).
created%(created)fTime when the LogRecord was created (as returned by time.time()).
exc_infoYou shouldn’t need to format this yourself.Exception tuple (à la sys.exc_info) or, if no exception has occurred, None.
filename%(filename)sFilename portion of pathname.
funcName%(funcName)sName of function containing the logging call.
levelname%(levelname)sText logging level for the message (‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’).
levelno%(levelno)sNumeric logging level for the message (DEBUG, INFO, WARNING, ERROR, CRITICAL).
lineno%(lineno)dSource line number where the logging call was issued (if available).
message%(message)sThe logged message, computed as msg % args. This is set when Formatter.format() is invoked.
module%(module)sModule (name portion of filename).
msecs%(msecs)dMillisecond portion of the time when the LogRecord was created.
msgYou shouldn’t need to format this yourself.The format string passed in the original logging call. Merged with args to produce message, or an arbitrary object (see Using arbitrary objects as messages).
name%(name)sName of the logger used to log the call.
pathname%(pathname)sFull pathname of the source file where the logging call was issued (if available).
process%(process)dProcess ID (if available).
processName%(processName)sProcess name (if available).
relativeCreated%(relativeCreated)dTime in milliseconds when the LogRecord was created, relative to the time the logging module was loaded.
stack_infoYou shouldn’t need to format this yourself.Stack frame information (where available) from the bottom of the stack in the current thread, up to and including the stack frame of the logging call which resulted in the creation of this record.
thread%(thread)dThread ID (if available).
threadName%(threadName)sThread name (if available).

Logging Variable Data

1. this can actually be done directly by using a format string for the message and appending the variable data as arguments

import logging

name = ‘John’

logging.error(‘%s raised an error’, name)

Lazy Logging:

Issue:

log.debug(“I found {} and {}”, getone(), gettwo());

Seems, pretty good. A debug log with two parameters into the String.

Resources:

  1. https://docs.python.org/3/howto/logging-cookbook.html
  2. https://realpython.com/python-loggin/

Exception Handling:

syntax error-

missing : after ‘if’ or ‘for’

Exception :

Errors detected during execution are called exceptions

Handling Exceptions:

The try statement works as follows.

  • First, the try clause (the statement(s) between the try and except keywords) is executed.
  • If no exception occurs, the except clause is skipped and execution of the try statement is finished.
  • If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the except keyword, the except clause is executed, and then execution continues after the try statement.
  • If an exception occurs which does not match the exception named in the except clause, it is passed on to outer try statements; if no handler is found, it is an unhandled exception and execution stops with a message as shown above.
Syntax-
try:
x = int(input("Please enter a number: "))
break
except ValueError:
print("Oops!  That was no valid number.  Try again..

Note :Above program expects an integer ,If provided any other datatype it complains/Throws Error.

A try statement may have more than one except clause, to specify handlers for different exceptions. At most one handler will be executed. Handlers only handle exceptions that occur in the corresponding try clause, not in other handlers of the same try statement. An except clause may name multiple exceptions as a parenthesized tuple, for example

 except (RuntimeError, TypeError, NameError):
Order or evoking exceptions:

A class in an except clause is compatible with an exception if it is the same class or a base class thereof (but not the other way around — an except clause listing a derived class is not compatible with a base class). For example, the following code will print B, C, D in that order:

class B(Exception):
 pass
class C(B):
 pass
class D(C):
 pass
for cls in [B, C, D]:
 try:
   raise cls()
 except D:
  print("D")
 except C:
     print("C")
 except B:
 print("B")

Note that if the except clauses were reversed (with exceptB first), it would have printed B, B, B — the first matching except clause is triggered.

Summary: an except clause can capture only those exception class which are Base(or smaller footprint than that of raise exception class)

Supreme Exception Handler:

The last except clause may omit the exception name(s), to serve as a wildcard. Use this with extreme caution, since it is easy to mask a real programming error in this way! It can also be used to print an error message and then re-raise the exception (allowing a caller to handle the exception as well):

import sys
try:
   f = open('myfile.txt')
   s = f.readline()
   i = int(s.strip())
except OSError as err:
   print("OS error: {0}".format(err))
except ValueError:
 print("Could not convert data to an integer.")
except:
 print("Unexpected error:", sys.exc_info()[0])
 raise

The tryexcept statement has an optional else clause, which, when present, must follow all except clauses. It is useful for code that must be executed if the try clause does not raise an exception. For example:

for arg in sys.argv[1:]:
 try:
 f = open(arg, 'r')
 except OSError:
   print('cannot open', arg)
else:
      print(arg, 'has', len(f.readlines()), 'lines')
 f.close()

The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the try … except statement.



Regular Expression module:

re — Regular expression operations

Both patterns and strings to be searched can be Unicode strings as well as 8-bit strings. However, Unicode strings and 8-bit strings cannot be mixed to match a literal backslash, one might have to write ‘\\\\’ as the pattern string, because the regular expression must be \\, and each backslash must be expressed as \\ inside a regular Python string literal.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with ‘r’. So r”\n” is a two-character string containing ‘\’ and ‘n’, while “\n” is a one-character string containing a newline

search() vs. match()

Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string.

Example:

>>> re.match("c", "abcdef")  # No match
>>> re.search("c", "abcdef") # Match

however that in MULTILINE mode match() only matches at the beginning of the string, whereas using search() with a regular expression beginning with ‘^’ will match at the beginning of each line.

>>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
FINDING all Adverbs

For example, if one was a writer and wanted to find all of the adverbs in some text, he or she might use findall() in the following manner:

>>> text = "He was carefully disguised but captured quickly by police."
>>> re.findall(r"\w+ly", text)
['carefully', 'quickly']

Use case: parsing log file

import sys
import re
logFileName = open(sys.argv[1], "r")
putputFileName = "parsedLogOutput.txt"
outputfile = open(putputFileName, "w")
lines = logFileName.readlines()
regex = "2019-08-12 03:00:*"
compiledReg = re.compile(regex)
for line in lines:
   if re.match(compiledReg, line, flags=0):
        outputfile.write(line + "\n")
  print(line)

Package and making a module deliverable

Each package in Python is a directory which MUST contain a special file called __init__.py. This file can be empty, and it indicates that the directory it contains is a Python package, so it can be imported the same way a module can be imported.

Example:

we create a directory called foo, which marks the package name, we can then create a module inside that package called bar. We also must not forget to add the __init__.py file inside the foo directory.

To use the module bar, we can import it in two ways:
import foo.bar

Tip:Listing functions of module:

# re is a module and following is script to list all functions:

Import re
find_members = []
for member in dir(re):
    if "find" in member:
        find_members.append(member)
print(sorted(find_members))

Problem Specific code snippets:

1.Run a linux command

 make a command
makecmd = "ls -lrt"
# Execute the command
cwd = subprocess.Popen(makecmd, stdout=subprocess.PIPE, shell=True)
(output, err) = cwd.communicate()
# print the output
print("output: " + str(output))
  1. File Handling
    1. Text File
    2. Excel File

There are different modules for handling excel file like :

XlsxWriter(only for writing),openpyxl module(rw all variation of excels), xlwt module, panda module

Working with xlwt- in pyhton

Step 1-Install module:

pip install xlwt

Step 2-code snippet to write to an xml and apply style:

# importing xlwt module
import xlwt
workbook = xlwt.Workbook() 
sheet = workbook.add_sheet("Sheet Name")
# Applying multiple styles
style = xlwt.easyxf('font: bold 1, color red;')
# Writing on specified sheet
sheet.write(0, 0, 'SAMPLE', style)
workbook.save("sample.xls")
Working with panda for excel handling-

Python Pandas is a data analysis library. It can read, filter and re-arrange small and large datasets and output them in a range of formats including Excel.

Note: If only purpose is writing to excel use native module instead of pandas which internally uses it as it will be over-burden.

Pandas writes Excel files using the XlsxWriter modules

So first needs to install xlsxWriter:

Pip install xlsxWriter

Classes and Interface

Memory Map

This is section is about Interface vs Class and in-depth content on usage of each.

Following rules are described:
  1. Item 15: Minimize the accessibility of classes and members
  2. Item 16: In public classes, use accessor methods, not public fields
  3. Item 17: Minimize mutability
  4. Item 18: Favor composition over inheritance
  5. Item 19: Design and document for inheritance or else prohibit it
  6. Item 20: Prefer interfaces to abstract classes
  7. Item 21: Design interfaces for posterity
  8. Item 22: Use interfaces only to define types
  9. Item 23: Prefer class hierarchies to tagged classes
  10. Item 24: Flavor static member classes over non-static
  11. Item 25: Limit source files to a single top-level class

Highlights of section

Terms
Implementation inheritance- when a class extends another class and
Interface inheritance– when a class implements interface or an interface extends another interface.

Loopholes in using Inheritance:

  • [WRONG Statement]The problems with overriding methods-
    1. Ambiguity in case of lack of Documentation – The case of counting elements of Hashset<>:
      1. there are 2 mehods adding elments to hashset like: add() and addAll()
      2. the case is- that we have to count number of elements in hash set .
      3. we add 3 elements using addall(),and then when we call to user defined function count it RETURNS 6.
      4. Blunder !!!! so internally addAll calls add() method and so it us inserted twcie . this issue occured beasue[Conclusion] there is no documentation for the fact that addAll() call add internally.
    2. [The case of having predicate before adding in a: ]
      • the problem in inheritance in which method is added not overridden. Suppose you add a method and then in later releases java library also releases method with same signature in super class then your class will be fragile class.
summary :


Use inheritance after aptly analysing if there exists IS-A relationship then class B can extends class A i.e class-B IS-A class-A.
Inherit a class only if it made to be inherited else have proper documentation to avoid blunder as we saw in case 1