| 1 | Amazon Kendra | Text and document | handles document search queries |
| Amazon Personalize | Product recommendation | generate real-time recommendations based on item interaction data, also this can be integrated to lambda and though lambda others can call e.g lex for personalised chat . |
| 3 | SageMaker Model Monitor | Model monitoring – Accuracy | Monitoring deployed models – continuously tracks the accuracy of deployed models, and can trigger retraining workflows when model drift or decreased accuracy is detected. Note – 1.Monitoring for issues like data drift, helping maintain model performance over time is its feature. 2.Not for evaluation of training performance. 3.SageMaker Model Monitor uses ground truth ingestion to merge the actual outcomes (ground truth) with the model’s predictions to evaluate the model’s performance in production. |
| 4 | SageMaker Neo | Model Optimisation | for model optimization |
| 5 | SageMaker Feature Store | | managing features |
| 6 | SageMaker Autopilot | | automates the entire ML workflow from data preprocessing to model training and tuning |
| 7 | SageMaker Experiments | | tracks each experiment. It Captures and organizes metadata for trials to ensure reproducibility. Features : – SageMaker Experiments offers both an API and a graphical interface in SageMaker Studio, which allows users to visualize and compare key performance metrics, such as accuracy and loss, across different trials to identify the best-performing model. – In SageMaker Experiments, trial components are used to represent different stages or steps of a workflow within a trial, such as data preprocessing, model training, and evaluation. This allows the ML engineer to track each step separately and compare results across trials. |
| 8 | SageMaker Pipelines | | orchestrates complex workflows, ensuring automation and reproducibility. |
| 9 | AWS Glue | | -ETL processes ,not real-time stream processing. -AWS Glue FindMatches to automatically detect and group duplicate records in the dataset
AWS Glue jobs cannot be directly integrated as processing steps in SageMaker Pipelines because Glue is an independent service, not a SageMaker-native processing task.
AWS Glue crawlers to infer the schemas and available columns. – AWS Glue crawlers can analyze the .csv files in Amazon S3 and automatically infer the schema and structure of the data. This creates a table definition in the AWS Glue Data Catalog, enabling the data to be organized and understood.
AWS Glue DataBrew for data cleaning and feature engineering.- AWS Glue DataBrew is a visual data preparation tool that allows the ML engineer to clean and preprocess the .csv data. Tasks such as filling in missing fields, transforming formats, and normalizing data can be performed easily with Glue DataBrew. |
| 10 | The Analyze Lending Workflow API | | Specialised for mortgage documents |
| 11 | Analyze Expenses API | | for financial data – focuses on invoices and receipts etc., not legal contracts |
| 12 | The Analyze ID API | | For analysing specific to identity documents, not legal contracts or others. |
| 13 | Analyze Document API | | extracts key-value pairs and tables from document and for enhanced feature – custom queries can be used to categorize documents based on specific criteria. |
| 14 | Amazon Macie | | PII – automated data classification and sensitive data discovery jobs that can be scheduled to run regularly. |
| 15 | Amazon GuardDuty | Security threats | |
| 16 | AWS CloudTrail | | logs access and actions performed on the model OR any other service |
| 17 | AWS KMS | | used to encrypt both the training data stored in Amazon S3 and the model artifacts, ensuring compliance with healthcare data privacy regulations. i.e training data and deployed models |
| 18 | SageMaker Ground Truth | Labelling | Features : – Active learning in SageMaker Ground Truth uses a combination of ML models to automatically label simpler cases, while more complex cases are sent to human workers, helping reduce costs. – Private Workforce – refers to an internal team used for labeling but does not involve automation. |
| 19 | Kinesis Data Streams | Collect and stream Real Time Data | For collecting and streaming data from IoT devices.Real time data and can be forwarded to lambda for analysis |
| Kinesis Data Analytics | | allows for the immediate processing of high-throughput streams with SQL or Apache Flink applications |
| 20 | AWS Shield Standard | | |
| 21 | AWS Shield Advanced | | Shield Advanced offers enhanced protection, detailed attack diagnostics, and cost protection against scaling charges due to DDoS attacks. |
| 22 | Amazon Transcribe | Audio — > text | analyse audio recordings from phone calls. Features : Batch transcription is useful for large volumes, Custom vocabulary improves transcription accuracy for industry-specific terms/jargon. Auto punctuation ensures transcripts are readable. speaker identification distinguishes between speakers. |
| 23 | SageMaker Clarify | Model Bias | Features – – Post-training bias detection helps analyze biases that may be present in the model predictions after training. – SageMaker Clarify offers pre-training bias detection, allowing for the identification of biases in input data before model training.
– SageMaker Clarify uses SHAP (Shapley Additive Explanations) values, a game theory-based method, to calculate how individual features impact a model’s prediction, providing both global and local explanations. This is crucial for explainability and compliance. Note- LIME is another method for explainability, but SageMaker Clarify specifically uses SHAP values. |
| 24 | QuickSight | | QuickSight is used for data visualization |
| 25 | SageMaker Debugger | | Provides built-in rules to detect disappearing gradients and overfitting, enabling real-time monitoring of the training process. |
| 26 | SageMaker RL estimator | Model Training | The SageMaker RL estimator helps you easily train models in local mode, allowing for quick iteration during development. |
| 27 | Data wrangler | | Features: – One-hot encoding missing values : for converting categorical data to numerical values, not for handling missing values. – Scaling the missing values : Scaling changes the range of numerical data but doesn’t address missing values. – Imputing missing values using pandas or PySpark- Custom transformations in SageMaker Data Wrangler allow using libraries like pandas or PySpark to impute missing values based on the mean, median, or more complex methods. – Random Undersampling -andom undersampling is one of the techniques used in SageMaker Data Wrangler to handle class imbalance by reducing the number of samples in the majority class. – Random oversampling – for class impabalce – add duplicates https://aws.amazon.com/blogs/machine-learning/balance-your-data-for-machine-learning-with-amazon-sagemaker-data-wrangler/
t does not perform large-scale ETL operations or automate schema inference from on-premises databases.
SageMaker Data Wrangler’s corrupt image transform is specifically designed to simulate real-world image imperfections, such as noise, blurriness, or resolution changes, during the preprocessing stage.
SageMaker Data Wrangler’s outlier detection -hile the outlier detection transform in SageMaker Data Wrangler is useful for identifying and removing anomalous data points in numerical datasets, it is not designed for handling image quality variations.
|
| 28 | Amazon Bedrock | chatbot | features – Fine-tuning models with custom data :Fine-tuning allows the company to customize the foundation model with their own data, improving its relevance and accuracy for specific domain-related queries. RAG for enhanced knowledge base integration : RAG helps by integrating a knowledge base into the model, allowing for more accurate and context-aware responses based on private data. Data encryption in transit and at rest-Amazon Bedrock ensures that data, including model prompts and responses, are encrypted both in transit and at rest, ensuring secure handling of sensitive information.
|
| 29 | Amazon SageMaker Studio Notebooks | | Features – SageMaker Studio Notebooks provide persistent storage, allowing users to manage multiple notebooks, store datasets, and access them later. This enables better management of machine learning projects over time. – Traditional NB doesn’t offer storage |
| 30 | Amazon Rekognition | Image | Features : – Unsafe content detection, label detection, and face comparison |
| 31 | SageMaker Feature Store | | Features: Offline Store– The offline store is for batch processing and storage, not for tracking the evolution of features. Feature Versioning- Feature versioning allows tracking changes to features over time, ensuring that models can be reproduced accurately by maintaining a history of the features. Feature Scaling- Feature scaling adjusts the range of values in features Lineage Tracking– Lineage tracking provides transparency into feature creation |
| 32 | Amazon Comprehend | Text analysis | Features : Amazon Comprehend for general feedback analysis . Amazon Comprehend Medical to process and extract relevant insights from medical feedback. |
| 33 | SageMaker Pipelines, | | The relationships between steps in SageMaker Pipelines are defined using a directed acyclic graph (DAG). This structure outlines the dependencies and sequence of each step in the pipeline.
SageMaker Pipelines callback steps are specifically designed to integrate external processes into the SageMaker pipeline workflow. By using a callback step, the SageMaker pipeline waits until the AWS Glue jobs complete. |
| 34 | Amazon A2I -Amazon Augmented AI | | Amazon A2I can be directly integrated with Amazon Textract to route low-confidence predictions to human reviewers, simplifying the review process |
| 35 | SageMaker batch transform job | | A SageMaker batch transform job is used to run inference on large datasets with trained models, without deploying a persistent endpoint. Additional : Flow : s3event->cloudwatchevent/eventBridge–> SM batchTRanformjob–>SM pipeline (automated data prep , learning etc.) |
| 36 | Amazon Athena | | used for querying data, not triggering inference jobs. |
| 37 | Amazon Q Business | | Features: – Integration with Jira through Amazon Q Business Plugins – RAG – for accuracy – Natural Language Understanding (NLU) – Related to understanding text. -Security : 1. enable data encryption for sensitive information. 2.Access control – Integrate Amazon Q Business with AWS Identity and Access Management (IAM) Identity Center to manage user permissions |
| 38 | Amazon Fraud Detector | | (Deprecated 7NOv2025)
Features – has built-in scalability . |
| 39 | Amazon SageMaker Lineage Tracking | Artifact | is used to track the lineage of artifacts (e.g., datasets, models, and experiments) within an ML workflow. It provides visibility into the relationships between components, such as which dataset and training job produced a specific model version |
| 40 | AWS Lake Formation | | specifically designed for aggregating and managing large datasets from various sources, including Amazon S3, databases, and other on-premises or cloud-based sources. upports connecting to on-premises PostgreSQL databases and Amazon S3, making it the best choice for aggregating transaction logs, customer profiles, and database tables. Additionally, the centralized data lake can be used for further analysis and ML training. |
| 41 | SageMaker Model Registry | | Key Benefits of SageMaker Model Registry collections : Non-disruptive reorganization using collections. Better model management and discoverability at scale. |
| 42 | SageMaker ML Lineage Tracking | | Automatically tracks lineage information, including input datasets, model artifacts, and inference endpoints, ensuring compliance and auditability. |
| 43 | | | |
| 44 | | | |
| | | |