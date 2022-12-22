NOTE: All opinions are my own and do not reflect my employer’s

I recently had the opportunity to take the AWS Machine Learning Specialty exam, and I wanted to share my experience and some tips that I found helpful in preparing for the exam.

10% of the exam focuses on analytics services so it’s crucial to get a good grasp of data ingestion, data processing, and data visualization technologies. Read more about it here.

In addition to this foundation, it’s also important to have hands-on experience with machine learning on the AWS platform. This includes using AI services such as Amazon SageMaker , Amazon Comprehend , Amazon Kendra (etc) as well as understanding how to deploy machine learning models in a production environment.

First and foremost, it’s important to have a solid foundation in machine learning concepts and techniques. This includes understanding the different types of algorithms, how they work, and when to use them.

To prepare for the exam, I focused on studying a few core pieces of the material and doing practice exams instead of diving into every minute detail, as I found this approach to be more time efficient.

(1) Built-In Algorithms

To prepare for the exam, I studied the many built-in algorithms by breaking them down into supervised and unsupervised categories, and then listing out common use cases for each. This approach helped me to better identify patterns in the exam and how they could be applied in different situations. NOTE: The image I made below does not comprise of ALL the Built In Algorithms on AWS. Refer here for the full list.

(2) Concept of Underfitting vs Overfitting

Understanding the differences between underfitting and overfitting is critical in the field of machine learning, as it can have a significant impact on the accuracy and effectiveness of a model. Underfitting occurs when a model is too simple and is unable to capture the complexity of the data, leading to poor performance on both training and test data. On the other hand, overfitting occurs when a model is too complex and is able to fit the training data perfectly, but performs poorly on test data.

I created the table below as a quick reference to answer questions during the exam as it came up quite a bit during the practice exams!

(3) ML Classification Metrics

There are several common metrics that are used to evaluate the performance of a machine learning classification model:

Accuracy: This is the most straightforward metric, and it simply measures the percentage of predictions that the model got right.

2. Precision: This measures the proportion of positive predictions that were actually correct.

3. Recall: This measures the proportion of actual positive cases that the model was able to correctly identify.

4. F1 Score: This is a weighted average of precision and recall, and is often used as a single metric to compare classifiers.

5. AUC-ROC Curve: This is a graphical representation of the model’s performance, and shows the trade-off between the true positive rate and false positive rate.

6. Confusion Matrix: This is a table that shows the number of true positive, true negative, false positive, and false negative predictions made by the model.

(4) Amazon Sagemaker

CC: Amazon Science

Amazon SageMaker is a fully managed machine learning platform that provides a range of tools and services for building, training, and deploying machine learning models. I prioritized learning this service the most.

The Amazon Sagemaker playlist on YouTube is a great resource for learning about machine learning and how to use Amazon Sagemaker to build and deploy models.

The official AWS documentation is also a valuable resource, as it provides detailed information about the various features and capabilities of Amazon Sagemaker. If you prefer a more hands-on approach, Stephan Maarek’s course on Udemy is a great option, as it provides a step-by-step guide to building and deploying machine learning models using Amazon Sagemaker. Additionally, the Amazon Sagemaker Immersion Day is a fantastic opportunity to learn about all of the core features of the platform and get some hands-on experience working with it.

(5) Feature Engineering Techniques

Feature engineering is an important step in the machine learning process because the quality of the features used to train a model can significantly impact its performance. Good features can help a model make more accurate predictions, while poor or irrelevant features can hinder its ability to learn and make effective decisions. By carefully selecting and creating relevant features, we can give our model a better understanding of the problem we are trying to solve and improve its ability to generalize to new, unseen data. In addition, feature engineering can help to reduce the complexity of the model, which can improve its interpretability and make it easier to deploy in real-world applications.

Check out the table I made below to make all the techniques digestible, main thing is to identify the key words!

(6) AWS AI services

It is important to understand the various AWS AI services and their capabilities in order to effectively put together a workflow for a specific scenario. For example, if a company does not have access to other data science resources but still wants to extract customer sentiment from news articles, Amazon Comprehend may be a good choice. This service uses natural language processing to analyze text and identify sentiments, making it well-suited for this task. However, if the company also wanted to analyze data from social media platforms, Amazon Rekognition might be a better fit as it specializes in image and video analysis. Knowing the capabilities of each service allows you to select the most appropriate tools for the task at hand, ensuring that your workflow is efficient and effective.