Skip to main content

10 study areas for the AWS Certified Data Analytics – Specialty exam

As a solutions architect at AWS, I have spent the past few years providing technical guidance to many AWS customers as they designed and built cloud-based data architectures. Prior to AWS, I held various positions in the data space, ranging from data engineering to machine learning. I considered it my area of depth. But as I continued to work with a greater variety of AWS customers, I also saw an increasing variety of data patterns, sources, tools, and requirements. I pursued the AWS Certified Data Analytics – Specialty certification in order to deepen my knowledge across all analytics domains.

In this blog, I will share how I prepared for the AWS Certified Data Analytics – Specialty exam. Earning the credential will validate your expertise in designing data solutions and using analytics services to derive insights from data. This credential also helps organizations identify and develop talent with critical skills for implementing cloud initiatives.

I have found that I am better prepared to help customers with design considerations when building data architectures on AWS after achieving the certification. If you have experience working with AWS services to build data solutions, keep reading to learn what to expect during the exam and how to prepare.

Areas of study

The AWS Certified Data Analytics – Specialty exam is one of 12 AWS Certifications offered, and one of six at the Specialty-level. The exam includes questions that test your understanding of how to use AWS services to design, build, secure, and maintain analytics solutions. You will need to understand how the services integrate with one another as part of the broader data lifecycle of collection, storage, processing, and visualization.

From my experience preparing for and earning this certification, I recommend that candidates for this exam focus on these 10 areas of study. Each study area includes a (non-exhaustive) list of resources that illustrate the concept and what you can expect on the exam.

1. Architecture patterns and design principles

The exam goes beyond service recall and requires you to analyze patterns and select the most appropriate solution. Start by orienting yourself with high-level design recommendations, common architectural patterns, and the logic behind them. Many of the analytics patterns revolve around the Modern Data Architecture framework. The Modern Data Architecture advocates for a centralized data lake, surrounded by purpose-built data services with seamless data movement between them. The Data Analytics Lens of the AWS Well-Architected Framework provides key characteristics and considerations for the most common scenarios, including the modern data architecture, data mesh, batch data processing, streaming ingestion and processing, operational analytics, and data visualization.

Additionally, this re:Invent presentation from my colleague Ben Snively provides a great refresher on frequently encountered architectural patterns and best practices.

Consider the following resources:

2. Concepts and AWS services for the five domains of analytics

The exam classifies questions into five domains: 1) Collection, 2) Storage and Data Management, 3) Processing, 4) Analysis and visualization, and 5) Security. Often, analytics professionals specialize in one of these domains more than in others. Now is the time to dive deep into analytics concepts and AWS Analytics services you may not know well enough.

For example, the ‘collection’ domain includes questions on Amazon Kinesis (Data Streams, Firehose, and Data Analytics), Amazon Managed (and self-hosted) Kafka, Amazon DynamoDB Streams, Amazon Simple Queue Services, Amazon Database Migration Service, AWS Snowball, and AWS Direct Connect. You should understand the characteristics and use cases for these services and how they differ from one another. You should also understand key data architecture design concepts, such as data order, format, and compression.

To learn about a service, or increase your depth of knowledge, read the service FAQs, the developer or management guide, and consider a hands-on lab or class from AWS Training. Most guides have tutorials, and AWS also offers self-paced labs and immersion days.

Consider the following resources:

The rest of the study areas below include key themes on the exam that you should understand across all AWS Analytics and related services, with links to information that illustrate the concept.

3. Data movement integrations between services

A modern data architecture requires seamless data movement between data producers, processing applications, data lakes, and purpose-built data stores. When choosing a data movement or processing step, it is critical to validate that it will support the required data source and destination(s) at the required cadence. Expect many real-time, near-real-time, event-driven, and scheduled distinctions. Beyond knowing which integrations exist, the exam will expect you to know how they work and key considerations when using them.

Consider the following (non-exhaustive) list of resources:

4. Data access integration between services

AWS advocates for a data architecture that leverages purpose-built data stores, democratizes data access, and uses the right tool for the job. A data platform that implements these principles will also need to enable data access from these various data stores, and for a variety of downstream users. Most tools support Amazon S3 (typically used as the data lake), and many services offer capabilities like federated queries to support “around the perimeter” data access between services. The exam will ask questions about these integrations and how to implement them.

Consider the following resources:

5. Common analytical query scenarios

Ultimately, organizations invest in data infrastructure in order to derive actionable insights from their data. The exam will ask analysis questions, streaming analytics, log analytics, data visualizations, and machine learning. Note that many AWS Analytics services offer built-in machine learning capabilities, and you should know them.

Consider the following resources:

6. Managing, scaling, and updating applications

The volume and velocity of data that organizations are storing, processing, and querying is increasing at an exponential rate. Over time, many organizations that start with terabytes of data need to scale to handle petabytes or even exabytes of data. Cloud-native analytics approaches offer elasticity to respond to changing scale requirements and mechanisms to decrease the management overhead and cost. The exam will expect you to understand how to implement them. AWS has also added (a growing list of) many new serverless options in the analytics space, and you should know which services offer serverless options and how to use them.

Consider the following resources:

7. Data partitioning and distribution strategies

Distributing chunks of data to enable parallel processing is a key scaling concept for almost all data services. Amazon Kinesis has shards and partitions, Amazon OpenSearch has indices and shards, big data processing tools like Apache Spark have partitions, Amazon RedShift has distribution keys, Amazon QuickSight has SPICE (Super-fast, Parallel, In-memory Calculation Engine), and so on. For all services, you should be very familiar with the partitioning strategies, recommended sizes, and how to optimize them for performance.

Consider the following resources:

8. Security and compliance

Cloud security is the highest priority at AWS. When talking about analytics workloads, security includes classifying sensitive data, protecting data at-rest and in-transit, controlling data access, controlling infrastructure access, and auditing. Classic AWS security concepts and services are important here, such as encryption, Amazon VPC, AWS IAM policies, and AWS CloudTrail. There are also analytics-specific data governance tools like AWS Lake Formation, Athena Workgroups, and Amazon QuickSight users.

Consider the following resources:

9. Monitoring and troubleshooting Analytics workloads

Monitoring is an important part of maintaining the reliability, availability, and performance of AWS Analytics services. Amazon CloudWatch monitors many key metrics for analytics services. You should know which metrics are most important for each service, as well as common problems and how to fix them. Some services have additional monitoring dashboards (particularly Spark-based workloads, like those in AWS Glue).

Consider the following resources:

10. Amazon S3

Amazon S3 acts as the foundation for all data platforms built on AWS. Amazon S3 is a flexible, durable, highly available, low cost, and almost infinitely scalable data store. It is a service that is prominently featured in data architectures, and in the exam. As a data architect, you need to understand lifecycle policies, integrations, optimum storage patterns, security, access patterns, and cross-regional data transfer. For example, Amazon Athena cannot read data stored in the S3 Glacier Storage class.

Consider the following resources:

Get hands-on

Finally, there is no substitute for getting hands-on with AWS services to strengthen your understanding. As part of my preparation for the exam, I built several streaming and batch data ingestion architectures within my AWS account. If you haven’t done it yet, sign up for a training account and take advantage of on-demand digital courses on AWS Skill Builder, virtual/in-person instructor-led classroom training, virtual webinars, and an exam-readiness course. The AWS Certified Data Analytics – Specialty exam page can also help you build a plan to prepare.

The value of AWS Certification

Organizations in every industry want to accelerate decision making in today’s complex and disrupted business landscape. There is a need for technology professionals who understand how to leverage AWS’ elastic data processing services to support these business outcomes. The AWS Certified Data Analytics – Specialty certification presents IT or engineering professionals with the opportunity to validate their knowledge and show that they understand how to design cost-efficient, secure, and high-performance data processing architectures on AWS. Preparing for a certification exam is an excellent way to reinforce your knowledge of any technology. I hope you consider pursuing this exam and experience similar benefits. Best of luck!

Read more...