AWS – Cloud Content

Srinivas Lakkireddy
Comments Off on Apache Spark Partitions – AWS S3
February 17, 2024

Apache Spark Partitions – AWS S3

Reading data from AWS S3 When Apache Spark reads data from Amazon S3 (Simple Storage Service), the process of creating partitions is different from reading data from HDFS. In the case of S3, Spark does not directly align partitions with the concept of HDFS blocks, as there is no block-based storage system like HDFS in

Srinivas Lakkireddy
Comments Off on Apache Spark Partitions – HDFS
February 17, 2024

Apache Spark Partitions – HDFS

Reading data from HDFS When Apache Spark reads data from Hadoop Distributed File System (HDFS), the process of creating partitions is influenced by several factors. Here’s an overview of how Spark creates partitions when reading data from HDFS. HDFS Blocks The primary storage unit in HDFS is a block. By default, these blocks are commonly

Srinivas Lakkireddy
Comments Off on AWS Glue Job vs. EMR Spark Job Cost Comparison
December 16, 2023

AWS Glue Job vs. EMR Spark Job Cost Comparison

Deciding on the most cost-effective option for your Spark jobs can be tricky, as AWS Glue and EMR have distinct pricing models and capabilities. Let’s dive into a quick comparison to help you choose. Cost Comparison Considerations: Recommendation: It is essential to evaluate your specific use case, workload characteristics, and preferences to determine the most

Cloud Content

Tag: AWS

Apache Spark Partitions – AWS S3

Apache Spark Partitions – HDFS

AWS Glue Job vs. EMR Spark Job Cost Comparison