- Published on
How to Upload Files to Amazon S3 using Spring Boot and S3 Transfer Manager
- Authors
- Name
- Luis Carbonel
Amazon S3 is a popular cloud storage service that allows you to store and retrieve data anytime, anywhere. Transferring files efficiently and quickly is crucial in the development of cloud applications. This tutorial will show you how to transfer files to Amazon S3 efficiently. We will be using S3 Transfer Manager, which in itself is a high-level file transfer tool for AWS SDK for Java 2.x that facilitates uploading and downloading files efficiently and quickly. Still, in addition to this, we will be using AWS CRT-based S3 client which is an implementation of the S3AsyncClient to further improve the performance and efficiency of our file transfer operations. Throughout this tutorial, we will guide you step-by-step, from initial setup to practical examples of file uploads using these technologies.
Prerequisites
To follow this tutorial, you will need the following:
- Java Development Kit (JDK) installed on your machine.
- An Integrated Development Environment (IDE) such as IntelliJ IDEA or Eclipse.
- Spring Boot and AWS SDK installed Gradle.
- An AWS account (we will be using localstack for local development to avoid any costs).
- Docker and Docker Compose installed on your machine to run localstack.
- Familiarity with AWS S3 and Java programming language.
To perform local testing without the need to create an AWS account, we will use LocalStack, a development platform that simulates AWS services in your local environment.
Setting up the Project
Before you begin, you will need to install and configure LocalStack in your development environment. LocalStack provides a local implementation of Amazon S3 and other AWS services, allowing you to perform testing without incurring costs or relying on actual AWS infrastructure.
Installing LocalStack
To install LocalStack, you will need to have Docker and Docker Compose installed on your machine. If you don't have them installed, you can follow the official installation guides for Docker and Docker Compose.
Once you have Docker and Docker Compose installed, you can install LocalStack by running the following command in your terminal:
docker pull localstack/localstack
Install LocalStack S3 service using Docker Compose
version: '3'
services:
localstack:
image: 'localstack/localstack:latest'
ports:
- '4566-4583:4566-4583'
environment:
- AWS_DEFAULT_REGION=us-east-1
- SERVICES=s3
- EDGE_PORT=4566
volumes:
- '${TEMPDIR:-/tmp/localstack}:/tmp/localstack'
- '/var/run/docker.sock:/var/run/docker.sock'
Running LocalStack
To run LocalStack, you can use the following command:
docker-compose up -d
Install AWS CLI
Install AWS CLI using brew. Also check aws install docs
brew install awscli
Configure AWS CLI
Configure AWS CLI to use localstack
aws configure
# If you need a profile use this command: aws configure --profile localstack
AWS Access Key ID [None]: foo
AWS Secret Access Key [None]: bar
Default region name [None]: us-east-1
Default output format [None]: json
Create S3 Bucket
Create a bucket using AWS CLI
aws --endpoint-url=http://localhost:4566 s3 mb s3://devlach-spring-boot-aws-s3-async
// Console output: make_bucket: devlach-spring-boot-aws-s3-async
Integrating Amazon S3 Transfer Manager and CRT-based Client
In this section, we are going to integrate the Amazon S3 Transfer Manager and the CRT-based client into a Spring Boot application. This will enable us to efficiently and quickly upload files to Amazon S3.
Creating a Spring Boot Project
First, we need to create a new Spring Boot project. For this, we can use Spring Initializr, an online tool that lets us generate a Spring Boot project with the dependencies we need. In our case, we will select the following options:
- Project type: Gradle
- Language: Java
- Spring Boot version: 3.x.x
- Dependencies: Spring Web, AWS SDK for Java 2.x, S3 Transfer Manager, AWS CRT
Once we have selected these options, we click on “Generate” to download the project.
Configuring the build.gradle file
The build.gradle file is where we define our project’s dependencies and how it will be built. In our case, we need the following dependencies:
- Spring Boot Starter: Provides the basic dependencies for building a Spring Boot application.
- AWS SDK for Java: Allows us to interact with AWS services from our Java application.
- S3 Transfer Manager: A high-level tool for uploading and downloading files from Amazon S3.
- AWS CRT: An S3 client based on AWS’s Common Runtime (CRT), which enhances the performance and efficiency of our file transfer operations.
- Lombok: A library that helps us reduce repetitive code in our Java application.
plugins {
java
id("org.springframework.boot") version "3.0.6"
id("io.spring.dependency-management") version "1.1.0"
}
group = "com.devlach"
version = "0.0.1-SNAPSHOT"
java.sourceCompatibility = JavaVersion.VERSION_17
configurations {
compileOnly {
extendsFrom(configurations.annotationProcessor.get())
}
}
repositories {
mavenCentral()
}
val awsSdkVersion = "2.20.56"
val awsSdkCrtVersion = "0.21.12"
dependencies {
implementation("org.springframework.boot:spring-boot-starter")
implementation(platform("software.amazon.awssdk:bom:$awsSdkVersion"))
implementation("software.amazon.awssdk:s3")
implementation("software.amazon.awssdk:s3-transfer-manager")
implementation("software.amazon.awssdk.crt:aws-crt:$awsSdkCrtVersion")
compileOnly("org.projectlombok:lombok")
annotationProcessor("org.projectlombok:lombok")
testImplementation("org.springframework.boot:spring-boot-starter-test")
}
tasks.withType<Test> {
useJUnitPlatform()
}
Configuring Application Properties
Next, we need to configure our application properties. We will be using the following properties:
Our application.properties file will look like this:
# AWS configuration properties
aws.access.key=foo
aws.secret.key=bar
aws.region=us-east-1
- aws.access.key: The access key for our AWS account.
- aws.secret.key: The secret key for our AWS account.
- aws.region: The region where our S3 bucket is located.
Creating the AWS Configuration Class
The AwsConfig class is where we configure our S3 client. In this class, we create an S3AsyncClient that uses the AWS credentials and region we have defined in the application.properties file. We also set our S3 client’s endpoint to LocalStack, which is an AWS simulator that allows us to develop and test our application locally without incurring costs.
package com.devlach.springbootawss3async.configuration;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
import software.amazon.awssdk.auth.credentials.AwsCredentials;
import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import java.net.URI;
@Configuration
public class AwsConfig {
@Value("${aws.access.key}")
private String accessKey;
@Value("${aws.secret.key}")
private String secretKey;
@Value("${aws.region}")
private String region;
private static final String LOCAL_STACK_ENDPOINT = "http://localhost:4566";
/*
@Bean
public S3AsyncClient s3AsyncClient() { // Not crtBuilder()
return S3AsyncClient.builder()
.credentialsProvider(staticCredentialsProvider())
.endpointOverride(URI.create(LOCAL_STACK_ENDPOINT))
.region(region())
.build();
}
*/
@Bean
public S3AsyncClient s3AsyncClient() {
return S3AsyncClient.crtBuilder()
.credentialsProvider(staticCredentialsProvider())
.endpointOverride(URI.create(LOCAL_STACK_ENDPOINT))
.region(region())
.forcePathStyle(true)
.build();
}
private StaticCredentialsProvider staticCredentialsProvider() {
AwsCredentials awsBasicCredentials = AwsBasicCredentials.create(accessKey, secretKey);
return StaticCredentialsProvider.create(awsBasicCredentials);
}
private Region region() {
return region != null ? Region.of(region) : Region.US_EAST_1;
}
}
The class AwsConfig
is annotated with @Configuration
, indicating it’s a source of bean definitions. The @Value annotation is used to inject property values from a properties file or the environment directly into fields (accessKey, secretKey, region).
The staticCredentialsProvider
method creates a StaticCredentialsProvider
object by using AWS basic credentials (accessKey and secretKey).
The region
() method checks if a region is specified; if not, it defaults to US_EAST_1.
The s3AsyncClient
() method is a bean definition for S3AsyncClient, which is used to interact with the S3 service asynchronously. There are two versions of this method.
The commented-out method uses the default builder (S3AsyncClient.builder()), while the active method uses S3AsyncClient.crtBuilder(). The crtBuilder() method creates an S3 async client using the AWS Common Runtime (CRT), a set of libraries that provide consistent interface and performance improvements for AWS SDKs. This version of the client can take advantage of the performance benefits of CRT, especially for high-throughput transfers.
Comparatively, the usage of crtBuilder
() can provide better performance and efficiency in file transfer operations due to its underlying implementation. The forcePathStyle(true) line included in this configuration is specific to the CRT client and forces the client to use path-style access for all requests, which can help avoid issues with certain S3-compatible services.
In summary, the crtBuilder() provides better performance than the regular builder(), especially for high-throughput transfers, making it the preferred choice for applications that require efficient and quick file transfers.
Creating the S3 Service Class
- S3Service interface
package com.devlach.springbootawss3async.services;
import java.util.List;
public interface S3Service {
void createBucket(String bucketName);
void deleteBucket(String bucketName);
int deleteObjects(String bucketName, List<String> keys);
int uploadDirectory(String bucketName, String directoryPath, String prefix);
List<String> listObjects(String bucketName, String prefix);
}
- S3ServiceImpl class
package com.devlach.springbootawss3async.services;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import org.springframework.util.StringUtils;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.model.*;
import software.amazon.awssdk.transfer.s3.S3TransferManager;
import software.amazon.awssdk.transfer.s3.model.CompletedDirectoryUpload;
import software.amazon.awssdk.transfer.s3.model.UploadDirectoryRequest;
import software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener;
import java.nio.file.Paths;
import java.util.List;
import java.util.function.BiConsumer;
@Service
@Slf4j
public class S3ServiceImpl implements S3Service {
private final S3AsyncClient s3AsyncClient;
private static final int DEFAULT_MAX_TICKS = 20;
public S3ServiceImpl(S3AsyncClient s3AsyncClient) {
this.s3AsyncClient = s3AsyncClient;
}
@Override
public void createBucket(String bucketName) {
try {
s3AsyncClient.createBucket(builder -> builder.bucket(bucketName))
.whenComplete(getCreateBucketResponseThrowableBiConsumer(bucketName)).join();
} catch (S3Exception e) {
log.error("Error creating bucket {}. {}", bucketName, e.awsErrorDetails().errorMessage());
throw e; // throw you custom exception here. It will be caught by the controller advice
}
}
@Override
public void deleteBucket(String bucketName) {
try {
s3AsyncClient.deleteBucket(builder -> builder.bucket(bucketName)).whenComplete(getDeleteBucketResponseThrowableBiConsumer(bucketName)).join();
} catch (S3Exception e) {
log.error("Error deleting bucket {}. {}", bucketName, e.awsErrorDetails().errorMessage());
throw e; // throw you custom exception here. It will be caught by the controller advice
}
}
@Override
public int deleteObjects(String bucketName, List<String> objectKeys) {
try {
ObjectIdentifier[] objectIdentifiers = objectKeys.stream().map(objectKey -> ObjectIdentifier.builder().key(objectKey).build()).toArray(ObjectIdentifier[]::new);
return s3AsyncClient.deleteObjects(DeleteObjectsRequest.builder()
.bucket(bucketName)
.delete(builder -> builder.objects(
objectIdentifiers
).build())
.build()).join().deleted().size();
} catch (S3Exception e) {
log.error("Error deleting objects from bucket {}. {}", bucketName, e.awsErrorDetails().errorMessage());
throw e; // throw you custom exception here. It will be caught by the controller advice
}
}
@Override
public int uploadDirectory(String bucketName, String directoryPath, String prefix) {
validateS3Key(prefix);
S3TransferManager s3TransferManager = S3TransferManager.builder()
.s3Client(s3AsyncClient)
.build();
UploadDirectoryRequest.Builder uploadDirectoryBuilder = UploadDirectoryRequest.builder()
.bucket(bucketName)
.s3Prefix(prefix)
.source(Paths.get(directoryPath))
.uploadFileRequestTransformer(uploadFileRequest -> uploadFileRequest
.addTransferListener(LoggingTransferListener.create(DEFAULT_MAX_TICKS))
.build());
CompletedDirectoryUpload completedDirectoryUpload = s3TransferManager.uploadDirectory(uploadDirectoryBuilder.build())
.completionFuture().join();
completedDirectoryUpload.failedTransfers()
.forEach(transfer -> log.warn("Error uploading file {}", transfer.exception().getMessage()));
return completedDirectoryUpload.failedTransfers().size();
}
@Override
public List<String> listObjects(String bucketName, String prefix) {
validateS3Key(prefix);
return s3AsyncClient
.listObjectsV2(builder -> builder.bucket(bucketName)
.prefix(prefix)).join().contents().stream().map(S3Object::key).toList();
}
private static BiConsumer<CreateBucketResponse, Throwable> getCreateBucketResponseThrowableBiConsumer(String bucketName) {
return (resp, err) -> {
if (resp != null) {
log.info("Bucket {} created successfully", bucketName);
} else {
log.error("Error creating bucket {}. {}", bucketName, err.getMessage());
}
};
}
private static BiConsumer<DeleteBucketResponse, Throwable> getDeleteBucketResponseThrowableBiConsumer(String bucketName) {
return (resp, err) -> {
if (resp != null) {
log.info("Bucket {} deleted successfully", bucketName);
} else {
log.error("Error deleting bucket {}. {}", bucketName, err.getMessage());
}
};
}
private static void validateS3Key(String key) {
if (key == null || key.isEmpty()) {
throw new IllegalArgumentException("Key must not be null or empty");
}
if (key.startsWith("/")) {
throw new IllegalArgumentException("Key must not start with /");
}
if (key.contains("//")) {
throw new IllegalArgumentException("Key must not contain //");
}
}
}
The S3ServiceImpl
class is where we implement the logic to interact with Amazon S3. In this class, we use the S3AsyncClient we have configured in the AwsConfig class to create and delete buckets, upload directories, and list and delete objects.
Creating the Spring Boot Application Main Class
Finally, the SpringBootAwsS3AsyncApplication
class is where we start our Spring Boot application. In this class, we use the S3 service we have created to upload a directory to Amazon S3, list the bucket’s objects, and then delete the objects and the bucket.
package com.devlach.springbootawss3async;
import com.devlach.springbootawss3async.services.S3Service;
import lombok.extern.slf4j.Slf4j;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.PathResource;
import org.springframework.core.io.ResourceLoader;
import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import java.net.URI;
import java.util.List;
@SpringBootApplication
@Slf4j
public class SpringBootAwsS3AsyncApplication {
public static final String BUCKET_NAME = "devlach-spring-boot-aws-s3-async";
private final S3Service s3Service;
public SpringBootAwsS3AsyncApplication(S3Service s3Service) {
this.s3Service = s3Service;
}
public static void main(String[] args) {
SpringApplication.run(SpringBootAwsS3AsyncApplication.class, args);
}
@Bean
public CommandLineRunner commandLineRunner() {
return args -> {
// Create bucket
s3Service.createBucket(BUCKET_NAME);
// Upload directory
ClassLoader classLoader = getClass().getClassLoader();
String uploadDirectoryPrefix = "uploadDirectory";
String uploadDirectoryPath = classLoader.getResource("static").getPath() + "/" + uploadDirectoryPrefix;
int filesNotUploaded = s3Service.uploadDirectory(BUCKET_NAME, uploadDirectoryPath, uploadDirectoryPrefix);
log.info("Files not uploaded: {}", filesNotUploaded);
assert filesNotUploaded == 0;
// Clean up
// Delete bucket's objects and bucket
List<String> keyObjectsToDelete = s3Service.listObjects(BUCKET_NAME, uploadDirectoryPrefix);
int objectsDeleted = s3Service.deleteObjects(BUCKET_NAME, keyObjectsToDelete);
log.info("Objects deleted: {}", objectsDeleted);
assert objectsDeleted == keyObjectsToDelete.size();
s3Service.deleteObjects(BUCKET_NAME, List.of("/" + uploadDirectoryPrefix));
s3Service.deleteBucket(BUCKET_NAME);
};
}
}
Running the Application and Understanding the Logs
After setting up our application and configuring the necessary AWS services, we can now run our Spring Boot application and observe how it interacts with Amazon S3. The logs provided will help us to understand this interaction and the powerful multithreading capabilities of Amazon’s S3 Transfer Manager.
To run our application, we can use the following command:
./gradlew bootRun
This will start our Spring Boot application and establish a connection with the configured AWS S3 service. The logs will show us the following:
2023-05-14T18:38:02.261-03:00 INFO 4178024 --- [ main] c.d.s.SpringBootAwsS3AsyncApplication : Started SpringBootAwsS3AsyncApplication in 1.704 seconds (process running for 2.083)
2023-05-14T18:38:02.739-03:00 INFO 4178024 --- [nc-response-0-0] c.d.s.services.S3ServiceImpl : Bucket devlach-spring-boot-aws-s3-async created successfully
2023-05-14T18:38:02.833-03:00 INFO 4178024 --- [fer-manager-2-0] s.a.a.t.s.p.LoggingTransferListener : Transfer initiated...
2023-05-14T18:38:02.837-03:00 INFO 4178024 --- [fer-manager-2-0] s.a.a.t.s.p.LoggingTransferListener : | | 0.0%
2023-05-14T18:38:02.873-03:00 INFO 4178024 --- [fer-manager-2-0] s.a.a.t.s.p.LoggingTransferListener : Transfer initiated...
2023-05-14T18:38:02.874-03:00 INFO 4178024 --- [fer-manager-2-0] s.a.a.t.s.p.LoggingTransferListener : | | 0.0%
2023-05-14T18:38:02.880-03:00 INFO 4178024 --- [fer-manager-2-0] s.a.a.t.s.p.LoggingTransferListener : Transfer initiated...
2023-05-14T18:38:02.881-03:00 INFO 4178024 --- [fer-manager-2-0] s.a.a.t.s.p.LoggingTransferListener : | | 0.0%
2023-05-14T18:38:03.001-03:00 INFO 4178024 --- [ Thread-8] s.a.a.t.s.p.LoggingTransferListener : |====================| 100.0%
2023-05-14T18:38:03.003-03:00 INFO 4178024 --- [ Thread-7] s.a.a.t.s.p.LoggingTransferListener : |====================| 100.0%
2023-05-14T18:38:03.026-03:00 INFO 4178024 --- [ Thread-8] s.a.a.t.s.p.LoggingTransferListener : |====================| 100.0%
2023-05-14T18:38:03.107-03:00 INFO 4178024 --- [nc-response-0-1] s.a.a.t.s.p.LoggingTransferListener : Transfer complete!
2023-05-14T18:38:03.109-03:00 INFO 4178024 --- [nc-response-0-2] s.a.a.t.s.p.LoggingTransferListener : Transfer complete!
2023-05-14T18:38:03.135-03:00 INFO 4178024 --- [nc-response-0-3] s.a.a.t.s.p.LoggingTransferListener : Transfer complete!
2023-05-14T18:38:03.136-03:00 INFO 4178024 --- [ main] c.d.s.SpringBootAwsS3AsyncApplication : Files not uploaded: 0
2023-05-14T18:38:03.299-03:00 INFO 4178024 --- [ main] c.d.s.SpringBootAwsS3AsyncApplication : Objects deleted: 3
2023-05-14T18:38:03.348-03:00 INFO 4178024 --- [nc-response-0-7] c.d.s.services.S3ServiceImpl : Bucket devlach-spring-boot-aws-s3-async deleted successfully
Process finished with exit code 0
The logs demonstrate the power and efficiency of the Amazon S3 Transfer Manager, particularly its multithreading capabilities. This is clear from the multiple simultaneous transfers initiated, and the fact that these transfers complete nearly concurrently.
When the application starts, it successfully creates a new S3 bucket, as indicated by the log entry at 18:38:02.739. Shortly after, at 18:38:02.833, the Transfer Manager initiates the first file transfer, and subsequent transfers are initiated in rapid succession.
The multithreading capability of the Transfer Manager becomes apparent as it starts multiple upload processes, represented by the entry Transfer initiated... and | | 0.0%. These entries are created by separate threads as indicated by the fer-manager-2-0 suffix, which means these processes are running concurrently.
In the log entries that follow, we see a rapid sequence of |====================| 100.0% entries, indicating that the file uploads have been completed. The interesting aspect here is that the completion of these uploads is almost simultaneous, showing that the Transfer Manager is effectively using multithreading to handle multiple uploads at once.
After the uploads, the log entry Files not uploaded: 0 shows that all files were successfully uploaded. Then, the application successfully deletes all objects and the bucket itself, as shown in the log entry Bucket devlach-spring-boot-aws-s3-async deleted successfully.
From these logs, we can infer that the Amazon S3 Transfer Manager is an effective tool for managing file uploads to S3, especially when dealing with multiple files. Its multithreading capabilities make it a robust and efficient solution for handling large-scale or concurrent file transfers.
S3CrtAsyncClient offers some advantages
Greater CPU Usage Efficiency: The S3CrtAsyncClient uses the AWS Common Runtime (CRT) library, which is written in C and optimized for high CPU usage efficiency. We have at our disposal in its builder the property maxConcurrency.
Lower latency: S3CrtAsyncClient is designed to reduce the latency of S3 requests, especially when handling large amounts of simultaneous requests. At this point, the builder brings by default a Crt optimized http client, but we can tune it through S3CrtHttpConfiguration.
Efficient Memory Handling: Unlike S3AsyncClient, S3CrtAsyncClient performs more efficient memory management during data transfer, which can result in lower memory usage.
Better performance for large data transfers: If you’re transferring large amounts of data, S3CrtAsyncClient can offer better performance compared to S3AsyncClient. Here we can play with the targetThroughputInGbps property, which by default is 10 gigabits per second.
Supports all S3 operations: S3CrtAsyncClient is a direct replacement client for S3AsyncClient and supports all S3 operations, making it an attractive replacement option if you’re looking to improve the performance of your application. This is a crucial point since we don’t have to worry about making data type changes when migrating from one client to another. This can be observed in the library when the S3CrtAsyncClient interface extends from S3AsyncClient.
It's important to note that while S3CrtAsyncClient may offer better performance in many cases, performance can vary depending on specific factors of your application and your environment, such as the number of simultaneous requests you're handling, the size of the data you're transferring, and the network infrastructure you're using. Therefore, it's always recommended to perform performance testing to determine which client provides the best performance for your specific use case.
Conclusion
In this article, we’ve explored how to use Amazon S3 and Spring Boot to create an asynchronous, multi-threaded file transfer application. We’ve seen how to set up and configure the necessary components, and how to run our application and interpret the logs it produces.
The power of Amazon’s S3 Transfer Manager really shines in this context. Its multi-threading capabilities allow for the efficient handling of multiple concurrent file transfers. This is especially useful for applications that need to handle large amounts of data, or that require high-speed, concurrent file transfers.
Moreover, integrating Amazon S3 with Spring Boot creates a powerful and scalable solution for cloud-based storage needs. Spring Boot’s simplicity and ease of use combined with Amazon S3’s robust and scalable storage solutions make for a compelling combination.
While this example is quite simple, it’s easy to see how these concepts could be expanded for more complex applications. With the foundation laid out in this guide, you’re well-equipped to start exploring these possibilities.
Remember, while we’ve used Amazon S3 in this tutorial, many of the concepts we’ve covered are applicable to other cloud storage services as well. The key is understanding how to integrate these services into your Spring Boot application, and how to effectively use their APIs to accomplish your specific needs.
We hope this guide has been helpful for you in understanding how to create an efficient, asynchronous file transfer application with Spring Boot and Amazon S3. Happy coding!
All the code snippets mentioned in the article can be found on GitHub.