Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -358,4 +358,11 @@ object LakeFSStorageClient {

branchesApi.resetBranch(repoName, branchName, resetCreation).execute()
}
def parsePhysicalAddress(address: String): (String, String) = {
// expected: "<scheme>://bucket/key..."
val uri = new java.net.URI(address)
val bucket = uri.getHost
val key = uri.getPath.stripPrefix("/")
(bucket, key)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -259,4 +259,59 @@ object S3StorageClient {
DeleteObjectRequest.builder().bucket(bucketName).key(objectKey).build()
)
}

/**
* Uploads a single part for an in-progress S3 multipart upload.
*
* This method wraps the AWS SDK v2 {@code UploadPart} API:
* it builds an {@link software.amazon.awssdk.services.s3.model.UploadPartRequest}
* and streams the part payload via a {@link software.amazon.awssdk.core.sync.RequestBody}.
*
* Payload handling:
* - If {@code contentLength} is provided, the payload is streamed directly from {@code inputStream}
* using {@code RequestBody.fromInputStream(inputStream, len)}.
* - If {@code contentLength} is {@code None}, the entire {@code inputStream} is read into memory
* ({@code readAllBytes}) and uploaded using {@code RequestBody.fromBytes(bytes)}.
* This is convenient but can be memory-expensive for large parts; prefer providing a known length.
*
* Notes:
* - {@code partNumber} must be in the valid S3 range (typically 1..10,000).
* - The caller is responsible for closing {@code inputStream}.
* - This method is synchronous and will block the calling thread until the upload completes.
*
* @param bucket S3 bucket name.
* @param key Object key (path) being uploaded.
* @param uploadId Multipart upload identifier returned by CreateMultipartUpload.
* @param partNumber 1-based part number for this upload.
* @param inputStream Stream containing the bytes for this part.
* @param contentLength Optional size (in bytes) of this part; provide it to avoid buffering in memory.
* @return The {@link software.amazon.awssdk.services.s3.model.UploadPartResponse},
* including the part ETag used for completing the multipart upload.
*/
def uploadPart(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment and correct formatting to this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

bucket: String,
key: String,
uploadId: String,
partNumber: Int,
inputStream: InputStream,
contentLength: Option[Long]
): UploadPartResponse = {
val body: RequestBody = contentLength match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need streaming here, it just read all bytes at once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the case when user does not specify Content Length, we read all the bytes. (However this case is forbidden in uploadPart endpoint)

The case when user specify Content Length; RequestBody.fromInputStream(inputStream, contentLength /* = ex 5 GiB */) the SDK does not read and buffer the whole 5 GiB in memory first. For retries (depends in support), the SDK tries rewinding by using InputStream.reset() with a read limit of 128 KiB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aicam do you agree in this?

case Some(len) => RequestBody.fromInputStream(inputStream, len)
case None =>
val bytes = inputStream.readAllBytes()
RequestBody.fromBytes(bytes)
}

val req = UploadPartRequest
.builder()
.bucket(bucket)
.key(key)
.uploadId(uploadId)
.partNumber(partNumber)
.build()

s3Client.uploadPart(req, body)
}

}
Loading