Issue while uploading compressed file to S3 using Alpakka

vvinod64 · September 18, 2020, 8:09am

I tried to upload contents of a compressed tar file into S3 using Alpakka but only 1-2 entries were copied, rest of them were skipped.
When I increased the chunck size to a big number (double of file size in bytes), it worked but I suspect if the tar file size is too big then it will fail. Is it expected or I have missed something?
Below is my code:

lazy val fileUploadRoutes: Route = {
withoutRequestTimeout
withoutSizeLimit {
  pathPrefix("files") {
    post {
      path("uploads") {
        extractMaterializer { implicit materializer =>
          fileUpload("file") {
            case (metadata, byteSource) =>
              val uploadFuture = byteSource.async
                .via(Compression.gunzip(200000000))
                .via(Archive.tarReader()).async
                .runForeach(f => {

                  f._2.runWith(s3AlpakkaService.sink(FileInfo(UUID.randomUUID().toString, f._1.filePath, metadata.getContentType)))

                })
              onComplete(uploadFuture) {
                case Success(result) =>
                  log.info("Uploaded file to: " + result)
                  complete(StatusCodes.OK)
                case Failure(ex) =>
                  log.error(ex, "Error uploading file")
                  complete(StatusCodes.FailedDependency, ex.getMessage)
              }
          }
        }
      }
    }
  }
}

}

jrudolph · September 22, 2020, 12:20pm

Hi @vvinod64,

thanks for that question.

What type is that s3AlpakkaService? Maybe its sink materializes to a result Future that you can check for errors?

What is the chunk size?

Johannes

vvinod64 · September 23, 2020, 8:42am

Hi @jrudolph
Thanks for the response.

Here is the code of s3AlpakkaService

class S3AlpakkaService()(implicit as: ActorSystem, m: Materializer) {
def sink(fileInfo: FileInfo): Sink[ByteString, Future[MultipartUploadResult]] = {
  val fileName = fileInfo.fileName
  S3.multipartUpload(srcBucketName, fileName)
 }
}

I guess issue is not in S3AlpakkaService as I tried logging the file names before calling
s3AlpakkaService.sink and it printed only 1-2 entries. I tried setting chunck size to few KB to 10 MBs but that did not solve the issue. For uploading a .gz file of size around 100MB, I had to set chunck size to 200MB.

jrudolph · September 23, 2020, 9:12am

Interesting. What do you mean with chunk size? The value passed to Compression.gunzip? That shouldn’t actually make a difference but if it does it would point to a bug in the Archive.tarReader. Are you on the latest version of Alpakka? Can you post your Alpakka versions?

vvinod64 · September 24, 2020, 2:51pm

Yes, with chunk size I meant the value passed to Compression.gunzip. I’m using Alpakka version 2.0.1

jrudolph · October 5, 2020, 9:36am

It would be good if you could reproduce this without Akka HTTP by using a FileIO.fromPath source.

So, it could just be

FileIO.fromPath(...)
  .via(Compression.gunzip(200000000))
  .via(Archive.tarReader()).async
  .runForeach(...)

If that shows the same problem, can you open up an issue on https://github.com/akka/alpakka with the tar file in question and the reproducer code.

Thanks,
Johannes

Topic		Replies	Views
AWS S3 multipart upload with Akka Http does not upload the full file Akka Libraries akka-http , scala , streams	2	1489	June 26, 2020
Alpakka S3 Connectors For Parquet File format Akka Libraries akka-cluster , alpakka , streams	2	33	November 12, 2024
Alpakka and S3 truncating downloaded files Akka Streams & Alpakka	1	1211	March 18, 2020
Alpakka S3 doesn't read S3's response in case of an Exception in the upload stream Akka Libraries alpakka	0	605	February 1, 2022
Downloading zip file of a csv, from S3 bucket using Akka and Alpakka streams Akka Streams & Alpakka	0	497	November 4, 2021

Issue while uploading compressed file to S3 using Alpakka

Related topics