Issue while uploading compressed file to S3 using Alpakka

I tried to upload contents of a compressed tar file into S3 using Alpakka but only 1-2 entries were copied, rest of them were skipped.
When I increased the chunck size to a big number (double of file size in bytes), it worked but I suspect if the tar file size is too big then it will fail. Is it expected or I have missed something?
Below is my code:

lazy val fileUploadRoutes: Route = {
withoutRequestTimeout
withoutSizeLimit {
  pathPrefix("files") {
    post {
      path("uploads") {
        extractMaterializer { implicit materializer =>
          fileUpload("file") {
            case (metadata, byteSource) =>
              val uploadFuture = byteSource.async
                .via(Compression.gunzip(200000000))
                .via(Archive.tarReader()).async
                .runForeach(f => {

                  f._2.runWith(s3AlpakkaService.sink(FileInfo(UUID.randomUUID().toString, f._1.filePath, metadata.getContentType)))

                })
              onComplete(uploadFuture) {
                case Success(result) =>
                  log.info("Uploaded file to: " + result)
                  complete(StatusCodes.OK)
                case Failure(ex) =>
                  log.error(ex, "Error uploading file")
                  complete(StatusCodes.FailedDependency, ex.getMessage)
              }
          }
        }
      }
    }
  }
}

}

Hi @vvinod64,

thanks for that question.

What type is that s3AlpakkaService? Maybe its sink materializes to a result Future that you can check for errors?

What is the chunk size?

Johannes

Hi @jrudolph
Thanks for the response.

Here is the code of s3AlpakkaService

class S3AlpakkaService()(implicit as: ActorSystem, m: Materializer) {
def sink(fileInfo: FileInfo): Sink[ByteString, Future[MultipartUploadResult]] = {
  val fileName = fileInfo.fileName
  S3.multipartUpload(srcBucketName, fileName)
 }
}

I guess issue is not in S3AlpakkaService as I tried logging the file names before calling
s3AlpakkaService.sink and it printed only 1-2 entries. I tried setting chunck size to few KB to 10 MBs but that did not solve the issue. For uploading a .gz file of size around 100MB, I had to set chunck size to 200MB.

Interesting. What do you mean with chunk size? The value passed to Compression.gunzip? That shouldn’t actually make a difference but if it does it would point to a bug in the Archive.tarReader. Are you on the latest version of Alpakka? Can you post your Alpakka versions?

Yes, with chunk size I meant the value passed to Compression.gunzip. I’m using Alpakka version 2.0.1

It would be good if you could reproduce this without Akka HTTP by using a FileIO.fromPath source.

So, it could just be

FileIO.fromPath(...)
  .via(Compression.gunzip(200000000))
  .via(Archive.tarReader()).async
  .runForeach(...)

If that shows the same problem, can you open up an issue on https://github.com/akka/alpakka with the tar file in question and the reproducer code.

Thanks,
Johannes