Akka Stream kafka Configuration to investigate hanging consumers

Hi,

i am wondering if someone can give me glimpse on how to log kafka stream kafka properly and efficiently for investigating my use case:

The problem that I am trying to solve is exposed on github Kafka consumer hangs in consumer group issue ? · Issue #899 · akka/alpakka-kafka · GitHub as a bug, but it might be that i am doing something wrong, hence trying to investigate.

In short, whenever I consume from a set of topics with multiple consumers pertaining to the same consumer group (all consumer of the group subscribe to the same set of topics), only half of the consumer do actually perform the consumption, the other simply hang.

The detail of the the log is in the ticket.

So I would like to understand why those consumer are not getting any data. What is the exact problem, why they keep logging: FETCH_SESSION_ID_NOT_FOUND

I would like to focus my logging on enough detail to understand that, as apposed to too much detail that makes it difficult to spot where the problem is coming from.

Weirdly at this point tuning the Debug on akka does not work make any difference. It is only when i turn it on on apache, that i get debug result but then there are too many of them.

Here is the scheme of my logging configuration:

In my application.conf I have

akka {
  loggers = ["akka.event.slf4j.Slf4jLogger"]
  loglevel = ${?AKKA_LOG_LEVEL}
  loglevel = "INFO"
  logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
}

In my logBack.xml I have

<configuration>

    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>

    <logger name="org.apache" level="${APACHE_LOG_LEVEL:-INFO}"/>
    <logger name="com.elsevier.entellect" level="${APP_LOG_LEVEL:-INFO}"/>
    <logger name="akka" level="${AKKA_LOG_LEVEL:-INFO}"/>

    <root level="${ROOT_LOG_LEVEL:-INFO}">
        <appender-ref ref="STDOUT" />
    </root>

</configuration>

I deploy via Kubernetes and inject the environment variables as necessary.

Setting AKKA_LOG_LEVEL to DEBUG, literally make no difference at all.

However setting APACHE_LOG_LEVEL to OFF, INFO or DEBUG makes a difference. However at INFO i have basic things about KAFKA as in the post on github. If I put DEBUG then I get too much things.

Hence I wonder if someone can help me here. More specifically, help figure out what logger at which level do i need to set to at least capture what is happening with the consumer that hang ? Are they making request and not getting anything, is there a rebalancing issue ?

Note the configuration of my consumers:

val consumerSettings = ConsumerSettings(system, new StringDeserializer, new StringDeserializer)
      .withBootstrapServers(conf.kafkaBroker.bootstrapServers)
      .withGroupId(conf.kafkaConsumer.groupId)
      .withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, conf.kafkaConsumer.offsetReset)
      .withProperty(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "1800000")
      .withProperty(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "300000")
      .withProperty(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "60000")
      .withProperty(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "1000000")
      .withProperty(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, "10000")

This setting is specific to our workload on those consumer, they have to do very long operation.

Many thanks for any help