In my setup, Storm topologies are configured to consume Kafka messages.When the load is @100 messages per second, I see that Bolts are processing messages instantaneously and is acknowledging Kafka at almost similar rate.
Results for query "challenges while processing Kafka"
I'm trying to implement parallel processing with Spark.I want to create multiple receivers (not just threads) in spark to receive streaming data from kafka.
I need to consumer messages from different kafka topics,
Should i create different consumer instance per topic and then start a new processing thread as per the number of partition.or
I should subscribe all topics from a single consumer instance and the should start different processing threads
Thanks & regards,
This really depends on logic of your application - does it need to see all messages together in one place, or not.
spark streaming job in pyspark is having many active job and all are in processing stage for a while
I have two a pyspark streaming job that gets data from kafka and gcs and writes the data to mysql.Everything is running fine for three weeks in production but all of a sudden we are noticing some weird behaviour in increase in active job and all of them are stuck in processing.
I have an application where multiple users can send REST operations to modify the state of shared objects.Not all the operations are valid for example you can not Modify an object after it was Deleted.
Kafka messaging use at-least-once message delivery to ensure every message to be processed, and uses a message offset to indicates which message is to deliver next.When there are multiple consumers, if some deadly message cause a consumer crash during message processing, will this message be redelivered to other consumers and spread the death?
I am trying to provide only certain number of new rows to a DB(consumer) using Kafka connect.For which I've configured the source config file as
This is how the source.
Need suggestions on best approach to solve this problem -
I am developing a kafka stream processing application where the source log stream contains two types of messages - employeeInfo and departmentInfo.I do not have any control over this log stream, so I cannot change the schema or how it is written.
We are trying to build real time streaming using Kafka based on the below url.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/
As part of the evaluation we tried to push the Kafka stream output to Mongodb using the kafka connect.
I try to process data that is already placed in Kafka cluster with use of Kafka Streams application (no data is added to Kafka during processing), both have version 0.I know for sure that there is still remaining data in the cluster to process.
if there is surge in MS SQL records Spark processing takes more time than batch interval and spark ends up sending duplicate records to Kafka.As an alternate to this I am thinking of using Kafka Connect to read the messages from MS SQL and send records to Kafka topic and maintain the MS SQL CDC in Kafka.
Provided a use case:
A stream processing architecture;
Events go into Kafka then get processed by a job with a MongoDB sink.Database name: myWebsite
and the job sinks user records in the users collection.
And while I understand what it is that tools like Hadoop/Cassandra/Kafka etc do, no one seems to explain how the data gets from these large processing tools to rendering something on a client/webpage.explain how the data gets from these large processing tools to rendering something on a client/webpage.
df = df.select(
What is the best way to post large data for processing within enterprise applications?Can we look at JMS technologies / Kafka Cluster to receive and distribute the data?
getLogger(MyDomain.info("I am continously streaming data from Kafka "
+ "and forwarding it to controller for further processing.
The kafka stream processing is implemented in our system for transaction processing.The solution is implemented as below,
Kafka producer publishes event to kafka topic and the stream processor process the input event and perform the aggregation operation.
we are using kafka for stream processing in our system.The structure of the input data\message is very complex.
I tested kafka cLi and that works
but for my requirement (kafka => spark streaming) i can see most developers writing kafkaConsumer.scala (which can use a spark conf to update / perform the processing logic) and building the jar
how to execute this jar?
I want to dive into stream processing with Kafka and I need some help to get my head around some design principals which are currently not very clear to me.would you make one topic "price" keyed (and therefore partitioned) by the stock symbol?