background tasks in Python using task queues #22

divyanshutomar · 2018-08-01T17:52:11Z

Article on running background tasks in Python using task queues

sidhanthp

I really enjoyed the style of the post and think you did a great job with a starter application, but I don't think the reader is left with a strong understanding of Redis Queue's, how to use them, and what's happening internally.

I left you a couple comments, take a look at them and let me know what you think.

sidhanthp · 2018-08-01T19:43:44Z

posts/running-background-tasks-python.md

+---
+# Running background tasks in Python using task queues
+
+We often come across problems in our applications where a compute-intensive time-taking task needs to be performed on the server in response to some user activity or request. For a server-side application exposing a REST API, handling this problem is different from common CRUD endpoints where request-response lifecycle is usually short. In this case, the response for such a request may not be available, as it may be not viable to proceed with its execution immediately on the same process.


Rather than starting with 'We often come across problems...", it would be more powerful to give an example where this occurs.

Additionally, I think this example is a little convoluted for the 1st paragraph.

For a server-side application exposing a REST API, handling this problem is different from common CRUD endpoints where request-response lifecycle is usually short. In this case, the response for such a request may not be available, as it may be not viable to proceed with its execution immediately on the same process.
-->
When dealing with server-side applications, the response from a REST API may not be available immediately. It would be more efficient to run background tasks than waste the idle CPU cycles.

It makes sense, I will rework on the introduction

sidhanthp · 2018-08-01T20:07:09Z

posts/running-background-tasks-python.md

+
+We often come across problems in our applications where a compute-intensive time-taking task needs to be performed on the server in response to some user activity or request. For a server-side application exposing a REST API, handling this problem is different from common CRUD endpoints where request-response lifecycle is usually short. In this case, the response for such a request may not be available, as it may be not viable to proceed with its execution immediately on the same process.
+
+The execution of such tasks or jobs can be performed in the background by some another process spawned for the sole purpose. These processes are usually called _workers_. They run concurrently with the main process (web server in our case) handling client requests. To list of all the tasks which need to be executed, a job/task queue is maintained to store tasks along with its metadata created by incoming requests on the web server. The worker process then executes these tasks chronologically. This modular approach makes it easier for the web server to accommodate execution of such long-running tasks as it will not get blocked itself in doing so. This also means that the web server can respond to forthcoming client requests.


I assume you meant to write "by another process".

"To list all the tasks"

Thanks for pointing out the grammatical mistakes. 👍

sidhanthp · 2018-08-01T20:15:29Z

posts/running-background-tasks-python.md

+
+![architectire of web server and queue](./images/running-background-tasks-python/small-archi.png)
+
+Task queues are quite popular among microservices architecture. They enable each microservice to perform its definite task really well and take care of the complexities of inter-microservice communication.


I'm not sure what you mean when you say 'definite task'. If I were to guess, you're referring to the idea that each microservice should perform a small task.

Additionally, if you make a claim that a task queue takes 'care of the complexities of inter-microservice communication', I would explain it a little further (unless I'm missing something).

I think it's not about a small or big task. I feel 'dedicated task' will make more sense here?

I will add more details here explaining the role of messaging queues in microservices architecture.

sidhanthp · 2018-08-01T20:28:54Z

posts/running-background-tasks-python.md

+![health check](./images/running-background-tasks-python/health-check.png)
+
+
+### Getting to Know Starter Application


I would use this to make the tool names stand out, rather than this

sidhanthp · 2018-08-01T20:31:25Z

posts/running-background-tasks-python.md

+
+### Writing the parser
+
+Let's start with writing a simple parser that accepts a Goodreads book page URL. We will be using requests python library for making an HTTP request to get HTML content of the page. BeautifulSoup is a python library that lets us search, manipulate and create structured markup languages such as HTML, XML, etc. It will create a searchable tree from the fetched page's HTML. This will allow us to retrieve key information like the book title, author, rating, and description.


I would leave out "It will create a searchable tree from the fetched page's HTML."

sidhanthp · 2018-08-01T21:04:25Z

posts/running-background-tasks-python.md

+
+### Inspecting task queue
+
+The creators of Redis queue (RQ) library have developed another library for checking the state of Redis queue. It is called *rq-dashboard* and it can be integrated with our flask web application. It exposes a flask blueprint for integrating with an existing flask project. It's a browser-based application which shows queues status,  workers listening on those queues and jobs queued along with their meta information. Also, it provides triggers for flushing the queue and re-queuing failed jobs.


I think you can make this a little more concise and give a brief description of the benefits.

It looks like you've written about it a little below, I think it might be better to move that up.

sidhanthp · 2018-08-01T21:06:38Z

posts/running-background-tasks-python.md

+
+Now, we are all set to begin testing our application with some Goodreads URLs. Let's start by making a POST request to `/parseGoodReads` endpoint. Make sure to provide a valid list of URLs in an array as the request body.
+
+![post request](./images/running-background-tasks-python/post-req.png)


What are you using to make the post request?

sidhanthp · 2018-08-01T21:22:27Z

posts/running-background-tasks-python.md

+## Conclusion and Takeaways
+
+The above application demonstrates how queuing frameworks like Redis Queue can be leveraged for solving problems that need more time and computing resources. In our application, if the parsing task is executed on the same process where client requests are being processed and served, it can easily become a performance bottleneck with high traffic. The above approach not only helps in avoiding that but also bring modularity to the table. Following are some of the key takeaways you can follow to tackle similar problems:
+* Queueing frameworks allows more granular control over scaling of different processes. More worker processes can be spawned if there is an accumulation of a large number of tasks in the queue.


"frameworks allow more"

sidhanthp · 2018-08-01T21:23:15Z

posts/running-background-tasks-python.md

+
+The above application demonstrates how queuing frameworks like Redis Queue can be leveraged for solving problems that need more time and computing resources. In our application, if the parsing task is executed on the same process where client requests are being processed and served, it can easily become a performance bottleneck with high traffic. The above approach not only helps in avoiding that but also bring modularity to the table. Following are some of the key takeaways you can follow to tackle similar problems:
+* Queueing frameworks allows more granular control over scaling of different processes. More worker processes can be spawned if there is an accumulation of a large number of tasks in the queue.
+* Multiple queues can be used for handling different type of tasks.


Did you talk about this earlier in the post?

sidhanthp · 2018-08-01T21:24:02Z

posts/running-background-tasks-python.md

+* Multiple queues can be used for handling different type of tasks.
+* Every task can send some meta information about its status or progress so far to Redis. This information can be useful for getting an insight into a task that runs for a long duration.
+
+Thanks for following along and I hope this post would have been useful for you.


I don't think this line adds anything.

divyanshutomar · 2018-08-05T16:35:49Z

@sidhanthp I have made the required changes. Kindly review and let me know if anything else needs to be worked upon.

sidhanthp

Great job with the post, I've left you with a few comments that I think make the article a little easier to read and digest.

I found it difficult to follow the longer paragraphs. Some of the paragraphs are big blocks of texts, which I think causes your eyes to glaze over the text. If you break up the paragraphs, or make the takeaways obvious using italics, I think it would make the article a little easier to follow.

sidhanthp · 2018-08-06T15:31:27Z

posts/running-background-tasks-python.md

+---
+# Running background tasks in Python using task queues
+
+Let's say we have an e-commerce website where users can place orders for various products. Now there's a business requirement of finding out the kind of orders being placed and the most in-demand product in real-time. Also, for every order placed, the buyer should be conveyed of the confirmation of the order via an email or notification through a messaging service. In this case,if we process the order information on the same REST API service handling product requests, it may lead to significant problems. The API service may not able to respond in a short time as it can be blocked on the external services like the messaging service. This synchronous model worsens with a high number of orders being placed in a short span of time as the service may be preoccupied processing previous requests. Thus, these compute-intensive time-taking tasks like processing of orders on an e-commerce platform require an asynchronous approach.


I think this is too wordy.

Before writing, I would think through what you want the reader to take away from the paragraph. As the first paragraph, you introduce the reader to the e-commerce website, but they've clicked on the article to learn about 'background tasks in Python' and I think it would be more effective starting with that. You only touch about the importance of background tasks at the end of the paragraph.

Imagine I'm a reader who doesn't know what background tasks are / why I would want them, how would you introduce me with as little background?

sidhanthp · 2018-08-06T15:34:53Z

posts/running-background-tasks-python.md

+
+Task or message queues are quite popular among microservices architecture. They enable each microservice to perform its dedicated task and work as a medium for inter-microservice communication. These queues store messages or data incoming from _producer_ microservices which can be processed or consumed by _consumer_ microservices. In the e-commerce example above, the REST API handling orders is a producer microservice which pushes these orders to the queue. Whereas, a data analysis microservice determining the kind of orders being placed or the messaging service can be considered a consumer microservice.
+
+## Queueing frameworks to the rescue


I think this section would be more effective at the bottom. First, teach the reader why they need background tasks, about RQ, then point them to the other solutions.

sidhanthp · 2018-08-06T15:42:33Z

posts/running-background-tasks-python.md

+
+Let's say we have an e-commerce website where users can place orders for various products. Now there's a business requirement of finding out the kind of orders being placed and the most in-demand product in real-time. Also, for every order placed, the buyer should be conveyed of the confirmation of the order via an email or notification through a messaging service. In this case,if we process the order information on the same REST API service handling product requests, it may lead to significant problems. The API service may not able to respond in a short time as it can be blocked on the external services like the messaging service. This synchronous model worsens with a high number of orders being placed in a short span of time as the service may be preoccupied processing previous requests. Thus, these compute-intensive time-taking tasks like processing of orders on an e-commerce platform require an asynchronous approach.
+
+The execution of such tasks or jobs can be performed in the background by another process spawned for the sole purpose. These processes are usually called _workers_. They run concurrently with the main process (web server in our case) handling client requests. To list all the tasks which need to be executed, a job/task queue is maintained to store tasks along with its metadata created by incoming requests on the web server. The worker process then executes these tasks chronologically. This modular approach makes it easier for the web server to accommodate execution of such long-running tasks as it will not get blocked itself in doing so. This also means that the web server can respond to forthcoming client requests.


Try to make this more concise. Here's an example -->

Workers can be used to execute these tasks in the background. They run concurrently in the background using a queue, and the worker executes the tasks chronologically.

This modular approach prevents the web server from being blocked from responding to incoming client requests.

sidhanthp · 2018-08-06T15:44:08Z

posts/running-background-tasks-python.md

+
+## Real World Application
+
+We will be writing a flask-based web application which retrieves _Goodreads_ book information like title, author, rating and description. The web server exposes an endpoint that accepts book URLs. A function will crawl and parse this URL  for meta information of the book. As this function will take time to execute and may lead to blocking of the main thread, we will execute it asynchronously by pushing it to Redis queue (RQ). RQ allows us to enqueue multiple function calls to a queue which can be executed parallelly by a separate worker process. It requires Redis server as a message broker for performing this operation. Let's get into the code and learn how we can use Redis queue in our web applications.


Break the paragraph here. I think it makes it easier to follow.

"""
... pushing it to Redis queue (RQ).

RQ allows us to enqueue ...
"""

sidhanthp · 2018-08-06T16:07:42Z

posts/running-background-tasks-python.md

+  return dict(title=title.strip() if title else '',author=author.strip() if author else '',rating=float(rating.strip() if rating else 0),description=description)
+```
+
+We can now write a function called `parse_and_persist_book_info` that calls the above parsing function and persists the value to Redis so that it can be retrieved later. This function along with its arguments will be pushed to queue so that the worker process can execute it. Redis is a key-value store where the key should be unique else it may lead to overwriting of a previous value. Here `generate_redis_key_for_book` is a function that generates a unique key for a given book URL.


Introduce a paragraph break here -->

"""
... process can execute it.

Redis is a key-value ...
"""

sidhanthp · 2018-08-06T16:08:47Z

posts/running-background-tasks-python.md

+#........
+
+# This generates a unique Redis key against a book URL
+generate_redis_key_for_book = lambda bookURL: 'GOODREADS_BOOKS_INFO:' + bookURL


I would explain in a little more detail what you're doing here if this is a major piece of the post.

sidhanthp · 2018-08-06T16:12:49Z

posts/running-background-tasks-python.md

+  redisKey = generate_redis_key_for_book(bookUrl) # get Redis key for given book URL 
+  bookInfo  = parse_book_link_for_meta_data(bookUrl) # get book meta information from parsing function above
+  # Set the value to Redis. Here pickle serializes the dictionary
+  redisClient.set(redisKey,pickle.dumps(bookInfo))


I would explain redisClient.set(redisKey,pickle.dumps(bookInfo)) further. I could be missing it, but I don't see where you initialized redisClient.

sidhanthp · 2018-08-06T16:16:01Z

posts/running-background-tasks-python.md

+
+### The endpoint for accepting URLs
+
+Let's set up an endpoint that will accept a list of valid Goodreads book URLs. This is going to support POST method with URLs accepted as an array in _application/json_ body format. For validating the Goodreads book URLs we check for unique occurrences of URLs which starts with the string `https://www.goodreads.com/book/show/`. After the validation check, all valid URLs are pushed to Redis queue for parsing information. Here the method `enqueue_call` of Redis queue instance takes in a function that will be executed by worker process along with required arguments of the function.


I would take out the first couple sentences, and make the paragraph this -->
After the validation check, all valid URLs are pushed to Redis queue for parsing information. Here the method enqueue_call of Redis queue instance takes in a function that will be executed by worker process along with required arguments of the function.

I don't think the first couple sentences add to the explanation of Redis Queues.

divyanshutomar · 2018-08-06T18:34:41Z

@sidhanthp Updated

You moved the wrong paragraph to the bottom, I fixed it.

Updated structure to 'move up' information dedicated to Redis Queues and removed information that was specific to the starter application.

divyanshutomar · 2018-08-07T02:41:00Z

@sidhanthp Great job with the final touches. I guess we decided to have an example of a background job in the introductory para, that's why I kept the e-commerce example. Nevertheless, if you want to keep it concise the current state looks good to me.

added post content and images

ad8f416

divyanshutomar changed the title ~~added post content and images~~ background tasks in Python using task queues Aug 1, 2018

sidhanthp suggested changes Aug 1, 2018

View reviewed changes

sidhanthp self-assigned this Aug 1, 2018

reworked the article and fixed typos

60600d9

sidhanthp suggested changes Aug 6, 2018

View reviewed changes

updated the article text

4cd6e5f

sidhanthp added 2 commits August 6, 2018 14:42

Updated

2778a64

You moved the wrong paragraph to the bottom, I fixed it.

updated structure and made concise

3882593

Updated structure to 'move up' information dedicated to Redis Queues and removed information that was specific to the starter application.

updated grammar

7ab8514


		We often come across problems in our applications where a compute-intensive time-taking task needs to be performed on the server in response to some user activity or request. For a server-side application exposing a REST API, handling this problem is different from common CRUD endpoints where request-response lifecycle is usually short. In this case, the response for such a request may not be available, as it may be not viable to proceed with its execution immediately on the same process.

		The execution of such tasks or jobs can be performed in the background by some another process spawned for the sole purpose. These processes are usually called _workers_. They run concurrently with the main process (web server in our case) handling client requests. To list of all the tasks which need to be executed, a job/task queue is maintained to store tasks along with its metadata created by incoming requests on the web server. The worker process then executes these tasks chronologically. This modular approach makes it easier for the web server to accommodate execution of such long-running tasks as it will not get blocked itself in doing so. This also means that the web server can respond to forthcoming client requests.


		![architectire of web server and queue](./images/running-background-tasks-python/small-archi.png)

		Task queues are quite popular among microservices architecture. They enable each microservice to perform its definite task really well and take care of the complexities of inter-microservice communication.

		![health check](./images/running-background-tasks-python/health-check.png)


		### Getting to Know Starter Application


		### Writing the parser

		Let's start with writing a simple parser that accepts a Goodreads book page URL. We will be using requests python library for making an HTTP request to get HTML content of the page. BeautifulSoup is a python library that lets us search, manipulate and create structured markup languages such as HTML, XML, etc. It will create a searchable tree from the fetched page's HTML. This will allow us to retrieve key information like the book title, author, rating, and description.


		### Inspecting task queue

		The creators of Redis queue (RQ) library have developed another library for checking the state of Redis queue. It is called rq-dashboard and it can be integrated with our flask web application. It exposes a flask blueprint for integrating with an existing flask project. It's a browser-based application which shows queues status, workers listening on those queues and jobs queued along with their meta information. Also, it provides triggers for flushing the queue and re-queuing failed jobs.


		Now, we are all set to begin testing our application with some Goodreads URLs. Let's start by making a POST request to `/parseGoodReads` endpoint. Make sure to provide a valid list of URLs in an array as the request body.

		![post request](./images/running-background-tasks-python/post-req.png)


		Task or message queues are quite popular among microservices architecture. They enable each microservice to perform its dedicated task and work as a medium for inter-microservice communication. These queues store messages or data incoming from _producer_ microservices which can be processed or consumed by _consumer_ microservices. In the e-commerce example above, the REST API handling orders is a producer microservice which pushes these orders to the queue. Whereas, a data analysis microservice determining the kind of orders being placed or the messaging service can be considered a consumer microservice.

		## Queueing frameworks to the rescue


		Let's say we have an e-commerce website where users can place orders for various products. Now there's a business requirement of finding out the kind of orders being placed and the most in-demand product in real-time. Also, for every order placed, the buyer should be conveyed of the confirmation of the order via an email or notification through a messaging service. In this case,if we process the order information on the same REST API service handling product requests, it may lead to significant problems. The API service may not able to respond in a short time as it can be blocked on the external services like the messaging service. This synchronous model worsens with a high number of orders being placed in a short span of time as the service may be preoccupied processing previous requests. Thus, these compute-intensive time-taking tasks like processing of orders on an e-commerce platform require an asynchronous approach.

		The execution of such tasks or jobs can be performed in the background by another process spawned for the sole purpose. These processes are usually called _workers_. They run concurrently with the main process (web server in our case) handling client requests. To list all the tasks which need to be executed, a job/task queue is maintained to store tasks along with its metadata created by incoming requests on the web server. The worker process then executes these tasks chronologically. This modular approach makes it easier for the web server to accommodate execution of such long-running tasks as it will not get blocked itself in doing so. This also means that the web server can respond to forthcoming client requests.


		## Real World Application

		We will be writing a flask-based web application which retrieves _Goodreads_ book information like title, author, rating and description. The web server exposes an endpoint that accepts book URLs. A function will crawl and parse this URL for meta information of the book. As this function will take time to execute and may lead to blocking of the main thread, we will execute it asynchronously by pushing it to Redis queue (RQ). RQ allows us to enqueue multiple function calls to a queue which can be executed parallelly by a separate worker process. It requires Redis server as a message broker for performing this operation. Let's get into the code and learn how we can use Redis queue in our web applications.


		### The endpoint for accepting URLs

		Let's set up an endpoint that will accept a list of valid Goodreads book URLs. This is going to support POST method with URLs accepted as an array in _application/json_ body format. For validating the Goodreads book URLs we check for unique occurrences of URLs which starts with the string `https://www.goodreads.com/book/show/`. After the validation check, all valid URLs are pushed to Redis queue for parsing information. Here the method `enqueue_call` of Redis queue instance takes in a function that will be executed by worker process along with required arguments of the function.

background tasks in Python using task queues #22

Are you sure you want to change the base?

background tasks in Python using task queues #22

Uh oh!

Conversation

divyanshutomar commented Aug 1, 2018

Uh oh!

sidhanthp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyanshutomar commented Aug 5, 2018

Uh oh!

sidhanthp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyanshutomar commented Aug 6, 2018

Uh oh!

divyanshutomar commented Aug 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants