Asynchronous Python at Kumparan
0 CONTENTS
*post-contents*
- 1................................................................................
- 2................................................................................
- 3................................................................................
- 4................................................................................
- 5................................................................................
- 6................................................................................
- 7................................................................................
- 8................................................................................
- 9................................................................................
1 INTRO
This is a transcript of my lightning talk at . The alternative, clickbait-y title: “How Kumparan Handle More Than 10 Million Tracking Events on Daily Basis?”. PyCon 2018
2 OVERVIEW
is an Indonesian-based news platform. Several
systems that running on our data platform are build on top of Python
Kumparanasyncio
. So, in this 5 minutes talk, we would like to share our experiences.
3 ASYNC IN PYTHON
Let’s review some basic concept about how to do asynchronous programming in
python. There are a bunch of libraries that you can use in order to do
asynchronous programming in python. In python 3.5 or later, you can use
asyncio
package. asyncio
is available in the python standard library.
import asyncio
In order to do asynchronous programming using python asyncio
, you need to
understand these 3 basic concepts: Coroutine, Task and Event loop.
The first one is Coroutine. Coroutine is a function that have many entry points for suspending and resuming the execution. The second one is Task. Task is class that we can use to schedule coroutines to run concurrently. The last one is Event loop. Event loop is responsible for scheduling (suspend or resume) one or more Coroutine(s) simultaneously.
4 EVENT LOOP
Let us start from event loop. The event loop is the core of every asyncio
application. You can create new event loop by calling this function:
# Create new event loop
loop = asyncio.new_event_loop()
or you can get the existing event loop by using this function:
# Get the current event loop
loop = asyncio.get_event_loop()
5 COROUTINE
Coroutine is just a function. You can define new coroutine using
async def
keyword followed by the coroutine name, the inputs and the output.
async def process(input: str) -> str:
...
You may notice that we use type annotation. At Kumparan, we also use as our static type checker. mypy
Inside a Coroutine, you should not call a function that block the main thread because it will disrupt the event loop.
import timeasync def process(input: str) -> str:
# Don't do this
time.sleep(5)
You can call other coroutine using the await
keyword.
async def process(input: str) -> str:
await asyncio.sleep(5)
return “Processed: {}”.format(input)
A line with await
keyword is an example of entrypoint where the execution of
the coroutine can be paused or resumed.
You can not call coroutine directly, you need to create or get the event loop first then run the coroutine inside the event loop.
loop.run_until_complete(process("test"))
loop.close()
6 TASK
The last basic concept is a Task. Task is a python class that can be used to schedule one or more coroutines to run concurrently.
from typing import Nonedef callback(future: asyncio.Future) -> None:
processed = future.result()
print(“{} is here”.format(processed))async def main() -> None:
task1 = asyncio.create_task(process(“input 1”))
task2 = asyncio.create_task(process(“input 2”))
task1.add_done_callback(callback)
task2.add_done_callback(callback)
await asyncio.sleep(5)loop.run_until_complete(main())
The nice thing about task is you can attach a callback function. This callback will be executed when the coroutine is finished. This come in handy when we need to handle the error.
7 ASYNCIO AT KUMPARAN
So how we use python asyncio
at kumparan? asyncio
is a perfect fit for
high-performance web-servers, database connection libraries, distributed task
queues, etc.
These are a list of services that we built on top of python asyncio:
- Tracker API
- Tracker Transporter
- A/B Test Splitter API
- Trending Stories API
- Personalized Feed API
- And more …
Most of them are API server.
8 USE CASE
I will show you an example of how we build our service on top of python
asyncio
. The use case is for tracking events receiver.
Our goal is to be able to collect tracking events as many as possible, so
we implement Fire-and-Forget approach on top of asyncio
in order to
reduce the response time.
The implementation is very simple and easy to reason.
# NOTE: Simplified version
async def track(request: Request) -> Response:
# ...
try:
# Fire
task = tracker.collect(event)
# ...
task.add_done_callback(callback)
# and Forget (Return the response immediately)
return api.success()
except ValueError as e:
return api.error(status_code=400, error=str(e))
except Exception as e:
error = "/v1/track failed"
return api.error(error=error, exception=e)
First, we define a coroutine called track
. This coroutine will be executed on
every http request on the tracker endpoint. Inside this coroutine, we schedule
another coroutine and wrap it using asyncio.Task
to run concurrently. Then
we attach some callback function to handle an error and that’s it. Easy right?
With this approach, we are able to achieve response time in less than 50ms.
As you can see, mostly are in less than 5ms and we are able to collect more than 10,000,000 tracking events on daily basis.
As you may notice, in October 29 we able to collect more than 72 Million of tracking events. This happen because there is a breaking news about Lion Air Crash and with our Fire-and-Forget approach, we can handle this traffic spike without a problem.
9 LESSONS LEARNED
- The async library from the community is not mature yet, sometimes you need to implement it by yourself.
- It’s easy to make mistake by calling a blocking function. There is no tool that helps developers to spot this mistake.
And thanks everyone!
TAGS
*post-tags*
- [1]
LINKS
*post-links*
- [1]
- [2]
- [3]