Django is a fantastic framework, not the least because it includes everything needed to quickly create web apps. But the developers should not be the only ones benefiting from this. The app should also be fast for users.
The official documentation has a chapter about performance and optimization with good advice. In this article I want to build on that and show tools and methods I've used in the past to reduce page load time.
Measure & collect data
Performance benchmarking and profiling are essential to any optimization work. Blindly applying optimizations could add complexity to the code base and maybe even make things worse.
We need performance data to know which parts to focus on and to validate that any changes have the desired effect.
django-debug-toolbar
The django-debug-toolbar is easy to use and has a nice interface. It can show you how much time is spent on each SQL query, a quick button to get EXPLAIN
output for that query and a few other interesting details.
The template-profiler is an extra panel that adds profiling data about the template rendering process.
There are however a few drawbacks with the django-debug-toolbar. Because of how it integrates into the site it only makes sense to use in a development environment where DEBUG = True
.
It also comes with a huge performance penalty itself.
DTrace
DTrace doesn't have these limitations. It can be used on production services and give many details way beyond just the python part of the project. You can look deep into the database, python interpreter, webserver and operating system to get a complete picture where the time is spent.
Instead of the pretty browser UI this will happen in the CLI. DTrace scripts are written in a AWK-like syntax. There is also a collection of useful scripts in the dtracetools package. When using the Joyent pkgsrc repos this can be installed with
pkgin install dtracetools
One of the useful scripts in this package is the dtrace-mysql_query_monitor.d
which will show all MySQL queries:
Who Database Query QC Time(ms)
kumquat@localhost kumquat set autocommit=0 N 0
kumquat@localhost kumquat set autocommit=1 N 0
kumquat@localhost kumquat SELECT `django_session`.`session_key`, `django_session`.`session_data`, `django_session`.`expire_date` FROM `django_session` WHERE (`django_session`.`session_key` = 'w4ty3oznpqesvoieh64me1pvdfwjhr2k' AND `django_session`.`expire_date` > '2019-11-18 13:04: N 0
kumquat@localhost kumquat SELECT `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_u Y 0
kumquat@localhost kumquat SELECT `cron_cronjob`.`id`, `cron_cronjob`.`when`, `cron_cronjob`.`command` FROM `cron_cronjob` Y 0
...
To do something similar for PostgreSQL:
dtrace -n '
#pragma D option quiet
#pragma D option switchrate=10hz
#pragma D option strsize=2048
dtrace:::BEGIN {
printf("%-9s %-80s\n", "Time(ms)", "Query");
}
postgres*::query-start {
start = timestamp;
}
postgres*::query-done {
printf("%-9d %-80s\n\n", ((timestamp - start) / 1000 / 1000), copyinstr(arg0));
}
'
Which will look like this:
Time(ms) Query
7 SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."username" = 'wiedi'
...
To look into the python process itself there are a few very useful dtrace-py_*
scripts in the dtracetools package. For example dtrace-py_cputime.d
will show the number of calls to a function as well as the inclusive and exclusive CPU time:
Count,
FILE TYPE NAME COUNT
...
base.py func render 1431
sre_parse.py func get 1607
base.py func render_annotated 1621
functional.py func <genexpr> 1768
base.py func resolve 1888
sre_parse.py func __getitem__ 2011
sre_parse.py func __next 2104
related.py func <genexpr> 2324
__init__.py func <genexpr> 3974
regex_helper.py func next_char 9033
- total - 113741
Exclusive function on-CPU times (us),
FILE TYPE NAME TOTAL
...
base.py func _resolve_lookup 22070
base.py func resolve 22810
base.py func render 22997
related.py func foreign_related_fields 23543
functional.py func wrapper 25928
defaulttags.py func render 26218
base.py func __init__ 33303
sre_parse.py func _parse 42869
regex_helper.py func next_char 44579
regex_helper.py func normalize 71313
- total - 1809937
Inclusive function on-CPU times (us),
FILE TYPE NAME TOTAL
...
wsgi.py func __call__ 1790427
sync.py func handle_request 1804334
sync.py func handle 1806034
sync.py func accept 1806870
loader_tags.py func render 2452085
base.py func _render 2886611
base.py func render_annotated 4563513
base.py func render 6018042
deprecation.py func __call__ 12147994
exception.py func inner 13873367
In this case we see there is a bit of time spent on regex things, probably related to the URL routing.
cProfile
The standard python library comes with cProfile which will collect precise timings of function calls. Together with the django test client this can be used to automate performance testing.
Automating the performance data collection step as much as possible allows for quick iterations. For a recent project I created a dedicated manage.py command to profile the most important URLs. It looked similar to this:
from django.core.management.base import BaseCommand
from django.test import Client
from django.contrib.auth.models import User
import io
import pstats
import cProfile
def profile_url(url):
c = Client()
c.force_login(User.objects.first())
pr = cProfile.Profile()
pr.enable()
r = c.get(url, follow = True)
pr.disable()
assert r.status_code == 200
s = io.StringIO()
pstats.Stats(pr, stream = s).sort_stats('cumulative').print_stats(35)
class Command(BaseCommand):
help = 'run profiling functions'
def handle(self, *args, **options):
profile_url("/")
profile_url("/contacts/")
profile_url("/events/")
profile_url("/search/?q=info&type=all")
Instead of just printing the statistics they can also be saved to disk with pr.dump_stats(fn)
. This allows further processing with flameprof to create FlameGraphs.
%timeit
Another handy utility from the standard library is timeit. You'll often find examples like this:
$ python -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 5: 30.2 usec per loop
This is useful when experimenting with small statements.
To take this one step further I recommend you install IPython which will transform the Django manage.py shell
into a very powerful development environment.
Besides tab-completion and a thousand other features you'll have the %timeit
magic.
In [1]: from app.models import *
In [2]: e = Events.objects.first()
In [3]: %timeit e.some_model_method()
703 ns ± 7.05 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Optimize
Once you know which parts of your project are the slowest you can start working on improving those parts. Usually the most time is spent on database queries followed by template rendering.
Although every project might need different optimizations there are some common patterns.
Prefetch related objects
When you display a list of objects in your template and access some field of a related object this will trigger an additional database query. This can easily add up and result in huge numbers of queries just for one request.
When you know which related fields you will need you can tell Django to get these in a more efficient way. The two important methods are select_related()
and prefetch_related()
.
While select_related()
works by using a SQL JOIN prefetch_related()
creates one query for each lookup. These are easy to use, require nearly no modifications to existing code and can result in huge improvements.
Indexes
Another easy to apply performance tweak is to make sure you have the right database indexes.
Whenever you use a field to filter
and in some cases for order_by
you should consider if you need an index. Creating an index is as easy as adding db_index = True
to your model field, then creating and running the resulting migration. Be sure to validate the improvement with SQL EXPLAIN
.
Cache
Caching is a huge topic and there are many ways to improve django performance with caching. Depending on the environment and performance characteristics the place, duration and layer where a cache is used will be different.
The django cache framework is an easy way to leverage Memcached at various layers. The @cached_property
decorator is often helpful for fat model methods.
Precalculate
Some calculations just take too long for the usual time-budget of a HTTP request.
In these cases I've found it useful to precalculate the needed data in a background process.
This can be done with a task queue like Celery or with a little less complexity by just having a manage.py
command that is either long running as a service or called as a cronjob.
Beyond Django
Beyond these common cases there are many further ways to optimize web projects. Changing the database schema by denormalizing might improve some queries. Other techniques will depend heavily on the circumstances of the project.
There are usually also plenty of opportunities to optimize upstack as well as below. Measure performance data from the Browser and the time spent inside Django will only become a smaller part. With that new data you can start work on DOM rendering, CSS and JS, reducing request size for images or better network routing.
Also looking at lower levels can have huge benefits. Even small improvements at a lower level can result in performance gains simply because these parts are run so often.
A recent example was a ~120ms per request gain by changing how the python interpreter was compiled. The cpython version tested had the retpoline mitigation enabled. This was an isolated internal service where the threat model did not require this.
So just by compiling without -mindirect-branch=thunk-inline -mfunction-return=thunk-inline -mindirect-branch-register
resulted in a large performance boost.
If you have a web project in need of some performance optimization feel free to reach out!