Cutelyst 0.13.0 released!

September 1, 2016 by Daniel Nicoletti

Cutelyst the Qt web framework just got a new release, 0.13.0.

A new release was needed now that we have this nice new logo.

Special thanks to Alessandro Longo (Alex L.) for crafting this cute logo, and a cool favicon for Cutelyst web site.

But this release ain't only about the logo, it's full of cool things:

When I started Cutelyst a simple developer Engine (read HTTP engine) was created, it was very slow and mostly an ugly hackery but helped work on the APIs that matter, I then took a look at uWSGI due some friend saying it was awesome and it was great to be able to deal with many protocols without the hassled of writing parsers for them.

Fast forwarding to 0.12.0 release and I started to feel that I was reaching a limit on Cutelyst optimizations and uWSGI was holding us back, and it wasn't only about performance, memory usage (scalability) was too high for something that should be rather small, it's written in C after all.

It also has a fixed number of requests it can take, if you start it with 5 threads or process it's 5 blocking clients that can be processed at the same time, if you use the async option you then have a fixed number of clients per process, 5 process * 5 async clients = 25 clients at the same time, but this 5 async clients are always pre-allocated which means that each new process will also be bigger right from launch.

Think now about websockets, how can one deal with 5000 simultaneous clients? 50 process with async = 100? Performance on async mode was also slower due complexity to deal with them.

So before getting into writing an alternative to uWSGI in Cutelyst I did a simple experiment, asked uWSGI to load a Cutelyst app and fork 1000 times and wrote a simple QCoreApplication that would do the same, uWSGI used > 1GB of RAM and took around 10s to start, while the Qt app used < 300MB of RAM and around 3s. So ~700MB of RAM is a lot of RAM and that was enough to get me started.

Cutelyst-wsgi, is born, and granted the command line arguments are very similar to uWSGI and I also followed the same separation between socket and protocol handling, of course in C++ things are more reusable, so our Protocol class has a HTTP subclass and in future will have FastCGI and uWSGI ones too.

Did I say uWSGI before 2.1 doesn't support keep-alive? And that 2.1 is not released nor someone knows when it will? Cutelyst-wsig supports keep-alive, http pipelining, is complete async and yes, performs a little better. If you put NGINX in front of uWSGI you can get keep alive support, but guess what? the uwsgi protocol closes the connection between the front server so it's quite hard to get very high speeds. Preliminary results of TechEmpower Benchmarks #13 showed Cutelyst hitting these limits as others frameworks were using keep-alive properly.

Thanks to this new Engine the Engine API got several improvements and is quite stable now. Besides it a few other important changes were made as well:

Change internals to take advantage of NRVO (named return value optimization)
Improved speed of Context::uriFor() making Cutelyst now require Qt 5.6 due a behavior change in QUrl
Improved speed and memory usage of Url query parser 1s faster in 1m iterations, using QByteArray::split() is very convenient but it allocates more memory and a QList for the results, using ::indexOf() and manually getting the parts is both faster and more memory efficient but yes, this is the optimization we do in Cutelyst::Core and that makes a difference, in application code the extra complexity might not worth it.
C++ for ranged loops, all our Q_FOREACH & friends where replaced with for ranged loops
Use of new reverse and equal_range iterators
Use QHash for storing headers, this was done after several benchmarks that showed QHash was faster for all common cases, namely if it keept the values() in order like QMap it would be used in other places as well
Replaced most QList with QVector, and internally std::vector
Multipart/form-data got faster, it doesn't seek() anymore but requires a not sequential QIODevice as each Upload object point to parts of the body device.
Add a few more unit tests.

Thanks to the above the core library size is also a bit smaller, ~640KB on x64.

I was planning to do a 1.0 after 0.13 but with this new engine I think it's better to have a 0.14 version, and make sure no more changes in Core will be needed for additional protocols.

Download here enjoy!