Jan Rychter: blog (electronics, programming, technology)

Cloud server CPU performance comparison

2019-12-12

Alternate titles: "The cloud makes no sense", "Intel Xeon processors are slow", "The Great vCPU Heist".

I recently decided to try to move some of my CPU-intensive workload from my desktop into the "cloud". After all, that's supposedly what those big, strong and fast cloud servers are for.

I found that choosing a cloud provider is not obvious at all, even if you only want to consider raw CPU speed. Operators do not post benchmarks, only vague claims about "fastest CPUs". So I decided to do my own benchmarking and compiled them into a very unscientific, and yet revealing comparison.

I was aiming for the fastest CPUs. Most of my needs are for interactive development and quick builds, in terms of wall clock performance. Which means CPU performance matters a lot. Luckily, that's what all cloud providers advertise, right?

I decided to write up my experiences because I wish I could have read about all this instead of doing the work myself. I hope this will be useful to other people.

Providers tested

In alphabetical order:

  • Amazon AWS (c5.xlarge, c5.2xlarge, c5d.2xlarge, z1d.xlarge)
  • Digital Ocean (c-8, c-16)
  • IBM Softlayer (C1.8x8)
  • Linode (dedicated 8GB 4vCPU, dedicated 16GB 8vCPU)
  • Microsoft Azure (F4s v2, F8s v2)
  • Vultr (404 4vCPU/16GB, 405 8vCPU/32GB)

Why those? Well, those are the ones I could quickly find and sign up for without too much hassle. Also, those are the ones that at least promise fast CPUs (for example, Google famously doesn't much care about individual CPU speed, so I didn't try their servers).

Setting up and differences between cloud providers

Signing up and trying to run the various virtual machines offered by cloud operators was very telling. In an ideal world, I would sign up on a web site, get an API key, put that into docker-machine and use docker-machine for everything else.

Sadly, this is only possible with a select few providers. I think every cloud operator should contribute their driver to docker-machine, and I don't understand why so few do. You can use Digital Ocean, AWS and Azure directly from within docker-machine. The other drivers are non-existent, flaky or limited, so one has to use vendor-specific tools. This is rather annoying, as one has to learn all the cute names that the particular vendor has invented. What do they call a computer, is it a server, plan, droplet, size, node, horse, beast, or a daemon from the underworld?

One thing I quickly discovered is that what the vendors advertise is often not available. As a new user, you get access to the basic VM types, and have to ask your vendor nicely so that they allow you to spend more money with them. This process can be quick and painless with smaller providers, but can also explode into a major time sink, like it does with Azure. There was a moment when I was spending more time dealing with various tiers of Microsoft support than testing. I find this to be rather silly and I don't understand why in the age of global cloud computing I still have to ask and specify which instances I'd like to use in which particular regions before Microsoft kindly allows me to.

Assuming you can actually get access to VM instances, there is a big difference in how complex the management is. With Digital Ocean, Vultr or Linode you will be up and running in no time, with simple web UIs that make sense. With AWS or Azure, you will be spending hours dealing with resources, resource groups, regions, availability sets, ACLs, network security groups, VPCs, storage accounts and other miscellanea. Some configurations will be inaccessible due to weird limitations and you will have no idea why. A huge waste of time.

The benchmark

I used the best benchmark I possibly could: my own use case. A build task that takes about two and a half minutes on my (slightly overclocked) i7-6700K machine at home. I started signing up at various cloud providers and running the task.

After several tries, I decided to split the benchmark into two: a sequential build and a parallel build. Technically, both builds are parallel and use multiple cores to a certain extent, but the one called "parallel" uses "make -j2" to really load up every core the machine has, so that all cores are busy nearly all of the time.

The build is dockerized for easy and consistent testing. It mounts a volume with the source code, where output artifacts go, too. It does require a fair bit of I/O to store the resulting files, but I wouldn't call it heavily I/O-intensive.

Methodology

A single test consisted of starting a cloud server, provisioning it with Docker (both were sometimes done automatically by docker-machine), copying my source code to the server, pulling all the necessary docker images, and performing a build.

The total wall clock time for the build was measured. The smaller the better. I always did one build to prime the caches and discarded the first result.

I tried to get six builds done, over the course of multiple days, to check if there is variance in the results. And yes, there is very significant variance, which was a surprise.

For some cloud providers (Linode and IBM) the build times were so abysmal that I decided to abandon the effort after just two builds. No point in torturing old rust.

I also threw in results for my own local build machine (a PC next to my desk), with no virtualization (but the build was still dockerized), and a dedicated EX62-NVMe server from Hetzner.

Results

I first created rankings for average build times, but then realized that with so much variance, these averages make little sense. What I really care about is the worst build time, because with all the overbooking and over-provisioning going on, this is what I really get. I might get better times if I'm lucky, but I'm paying for the worst case.

The error bars indicate how much better the best case can be. As you can see, in some cases the differences are very significant.

These are the worst-case results for "sequential" builds (see "The benchmark" above for a description of what "sequential" means):

These are the worst-case results for "parallel" builds:

And this is the best case you can possibly get using a "sequential" build, if you are lucky:

The ugly vCPU story

What cloud providers sell is not CPUs. They invented the term "vCPU": you get a "virtual" CPU with no performance guarantees, while everybody still pretends this somehow corresponds to a real CPU. Names of physical chips are thrown around.

Those "vCPUs" correspond to hyperthreads. This is great for cloud providers, because it lets them sell 2x the number of actual CPU cores. It isn't so great for us. If you try hyperthreading on your machine, you will see that the benefits are on the order of 5-20%. Hyperthreading does not magically double your CPU performance.

If you wondered why everybody was so worried about hyperthreading-related vulnerabilities, it wasn't because of performance loss. It was because if we pressured the cloud providers, they would have to disable hyperthreading, and thus cut the number of "vCPUs" they are selling by a factor of two.

In other words, we now have a whole culture of overselling and overbooking in the cloud, and everybody accepts it as a given. Yes, this makes me angry.

Now, you might get lucky, and your VMs might have neighbors who do not use their "vCPUs" much. In that case, your machines will run at full (single-core) performance and your "vCPUs" will not be much different from actual CPU cores. But that is not guaranteed, and I found that most of the time you will actually get poor performance.

Intel® Xeon® processors are slow

There. I've said it. These processors are slow. Dog slow, in fact. We've been told over the years that the Intel® Xeon® chips are the powerhouses of computing, the performance champions, and cloud providers will often tell you which powerful Xeon® chips they are using. The model numbers are completely meaningless at this point, which I think is intentional confusion, so that even a 6-year old chip branded with the Xeon® name appears to be powerful.

Fact is, Xeon® processors are indeed very good, but for cloud providers. They let them pack lots of slow cores onto a single CPU die, put that into a server, and then sell twice that number of cores as "vCPUs" to us.

Now, if your workload is batch-oriented and embarassingly parallel, and if you can make 100% use of all the cores, then Xeon® processors might actually make sense. For other, more realistic workloads, they are completely smoked by desktop chips with lower core counts.

Of course, if this were the case, then everybody would buy desktop chips. Which is why Intel intentionally cripples those, removing ECC RAM support, thus making them more unreliable. And desktop chips are inconvenient for cloud providers, because you can't get as many "vCPUs" from a single physical server. Still, there are providers where you can get servers with desktop chips — Hetzner, for example, and these servers come out at the very top of my performance charts, being a fraction of the cost.

In other words, what we actually buy when we order our "Powerful compute-oriented Xeon®-powered VM" is a hyperthread on a dog-slow processor.

Enterprise shmenterprise

But, I can hear you say, this is wrong! Intel® Xeon® processors are for ENTERPRISE workloads! The serious stuff, the real deal, the corporate enterprisey synergistic large-mass cloud computing workloads that Real Enterprises use!

Well, my build is mostly Java execution and Java AOT compilation. Dockerized. That enterprisey enough? There is also some npm/grunt (it's a modern enterprise), with a bunch of I/O. It can make use of multiple cores, although not perfectly. I'd say it's the ideal "enterprise" use case.

Seriously, Xeon® chips are just plain slow. The benchmarks show it, especially in the single-threaded CPU performance part. They still rank relatively well in the multi-threaded benchmarks, but remember, a) your code is not embarassingly parallel most of the time, b) you will be renting 4-8 "vCPUs" (hyperthreads), not 16 actual cores that you're looking at in the GeekBench results.

Takeaways

If you want to spin up a relatively fast developer-friendly cloud server for software development, I'd say that Vultr and Digital Ocean are the top picks.

Digital Ocean is by far the most user- and developer-friendly. If you have little time, just go with them. Things are simple, make sense, and are fun to use. As an example, Digital Ocean lets you configure firewall rules and apply them to servers based on server tags. Any server deployed with a certain tag will then use those firewall rules. Simple, makes sense, quick and easy to use. Now go and try doing the same in Azure, let us know in a week how things are going.

Vultr has some rough edges, but is a very promising provider. Almost as user-friendly as Digital Ocean (but no docker-machine driver!). If you want to use attached storage, you will run into problems (attaching storage reboots the machine, which their support tells me is expected behavior).

You can get slightly faster machines at AWS if you pay a lot more. The z1d instances are advertised as fast. My testing shows them to be only slightly faster, which probably isn't worth the price increase over a c5.2xlarge.

Buying more "vCPUs" often gets you better performance, even for the sequential build case. This is a bit surprising, until you realize that you are buying hyperthreads on an over-provisioned machine. If you buy more hyperthreads, you push out the neighbors and "reserve" more of the real CPU cores for yourself.

The best performance comes from… desktop-class Intel processors. My old i7-i6700K is near the top of the charts, so is Hetzner's EX62-NVMe server with an i9-9900K. The EX62-NVMe is 64€/month, so for development it might make sense to just rent one or two and not bother with on-demand cloud servers at all.

Apart from Hetzner's desktop CPU offerings, there seems to be no way to get a cloud server with fast single-core performance.

Another conclusion from these benchmarks is that I decided to buy an iMac for my development machine, not an iMac Pro. Sure, I would like to have the improved thermal handling of the iMac Pro, as well as better I/O, but I do not want the dog-slow Xeon® processor. Perhaps it makes sense if you load all cores with video encoding/processing, but for interactive development it most definitely does not, and a desktop-class Intel CPU is a much better fit.

Spark E-mail app: why I don't use it anymore

2018-07-20

I've been using the Spark iOS E-mail app for almost as long as it exists. Better in every way than Apple's built-in Mail and nicely designed, it was a joy to use.

A couple of months ago Readdle announced Spark 2, "an e-mail experience built for teams". I am not a team, but I wished Readle all the luck with the new business model. However, all this talk of team functionality made me slightly suspicious, as it seemed such features would be hard to implement without Readdle reading my E-mail server-side.

But I'm not a team, and I didn't sign up for anything new, and the app didn't ask me whether I will allow Readdle servers to access and read my E-mail, right?

As it turns out, if you enter your E-mail login and password into Spark 2 on an iOS device, that login and password will be sent to Readdle's servers, stored there, and used to access your E-mail.

So, what's the problem?

E-mail credentials are the keys to the kingdom. If you want to seriously disrupt somebody's life, get access to their E-mail. Most sites do not implement 2-factor authentication and will happily allow an E-mail password reset, so E-mail access gives any attacker instant access to most online accounts.

A confirmation E-mail is used when signing up for new services. Receipts are stored in E-mail archives. Lots of personal information is in E-mail. Nearly all E-mail is unencrypted and unsigned, and many people will trust an E-mail that they receive without question.

What's more, if my mobile device has my E-mail password, there are certain limits on what it can do. It probably won't train a machine-learning model on all 20GB of my archives, or extract all image attachments to get geo-positioning data from them. But there are no such limits server-side. If Readdle's servers have my password, they are free to download, read and process as much of my E-mail as they want to, whenever and however they want to.

I trusted Spark on iOS with my E-mail password, expecting that the app will keep it to itself on the device. iOS devices are reasonably secure, and there are limits to what a mobile app can do, so it was a compromise I was willing to make.

I never agreed for my password to be sent to online servers, stored there and be used to access my E-mail. That's an entirely different implied contract, and I'm not happy with it.

It's worth noting that guarding my password suddenly becomes much more difficult when it's stored on servers, and I think the risk of a breach is too high.

Clarity in communication

Since the app never asked me if I'm fine with my password being sent to and stored on their servers, I looked into the Privacy Policy. Here are the relevant parts:

Email address: As an email client, the core functionality of our Product is based on providing you with the ability to manage your email. For this reason, Spark services access your email account when you start using the App. […]

That sounds entirely reasonable. I don't know what "Spark services" are (they aren't defined in the policy), but I assume they must be parts of the E-mail app that run on iOS, right?

OAuth login or mail server credentials: Spark requires your credentials to log into your mail system in order to receive, search, compose and send email messages and other communication. Without such access, our Product won’t be able to provide you with the necessary communication experience. In order for you to take full advantage of additional App and Service features, such as “send later”, “sync between devices” and where allowed by Apple – “push notifications” we use Spark Services. […]

This also sounds reasonable and doesn't indicate that my credentials are being sent anywhere, right?

Except if you substitute "Spark services" with "online servers in the cloud". Oh, wait.

I do not know if it was Readdle's intention to hide the fact that "Spark services" are really "servers in the cloud". I do not suspect them of ill will, but I consider all this to be a serious lapse in judgment.

Here is what I would expect:

  • Do not force non-team users to share their credentials server-side. There is no reason to.
  • Ask clearly for permission to "SEND AND STORE YOUR PASSWORD ON OUR ONLINE SERVERS WHICH WILL ACCESS YOUR E-MAIL". It should be very clear to the user what is happening. The wording an presentation should make it difficult to accidentally agree. The users takes on additional risks by agreeing, so be clear about those risks.
  • Replace "Spark services" with "our online servers in the cloud" in the Privacy Policy.

As for me, I stopped using Spark immediately and deleted it from all my devices. I do not trust it anymore. I miss it (Readdle makes really good apps), but trust is important.

Leaving Squarespace

2015-09-28

After several years the time has come to move my blogs from Squarespace. It was a strange relationship: I run my own servers and I'm certainly capable of implementing my own blogging solution, but using Squarespace was just easier. I could never find the time to do something of my own. So, even though I wasn't entirely happy with how Squarespace worked, I kept paying to have my pages hosted there.

The proverbial straw that broke the camel's back came recently. As I was leaving on vacation, I got an E-mail from Squarespace about them being unable to charge my credit card for another month. Not surprising, as my credit card expired a couple of weeks ago, so I had a new expiration date and CVV code. What was surprising, though, was that Squarespace immediately proceeded to turn off my blogs and pages. I gave them my new expiration date and CVV code, but they said I have to re-register again. I asked customer support for one week of grace period, as I was on vacation with poor internet connectivity, but the answer was a definitive "No". My pages went 403.

Think about it. This is a company that takes pride in customer support. I have been a loyal customer for several years. And now, they are unable to give me one week of grace period? They begin with taking everything offline and responding with a 403?

This is not intended to be a Squarespace review — but if you're considering hosting your blog/pages with them, you should take these points into account:

  • There is no „relationship“: the moment your credit card can't be charged, Squarespace will take your pages offline. As in "HTTP 403 Forbidden" offline.
  • Customer support, while very responsive and polite, is only useful as an intelligent manual. They will help you with finding settings, but anything that would result in changes to the code is off-limits. As an example, I've been asking for years to make a change to the code that generates URLs for blog posts. They remove charactes with diacritics, instead of replacing them with ascii-lookalikes (so „łódź-2014“ gets transformed into „d-2014“ instead of „lodz-2014“). This is an eyesore and a disaster for SEO, and yet I could never get them to fix it. And I first reported it in March 2010.
  • If your site is multi-lingual, or even non-English, you will have a rough road ahead of you.
  • Squarespace will lose some of your data over time. Migration from Squarespace 5 to Squarespace 6, for example, lost high-resolution versions of my images. Only the thumbnails made it through. Some of the formatting was lost, too. It is up to you to write CSS to correct the more glaring problems.
  • Your data is held hostage. The export functionality is poor, broken and Squarespace has no interest in fixing it. While trying to write an importer for their XML export, I encountered a number of issues and reported them. After two months I finally got a definitive answer: the issues will not be fixed (more on this coming soon in a separate blog post). Only one issue got fixed: non-ASCII characters in exported comments are no longer lost (!).

This blog (and all my other pages) have been moved to my own server. I find it disappointing that I have to write my own blogging software (it's 2015 after all), but I'm getting used to it — I recently had to do the same thing to have private photo galleries for sharing with my family.

Goodbye, Squarespace.

System perspective: channels do not necessarily decouple

2015-05-20

Clojure's core.async channels provide a great metaphor for joining components in a system. But I noticed there is a misconception floating around: that channels fully decouple system components, so that components do not need to know anything about one another. As a result, it's easy to go overboard with core.async. I've seen people overuse channels, putting them everywhere, even when a cross-namespace function call would do just fine.

What channels do provide is asynchronous processing. The "degree" of asynchronicity can be controlled — we may choose to always block the caller until someone reads from a channel (thus providing a rendezvous point), or we may choose to introduce asynchronicity, letting the caller proceed without worrying about when the value gets processed.

Since you can put anything onto a channel, it's easy to forget that this "anything" is part of the protocol. Just as functions have an argument signature, channels have value signatures, or protocols: the common data format that all parties agree on.

It isn't true that channels fully decouple components, so that "they don't need to know anything about one another". You still need a wire protocol, just as with directly calling functions in another component. Channels do decouple things in time: you are not forced to a synchronous execution model and can control when things are being processed. But they are not a magic component decoupling wand, so don't use them when a simple synchronous function call will do.

Hard Drive Encryption, revisited

2015-03-03

Hard Drive Encryption, revisited

Several years ago I made a comment on Hacker News (full discussion) about full-disk encryption performed by the hard drives themselves. Basically, the idea is that you give your hard drive a password/key and hope that it transparently encrypts your data before it hits the platters (or flash memory for SSDs).

I wrote:

That kind of encryption is useless, because I can't audit it. How do I know my data really IS encrypted and the key isn't just stored on the drive itself?

Now, Hacker News has a number of well-known people, who have a following. Opposing their opinions is not popular. Notice how my to-the-point response to tptacek gets downvoted.

Anyway — I feel somewhat vindicated by the recent revelations of hard drive firmware hacking by the NSA. I was right: you can’t and shouldn’t trust your hard drives. If you care about encryption at all, your drives should see the data already encrypted.

I2C driver for Freescale Kinetis microcontrollers

2014-12-17

I wrote a tiny driver that allows you to access I2C (IIC, I²C, or I squared C) on Freescale Kinetis microcontrollers. It works asynchronously (interrupt-driven), supports repeated start (restart) and does not depend on any large software framework.

The code is on Github and the driver has its own page, too, with a README.

Even though it isn't fully-featured or well tested, I have a good reason for publishing the code. I wrote this and then had to put my Kinetis-related projects on hold. After several months I forgot having written this driver and started searching online for one… only to finally find it using Spotlight, on my hard drive. This is what happens if you work on too many projects.

To avoid this happening in the future, I now intend to publish most things I write as soon as they are reasonably complete, so that I can find them online when I need them.

Fablitics Launch

2014-11-04

We have just launched Fablitics — our friendly business intelligence and E-commerce analytics solution.

The driving idea behind Fablitics is that only meaningful numbers and graphs should be shown in business intelligence software. We tried to understand which numbers are important and can help decision making, and which bring little value and only confuse.

Early on we noticed that many analytics-type products show lots of data, but most of that data isn’t related to actual business.

Let’s take an example: page views and visits in an online store. They are related to the business, but very remotely. Estimating performance based on page-views could be compared to estimating the performance of a supermarket by the number of cars in the parking lot. Sure, that number is correlated with supermarket sales. Yes, probably the more cars there are in the lot, the better. Yes, the days when there is more traffic will likely bring in more revenue. But the relationship is too weak to allow meaningful conclusions.

So, when designing Fablitics, we decided to focus on fundamental business concepts: customers, products, sales. Instead of showing the number of page views, we count customers that visit the store. We determine which customers enter the store for the first time, and which are returning. We know how much each customer purchased, and we also know how the customer was referred to us, so we can put a monetary revenue value on advertisement campaigns.

All this is based on a rethinking of what analytics software should do. In our opinion, as long as the purpose is to improve the business, it should be strongly rooted in business concepts.

If you run an online store, you can sign up now for a free trial http://fablitics.com/ — no credit card required.

Lsquaredc: accessing I²C in Linux

2014-05-20

It might seem that writing I2C libraries is my favorite activity, but it really isn't. This library is not something I expected to write, but since I had to, I'm releasing it in the hope that it will save others time and frustration.

Lsquaredc is a tiny Linux library that allows you to access I2C (IIC, I²C, or I squared C) from userspace without causing excessive pain and suffering.

When I tried accessing I2C on my BeagleBone Black, I was sure it would be obvious. Well, I was wrong. It turned out that the usual way is to read/write data using read() and write() calls, which doesn't support restarts. And if one needs repeated start, there is an ioctl with an entirely different and difficult to use interface.

For more information, see the Lsquaredc page and its Github repository.

Designing a High Voltage Power Supply for Nixie Tube Projects

2014-05-04

PCB layout for the switch-mode HV PSU
PCB layout for the switch-mode HV PSU

I've posted a page describing the design of a HV PSU (High-Voltage Power Supply) that generates up to 220V from a 12V input. In addition to that, it also provides 2*Vout (so, up to 440V, for dekatrons), and two outputs for powering digital logic: 5V and 3.3V. The primary HV boost circuit reaches 88% efficiency when going from 12V to 185V at 55mA, with a 3% output ripple.

The version I'm posting online is not perfect, but works quite well in a number of my projects. I decided I'd rather publish it as it is now rather than keep it locked forever.

It is published as Open-Source Hardware, to be used however one likes. All source design files are provided. It's my way of paying back: I learned a lot from looking at other designs and by asking questions, so now it's time to give back.

Dollhouse built from laser-cut plywood

2014-05-01

I wanted a dollhouse for my daughter. But, as it often happens, I couldn't find anything I liked. I wanted it to be tall, with multiple floors connected with stairs. I wanted every room easily accessible, with few external walls. And I also wanted it built in a way that would allow four kids to comfortably play together.

I ended up designing my own dollhouse in a CAD program, then laser-cutting it in 5mm plywood. The structure is held together by mortise-tenon joints, with just a little glue so that it doesn't fall apart when picked up. It's amazing how precisely you can cut plywood with a laser.

This is my second design in laser-cut plywood (the first was an Art-Deco inspired Nixie clock) and I feel I've learned a lot. My main discoveries so far:

  • Plywood will warp. Not as much as wood, but expect large surfaces to eventually warp. You can either ignore it (it might not matter), or design additional support structures.
  • Laser cutter is incredibly precise, but your plywood often isn't. You can't rely on the "official" thickness. I found 5mm plywood to be anything from 4.75mm to 5.25mm (and that is supposedly pretty good). Measure your particular batch and design your structure for the measured thickness. It really helps to use a parametric modelling CAD program, so that you can change the thickness anytime.
  • It is easy to design a structure, but more difficult to design a structure that you can assemble. I discovered the hard way that some designs simply can't be assembled (parts block one another and there is no order of putting things together that will allow you to complete the structure).
  • Your mortise-tenon joints will fit even if you make both the hole (mortise) and the peg (tenon) the same size. Cutting laser thickness provides enough room. Still, unless you have perfect quality plywood, it is better to offset your hole edges and make holes slightly larger.

I'm quite happy with the results so far and will certainly use this method for other projects.