1.1 Billion Taxi Rides with BrytlytDB & 2 GPU-Powered p2.16xlarge EC2 Instances

BrytlytDB is an in-GPU-memory database built on top of PostgreSQL. It's operated using many of PostgreSQL's command line utilities, it's wire protocol compatible so third-party PostgreSQL clients can connect to BrytlytDB and queries are even parsed, planned and optimised by PostgreSQL's regular codebase before the execution plan is passed off to GPU-optimised portions of code BrytlytDB offer.

Clustering works right out of the box, GPU-powered JOINs are supported, Stored Procedures are fully functional, Deep- and Machine Learning workloads via Torch are supported and BI visualisation software in the form of SpotLyt is included with BrytlytDB as well.

The feature list is too long for one blog to cover so in this post I'll just be taking a look at how performant BrytlytDB's OLAP functionality is. In this benchmark I'll see how well 32 Tesla K80 GPUs spread across two EC2 instances perform when querying 1.1 billion taxi trips. I'll be using the same dataset I've used to benchmark Amazon Athena, BigQuery, ClickHouse, Elasticsearch, EMR, kdb+/q, MapD, PostgreSQL, Redshift and Vertica. I have a single-page summary of all these benchmarks for comparison.

Note: Since writing this blog post I've benchmarked the newer, version 2.0 of BrytlytDB. I suggest you read that benchmark as it's more representative of the current version of the software.

A GPU-Powered AWS EC2 Cluster

For this benchmark I'll be using two p2.16xlarge EC2 instances running Ubuntu 16.04.2 LTS in Amazon Web Services' eu-west-1a region in Ireland. Each machine has 8 Nvidia K80 cards which have 2 GPUs each, 64 virtual CPUs and 732 GB of memory. There's also 20 Gbit/s of networking capacity available to each instance.

Below are the specifications of the compute capabilities available each one of the EC2 instances.

$ lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               1209.207
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.10
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-15,32-47
NUMA node1 CPU(s):     16-31,48-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt ida

This is the layout of the GPUs available on one of the two EC2 instances.

$ nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:0F.0     Off |                    0 |
| N/A   61C    P0    60W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 0000:00:10.0     Off |                    0 |
| N/A   47C    P0    69W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 0000:00:11.0     Off |                    0 |
| N/A   66C    P0    59W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 0000:00:12.0     Off |                    0 |
| N/A   54C    P0    72W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 0000:00:13.0     Off |                    0 |
| N/A   61C    P0    59W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 0000:00:14.0     Off |                    0 |
| N/A   49C    P0    70W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 0000:00:15.0     Off |                    0 |
| N/A   66C    P0    58W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 0000:00:16.0     Off |                    0 |
| N/A   51C    P0    70W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   8  Tesla K80           Off  | 0000:00:17.0     Off |                    0 |
| N/A   62C    P0    64W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   9  Tesla K80           Off  | 0000:00:18.0     Off |                    0 |
| N/A   49C    P0    72W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  10  Tesla K80           Off  | 0000:00:19.0     Off |                    0 |
| N/A   63C    P0    59W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  11  Tesla K80           Off  | 0000:00:1A.0     Off |                    0 |
| N/A   52C    P0    70W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  12  Tesla K80           Off  | 0000:00:1B.0     Off |                    0 |
| N/A   65C    P0    60W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  13  Tesla K80           Off  | 0000:00:1C.0     Off |                    0 |
| N/A   52C    P0    73W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  14  Tesla K80           Off  | 0000:00:1D.0     Off |                    0 |
| N/A   64C    P0    60W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  15  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   52C    P0    71W / 149W |     76MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

The first machine has an IP address of 52.214.237.134 and has three roles: Global Transaction Manager, Coordinator and host to the first 16 data nodes.

The second machine has an IP address of 34.250.232.38 and has two roles: it is also a coordinator and hosts the second set of 16 data nodes.

TCP port 5432 is open between the two machines for communicating via PostgreSQL's wire protocol, TCP port 7777 is open for global transaction manager communication and ports 20,000 through to 20,031 are open for data node communication.

In addition to the default 20 GB EBS volumes on each EC2 instance there are six 500 GB General Purpose SSDs attached which offer a baseline of 100 IOPS each and can burst to 3,000 IOPS if need be. These drives are setup in a RAID 0 configuration on each instance.

Before setting up the RAID array I'll install a few dependencies. This was run on both EC2 instances.

$ sudo apt update
$ sudo apt install \
      mdadm \
      wget \
      xfsprogs

Below are the commands used to setup the RAID array on each instance.

$ sudo mdadm \
      --create \
      --verbose \
      --force /dev/md0 \
      --level=0 \
          /dev/xvdb \
      --raid-devices=6 \
          /dev/xvdc \
          /dev/xvdd \
          /dev/xvde \
          /dev/xvdf \
          /dev/xvdg
$ sudo mkfs.ext4 /dev/md0
$ sudo mkdir /raidArray
$ sudo mount /dev/md0 /raidArray
$ sudo chown ubuntu /raidArray

Here's what the RAID layout looked like after it was setup.

$ lsblk

NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
xvda    202:0    0   20G  0 disk
└─xvda1 202:1    0   20G  0 part  /
xvdb    202:16   0  500G  0 disk
└─md0     9:0    0    3T  0 raid0 /raidArray
xvdc    202:32   0  500G  0 disk
└─md0     9:0    0    3T  0 raid0 /raidArray
xvdd    202:48   0  500G  0 disk
└─md0     9:0    0    3T  0 raid0 /raidArray
xvde    202:64   0  500G  0 disk
└─md0     9:0    0    3T  0 raid0 /raidArray
xvdf    202:80   0  500G  0 disk
└─md0     9:0    0    3T  0 raid0 /raidArray
xvdg    202:96   0  500G  0 disk
└─md0     9:0    0    3T  0 raid0 /raidArray

This RAID 0 setup offers a partition with a capacity of 3.2 TB on each instance:

$ df -H

Filesystem      Size  Used Avail Use% Mounted on
udev            387G     0  387G   0% /dev
tmpfs            78G  9.2M   78G   1% /run
/dev/xvda1       21G  7.2G   14G  35% /
tmpfs           387G     0  387G   0% /dev/shm
tmpfs           5.3M     0  5.3M   0% /run/lock
tmpfs           387G     0  387G   0% /sys/fs/cgroup
tmpfs            78G     0   78G   0% /run/user/1000
/dev/md0        3.2T   57G  3.0T   2% /raidArray

BrytlytDB Up & Running

I've run the following to download BrytlytDB's install script. BrytlytDB is commercial software so I cannot divulge the URL I pulled this from at this time. The following was run on both EC2 instances.

$ cd ~
$ wget https://<server>/install.sh

I've then edited the install script on the first instance with the following instance-specific values:

$ vi install.sh

GPUS=16
X2DATA=/raidArray/data
COORDHOST=52.214.237.134
COORDSTART=0
GTMHOST=52.214.237.134

And the install script on the second instance was edited with the following instance-specific values:

$ vi install.sh

GPUS=16
X2DATA=/raidArray/data
COORDHOST=34.250.232.38
COORDSTART=1
GTMHOST=52.214.237.134
GTMCOUNT=0
DATASTART=16

With those changes in place I then ran the install script on both EC2 instances.

$ chmod +x install.sh
$ ./install.sh

The install script conducts the following:

Download the BrytlytDB binaries.
Install Nvidia's GPU drivers.
Generate database initialisation bash scripts.
Configure PostgreSQL with the correct IP addresses and port numbers used throughout the cluster.
Deploy BrytlytDB-specific cluster configuration to each Data Node and individual GpuManagers.
Generate start up scripts for the Global Transaction Manager, GpuManagers, Datanodes and the Coordinators.

Once that's done I can form the cluster with the following registration script. The following was run on the first EC2 instance.

$ vi register.sh

#!/bin/bash
for i in {0..15}; do
    /usr/local/x2/bin/psql \
        -p 5432 \
        -h localhost \
        postgres \
        -c "CREATE NODE d$i WITH (HOST='52.214.237.134',
                                  TYPE='datanode',
                                  PORT=$((20000+$i)));"
done

for i in {16..31}; do
    /usr/local/x2/bin/psql \
        -p 5432 \
        -h localhost \
        postgres \
        -c "CREATE NODE d$i WITH (HOST='34.250.232.38',
                                  TYPE='datanode',
                                  PORT=$((20000+$i)));"
done

/usr/local/x2/bin/psql \
    -p 5432 \
    -h localhost \
    postgres \
    -c "CREATE NODE c1 WITH (HOST='34.250.232.38',
                             TYPE='coordinator',
                             PORT=5432);"

$ ./register.sh

The second registration script run on the other EC2 instance is identical with the first with the exception of the coordinator line pointing to 52.214.237.134 instead.

Loading 1.1 Billion Trips Into BrytlytDB

For this benchmark I've downloaded and decompressed one half the 500 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post onto each EC2 instance. The data sits across 56 files across both machines but for the quickest load time I want to create 32 reasonably equally-sized CSV files and load 16 files on each EC2 instance simultaneously. The 32 files will pair up with the 32 GPUs available across the cluster and should allow for the best utilisation of the GPU resources when running queries.

Here is half of the original 500 GB data set on the first EC2 instance:

$ ls -l /raidArray/s3

... 2024092233 ... trips_xaa.csv.gz
... 2023686578 ... trips_xab.csv.gz
... 2022717460 ... trips_xac.csv.gz
... 2023507930 ... trips_xad.csv.gz
... 2024224441 ... trips_xae.csv.gz
... 2030784022 ... trips_xaf.csv.gz
... 2036427328 ... trips_xag.csv.gz
... 2038915704 ... trips_xah.csv.gz
... 2039556774 ... trips_xai.csv.gz
... 2039272131 ... trips_xaj.csv.gz
... 2019988039 ... trips_xak.csv.gz
... 2001054379 ... trips_xal.csv.gz
... 1999737958 ... trips_xam.csv.gz
... 1997898854 ... trips_xan.csv.gz
... 1997471865 ... trips_xao.csv.gz
... 2000533767 ... trips_xap.csv.gz
... 1998974328 ... trips_xaq.csv.gz
... 2009717364 ... trips_xar.csv.gz
... 2016934402 ... trips_xas.csv.gz
... 2022353263 ... trips_xat.csv.gz
... 2013756232 ... trips_xau.csv.gz
... 2013475097 ... trips_xav.csv.gz
... 2020404819 ... trips_xaw.csv.gz
... 2015585302 ... trips_xax.csv.gz
... 2008026925 ... trips_xay.csv.gz
... 2007821692 ... trips_xaz.csv.gz
... 2005814365 ... trips_xba.csv.gz
... 2013779043 ... trips_xbb.csv.gz

And this is the other half on the second instance:

$ ls -l /raidArray/s3

... 2018006900 ... trips_xbc.csv.gz
... 2019260716 ... trips_xbd.csv.gz
... 1893270062 ... trips_xbe.csv.gz
... 1854614596 ... trips_xbf.csv.gz
... 1854948290 ... trips_xbg.csv.gz
... 1855481821 ... trips_xbh.csv.gz
... 2008073682 ... trips_xbi.csv.gz
... 2023698241 ... trips_xbj.csv.gz
... 2028094175 ... trips_xbk.csv.gz
... 2030037816 ... trips_xbl.csv.gz
... 2048044463 ... trips_xbm.csv.gz
... 2031794840 ... trips_xbn.csv.gz
... 2034562660 ... trips_xbo.csv.gz
... 2034332904 ... trips_xbp.csv.gz
... 2036182649 ... trips_xbq.csv.gz
... 2035637794 ... trips_xbr.csv.gz
... 2026865353 ... trips_xbs.csv.gz
... 2028888024 ... trips_xbt.csv.gz
... 2027725347 ... trips_xbu.csv.gz
... 2027090129 ... trips_xbv.csv.gz
... 2024075447 ... trips_xbw.csv.gz
... 2025027115 ... trips_xbx.csv.gz
... 2020889873 ... trips_xby.csv.gz
... 2025885378 ... trips_xbz.csv.gz
... 2023688771 ... trips_xca.csv.gz
... 2024722253 ... trips_xcb.csv.gz
... 2030741659 ... trips_xcc.csv.gz
... 1383149265 ... trips_xcd.csv.gz

On each instance I'll decompress the GZIP files. Below took about four minutes to complete on each instance.

$ cd /raidArray/s3/
$ for filename in trips_x*.csv.gz; do
        gzip -d $filename &
    done

Then I'll concatenate the ~560 million lines of CSV data on each instance and break it up into files of 35 million lines each.

$ cat trips_x*.csv \
    | split --lines=35000000 \
            --additional-suffix=.csv

The above completed in 8 minutes on each EC2 instance.

I'm now left with 16 CSV files on the first EC2 instance:

$ ls -l /raidArray/s3/x*.csv

... 16269790081 ... xaa.csv
... 16256520062 ... xab.csv
... 16260850939 ... xac.csv
... 16300101241 ... xad.csv
... 16328293941 ... xae.csv
... 16308059268 ... xaf.csv
... 15710765508 ... xag.csv
... 15694289596 ... xah.csv
... 15724665907 ... xai.csv
... 15701359090 ... xaj.csv
... 15805197311 ... xak.csv
... 15825215893 ... xal.csv
... 15798558696 ... xam.csv
... 15810367473 ... xan.csv
... 15740445794 ... xao.csv
... 15739487044 ... xap.csv

And 16 files on the other EC2 instance:

$ ls -l /raidArray/s3/x*.csv

... 15832236746 ... xaa.csv
... 15500745803 ... xab.csv
... 15377897035 ... xac.csv
... 15219115615 ... xad.csv
... 15910087929 ... xae.csv
... 16174771007 ... xaf.csv
... 16281792208 ... xag.csv
... 16282698764 ... xah.csv
... 16290969841 ... xai.csv
... 16268493212 ... xaj.csv
... 16268866466 ... xak.csv
... 16265349110 ... xal.csv
... 16252570464 ... xam.csv
... 16281353543 ... xan.csv
... 16241569911 ... xao.csv
... 13355974598 ... xap.csv

I'll then connect to the coordinator node on each EC2 instance and setup BrytlytDB's gpu_manager_fdw extension and foreign data wrapper.

$ /usr/local/x2/bin/psql -d postgres

psql (PGXC 1.3devel, based on PG 9.4beta1)
Type "help" for help.

CREATE EXTENSION gpu_manager_fdw;
CREATE SERVER gm_fdw_server
    FOREIGN DATA WRAPPER gpu_manager_fdw;

I can then create the trips table.

CREATE FOREIGN TABLE trips (
    trip_id                 INT,
    vendor_id               VARCHAR(3),
    pickup_datetime         DATE,
    dropoff_datetime        DATE,
    store_and_fwd_flag      VARCHAR(1),
    rate_code_id            INT,
    pickup_longitude        DOUBLE PRECISION,
    pickup_latitude         DOUBLE PRECISION,
    dropoff_longitude       DOUBLE PRECISION,
    dropoff_latitude        DOUBLE PRECISION,
    passenger_count         INT,
    trip_distance           DOUBLE PRECISION,
    fare_amount             DOUBLE PRECISION,
    extra                   DOUBLE PRECISION,
    mta_tax                 DOUBLE PRECISION,
    tip_amount              DOUBLE PRECISION,
    tolls_amount            DOUBLE PRECISION,
    ehail_fee               DOUBLE PRECISION,
    improvement_surcharge   DOUBLE PRECISION,
    total_amount            DOUBLE PRECISION,
    payment_type            VARCHAR(10),
    trip_type               INT,
    pickup                  VARCHAR(50),
    dropoff                 VARCHAR(50),

    cab_type                VARCHAR(6),

    precipitation           INT,
    snow_depth              INT,
    snowfall                INT,
    max_temperature         INT,
    min_temperature         INT,
    average_wind_speed      INT,

    pickup_nyct2010_gid     INT,
    pickup_ctlabel          VARCHAR(10),
    pickup_borocode         INT,
    pickup_boroname         VARCHAR(13),
    pickup_ct2010           VARCHAR(6),
    pickup_boroct2010       VARCHAR(7),
    pickup_cdeligibil       VARCHAR(1),
    pickup_ntacode          VARCHAR(4),
    pickup_ntaname          VARCHAR(56),
    pickup_puma             VARCHAR(4),

    dropoff_nyct2010_gid    INT,
    dropoff_ctlabel         VARCHAR(10),
    dropoff_borocode        INT,
    dropoff_boroname        VARCHAR(13),
    dropoff_ct2010          VARCHAR(6),
    dropoff_boroct2010      VARCHAR(7),
    dropoff_cdeligibil      VARCHAR(1),
    dropoff_ntacode         VARCHAR(4),
    dropoff_ntaname         VARCHAR(56),
    dropoff_puma            VARCHAR(4)
)  SERVER gm_fdw_server OPTIONS (max_size '38000000', index '24')
DISTRIBUTE BY HASH (trip_id);

The above sets the maximum row count per GPU device at 38 million and creates an index on the cab_type column. The cab_type column has very low cardinality so the indexing will be less beneficial versus a column with a higher cardinality of values.

Some of the data types used in the above table aren't the types I normally use in my benchmarks. The reason for this is that there is still limited data type support available in this early version of BrytlytDB. I've had to substitute DATETIME with DATE which will truncate the timestamps in the dataset to just the date rather than the full date and time. There is no SMALLINT support yet so I've had to use the larger INT type as an replacement for those fields. DECIMAL types aren't yet supported so I'll be using DOUBLE PRECISION as a replacement there.

The above replacements could speed up or slow down the queries I benchmark with so I'm hoping to do another benchmark when data type support is widened in the future.

With the table created I'll launch 32 simultaneous load jobs across the two EC2 instances to load the data into BrytlytDB's trips table.

This is the import script I ran on the first EC2 instance:

$ vi import.sh

count=0

for filename in /raidArray/s3/xa*.csv; do
    /usr/local/x2/bin/psql \
        -d postgres \
        -p 5432 \
        -h localhost \
        -c "EXECUTE DIRECT ON ( d$count )
                'SELECT load_data(''trips'',
                                  ''$filename'',
                                  '',
                                  '',
                                  100000,
                                  ''0'',
                                  '''',
                                  ''append'')'" 2>&1 &
    (( count++ ))
done

And this is the load script for the second instance.

$ vi import.sh

count=16

for filename in /raidArray/s3/xa*.csv; do
    /usr/local/x2/bin/psql \
        -d postgres \
        -p 5432 \
        -h localhost \
        -c "EXECUTE DIRECT ON ( d$count )
                'SELECT load_data(''trips'',
                                  ''$filename'',
                                  '',
                                  '',
                                  100000,
                                  ''0'',
                                  '''',
                                  ''append'')'" 2>&1 &
    (( count++ ))
done

The first EC2 instance loaded its half of the dataset in 1 hour, 12 minutes and 21 seconds. The second instance loaded its half in 1 hour, 9 minutes and 57 seconds.

After the data was loaded in I could see the PostgreSQL data directory filled up with reasonably evenly-sized data folders. Here's what they look like on the first EC2 instance:

$ du -hs /raidArray/data/*

36M     c0
5.4G    d0
5.4G    d1
5.7G    d10
5.7G    d11
5.7G    d12
5.6G    d13
5.7G    d14
5.6G    d15
5.5G    d2
5.7G    d3
5.7G    d4
5.7G    d5
5.7G    d6
5.7G    d7
5.7G    d8
5.7G    d9
4.0M    g0

I was then able to use PostgreSQL's CLI tool to make sure I can see the table and all 1.1 billion records.

$ /usr/local/x2/bin/psql -d postgres

\d

            List of relations
 Schema | Name  |     Type      | Owner
--------+-------+---------------+--------
 public | trips | foreign table | ubuntu
(1 row)

select count(*) from trips;

   count
------------
 1113653018
(1 row)

Benchmarking BrytlytDB

The times quoted below are the lowest query times seen during a series of runs. As with all my benchmarks, I use the lowest query time as a way of indicating "top speed".

$ /usr/local/x2/bin/psql -d postgres

\timing on

The following completed in 0.762 seconds.

SELECT cab_type,
       count(*)
FROM trips
GROUP BY cab_type;

The following completed in 2.472 seconds.

SELECT passenger_count,
       avg(total_amount)
FROM trips
GROUP BY passenger_count;

The following completed in 4.131 seconds.

SELECT passenger_count,
       extract(year from pickup_datetime) AS pickup_year,
       count(*)
FROM trips
GROUP BY passenger_count,
         pickup_year;

The following completed in 6.041 seconds.

SELECT passenger_count,
       extract(year from pickup_datetime) AS pickup_year,
       cast(trip_distance as int) AS distance,
       count(*) AS the_count
FROM trips
GROUP BY passenger_count,
         pickup_year,
         distance
ORDER BY pickup_year,
         the_count desc;

Given all the features of PostgreSQL are still available I'm blown away at how fast BrytlytDB is able to aggregate data. The cluster I used cost around $30 / hour which means this system out performed other Cloud-based and PostgreSQL-based data warehousing solutions both in terms of wall clock time and in terms of cost per hour of running the cluster.

That being said I know Nvidia's K80 chips use extremely fast memory and have thousands of compute cores so I expect further optimisations from BrytlytDB to drive down these already fast query times.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.