I examine the performance of ClickHouse on Intel's Core i9-14900K against my 1.1B taxi rides benchmark.
Category: Databases | All Categories
I examine the performance of DuckDB against my 1.1B taxi rides benchmark.
I investigate how fast DoubleCloud can query 1.1 billion taxi journeys using their managed ClickHouse solution.
I look at the latest way to get ClickHouse running quickly.
I examine the performance of Hydrolix against my 1.1B taxi rides benchmark.
I investigate how fast OmniSciDB can query 1.1 billion taxi journeys using a 16" MacBook Pro.
I compare import times of various formats into ClickHouse.
I analyse material recently published on Google's "Procella" query processing engine which powers YouTube.
I analyse and debate arguments surrounding the "demise" of Hadoop.
I look for faster ways of transferring files between HDFS and AWS S3.
I take a look at Apache Flume and walk through an example using it to connect Kafka to HDFS.
I take a short look at FoundationDB and walk through a leaderboard example using Python.
I investigate how fast ClickHouse 18.16.1 can query 1.1 billion taxi journeys on a 3-node, 108-core AWS EC2 cluster.
I compare the ORC file construction times of Spark 2.4.0, Hive 2.3.4 and Presto 0.214.
I investigate how fast Spark and Presto can query 1.1 Billion Taxi Journeys using a 21-node EMR cluster.
I explore several HDFS interfaces and compare them to the JVM-based Apache Hadoop HDFS CLI.
This tutorial covers importing CSV data into SQL Server 2017, automating data pipeline tasks via Apache Airflow and visualising data using Pandas and Jupyter Notebooks.
I investigate how fast SQLite can query 1.1 billion taxi journeys from Parquet files off of HDFS.
A guide to connecting to five different data stores using Presto.
I investigate how fast Spark and Presto can query 1.1 Billion Taxi Journeys using an i3.8xlarge EC2 instance with 1.7 TB of NVMe storage versus a 21-node EMR cluster.
A simple Hadoop 3 installation guide for Ubuntu 16 that includes Hive, Spark and Presto.
I investigate how fast BrytlytDB 2.1 can query 1.1 billion taxi journeys using five IBM Minsky servers with 20 Nvidia P100 GPUs.
I investigate how fast BrytlytDB 2.0 can query 1.1 billion taxi journeys using two p16.8xlarge AWS EC2 instances.
This tutorial covers importing CSV data into SQLite 3, manipulating data via Python and visualising data using Pandas and Jupyter Notebooks.
I investigate how fast Spark 2.2 can query 1.1 billion taxi journeys using a cluster of three Raspberry Pis.
I investigate how fast BrytlytDB can query 1.1 billion taxi journeys using two p16.8xlarge AWS EC2 instances.
In this tutorial I walk-through building MapD from source on an Ubuntu 16.04.2 machine.
I investigate how fast MapD 3.0 can query 1.1 billion taxi journeys using two p2.8xlarge AWS EC2 instances.
I demonstrate how to extract analytical data from petabytes worth of websites collected by Common Crawl.
I investigate how fast ClickHouse can query 1.1 billion taxi journeys on an Intel Core i5 4670K.
I investigate how fast Vertica Community Edition 8.0.1 can query 1.1 billion taxi journeys on an Intel Core i5 4670K.
I investigate how fast an 11-node Spark 2.1.0 cluster can query over a billion records.
I investigate how fast kdb+/q can query 1.1 billion taxi journeys on 4 Intel Xeon Phi 7210 CPUs.
I investigate how fast Amazon Athena can query 1.1 billion taxi journeys.
I walk through installing, loading in data and querying Alenka.
I investigate how fast MapD can query 1.1 billion taxi journeys using 8 Nvidia Pascal-based Titan X cards.
I investigate how fast MapD can query 1.1 billion taxi journeys using 4 g2.8xlarge EC2 instances.
I investigate how fast MapD can query 1.1 billion taxi journeys using 4 Nvidia Titan X cards.
I investigate how fast MapD can query 1.1 billion taxi journeys using 8 Nvidia Telsa K80 GPU cards.
I investigate how fast a series of graph generated using R can be created across 4 different types of AWS RDS instances.
I investigate how fast a 6-node ds2.8xlarge Redshift Cluster can query over a billion records.
I investigate how fast a single Redshift ds2.xlarge instance can query over a billion records.
I look at ways of fitting every column of the 1.1 billion taxi rides into Elasticsearch on a single, 850 GB SSD.
I investigate how fast a 50-node Dataproc cluster queries the metadata of 1.1 billion taxi trips.
I investigate the performance impact of ORC file sizes on Presto query times using Google Cloud's Dataproc service.
I look at speeding up Presto queries on 1.1 billion records run on a 10-node Dataproc cluster.
I investigate the speed differences between S3 and HDFS when querying over a billion records using Presto on AWS EMR.
I investigate how fast a small Dataproc cluster can query over a billion records using Presto.
I investigate how fast a 50-node AWS EMR cluster can query over a billion records using Presto.
I investigate how fast BigQuery can query the metadata of 1.1 billion NYC taxi journeys.
I look at query speeds on 1.1 billion records on a single PostgreSQL installation running on an SSD.
I investigate how fast a single instance of Elasticsearch can query over a billion records.
I investigate how fast a small AWS EMR cluster can query over a billion records using Spark.
I investigate how fast a small AWS EMR cluster can query over a billion records using Presto.
I look at the relationship between topic counts and producer latency with Kafka.
Import the metadata of over a billion Yellow and Green Taxi and Uber rides in New York City into ORC-formatted, columnar-based files on HDFS and query them using Hive & Presto.
Import the metadata of over a billion Yellow and Green Taxi and Uber rides in New York City into a columnar-based Data Warehouse.
Using Airpal to execute queries on Parquet-fomatted data via Presto.
Parallel imports of CSV data from AWS S3 into Redshift.
I explore three ways to get Hadoop installed and running.
GAE strips HTTP body payloads if sent via HTTP GET. Elasticsearch excepts post bodies sent via HTTP GET. Re-writing the HTTP verb fixes the communications problem.
Copyright © 2014 - 2025 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.