In this article I will introduce a new command line search tool,
that combines the usability of
The Silver Searcher
ack clone) with the
raw performance of GNU grep.
ripgrep is fast, cross platform (with binaries
available for Linux, Mac and Windows) and written in
ripgrep is available on
We will attempt to do the impossible: a fair benchmark comparison between
several popular code search tools. Specifically, we will dive into a series of
25 benchmarks that substantiate the following claims:
- For both searching single files and huge directories of files, no other
tool obviously stands above
ripgrep in either performance or correctness.
ripgrep is the only tool with proper Unicode support that doesn’t make
you pay dearly for it.
- Tools that search many files at once are generally slower if they use
memory maps, not faster.
As someone who has worked on text search in Rust in their free time for the
last 2.5 years, and as the author of both
the underlying regular expression engine,
I will use this opportunity to provide detailed insights into the performance
of each code search tool. No benchmark will go unscrutinized!
Target audience: Some familiarity with Unicode, programming and some
experience with working on the command line.
NOTE: I’m hearing reports from some people that
rg isn’t as fast as I’ve
claimed on their data. I’d love to help explain what’s going on, but to do
that, I’ll need to be able to reproduce your results. If you
file an issue
with something I can reproduce, I’d be happy to try and explain it.
It turns out that finite state machines are useful for things other than
expressing computation. Finite state machines can also be used to compactly
represent ordered sets or maps of strings that can be searched very quickly.
In this article, I will teach you about finite state machines as a data
structure for representing ordered sets and maps. This includes introducing
an implementation written in Rust called the
It comes with
complete API documentation.
I will also show you how to build them using a simple command line tool.
Finally, I will discuss a few experiments culminating in indexing over
1,600,000,000 URLs (134 GB) from the
July 2015 Common Crawl Archive.
The technique presented in this article is also how
Lucene represents a part of its inverted
Along the way, we will talk about memory maps, automaton intersection with
regular expressions, fuzzy searching with Levenshtein distance and streaming
Target audience: Some familiarity with programming and fundamental data
structures. No experience with automata theory or Rust is required.
NOTE: This blog post has been merged into the Rust Book.
Like most programming languages, Rust encourages the programmer to handle
errors in a particular way. Generally speaking, error handling is divided into
two broad categories: exceptions and return values. Rust opts for return
In this article, I intend to provide a comprehensive treatment of how to deal
with errors in Rust. More than that, I will attempt to introduce error handling
one piece at a time so that you’ll come away with a solid working knowledge of
how everything fits together.
When done naively, error handling in Rust can be verbose and annoying. This
article will explore those stumbling blocks and demonstrate how to use the
standard library to make error handling concise and ergonomic.
Target audience: Those new to Rust that don’t know its error handling
idioms yet. Some familiarity with Rust is helpful. (This article makes heavy
use of some standard traits and some very light use of closures and macros.)
A few weeks ago, I set out to add regular expressions to the
distribution with an implementation and feature set heavily inspired by
Russ Cox’s RE2.
It was just recently added to the
This regex crate includes a syntax extension that compiles a regular expression
to native Rust code when a Rust program is compiled. This can be thought of
as “ahead of time” compilation or
something similar to compile time function
These special natively compiled regexes have the same exact API as regular
expressions compiled at runtime.
In this article, I will outline my implementation strategy—including code
samples on how to write a Rust syntax extension—and describe how I was able
to achieve an identical API between regexes compiled at compile time and
regexes compiled at runtime.
Go’s only method of compile time safe polymorphism is structural subtyping, and
this article will do nothing to change that. Instead, I’m going to present a
ty with facilities to write type parametric functions in Go that
maintain run time type safety, while also being convenient for the
caller to use.
As a programmer and a fantasy football addict, I am embarassed by the means
through which we must expend ourselves to get data in a machine readable form.
This lack of open source software cripples the community with sub-standard
tools, and most importantly, detracts from some really cool and fun things that
could be done with easily available statistics. Many tools are either out-dated
or broken, or if they work, they are closed source and often cost money.
Yesterday I started work on a new library package that I hope will start to
improve this sorry state of affairs.
In sum, Archlinux is working beautifully. What follows is a rough run down of
my notes while installing, configuring, tuning and using Archlinux on the
Lenovo Thinkpad T430.
Go, by its very nature, is multithreaded. This makes a traditional approach of
daemonizing Go programs by forking a bit difficult.
To get around this, you could try something as simple as backgrounding your Go
program and instructing it to ignore the HUP
The X Go Binding (XGB) is a low level
library that provides an API to interact with running X servers. One can only
communicate with an X server by sending data over a network connection;
protocol requests, replies and errors need to be perfectly constructed down to
the last byte. Xlib did precisely this, and then some. As a result, Xlib became
huge and difficult to maintain.
In recent years, the XCB project made things a
bit more civilized by generating C code from XML
files describing the X client
protocol using Python. While the Python to generate said
code is no walk
in the park, it is typically preferred to the alternative: keeping the X core
protocol up to date along with any number of extensions that exist as well.
(There are other benefits to XCB, like easier asynchronicity, but that is
beyond the scope of this post.)
XGB proceeds in a similar vain; Python is used to generate Go
that provides an API to interact with the X protocol. Unlike its sister project
XCB, it is not thread safe. In particular, if X requests are made in parallel,
the best case scenario is a jumbled request or reply and the worst case
scenario is an X server crash. Parallel requests can be particularly useful
when images are being sent to the X server to be painted on the screen; other
useful work could be done in the interim.
Do you have dynamic workspaces in your window manager?
You might be wondering: what in the world are dynamic workspaces?
A dynamic workspace model allows one to add, remove or rename workspaces on the
Comparatively, in a typical window manager (or desktop environment)
configuration, you tell the window manager to have x number of workspaces.
When you start your window manager, you’ll have x workspaces, and you can
typically cycle between them using some variation of “next workspace” or
“previous workspace” commands.
The disadvantage with this model is that it’s difficult to have a large number
of workspaces—else you might forget which window is on each workspace.