Dot syntax and deep_fetch for Ruby hashes
Benchmarks and usability considerations
May 5, 2022 Ā· Felipe Vogel Ā·- Baselines
- Dot syntax
- Dig with errors
- Moral of the story
- Appendix A:
dig
on a hash with error-raising defaults - Appendix B:
Hash#to_struct
- Appendix C: the benchmark code
Recently I heard about this convenient feature of Elixir maps:
To access atom keys, one may also use the
map.key
notation. Note thatmap.key
will raise aKeyError
if themap
doesnāt contain the key:key
, compared tomap[:key]
, that would returnnil
.
Nice! This is something Iāve been wishing for in Ruby. In a current project I have a configuration hash that is passed around and used in a variety of objects. The hash is quite large and several levels deep, so my code abounds with chains of Hash#fetch
such as config.fetch(:item).fetch(:template).fetch(:variants)
. Which, as you can imagine, makes some lines very long and not particularly readable š
Two notes on why Iāve been doing it this way:
- The reason I use
fetch
instead ofconfig[:item][:template][:variants]
orconfig.dig(:item, :template, :variants)
is that theKeyError
raised byfetch
, in case of a missing key, is more helpful than the defaultnil
value from brackets ordig
. In fact, thatnil
could cause a major debugging headache if it results in an error somewhere else far from where thenil
originated. - If youāre wondering why Iām using a raw hash instead of a custom
Config
class with syntactic sugar such asconfig[:item, :template, :variants]
: that could be a great idea in some projects! But in this project, some objects use only a part of the config and I donāt want to pass the entire config into those objects. Also, some objects perform hash operations using parts of the config. So if Iām creating separateConfig
objects just to wrap inner hashes from the mainConfig
, and if Iām converting theseConfig
objects into a hash at various points, then it seems I should simply use a hash to begin with. In this project itās simpler to deal with hashes all the time, so that I donāt have to ask myself, āLetās see, is this aConfig
object here, or has it turned into a hash?ā
So, if we stick with a raw hash, can we hack our way to a more concise alternative to those repeated fetch
es, but with the same safety net of a KeyError
? Of course! This is Ruby, after all, where anything is possible. But whether itās advisableā¦ thatās the real question. In this post, weāll look at two possible syntaxes along with their performance and usability implications:
- Dot syntax:
config.item.template.variants
- Deep fetch:
config.deep_fetch(:item, :template, :variants)
Originally I set out to find a performant approach to dot syntax, but by the end I had changed my mind, for reasons that Iāll explain.
Baselines
First, here are benchmarks on standard syntax (mostly). For the benchmark code, see the end of this post.
- Bracket notation:
hash[:a][:b]
- Dig:
hash.dig(:a, :b)
- Chained
fetch
:hash.fetch(:a).fetch(:b)
- A shorter
fetch
alias:hash > :a > :b
. Because why not.
user system total real
1. brackets : 0.003332 0.000000 0.003332 ( 0.003332)
2. dig : 0.002877 0.000823 0.003700 ( 0.003704)
3. fetch : 0.005040 0.000000 0.005040 ( 0.005044)
4. fetch alias : 0.005012 0.000000 0.005012 ( 0.005012)
Notes:
- These are all very performant. But remember, brackets and
dig
returnnil
where I want aKeyError
, and chainedfetch
is what Iām trying to get away from. - In some runs,
dig
(#2) was faster than brackets (#1), but more often brackets win by a hair. - Chained
fetch
(#3) is consistently slower than brackets here in the benchmarks, but my projectās test suite does not run any faster when I replace all calls tofetch
with brackets. Itās a good reminder that benchmarks donāt always reflect real-life performance. - Even though the
fetch
alias (#4) is just as fast asfetch
itself in the benchmarks, my projectās test suite took 20% longer to run when I replaced all calls tofetch
with an alias. 20% slower is not much, especially since all of my tests run in well under one second. But thereās also the fact that whileconfig > ā¦ > ā¦
looks really cool, it is a bit cryptic (likely to confuse my future forgetful self), and I have to surround it with parentheses every time I want to call a method on the return value. Still, I was curious about the performance hit and thatās why I included thefetch
alias here.
Dot syntax
Here are a few approaches to dot notation for hashes or hash-like structures, benchmarked. Keep in mind that I measured only access (reading) performance, not initialization or writing.
- A Struct. (UPDATE: I also tried the new Data class in Ruby 3.2, but it provides the same speed of access as Struct, so I decided not to add it to the benchmarks.)
- Faux dot notation by flattening a hash and giving it composite keys, as in
config[:"item.template.variants"]
. I copied this approach from here, with the main difference that I use symbols as keys because theyāre more performant than strings. Note that:"string"
is similar to"string".to_sym
but faster because a string is not created every time. Also, this approach uses brackets, but only because that hashās bracket accessor (Hash#[]
) is overridden to usefetch
. - An OpenStruct.
- Augmenting a single hash with methods corresponding to its keys.
- ActiveSupport::OrderedOptions.
- hash_dot gem. Also, my benchmark code is based on the benchmarks in hash_dotās README.
- Hashie gem.
user system total real
1. Struct : 0.002540 0.000000 0.002540 ( 0.002541)
2. flat composite keys: 0.008301 0.000000 0.008301 ( 0.008314)
3. OpenStruct : 0.009731 0.000000 0.009731 ( 0.009772)
4. per-hash dot access: 0.015300 0.000000 0.015300 ( 0.015304)
5. AS::OrderedOptions : 0.070637 0.000000 0.070637 ( 0.070640)
6. hash_dot : 0.163008 0.000000 0.163008 ( 0.163076)
7. hashie : 0.163450 0.000000 0.163450 ( 0.163451)
Notes:
- The Struct is š„fastš„ the OpenStruct is not bad either. These werenāt an option for my config hash because it needs hash functionality, but elsewhere I found a use for Structs built from hashes, and along the way I wrote a
Hash#to_struct
method. Iāve pasted it in an appendix below. Structs are a great option for dot access wherever your data doesnāt need to remain in hash form. - The flattened hash with composite keys (#2) is also fast, but it too wouldnāt work for my present purposes because itās too far from the vanilla nested hash that I began with.
- Per-hash dot access (#4) is the fastest dot notation for a plain hash. (Note that it only works for a hash that doesnāt get any new keys once itās set up, which is just fine for my config hash.) However, when applied in my project, it still made my tests run for 70% longer. Again, thatās not as bad as it sounds for my sub-1-second test suite.
- But then something unexpected happened as soon as I replaced my projectās calls to
fetch
with dot notation. My code looked more messy even though it was now more concise. The reason, I think, is that there was no longer a slew of (syntax-highlighted) symbols at the points where I access the config hash, and so it was a bit harder to see at a glance where config values were being used. Instead of brightly-colored symbols evenly spaced byfetch
, my eyes now saw only a mush of method calls until my brain processed the words and told me whether thatās a place where the config hash is accessed. Hmm. Now I was wondering if there was a way to keep the symbols involved, but in a more concise way than chainingfetch
š¤
Dig with errors
Hash#dig
looks nice: hash.dig(:item, :template, :variants)
. But again, the problem is that it defaults to nil
for nonexistent keys. What if we could make a similar method that raises a KeyError
instead?
This has actually been proposed as an addition to Ruby several times (1, 2, 3) with various names including deep_fetch
, dig!
, and dig_fetch
. But the method seems unlikely to be added in the near future. Soā¦ letās do it ourselves!
Here are a few different implementations, with benchmarks. There are also a couple of gems for it, dig_bang and deep_fetch, but I didnāt include them here because dig_bang
uses reduce
(#4 below) and deep_fetch
uses recursion, which performs the same.
dig
on a hash that has had its defaults set such that it raises an error for nonexistent keys.Hash#case_deep_fetch
, which raises aKeyError
for nonexistent keys. Itās calledcase_deep_fetch
because itās implemented with a simple case statement.Hash#while_deep_fetch
, similar but implemented with awhile
loop.Hash#reduce_deep_fetch
. This is the āmost Rubyā implementation.
user system total real
1. dig, error defaults: 0.003750 0.000000 0.003750 ( 0.003750)
2. case_deep_fetch : 0.007850 0.000000 0.007850 ( 0.007856)
3. while_deep_fetch : 0.014849 0.000000 0.014849 ( 0.014852)
4. reduce_deep_fetch : 0.027889 0.000056 0.027945 ( 0.027950)
Notes:
reduce_deep_fetch
(#4) is the most idiomatic and flexible implementation, so itās what I chose in the end. In my case, there was no noticeable performance difference between this and the other two. For the difference to be noticeable, my Ruby app would have to be doing millions of deep fetches.while_deep_fetch
(#3) is for you if you want tosell your soultrade idiomatic Ruby for a bit of extra benchmark speed.case_deep_fetch
(#2) throws aesthetics and flexibility completely out the window because itās implemented with a case statement, and it can only dig as deep as the case statement is tall.
Moral of the story
In the middle of all this, I seriously considered giving up and just going back to fetch
, because itās the most performant and any other syntax risks making my code more cryptic to my future self. When I see config.fetch(:item)
I know Iām dealing with a hash, unlike when I see config.item
. But in the end I decided that config.deep_fetch(:item, :template)
is pretty self-explanatory, and the better readability that it allows is worth the small pause that my future self might take at seeing non-standard Ruby. But itās surprising that clarity (and not performance) is what made the decision a difficult one.
Which leads into the other surprising takeaway: in this case it wasnāt hard to custom-build a very performant solution for my project. So maybe I should try a DIY mindset more often, rather than immediately reaching for a gem (or ten).
In the end, maybe the real cost of my solution was in the absurd amount of time that I spent on all this benchmarking, hairsplitting, yak shaving, and bikeshedding. Enough! But I hope youāve enjoyed my little adventure as much as Iām enjoying seeing it finished.
Appendix A: dig
on a hash with error-raising defaults
So why is this a bad idea? It comes with no performance penalty because itās just dig
on a regular hash. What could go wrong?
The problem is that I have to modify my config hash in the beginning to give it defaults that raise a KeyError
. Recall that I also had to modify the hash when I tried per-hash dot access (#3 in the benchmarks on dot syntax above). But this time Iām less comfortable with the modification, because this one can āslip outā in less noticeable ways.
For example, if at some point in my code the config hash is operated on in a way that creates a derived hash (e.g. by calling map
on it and using the result), that derived hash would be a fresh new hash without the KeyError
defaults.
That new hash might get passed around, with me thinking itās the original that has the special defaults. I might use dig
on the hash, and dig
would work as in any hash (without my trusty KeyError
) without me ever knowing that anything was missing š
So this approach is too fragile for my liking. Plus, my future self might wonder āWhy did I use dig
and not fetch?ā until future self re-discovers my hack.
Appendix B: Hash#to_struct
This is a Refinement, which you can use by adding using ToStruct
at the top of a class or file where you want to use it.
# Converts a Hash to a Struct.
module ToStruct
refine Hash do
MEMOIZED_STRUCTS = {}
def to_struct
MEMOIZED_STRUCTS[keys] ||= Struct.new(*keys)
struct_class = MEMOIZED_STRUCTS[keys]
struct_values = transform_values { |v|
if v.is_a?(Hash)
v.to_struct
elsif v.is_a?(Array) && v.all? { |el| el.is_a?(Hash) }
v.map(&:to_struct)
else
v
end
}.values
struct_class.new(*struct_values)
end
end
end
Appendix C: the benchmark code
require 'benchmark'
require 'ostruct'
require 'active_support/ordered_options'
require 'hash_dot'
require 'hashie'
require 'active_support/core_ext/object/blank'
#### SETUP
## FOR BASELINE BENCHMARKS
# regular hash
vanilla = { address: { category: { desc: "Urban" } } }
# fetch alias
class Hash
alias_method :>, :fetch
end
## FOR DOT BENCHMARKS
# Struct
s_top = Struct.new(:address)
s_inner = Struct.new(:category)
s_innest = Struct.new(:desc)
struct = s_top.new(
s_inner.new(
s_innest.new(
"Urban"
)
)
)
# a flattened hash with composite keys
# from https://snippets.aktagon.com/snippets/738-dot-notation-for-ruby-configuration-hash
def to_namespace_hash(object, prefix = nil)
if object.is_a? Hash
object.map { |key, value|
if prefix
to_namespace_hash value, "#{prefix}.#{key}".to_sym
else
to_namespace_hash value, "#{key}".to_sym
end
}.reduce(&:merge)
else
{ prefix => object }
end
end
flat = { address: { category: { desc: "Urban" } } }
flat = to_namespace_hash(flat)
def flat.[](key)
fetch(key)
rescue KeyError
possible_keys = keys
.map { |x| x if x.match(/.*?#{key}.*?/i) }
.delete_if(&:blank?).join("\n")
raise KeyError, "Key '#{key}' not found. Did you mean one of:\n#{possible_keys}"
end
flat.freeze
# OpenStruct
ostruct = OpenStruct.new(
address: OpenStruct.new(
category: OpenStruct.new(
desc: "Urban"
)
)
)
# per-hash dot access
def allow_dot_access(vanilla_hash)
vanilla_hash.each do |key, value|
vanilla_hash.define_singleton_method(key) do
fetch(key)
end
allow_dot_access(value) if value.is_a?(Hash)
end
end
my_dot = allow_dot_access({ address: { category: { desc: "Urban" } } }).freeze
# ActiveSupport::OrderedOptions
asoo = ActiveSupport::OrderedOptions.new
asoo.address = ActiveSupport::OrderedOptions.new
asoo.address.category = ActiveSupport::OrderedOptions.new
asoo.address.category.desc = "Urban"
# hash_dot gem
hash_dot = vanilla.to_dot
## FOR DEEP_FETCH BENCHMARKS
# with error defaults
def add_key_error_defaults(vanilla_hash)
vanilla_hash.default_proc = -> (_hash, key) { raise KeyError, "key not found: :#{key}" }
vanilla_hash.values.each do |value|
if value.is_a? Hash
add_key_error_defaults(value)
end
end
vanilla_hash
end
errorful = add_key_error_defaults({ address: { category: { desc: "Urban" } } })
# deep_fetch implementations
class Hash
# ewwwwwwwwwwwww
def case_deep_fetch(key1, key2 = nil, key3 = nil, key4 = nil)
if key4
fetch(key1).fetch(key2).fetch(key3).fetch(key4)
elsif key3
fetch(key1).fetch(key2).fetch(key3)
elsif key2
fetch(key1).fetch(key2)
else
fetch(key1)
end
end
def while_deep_fetch(*keys)
hash = self
while key = keys.shift
hash = hash.fetch(key)
end
hash
end
def reduce_deep_fetch(*keys)
keys.reduce(self) { |memo, key| memo.fetch(key) }
end
end
#### BENCHMARKS
iterations = 50000
Benchmark.bm(8) do |bm|
puts "BASELINES:"
bm.report("1. brackets :") do
iterations.times do
vanilla[:address][:category][:desc]
end
end
bm.report("2. dig :") do
iterations.times do
vanilla[:address][:category][:desc]
end
end
bm.report("3. fetch :") do
iterations.times do
vanilla.fetch(:address).fetch(:category).fetch(:desc)
end
end
bm.report("4. fetch alias :") do
iterations.times do
vanilla > :address > :category > :desc
end
end
puts "DOT:"
bm.report("1. Struct :") do
iterations.times do
struct.address.category.desc
end
end
bm.report("2. flat composite keys:") do
iterations.times do
flat["address.category.desc".to_sym]
end
end
bm.report("3. OpenStruct :") do
iterations.times do
ostruct.address.category.desc
end
end
bm.report("4. per-hash dot access:") do
iterations.times do
my_dot.address.category.desc
end
end
bm.report("5. AS::OrderedOptions :") do
iterations.times do
asoo.address.category.desc
end
end
bm.report("6. hash_dot :") do
iterations.times do
hash_dot.address.category.desc
end
end
class Hash
include Hashie::Extensions::MethodAccess
end
bm.report("7. hashie :") do
iterations.times do
vanilla.address.category.desc
end
end
puts "DEEP_FETCH:"
bm.report("1. dig, error defaults:") do
iterations.times do
errorful.dig(:address, :category, :desc)
end
end
bm.report("2. case_deep_fetch :") do
iterations.times do
vanilla.case_deep_fetch(:address, :category, :desc)
end
end
bm.report("3. while_deep_fetch :") do
iterations.times do
vanilla.while_deep_fetch(:address, :category, :desc)
end
end
bm.report("4. reduce_deep_fetch :") do
iterations.times do
vanilla.reduce_deep_fetch(:address, :category, :desc)
end
end
end