Dot syntax and deep_fetch for Ruby hashes

Benchmarks and usability considerations

May 5, 2022 Ā· Felipe Vogel Ā·

Recently I heard about this convenient feature of Elixir maps:

To access atom keys, one may also use the map.key notation. Note that map.key will raise a KeyError if the map doesn’t contain the key :key, compared to map[:key], that would return nil.

Nice! This is something I’ve been wishing for in Ruby. In a current project I have a configuration hash that is passed around and used in a variety of objects. The hash is quite large and several levels deep, so my code abounds with chains of Hash#fetch such as config.fetch(:item).fetch(:template).fetch(:variants). Which, as you can imagine, makes some lines very long and not particularly readable šŸ˜’

Two notes on why I’ve been doing it this way:

So, if we stick with a raw hash, can we hack our way to a more concise alternative to those repeated fetches, but with the same safety net of a KeyError? Of course! This is Ruby, after all, where anything is possible. But whether it’s advisable… that’s the real question. In this post, we’ll look at two possible syntaxes along with their performance and usability implications:

Originally I set out to find a performant approach to dot syntax, but by the end I had changed my mind, for reasons that I’ll explain.

Baselines

First, here are benchmarks on standard syntax (mostly). For the benchmark code, see the end of this post.

  1. Bracket notation: hash[:a][:b]
  2. Dig: hash.dig(:a, :b)
  3. Chained fetch: hash.fetch(:a).fetch(:b)
  4. A shorter fetch alias: hash > :a > :b. Because why not.
                           user     system      total        real
1. brackets           :  0.003332   0.000000   0.003332 (  0.003332)
2. dig                :  0.002877   0.000823   0.003700 (  0.003704)
3. fetch              :  0.005040   0.000000   0.005040 (  0.005044)
4. fetch alias        :  0.005012   0.000000   0.005012 (  0.005012)

Notes:

Dot syntax

Here are a few approaches to dot notation for hashes or hash-like structures, benchmarked. Keep in mind that I measured only access (reading) performance, not initialization or writing.

  1. A Struct. (UPDATE: I also tried the new Data class in Ruby 3.2, but it provides the same speed of access as Struct, so I decided not to add it to the benchmarks.)
  2. Faux dot notation by flattening a hash and giving it composite keys, as in config[:"item.template.variants"]. I copied this approach from here, with the main difference that I use symbols as keys because they’re more performant than strings. Note that :"string" is similar to "string".to_sym but faster because a string is not created every time. Also, this approach uses brackets, but only because that hash’s bracket accessor (Hash#[]) is overridden to use fetch.
  3. An OpenStruct.
  4. Augmenting a single hash with methods corresponding to its keys.
  5. ActiveSupport::OrderedOptions.
  6. hash_dot gem. Also, my benchmark code is based on the benchmarks in hash_dot’s README.
  7. Hashie gem.
                           user     system      total        real
1. Struct             :  0.002540   0.000000   0.002540 (  0.002541)
2. flat composite keys:  0.008301   0.000000   0.008301 (  0.008314)
3. OpenStruct         :  0.009731   0.000000   0.009731 (  0.009772)
4. per-hash dot access:  0.015300   0.000000   0.015300 (  0.015304)
5. AS::OrderedOptions :  0.070637   0.000000   0.070637 (  0.070640)
6. hash_dot           :  0.163008   0.000000   0.163008 (  0.163076)
7. hashie             :  0.163450   0.000000   0.163450 (  0.163451)

Notes:

Dig with errors

Hash#dig looks nice: hash.dig(:item, :template, :variants). But again, the problem is that it defaults to nil for nonexistent keys. What if we could make a similar method that raises a KeyError instead?

This has actually been proposed as an addition to Ruby several times (1, 2, 3) with various names including deep_fetch, dig!, and dig_fetch. But the method seems unlikely to be added in the near future. So… let’s do it ourselves!

Here are a few different implementations, with benchmarks. There are also a couple of gems for it, dig_bang and deep_fetch, but I didn’t include them here because dig_bang uses reduce (#4 below) and deep_fetch uses recursion, which performs the same.

  1. dig on a hash that has had its defaults set such that it raises an error for nonexistent keys.
  2. Hash#case_deep_fetch, which raises a KeyError for nonexistent keys. It’s called case_deep_fetch because it’s implemented with a simple case statement.
  3. Hash#while_deep_fetch, similar but implemented with a while loop.
  4. Hash#reduce_deep_fetch. This is the ā€œmost Rubyā€ implementation.
                           user     system      total        real
1. dig, error defaults:  0.003750   0.000000   0.003750 (  0.003750)
2. case_deep_fetch    :  0.007850   0.000000   0.007850 (  0.007856)
3. while_deep_fetch   :  0.014849   0.000000   0.014849 (  0.014852)
4. reduce_deep_fetch  :  0.027889   0.000056   0.027945 (  0.027950)

Notes:

Moral of the story

In the middle of all this, I seriously considered giving up and just going back to fetch, because it’s the most performant and any other syntax risks making my code more cryptic to my future self. When I see config.fetch(:item) I know I’m dealing with a hash, unlike when I see config.item. But in the end I decided that config.deep_fetch(:item, :template) is pretty self-explanatory, and the better readability that it allows is worth the small pause that my future self might take at seeing non-standard Ruby. But it’s surprising that clarity (and not performance) is what made the decision a difficult one.

Which leads into the other surprising takeaway: in this case it wasn’t hard to custom-build a very performant solution for my project. So maybe I should try a DIY mindset more often, rather than immediately reaching for a gem (or ten).

In the end, maybe the real cost of my solution was in the absurd amount of time that I spent on all this benchmarking, hairsplitting, yak shaving, and bikeshedding. Enough! But I hope you’ve enjoyed my little adventure as much as I’m enjoying seeing it finished.

Appendix A: dig on a hash with error-raising defaults

So why is this a bad idea? It comes with no performance penalty because it’s just dig on a regular hash. What could go wrong?

The problem is that I have to modify my config hash in the beginning to give it defaults that raise a KeyError. Recall that I also had to modify the hash when I tried per-hash dot access (#3 in the benchmarks on dot syntax above). But this time I’m less comfortable with the modification, because this one can ā€œslip outā€ in less noticeable ways.

For example, if at some point in my code the config hash is operated on in a way that creates a derived hash (e.g. by calling map on it and using the result), that derived hash would be a fresh new hash without the KeyError defaults.

That new hash might get passed around, with me thinking it’s the original that has the special defaults. I might use dig on the hash, and dig would work as in any hash (without my trusty KeyError) without me ever knowing that anything was missing šŸ’€

So this approach is too fragile for my liking. Plus, my future self might wonder ā€œWhy did I use dig and not fetch?ā€ until future self re-discovers my hack.

Appendix B: Hash#to_struct

This is a Refinement, which you can use by adding using ToStruct at the top of a class or file where you want to use it.

# Converts a Hash to a Struct.
module ToStruct
  refine Hash do
    MEMOIZED_STRUCTS = {}

    def to_struct
      MEMOIZED_STRUCTS[keys] ||= Struct.new(*keys)
      struct_class = MEMOIZED_STRUCTS[keys]

      struct_values = transform_values { |v|
        if v.is_a?(Hash)
          v.to_struct
        elsif v.is_a?(Array) && v.all? { |el| el.is_a?(Hash) }
          v.map(&:to_struct)
        else
          v
        end
      }.values

      struct_class.new(*struct_values)
    end
  end
end

Appendix C: the benchmark code

require 'benchmark'
require 'ostruct'
require 'active_support/ordered_options'
require 'hash_dot'
require 'hashie'
require 'active_support/core_ext/object/blank'

#### SETUP

## FOR BASELINE BENCHMARKS

# regular hash
vanilla = { address: { category: { desc: "Urban" } } }

# fetch alias
class Hash
  alias_method :>, :fetch
end

## FOR DOT BENCHMARKS

# Struct
s_top = Struct.new(:address)
s_inner = Struct.new(:category)
s_innest = Struct.new(:desc)

struct = s_top.new(
  s_inner.new(
    s_innest.new(
      "Urban"
    )
  )
)

# a flattened hash with composite keys
# from https://snippets.aktagon.com/snippets/738-dot-notation-for-ruby-configuration-hash
def to_namespace_hash(object, prefix = nil)
  if object.is_a? Hash
    object.map { |key, value|
      if prefix
        to_namespace_hash value, "#{prefix}.#{key}".to_sym
      else
        to_namespace_hash value, "#{key}".to_sym
      end
    }.reduce(&:merge)
  else
    { prefix => object }
  end
end

flat = { address: { category: { desc: "Urban" } } }
flat = to_namespace_hash(flat)

def flat.[](key)
  fetch(key)
rescue KeyError
  possible_keys = keys
    .map { |x| x if x.match(/.*?#{key}.*?/i) }
    .delete_if(&:blank?).join("\n")

  raise KeyError, "Key '#{key}' not found. Did you mean one of:\n#{possible_keys}"
end

flat.freeze

# OpenStruct
ostruct = OpenStruct.new(
  address: OpenStruct.new(
    category: OpenStruct.new(
      desc: "Urban"
    )
  )
)

# per-hash dot access
def allow_dot_access(vanilla_hash)
  vanilla_hash.each do |key, value|
    vanilla_hash.define_singleton_method(key) do
      fetch(key)
    end

    allow_dot_access(value) if value.is_a?(Hash)
  end
end

my_dot = allow_dot_access({ address: { category: { desc: "Urban" } } }).freeze

# ActiveSupport::OrderedOptions
asoo = ActiveSupport::OrderedOptions.new
asoo.address = ActiveSupport::OrderedOptions.new
asoo.address.category = ActiveSupport::OrderedOptions.new
asoo.address.category.desc = "Urban"

# hash_dot gem
hash_dot = vanilla.to_dot

## FOR DEEP_FETCH BENCHMARKS

# with error defaults
def add_key_error_defaults(vanilla_hash)
  vanilla_hash.default_proc = -> (_hash, key) { raise KeyError, "key not found: :#{key}" }
  vanilla_hash.values.each do |value|
    if value.is_a? Hash
      add_key_error_defaults(value)
    end
  end
  vanilla_hash
end

errorful = add_key_error_defaults({ address: { category: { desc: "Urban" } } })

# deep_fetch implementations
class Hash
  # ewwwwwwwwwwwww
  def case_deep_fetch(key1, key2 = nil, key3 = nil, key4 = nil)
    if key4
      fetch(key1).fetch(key2).fetch(key3).fetch(key4)
    elsif key3
      fetch(key1).fetch(key2).fetch(key3)
    elsif key2
      fetch(key1).fetch(key2)
    else
      fetch(key1)
    end
  end

  def while_deep_fetch(*keys)
    hash = self
    while key = keys.shift
      hash = hash.fetch(key)
    end
    hash
  end

  def reduce_deep_fetch(*keys)
    keys.reduce(self) { |memo, key| memo.fetch(key) }
  end
end

#### BENCHMARKS

iterations = 50000

Benchmark.bm(8) do |bm|
  puts "BASELINES:"

  bm.report("1. brackets           :") do
    iterations.times do
      vanilla[:address][:category][:desc]
    end
  end

  bm.report("2. dig                :") do
    iterations.times do
      vanilla[:address][:category][:desc]
    end
  end

  bm.report("3. fetch              :") do
    iterations.times do
      vanilla.fetch(:address).fetch(:category).fetch(:desc)
    end
  end

  bm.report("4. fetch alias        :") do
    iterations.times do
      vanilla > :address > :category > :desc
    end
  end

  puts "DOT:"

  bm.report("1. Struct             :") do
    iterations.times do
      struct.address.category.desc
    end
  end

  bm.report("2. flat composite keys:") do
    iterations.times do
      flat["address.category.desc".to_sym]
    end
  end

  bm.report("3. OpenStruct         :") do
    iterations.times do
      ostruct.address.category.desc
    end
  end

  bm.report("4. per-hash dot access:") do
    iterations.times do
      my_dot.address.category.desc
    end
  end

  bm.report("5. AS::OrderedOptions :") do
    iterations.times do
      asoo.address.category.desc
    end
  end

  bm.report("6. hash_dot           :") do
    iterations.times do
      hash_dot.address.category.desc
    end
  end

  class Hash
    include Hashie::Extensions::MethodAccess
  end

  bm.report("7. hashie             :") do
    iterations.times do
      vanilla.address.category.desc
    end
  end

  puts "DEEP_FETCH:"

  bm.report("1. dig, error defaults:") do
    iterations.times do
      errorful.dig(:address, :category, :desc)
    end
  end

  bm.report("2. case_deep_fetch    :") do
    iterations.times do
      vanilla.case_deep_fetch(:address, :category, :desc)
    end
  end

  bm.report("3. while_deep_fetch   :") do
    iterations.times do
      vanilla.while_deep_fetch(:address, :category, :desc)
    end
  end

  bm.report("4. reduce_deep_fetch  :") do
    iterations.times do
      vanilla.reduce_deep_fetch(:address, :category, :desc)
    end
  end
end
šŸ‘‰ Newer: RVTWS šŸ‘ˆ Older: MacOS for PC users šŸš€ Back to top