Dot syntax and deep_fetch for Ruby hashes

Benchmarks and usability considerations

May 5, 2022 Ā· Felipe Vogel Ā·

Recently I heard about this convenient feature of Elixir maps:

To access atom keys, one may also use the map.key notation. Note that map.key will raise a KeyError if the map doesnā€™t contain the key :key, compared to map[:key], that would return nil.

Nice! This is something Iā€™ve been wishing for in Ruby. In a current project I have a configuration hash that is passed around and used in a variety of objects. The hash is quite large and several levels deep, so my code abounds with chains of Hash#fetch such as config.fetch(:item).fetch(:template).fetch(:variants). Which, as you can imagine, makes some lines very long and not particularly readable šŸ˜’

Two notes on why Iā€™ve been doing it this way:

So, if we stick with a raw hash, can we hack our way to a more concise alternative to those repeated fetches, but with the same safety net of a KeyError? Of course! This is Ruby, after all, where anything is possible. But whether itā€™s advisableā€¦ thatā€™s the real question. In this post, weā€™ll look at two possible syntaxes along with their performance and usability implications:

Originally I set out to find a performant approach to dot syntax, but by the end I had changed my mind, for reasons that Iā€™ll explain.

Baselines

First, here are benchmarks on standard syntax (mostly). For the benchmark code, see the end of this post.

  1. Bracket notation: hash[:a][:b]
  2. Dig: hash.dig(:a, :b)
  3. Chained fetch: hash.fetch(:a).fetch(:b)
  4. A shorter fetch alias: hash > :a > :b. Because why not.
                           user     system      total        real
1. brackets           :  0.003332   0.000000   0.003332 (  0.003332)
2. dig                :  0.002877   0.000823   0.003700 (  0.003704)
3. fetch              :  0.005040   0.000000   0.005040 (  0.005044)
4. fetch alias        :  0.005012   0.000000   0.005012 (  0.005012)

Notes:

Dot syntax

Here are a few approaches to dot notation for hashes or hash-like structures, benchmarked. Keep in mind that I measured only access (reading) performance, not initialization or writing.

  1. A Struct. (UPDATE: I also tried the new Data class in Ruby 3.2, but it provides the same speed of access as Struct, so I decided not to add it to the benchmarks.)
  2. Faux dot notation by flattening a hash and giving it composite keys, as in config[:"item.template.variants"]. I copied this approach from here, with the main difference that I use symbols as keys because theyā€™re more performant than strings. Note that :"string" is similar to "string".to_sym but faster because a string is not created every time. Also, this approach uses brackets, but only because that hashā€™s bracket accessor (Hash#[]) is overridden to use fetch.
  3. An OpenStruct.
  4. Augmenting a single hash with methods corresponding to its keys.
  5. ActiveSupport::OrderedOptions.
  6. hash_dot gem. Also, my benchmark code is based on the benchmarks in hash_dotā€™s README.
  7. Hashie gem.
                           user     system      total        real
1. Struct             :  0.002540   0.000000   0.002540 (  0.002541)
2. flat composite keys:  0.008301   0.000000   0.008301 (  0.008314)
3. OpenStruct         :  0.009731   0.000000   0.009731 (  0.009772)
4. per-hash dot access:  0.015300   0.000000   0.015300 (  0.015304)
5. AS::OrderedOptions :  0.070637   0.000000   0.070637 (  0.070640)
6. hash_dot           :  0.163008   0.000000   0.163008 (  0.163076)
7. hashie             :  0.163450   0.000000   0.163450 (  0.163451)

Notes:

Dig with errors

Hash#dig looks nice: hash.dig(:item, :template, :variants). But again, the problem is that it defaults to nil for nonexistent keys. What if we could make a similar method that raises a KeyError instead?

This has actually been proposed as an addition to Ruby several times (1, 2, 3) with various names including deep_fetch, dig!, and dig_fetch. But the method seems unlikely to be added in the near future. Soā€¦ letā€™s do it ourselves!

Here are a few different implementations, with benchmarks. There are also a couple of gems for it, dig_bang and deep_fetch, but I didnā€™t include them here because dig_bang uses reduce (#4 below) and deep_fetch uses recursion, which performs the same.

  1. dig on a hash that has had its defaults set such that it raises an error for nonexistent keys.
  2. Hash#case_deep_fetch, which raises a KeyError for nonexistent keys. Itā€™s called case_deep_fetch because itā€™s implemented with a simple case statement.
  3. Hash#while_deep_fetch, similar but implemented with a while loop.
  4. Hash#reduce_deep_fetch. This is the ā€œmost Rubyā€ implementation.
                           user     system      total        real
1. dig, error defaults:  0.003750   0.000000   0.003750 (  0.003750)
2. case_deep_fetch    :  0.007850   0.000000   0.007850 (  0.007856)
3. while_deep_fetch   :  0.014849   0.000000   0.014849 (  0.014852)
4. reduce_deep_fetch  :  0.027889   0.000056   0.027945 (  0.027950)

Notes:

Moral of the story

In the middle of all this, I seriously considered giving up and just going back to fetch, because itā€™s the most performant and any other syntax risks making my code more cryptic to my future self. When I see config.fetch(:item) I know Iā€™m dealing with a hash, unlike when I see config.item. But in the end I decided that config.deep_fetch(:item, :template) is pretty self-explanatory, and the better readability that it allows is worth the small pause that my future self might take at seeing non-standard Ruby. But itā€™s surprising that clarity (and not performance) is what made the decision a difficult one.

Which leads into the other surprising takeaway: in this case it wasnā€™t hard to custom-build a very performant solution for my project. So maybe I should try a DIY mindset more often, rather than immediately reaching for a gem (or ten).

In the end, maybe the real cost of my solution was in the absurd amount of time that I spent on all this benchmarking, hairsplitting, yak shaving, and bikeshedding. Enough! But I hope youā€™ve enjoyed my little adventure as much as Iā€™m enjoying seeing it finished.

Appendix A: dig on a hash with error-raising defaults

So why is this a bad idea? It comes with no performance penalty because itā€™s just dig on a regular hash. What could go wrong?

The problem is that I have to modify my config hash in the beginning to give it defaults that raise a KeyError. Recall that I also had to modify the hash when I tried per-hash dot access (#3 in the benchmarks on dot syntax above). But this time Iā€™m less comfortable with the modification, because this one can ā€œslip outā€ in less noticeable ways.

For example, if at some point in my code the config hash is operated on in a way that creates a derived hash (e.g. by calling map on it and using the result), that derived hash would be a fresh new hash without the KeyError defaults.

That new hash might get passed around, with me thinking itā€™s the original that has the special defaults. I might use dig on the hash, and dig would work as in any hash (without my trusty KeyError) without me ever knowing that anything was missing šŸ’€

So this approach is too fragile for my liking. Plus, my future self might wonder ā€œWhy did I use dig and not fetch?ā€ until future self re-discovers my hack.

Appendix B: Hash#to_struct

This is a Refinement, which you can use by adding using ToStruct at the top of a class or file where you want to use it.

# Converts a Hash to a Struct.
module ToStruct
  refine Hash do
    MEMOIZED_STRUCTS = {}

    def to_struct
      MEMOIZED_STRUCTS[keys] ||= Struct.new(*keys)
      struct_class = MEMOIZED_STRUCTS[keys]

      struct_values = transform_values { |v|
        if v.is_a?(Hash)
          v.to_struct
        elsif v.is_a?(Array) && v.all? { |el| el.is_a?(Hash) }
          v.map(&:to_struct)
        else
          v
        end
      }.values

      struct_class.new(*struct_values)
    end
  end
end

Appendix C: the benchmark code

require 'benchmark'
require 'ostruct'
require 'active_support/ordered_options'
require 'hash_dot'
require 'hashie'
require 'active_support/core_ext/object/blank'

#### SETUP

## FOR BASELINE BENCHMARKS

# regular hash
vanilla = { address: { category: { desc: "Urban" } } }

# fetch alias
class Hash
  alias_method :>, :fetch
end

## FOR DOT BENCHMARKS

# Struct
s_top = Struct.new(:address)
s_inner = Struct.new(:category)
s_innest = Struct.new(:desc)

struct = s_top.new(
  s_inner.new(
    s_innest.new(
      "Urban"
    )
  )
)

# a flattened hash with composite keys
# from https://snippets.aktagon.com/snippets/738-dot-notation-for-ruby-configuration-hash
def to_namespace_hash(object, prefix = nil)
  if object.is_a? Hash
    object.map { |key, value|
      if prefix
        to_namespace_hash value, "#{prefix}.#{key}".to_sym
      else
        to_namespace_hash value, "#{key}".to_sym
      end
    }.reduce(&:merge)
  else
    { prefix => object }
  end
end

flat = { address: { category: { desc: "Urban" } } }
flat = to_namespace_hash(flat)

def flat.[](key)
  fetch(key)
rescue KeyError
  possible_keys = keys
    .map { |x| x if x.match(/.*?#{key}.*?/i) }
    .delete_if(&:blank?).join("\n")

  raise KeyError, "Key '#{key}' not found. Did you mean one of:\n#{possible_keys}"
end

flat.freeze

# OpenStruct
ostruct = OpenStruct.new(
  address: OpenStruct.new(
    category: OpenStruct.new(
      desc: "Urban"
    )
  )
)

# per-hash dot access
def allow_dot_access(vanilla_hash)
  vanilla_hash.each do |key, value|
    vanilla_hash.define_singleton_method(key) do
      fetch(key)
    end

    allow_dot_access(value) if value.is_a?(Hash)
  end
end

my_dot = allow_dot_access({ address: { category: { desc: "Urban" } } }).freeze

# ActiveSupport::OrderedOptions
asoo = ActiveSupport::OrderedOptions.new
asoo.address = ActiveSupport::OrderedOptions.new
asoo.address.category = ActiveSupport::OrderedOptions.new
asoo.address.category.desc = "Urban"

# hash_dot gem
hash_dot = vanilla.to_dot

## FOR DEEP_FETCH BENCHMARKS

# with error defaults
def add_key_error_defaults(vanilla_hash)
  vanilla_hash.default_proc = -> (_hash, key) { raise KeyError, "key not found: :#{key}" }
  vanilla_hash.values.each do |value|
    if value.is_a? Hash
      add_key_error_defaults(value)
    end
  end
  vanilla_hash
end

errorful = add_key_error_defaults({ address: { category: { desc: "Urban" } } })

# deep_fetch implementations
class Hash
  # ewwwwwwwwwwwww
  def case_deep_fetch(key1, key2 = nil, key3 = nil, key4 = nil)
    if key4
      fetch(key1).fetch(key2).fetch(key3).fetch(key4)
    elsif key3
      fetch(key1).fetch(key2).fetch(key3)
    elsif key2
      fetch(key1).fetch(key2)
    else
      fetch(key1)
    end
  end

  def while_deep_fetch(*keys)
    hash = self
    while key = keys.shift
      hash = hash.fetch(key)
    end
    hash
  end

  def reduce_deep_fetch(*keys)
    keys.reduce(self) { |memo, key| memo.fetch(key) }
  end
end

#### BENCHMARKS

iterations = 50000

Benchmark.bm(8) do |bm|
  puts "BASELINES:"

  bm.report("1. brackets           :") do
    iterations.times do
      vanilla[:address][:category][:desc]
    end
  end

  bm.report("2. dig                :") do
    iterations.times do
      vanilla[:address][:category][:desc]
    end
  end

  bm.report("3. fetch              :") do
    iterations.times do
      vanilla.fetch(:address).fetch(:category).fetch(:desc)
    end
  end

  bm.report("4. fetch alias        :") do
    iterations.times do
      vanilla > :address > :category > :desc
    end
  end

  puts "DOT:"

  bm.report("1. Struct             :") do
    iterations.times do
      struct.address.category.desc
    end
  end

  bm.report("2. flat composite keys:") do
    iterations.times do
      flat["address.category.desc".to_sym]
    end
  end

  bm.report("3. OpenStruct         :") do
    iterations.times do
      ostruct.address.category.desc
    end
  end

  bm.report("4. per-hash dot access:") do
    iterations.times do
      my_dot.address.category.desc
    end
  end

  bm.report("5. AS::OrderedOptions :") do
    iterations.times do
      asoo.address.category.desc
    end
  end

  bm.report("6. hash_dot           :") do
    iterations.times do
      hash_dot.address.category.desc
    end
  end

  class Hash
    include Hashie::Extensions::MethodAccess
  end

  bm.report("7. hashie             :") do
    iterations.times do
      vanilla.address.category.desc
    end
  end

  puts "DEEP_FETCH:"

  bm.report("1. dig, error defaults:") do
    iterations.times do
      errorful.dig(:address, :category, :desc)
    end
  end

  bm.report("2. case_deep_fetch    :") do
    iterations.times do
      vanilla.case_deep_fetch(:address, :category, :desc)
    end
  end

  bm.report("3. while_deep_fetch   :") do
    iterations.times do
      vanilla.while_deep_fetch(:address, :category, :desc)
    end
  end

  bm.report("4. reduce_deep_fetch  :") do
    iterations.times do
      vanilla.reduce_deep_fetch(:address, :category, :desc)
    end
  end
end
šŸ‘‰ Newer: RVTWS šŸ‘ˆ Older: MacOS for PC users šŸš€ Back to top