Friday, October 26, 2012

When Should I Use Protected Method Visibility in Ruby?

When I'm reading Ruby code and I come across protected methods, I spend a moment taking a deeper look at the interface of the object defining them. Protected method visibility is targeted to a very specific and seemingly rare use-case in Ruby: methods defined as protected are only callable by other objects whose class is of the same defining class or its subclasses. The pickaxe book calls this “keeping it within the family.” The Ruby Programming Language describes protected as “the least commonly defined and also the most difficult to understand” of the method visibility types, so when I do see it I wonder what the author is trying to communicate. Is it a hint about the stability of the methods? Are the objects actually using protected access between instances? Did he or she want to encapsulate some behavior but use explicit self as a matter of style? Was this some unfortunate pattern promulgated by some random Ruby on Rails tutorial in 2007? Why would you reach for protected when the semantics of private seem sufficient to encapsulate an object's behavior?

Let's look at some of the common applications of protected method visibility. Some common patterns I've noticed are: attributes for comparison operations, mutator methods for immutable objects, fulfilling an abstract class' contract, and framework hooks.

Attributes for Comparison Operations

The most common employment of protected is applying it to attributes or methods that are necessary to compare two instances to each other without exposing that information in the object's public interface. Since operators on an object are actually method calls, we can override these methods and provide our own comparison logic for operations on an object. protected allows these objects to expose data needed in a comparison to each other but continue to hide it from the rest of the system.

For example, let's look at a simple Collection object. This object is responsible for management of a collection of items. The Collection doesn't expose the items directly but rather defines an interface for interacting with the internal array.

class Collection
  def initialize(items=[])
    @items = items
  end
end

Two Collection instances are deemed to be equal if they hold the same number of elements and the elements are in the same order. To expose this operation we define a method == on Collection:

class Collection
  def initialize(items=[])
    @items = items
  end

  def ==(other)
    # TODO: Compare our items to the other collection's items
  end
end

In order to compare the arrays we'll need to add an items getter but continue to hide this data from external callers. If we add a getter without specifying access control, a caller could access the contents of the array directly but if we set this method private, nobody – including sibling objects – will be able to access the property. protected does exactly what we want.

class Collection
  attr_reader :items
  protected :items

  def initialize(items=[])
    @items = items
  end

  def ==(other)
    items == other.items
  end
end

Collection instances can now compare themselves to each other while still hiding their data from other callers. Collection objects will only respond to items for sibling Collection instances; calls from other objects will raise a NoMethodError.

Collection.new([1, 2, 3]) == Collection.new([1, 2, 3])
# => true

Collection.new([1, 2, 3]) == Collection.new([3, 2, 1])
# => false

>> Collection.new([1, 2, 3]).items
# NoMethodError: protected method `items' called for #<Collection:0x007fdaf30450a8 @items=[1, 2, 3]>

The usual caveat here is that in Ruby access control is “just a suggestion” and a user of an object can still reach in and access anything regardless of its visibility. For example:

Collection.new([1, 2, 3]).instance_variable_get(:@items)
# => [1, 2, 3]

Collection.new([1, 2, 3]).send(:items)
# => [1, 2, 3]

While this is true, there's still value in signaling your intentions to users of the object. Setting explicit access controls guides users to our defined interface and discourages fiddling with internals.

The Ruby standard library class OpenStruct uses this pattern. OpenStruct allows a user to set arbitrary attributes that can be accessed with dot notation.

require "ostruct"

book = OpenStruct.new
book.title  = "The Art of Fielding"
book.author = "Chad Harbach"

book.title
# => "The Art of Fielding

book.author
# => "Chad Harbach"

An OpenStruct is considered equal to another OpenStruct when they hold the same attributes. Under the hood OpenStruct stores these attributes in an internal hash table. It exposes this table as a protected method which allows other OpenStruct instances to determine equivalence. This is implemented similarly to the Collection example:

# lib/ostruct.rb:224 (ruby 1.9.3-p286)

attr_reader :table # :nodoc:
protected :table

#
# Compares this object and +other+ for equality.  An OpenStruct is equal >
# +other+ when +other+ is an OpenStruct and the two object's Hash tables >
# equal.
#
def ==(other)
  return false unless(other.kind_of?(OpenStruct))
  return @table == other.table
end

Mutator Methods for Immutable Objects

Another use of protected I found in the Ruby standard library is using protected methods to maintain the immutability of a value object. Let's say we've decided our Collection is a value object and should be immutable. There are new requirements that necessitate some operations that require Collection to change during runtime. Let's start with the first: the sum of two Collection instances is a new Collection which holds a superset of the summands' arrays.

class Collection
  attr_reader :items
  protected :items

  def initialize(items=[])
    @items = items
  end

  def ==(other)
    items == other.items
  end

  def +(other)
    # TODO: Add our items to the other collection's items
    # and return a new collection with the sum.
  end
end

Since we've marked items as protected we can reach in from one instance into another, grab these items, add them to our items and instantiate a new collection.

class Collection
  attr_reader :items
  protected :items

  def initialize(items=[])
    @items = items
  end

  def ==(other)
    items == other.items
  end

  def +(other)
    self.class.new(items + other.items)
  end
end

Collection.new([1]) + Collection.new([4])
# => #<Collection:0x007f809207e1c0 @items=[1, 4]>

This simple example is fairly similar to our last – it involves overriding an operator method and using privileged data from the sibling instance in the operations. While state in neither Collection changed, they were able to collaborate and return a new Collection with the desired items.

IPAddr is a class in the Ruby standard library which is a value object that represents an IPv4 or IPv6 address. Under the hood it makes extensive use of this pattern for manipulating the IP address it represent.

Given an IP address (say, 192.168.0.77) and a subnet mask (255.255.255.248), we can use bitwise operations to figure out the upper and lower boundaries of the network of which this host is a member. The lower boundary is referred to as the network address and the upper boundary as the broadcast address.

# For more detail around IP addressing basics see TCP/IP
# Illustrated (Second Edition) pp. 31-43

address = IPAddr.new("192.168.0.77")
# #<IPAddr: IPv4:192.168.0.77/255.255.255.255>

# To calculate an IP address' network address, each bit
# in the address is bitwise ANDed with each corresponding
# bit in the subnet mask.
address & IPAddr.new("255.255.255.248")
# #<IPAddr: IPv4:192.168.0.72/255.255.255.255>

# To calculate an IP address' broadcast address, take
# the inverse of the subnet mask (flip ones to zeroes
# and vice-versa), and perform a bitwise OR against each
# bit in the address.
address | (~ IPAddr.new("255.255.255.248"))
# #<IPAddr: IPv4:192.168.0.79/255.255.255.255>

# So our IP address 192.168.0.77 with subnet mask
# 255.255.255.248 lies on a network whose network address
# is 192.168.0.72 and whose broadcast address is 192.168.0.79.

IPAddr exposes these operations but maintains immutability by cloning itself and calling protected methods on the new instance.

# lib/ipaddr.rb:108 (ruby 1.9.3-p286)

# Returns a new ipaddr built by bitwise AND.                                  
def &(other)                                                                  
 return self.clone.set(@addr & coerce_other(other).to_i)                     
end                                                                           

# Returns a new ipaddr built by bitwise OR.                                   
def |(other)                                                                  
  return self.clone.set(@addr | coerce_other(other).to_i)                     
end

# lib/ipaddr.rb:128 (ruby 1.9.3-p286)

# Returns a new ipaddr built by bitwise negation.                             
def ~                                                                         
  return self.clone.set(addr_mask(~@addr))                                    
end                                                                           

# lib/ipaddr.rb:370 (ruby 1.9.3-p286)

protected

# Set +@addr+, the internal stored ip address, to given +addr+. The
# parameter +addr+ is validated using the first +family+ member,
# which is +Socket::AF_INET+ or +Socket::AF_INET6+.
def set(addr, *family)
  case family[0] ? family[0] : @family
  when Socket::AF_INET
    if addr < 0 || addr > IN4MASK
      raise ArgumentError, "invalid address"
    end
  when Socket::AF_INET6
    if addr < 0 || addr > IN6MASK
      raise ArgumentError, "invalid address"
    end
  else
    raise ArgumentError, "unsupported address family"
  end
  @addr = addr
  if family[0]
    @family = family[0]
  end
  return self
end

Instead of changing its state during these operations, it creates a copy of itself using clone and calls protected methods like set to mutate the instance and return it to the caller.

Fulfilling an Abstract Class's Contract

Another example of the protected keyword is in ActiveSupport's caching layer. ActiveSupport::Cache::Store defines an abstract class that can be inherited to implement a pluggable caching layer. A minimal viable implementation of a cache store involves implementing three methods: read_entry, write_entry and delete_entry. These are called by the public API of the abstract class and implement a specific storage strategy. This separates the concerns of how the cache behaviors from the specifics of how its data is stored.

# activesupport/lib/active_support/cache.rb:441

protected

# activesupport/lib/active_support/cache.rb:461

# Read an entry from the cache implementation. Subclasses must implement
# this method.
def read_entry(key, options) # :nodoc:
  raise NotImplementedError.new
end

# Write an entry to the cache implementation. Subclasses must implement
# this method.
def write_entry(key, entry, options) # :nodoc:
  raise NotImplementedError.new
end

# Delete an entry from the cache implementation. Subclasses must
# implement this method.
def delete_entry(key, options) # :nodoc:
  raise NotImplementedError.new
end

ActiveSupport ships with implementations to store cache data in memory, a file and memcached. Each implementation has its own methods for interacting with its respective store.

# activesupport/lib/active_support/cache/mem_cache_store.rb:121

protected
  # Read an entry from the cache.
  def read_entry(key, options) # :nodoc:
    deserialize_entry(@data.get(escape_key(key), options))
  rescue Dalli::DalliError => e
    logger.error("DalliError (#{e}): #{e.message}") if logger
    nil
  end

# activesupport/lib/active_support/cache/mem_cache_store.rb:153

private

  # Memcache keys are binaries. So we need to force their encoding to binary
  # before applying the regular expression to ensure we are escaping all
  # characters properly.
  def escape_key(key)
    key = key.to_s.dup
    key = key.force_encoding("BINARY")
    key = key.gsub(ESCAPE_KEY_CHARS){ |match| "%#{match.getbyte(0).to_s(16).upcase}" }
    key = "#{key[0, 213]}:md5:#{Digest::MD5.hexdigest(key)}" if key.size > 250
    key
  end

  def deserialize_entry(raw_value)
    if raw_value
      entry = Marshal.load(raw_value) rescue raw_value
      entry.is_a?(Entry) ? entry : Entry.new(entry)
    else
      nil
    end
  end

By marking the abstract interface methods with protected and the implementation methods for the storage mechanism as private there's a demarcation between the concerns. There's no direct reason in the implementation for using protected methods in this case. The calls to these protected methods use an implicit self which means private method calls would work to encapsulate the object's behavior. Using the protected keyword is primarily a matter of convention to call into relief which concerns belong to what components to aid in maintenance.

Framework Hooks

Another conventional use of protected methods is for methods within an object that aren't called directly by the object but are callback hooks that a framework is configured to call. For example, ActionController::Base allows an inheriting class to define filters that are called at specific moments in a request's lifecycle. We'll contrive an example using a Blog application.

BlogApp::Application.routes.draw do
  resources :blogs do
    resources :posts
  end
end

Let's add a PostsController. We want to use strong_parameters to prevent any unauthorized mass-assignment and add an authorization check to ensure the current user's access to create posts on the current blog.

class PostsController < ApplicationController
  before_filter :authorize_user

  def create
    @post = blog.posts.build(post_parameters)

    if @post.save
      redirect_to action: :index, notice: "Post created."
    else
      render :new
    end
  end

  protected

  def authorize_user
    unless blog.authorized?(current_user)
      render nothing: true, status: :unauthorized
    end
  end

  private

  def blog
    @blog ||= Blog.find(params[:blog_id])
  end

  def post_parameters
    params.require(:post).permit(:title, :date, :content)
  end
end

The protected keyword denotes methods that are called by ActionController and the private keyword is used for methods that we call in the controller itself to complete our work. Simliar to the previous example above, there's no implementation reason that we're using protected methods here aside from calling attention to the fact that the methods marked as protected and private are interfaces aimed at different consumers: the external framework and the internal object respectively. It's a hint to future readers of the code that while these methods aren't part of the object's public API, there are users of the interface beyond the object itself.

So, when should I use it?

protected is an odd beast; it accomplishes much of what private does but with the addition of some nuanced complexity and the (dubious) benefit of being able to call methods on self explicitly. There are some conventions around what protected means but they seem to vary from project to project. I could find no project with any guidelines around method visibility. It was not apparent in most of the code I read that had used protected why the original author had chosen to use it.

I talked to several developers while writing this who had committed code to open source projects and had used protected. I received the same response from each: 1) I don't remember why I used protected there 2) I wouldn't use protected if I was writing that code again, private or public would have been better 3) I don't use protected at all today.

In searching ruby-core for conversations about protected methods, it's clear this feature even confuses core contributors. The OpenStruct example above was discussed on the list as a replacement of an instance_eval. The contributor who suggested it was tentative about using it: “From my ruby life for now, here's the only place where protected method lives.”

Protected method visibility could make sense to use in workaday code for the above cases. It was designed for the object interaction patterns shown in the first two examples – attributes for comparison operations and mutator methods for immutable objects. The latter two examples – fulfilling an abstract class' contract and framework hooks – are not universally applied patterns and aren't enforced by the language. If you're going to use protected in this way it's worth a quick discussion with your team to determine if the pattern would be useful or at the very least leaving a paper trail in either the commit message or the RDoc documentation for the methods briefly explaining why they are marked protected.

Thoughts, questions, or feedback? Please share! I'm @jpignata on Twitter and available via email at john@pignata.com. Thanks for reading!