When we originally wrote this book, we had a grand plan (we were
younger then). We wanted to document the language from the top down,
starting with classes and objects, and ending with the nitty-gritty
syntax details. It seemed like a good idea at the time. After all,
most everything in Ruby is an object, so it made sense to talk about
objects first.
Or so we thought.
Unfortunately, it turns out to be difficult to describe a language
that way. If you haven't covered strings, if statements,
assignments, and other details, it's difficult to write examples of
classes. Throughout our top-down description, we kept coming across
low-level details we needed to cover so that the example code would
make sense.
So, we came up with another grand plan (they don't call us pragmatic
for nothing). We'd still describe Ruby starting at the top. But before
we did that, we'd add a short chapter that described all the common
language features used in the examples along with the special vocabulary
used in Ruby, a kind of minitutorial to bootstrap us into the rest of
the book.
Let's say it again. Ruby is a genuine object-oriented language.
Everything you manipulate is an object, and the results of those
manipulations are themselves objects. However, many languages make the
same claim, and they often have a different interpretation of what
object-oriented means and a different terminology for the concepts
they employ.
So, before we get too far into the details, let's briefly look at the
terms and notation that we'll be using.
When you write object-oriented code, you're normally looking to model
concepts from the real world in your code. Typically during this modeling
process you'll discover categories of things that need to be
represented in code. In a jukebox, the concept of a ``song'' might be
such a category. In Ruby, you'd define a class to represent
each of these entities. A class is a combination of state (for
example, the name of the song) and methods that use that state (perhaps
a method to play the song).
Once you have these classes, you'll typically want to create a number
of instances of each. For the jukebox system containing a class
called Song, you'd have separate instances for popular hits such
as ``Ruby Tuesday,'' ``Enveloped in Python,'' ``String of Pearls,''
``Small talk,'' and so on. The word object is used
interchangeably with class instance (and being lazy typists, we'll
probably be using the word ``object'' more frequently).
In Ruby, these objects are created by calling a constructor, a special
method associated with a class. The standard constructor is called
new.
song1 = Song.new("Ruby Tuesday")
song2 = Song.new("Enveloped in Python")
# and so on
These instances are both derived from the same class, but they have
unique characteristics. First, every object has a unique object
identifier (abbreviated as object id). Second, you can
define instance variables,
variables with values that are
unique to each instance. These instance variables hold an object's
state. Each of our songs, for example, will probably have an instance
variable that holds the song title.
Within each class, you can define instance methods.
Each method
is a chunk of functionality which may be called from within the class
and (depending on accessibility constraints) from outside. These
instance methods in turn have access to the object's instance
variables, and hence to the object's state.
Methods are invoked by sending a message to an object.
The message
contains the method's name, along with any parameters the method may
need.[This idea of expressing method calls in the form of
messages comes from Smalltalk.] When an object receives a message,
it looks into its own class for a corresponding method. If found, that
method is executed. If the method isn't found, ... well,
we'll get to that later.
This business of methods and messages may sound complicated, but in
practice it is very natural. Let's look at some method calls.
(Remember that the arrows in the code examples show the values
returned by the corresponding expressions.)
"gin joint".length
»
9
"Rick".index("c")
»
2
-1942.abs
»
1942
sam.play(aSong)
»
"duh dum, da dum de dum ..."
Here, the thing before the period is called the receiver, and
the name after the period is the method to be invoked.
The first
example asks a string for its length, and the second asks a different
string to find the index of the letter ``c.'' The third line has a
number calculate its absolute value. Finally, we ask Sam to play us
a song.
It's worth noting here a major difference between Ruby and most other
languages. In (say) Java, you'd find the absolute value of some number
by calling a separate function and passing in that number. You might
write
number = Math.abs(number) // Java code
In Ruby, the ability to determine an absolute value is built into
numbers---they take care of the details internally. You simply send
the message abs to a number object and let it do the work.
number = number.abs
The same applies to all Ruby objects: in C you'd write
strlen(name), while in Ruby it's name.length, and so
on. This is part of what we mean when we say that Ruby is a genuine OO
language.
Not many people like to read heaps of boring syntax rules when they're
picking up a new language. So we're going to cheat. In this section
we'll hit some of the highlights, the stuff you'll just have to
know if you're going to write Ruby programs. Later, in Chapter
18, which begins on page 199, we'll go into
all the gory details.
Let's start off with a simple Ruby program. We'll write a method that
returns a string, adding to that string a person's
name. We'll then invoke that method a couple of times.
def sayGoodnight(name)
result = "Goodnight, " + name
return result
end
# Time for bed...
puts sayGoodnight("John-Boy")
puts sayGoodnight("Mary-Ellen")
First, some general observations. Ruby syntax is clean. You don't
need semicolons at the ends of statements as long as you put each
statement on a separate line. Ruby comments start with a
# character and run to the end of the line. Code layout is
pretty much up to you; indentation is not significant.
Methods are defined with the keyword def, followed by the method
name (in this case, ``sayGoodnight'') and the method's
parameters between parentheses. Ruby doesn't use braces to delimit
the bodies of compound statements and definitions. Instead, you simply
finish the body with the keyword end. Our method's body is
pretty simple. The first line concatenates the literal string
``Goodnight,'' to the parameter name and assigns the result to
the local variable result. The next line returns that result to the
caller. Note that we didn't have to declare the variable result;
it sprang into existence when we assigned to it.
Having defined the method, we call it twice. In both cases we pass the
result to the method puts, which simply outputs its argument
followed by a newline.
Goodnight, John-Boy
Goodnight, Mary-Ellen
The line ``puts sayGoodnight("John-Boy")'' contains two method calls,
one to sayGoodnight and the other to puts. Why does one call
have its arguments in parentheses while the other doesn't? In this
case it's purely a matter of taste. The following lines are all
equivalent.
However, life isn't always that simple, and precedence rules can make
it difficult to know which argument goes with which method invocation,
so we recommend using parentheses in all but the simplest cases.
This example also shows some Ruby string objects. There are many ways
to create a string object, but probably the most common is to use
string literals: sequences of characters between single or double
quotation marks. The difference between the two forms is the amount of
processing Ruby does on the string while constructing the literal. In
the single-quoted case, Ruby does very little. With a few exceptions,
what you type into the string literal becomes the string's value.
In the double-quoted case, Ruby does more work. First, it looks for
substitutions---sequences that start with a backslash character---and
replaces them with some binary value. The most common of these is
``\n'', which is replaced with a newline character.
When a
string containing a newline is output, the ``\n'' forces a
line break.
puts "And Goodnight,\nGrandma"
produces:
And Goodnight,
Grandma
The second thing that Ruby does with double-quoted strings
is expression interpolation. Within the string, the sequence
#{expression} is replaced by the value of
expression. We could use this to rewrite our previous method.
def sayGoodnight(name)
result = "Goodnight, #{name}"
return result
end
When Ruby constructs this string object, it looks at the current value
of name and substitutes it into the string. Arbitrarily complex
expressions are allowed in the #{...} construct. As a
shortcut, you don't need to supply the braces when the expression is
simply a global, instance, or class variable. For more
information on strings, as well as on the other Ruby standard types, see
Chapter 5, which begins on page 47.
Finally, we could simplify this method some more. The value returned
by a Ruby method is the value of the last expression evaluated, so we
can get rid of the return statement altogether.
def sayGoodnight(name)
"Goodnight, #{name}"
end
We promised that this section would be brief. We've got just one more
topic to cover: Ruby names. For brevity, we'll be using some terms
(such as class variable) that we aren't going to
define here. However, by talking about the rules now, you'll be ahead
of the game when we actually come to discuss instance variables and
the like later.
Ruby uses a convention to help it distinguish the usage of a name: the
first characters of a name indicate how the name is used.
Local variables, method parameters, and method names should all start
with a lowercase letter or with an underscore. Global variables are
prefixed with a dollar sign ($), while instance variables begin with
an ``at'' sign (@). Class variables start with two ``at'' signs (@@). Finally,
class names, module names, and constants should
start with an uppercase letter. Samples of different names are given
in Table 2.1 on page 10.
Following this initial character, a name can be any combination of
letters, digits, and underscores (with the proviso that the character
following an @ sign may not be a digit).
Ruby's arrays and hashes are indexed collections. Both store
collections of objects, accessible using a key. With arrays, the key
is an integer, whereas hashes support any object as a key. Both
arrays and hashes grow as needed to hold new elements. It's more
efficient to access array elements, but hashes provide more
flexibility. Any particular array or hash can hold objects of
differing types; you can have an array containing an integer, a
string, and a floating point number, as we'll see in a minute.
You can create and initialize a new array using an array literal---a
set of elements between square brackets. Given an array object, you
can access individual elements by supplying an index between
square brackets, as the next example shows.
a = [ 1, 'cat', 3.14 ] # array with three elements
# access the first element
a[0]
»
1
# set the third element
a[2] = nil
# dump out the array
a
»
[1, "cat", nil]
You can create empty arrays either by using an array literal with no
elements or by using the array object's constructor, Array.new.
empty1 = []
empty2 = Array.new
Sometimes creating arrays of words can be a pain, what with all the
quotes and commas. Fortunately, there's a shortcut: %w does just
what we want.
a = %w{ ant bee cat dog elk }
a[0]
»
"ant"
a[3]
»
"dog"
Ruby hashes are similar to arrays. A hash literal uses braces rather than
square brackets. The literal must supply two objects for every
entry: one for the key, the other for the value.
For example, you might want to map musical instruments to their
orchestral sections. You could do this with a hash.
Hashes are indexed using the same square bracket notation as arrays.
instSection['oboe']
»
"woodwind"
instSection['cello']
»
"string"
instSection['bassoon']
»
nil
As the last example shows, a hash by default returns nil when
indexed by a key it doesn't contain. Normally this is convenient, as
nil means false when used in conditional expressions.
Sometimes you'll want to change this default. For example, if you're
using a hash to count the number of times each key occurs, it's
convenient to have the default value be zero. This is easily done by
specifying a default value when you create a new, empty
hash.
histogram = Hash.new(0)
histogram['key1']
»
0
histogram['key1'] = histogram['key1'] + 1
histogram['key1']
»
1
Array and hash objects have lots of useful methods: see the discussion
starting on page 33, and the reference sections starting on
pages 278 and 317, for details.
Ruby has all the usual control structures, such as if statements
and while loops. Java, C, and Perl programmers may well get
caught by the lack of braces around the bodies of these
statements. Instead, Ruby uses the keyword end to signify the end
of a body.
if count > 10
puts "Try again"
elsif tries == 3
puts "You lose"
else
puts "Enter a number"
end
Similarly, while statements are terminated with end.
while weight < 100 and numPallets <= 30
pallet = nextPallet()
weight += pallet.weight
numPallets += 1
end
Ruby statement modifiers are a useful shortcut if the body of an
if or while statement is just a single expression. Simply
write the expression, followed by if or while and the
condition.
For example, here's a simple if statement.
if radiation > 3000
puts "Danger, Will Robinson"
end
Here it is again, rewritten using a statement modifier.
puts "Danger, Will Robinson" if radiation > 3000
Similarly, a while loop such as
while square < 1000
square = square*square
end
becomes the more concise
square = square*square while square < 1000
These statement modifiers should seem familiar to Perl programmers.
Most of Ruby's built-in types will be familiar to all programmers. A
majority of languages have strings, integers, floats, arrays, and so
on. However, until Ruby came along, regular expression support was
generally built into only the so-called scripting languages, such as
Perl, Python, and awk. This is a shame: regular expressions, although
cryptic, are a powerful tool for working with text.
Entire books have been written about regular expressions (for example,
Mastering Regular Expressions ), so we
won't try to cover everything in just a short section. Instead, we'll
look at just a few examples of regular expressions in action. You'll
find full coverage of regular expressions starting
on page 56.
A regular expression is simply a way of specifying a pattern of
characters to be matched in a string. In Ruby, you typically create a
regular expression by writing a pattern between slash characters
(/pattern/). And, Ruby being Ruby, regular expressions are of
course objects and can be manipulated as such.
For example, you could write a pattern that matches a string
containing the text ``Perl'' or the text ``Python'' using the
following regular expression.
/Perl|Python/
The forward slashes delimit the pattern, which consists of the two
things we're matching, separated by a pipe character (``|'').
You can use parentheses within patterns, just as you can in arithmetic
expressions, so you could also have written this pattern as
/P(erl|ython)/
You can also specify repetition within patterns. /ab+c/ matches a
string containing an ``a'' followed by one or more ``b''s, followed by
a ``c''. Change the plus to an asterisk, and /ab*c/ creates a
regular expression that matches an ``a'', zero or more ``b''s, and a
``c''.
You can also match one of a group of characters within a pattern. Some
common examples are character classes such as ``\s'', which
matches a whitespace character (space, tab, newline, and so on),
``\d'', which matches any digit, and ``\w'', which matches
any character that may appear in a typical word. The single character
``.'' (a period) matches any character.
We can put all this together to produce some useful regular
expressions.
/\d\d:\d\d:\d\d/ # a time such as 12:34:56
/Perl.*Python/ # Perl, zero or more other chars, then Python
/Perl\s+Python/ # Perl, one or more spaces, then Python
/Ruby (Perl|Python)/ # Ruby, a space, and either Perl or Python
Once you have created a pattern, it seems a shame not to use it. The
match operator ``=~'' can be used to match a string against a
regular expression. If the pattern is found in the string, =~
returns its starting position, otherwise it returns nil. This means
you can use regular expressions as the condition in if and
while statements. For example, the following code fragment writes
a message if a string contains the text 'Perl' or 'Python'.
if line =~ /Perl|Python/
puts "Scripting language mentioned: #{line}"
end
The part of a string matched by a regular expression can also be
replaced with different text using one of Ruby's substitution methods.
line.sub(/Perl/, 'Ruby') # replace first 'Perl' with 'Ruby'
line.gsub(/Python/, 'Ruby') # replace every 'Python' with 'Ruby'
We'll have a lot more to say about regular expressions as we go
through the book.
This section briefly describes one of Ruby's particular strengths. We're
about to look at code blocks: chunks of code that you can associate
with method invocations, almost as if they were parameters. This is an
incredibly powerful feature. You can use code blocks to implement
callbacks (but they're simpler than Java's anonymous inner classes),
to pass around chunks of code (but they're more flexible than C's
function pointers), and to implement iterators.
Code blocks are just chunks of code between braces or
do...end.
{ puts "Hello" } # this is a block
do #
club.enroll(person) # and so is this
person.socialize #
end #
Once you've created a block, you can associate it with a call to a
method. That method can then invoke the block one or more times
using the Ruby yield statement. The following example shows this
in action. We define a method that calls yield twice. We then
call it, putting a block on the same line, after the call (and after
any arguments to the method).[Some people like to think of
the association of a block with a method as a kind of parameter
passing. This works on one level, but it isn't really the whole
story. You might be better off thinking of the block and the method
as coroutines, which
transfer control back and forth between themselves.]
def callBlock
yield
yield
end
callBlock { puts "In the block" }
produces:
In the block
In the block
See how the code in the block (puts "In the block") is executed
twice, once for each call to yield.
You can provide parameters to the call to
yield: these will be passed to the block. Within the block, you
list the names of the arguments to receive these parameters between
vertical bars (``|'').
def callBlock
yield ,
end
callBlock { |, | ... }
Code blocks are used throughout the Ruby library to implement
iterators: methods that return successive elements from some kind of
collection, such as an array.
a = %w( ant bee cat dog elk ) # create an array
a.each { |animal| puts animal } # iterate over the contents
produces:
ant
bee
cat
dog
elk
Let's look at how we might implement the Array class's each
iterator that we used in the previous example. The each
iterator loops through every element in the array,
calling yield for each one. In pseudo code, this might look like:
# within class Array...
def each
for each element
yield(element)
end
end
You could then iterate over an array's elements by calling its
each method and supplying a block. This block would be called for
each element in turn.
[ 'cat', 'dog', 'horse' ].each do |animal|
print animal, " -- "
end
produces:
cat -- dog -- horse --
Similarly, many looping constructs that are built into languages such
as C and Java are simply method calls in Ruby, with the methods
invoking the associated block zero or more times.
Here we ask the number 5 to call a block five times, then ask
the number 3 to call a block, passing in successive values until
it reaches 6. Finally, the range of characters from ``a'' to ``e''
invokes a block using the method each.
Ruby comes with a comprehensive I/O library. However, in most of the
examples in this book we'll stick to a few simple methods. We've
already come across two methods that do output. puts writes each
of its arguments, adding a newline after each. print also writes
its arguments, but with no newline. Both can be used to write to any
I/O object, but by default they write to the console.
Another output method we use a lot is printf, which prints
its arguments under the control of a format string (just like
printf in C or Perl).
printf "Number: %5.2f, String: %s", 1.23, "hello"
produces:
Number: 1.23, String: hello
In this example, the format string "Number: %5.2f, String: %s"
tells printf to substitute in a floating point number
(allowing five characters in total, with two after the decimal point)
and a string.
There are many ways to read input into your program. Probably the most
traditional is to use the routine gets, which returns the next
line from your program's standard input stream.
line = gets
print line
The gets routine has a side effect: as well as returning the line
just read, it also stores it into the global variable
$_. This variable is special, in that it is used as the
default argument in many circumstances. If you call print with no
argument, it prints the contents of $_. If you write an if
or while statement with just a regular expression as the
condition, that expression is matched against $_. While viewed
by some purists as a rebarbative barbarism, these abbreviations can
help you write some concise programs. For example, the following
program prints all lines in the input stream that contain the word
``Ruby.''
while gets # assigns line to $_
if /Ruby/ # matches against $_
print # prints $_
end
end
The ``Ruby way'' to write this would be to use an iterator.
ARGF.each { |line| print line if line =~ /Ruby/ }
This uses the predefined object ARGF, which represents the
input stream that can be read by a program.
That's it. We've finished our lightning-fast tour of some of the
basic features of Ruby. We've had a brief look at objects, methods,
strings, containers, and regular expressions, seen some simple control
structures, and looked at some rather nifty iterators. Hopefully, this
chapter has given you enough ammunition to be able to attack the rest
of this book.
Time to move on, and up---up to a higher level. Next, we'll be looking
at classes and objects, things that are at the same time both the
highest-level constructs in Ruby and the essential underpinnings of
the entire language.