Learn Ruby and more: Datatypes and Objects

Share it Please

In order to understand a programming language, you have to know what kinds of data it can manipulate and what it can do with that data. This chapter is about the values manipulated by Ruby programs. It begins with comprehensive coverage of numeric and textual values. The chapter then moves on to explain ranges, symbols, and the special values true, false, and nil.

1. Numbers

Ruby includes five built-in classes for representing numbers, and the standard library includes three more numeric classes that are sometimes useful. Figure shows the class hierarchy.

All number objects in Ruby are instances of Numeric. All integers are instances of Integer. The Complex, BigDecimal, and Rational classes are not built-in to Ruby but are distributed with Ruby as part of the standard library. The Complex class represents complex numbers, of course. BigDecimal represents real numbers with arbitrary precision, using a decimal representation rather than a binary representation. And Rational represents rational numbers: one integer divided by another.

All numeric objects are immutable; there are no methods that allow you to change the value held by the object. If you pass a reference to a numeric object to a method, you need not worry that the method will modify the object.

1.1. Integer Literals

An integer literal is simply a sequence of digits:

0
123
12345678901234567890

If the integer values fit within the range of the Fixnum class, the value is a Fixnum. Otherwise, it is a Bignum, which supports integers of any size. Underscores may be inserted into integer literals (though not at the beginning or end), and this feature is sometimes used as a thousands separator:

1_000_000_000 # One billion (or 1,000 million in the UK)

1.2 Floating-Point Literals

A floating-point literal is an optional sign followed by one or more decimal digits, a decimal point (the . character), one or more additional digits, and an optional exponent. An exponent begins with the letter e or E, and is followed by an optional sign and one or more decimal digits. Here are some examples of floating-point literals:

0.0
-3.14
6.02e23 # This means 6.02 × 1023
1_000_000.01 # One million and a little bit more

Ruby borrows the ** operator from Fortran for exponentiation. Exponents need not be integers:

x**4    # This is the same thing as x*x*x*x
x**-1              # The same thing as 1/x
x**(1/3.0)       # The cube root of x
x**(1/4)    # Oops! Integer division means this is x**0, which is always 1
x**(1.0/4.0)    # This is the fourth-root of x

When multiple exponentiations are combined into a single expression, they are evaluated from right to left. Thus, 4**3**2 is the same as 4**9, not 64**2.

2. Text

Text is represented in Ruby by objects of the String class. Strings are mutable objects, and the String class defines a powerful set of operators and methods for extracting substrings, inserting and deleting text, searching, replacing, and so on. Ruby provides a number of ways to express string literals in your programs, and some of them support a powerful string interpolation syntax by which the values of arbitrary Ruby expressions can be substituted into string literals.

2.1 String Literals

Ruby provides quite a few ways to embed strings literally into your programs.

2.1.1 Single-quoted string literals

The simplest string literals are enclosed in single quotes (the apostrophe character). The text within the quote marks is the value of the string:
'This is a simple Ruby string literal'

Single-quoted strings may extend over multiple lines, and the resulting string literal includes the newline characters. It is not possible to escape the newlines with a backslash:
'This is a long string literal \
that includes a backslash and a newline'

2.1.2 Double-quoted string literals

String literals delimited by double quotation marks are much more flexible than singlequoted literals. Double-quoted literals support quite a few backslash escape sequences, such as \n for newline, \t for tab, and \" for a quotation mark that does not terminate the string:

"\t\"This quote begins with a tab and ends with a newline\"\n"
"\\" # A single backslash
2.1.3 Arbitrary delimiters for string literals

When working with text that contains apostrophes and quotation marks, it is awkward to use it as single- and double-quoted string literals. Ruby supports a generalized quoting syntax for string literals. The sequence %q begins a string literal that follows single-quoted string rules, and the sequence %Q (or just %) introduces a literal that follows double-quoted string rules. The first character following q or Q is the delimiter character, and the string literal continues until a matching (unescaped) delimiter is found. If the opening delimiter is (, [, {, or <, then the matching delimiter is ), ], }, or >. Otherwise, the closing delimiter is the same as the opening delimiter. Here are some examples:

%q(Don't worry about escaping ' characters!)
%Q|"How are you?", he said|
%-This string literal ends with a newline\n- # Q omitted in this one

2.1.4 String literals and mutability

Strings are mutable in Ruby. Therefore, the Ruby interpreter cannot use the same object to represent two identical string literals. (If you are a Java programmer, you may find this surprising.) Each time Ruby encounters a string literal, it creates a new object. If you include a literal within the body of a loop, Ruby will create a new object for each iteration. You can demonstrate this for yourself as follows:

10.times { puts "test".object_id }

For efficiency, you should avoid using literals within loops.

2.2. Character Literals

Single characters can be included literally in a Ruby program by preceding the character with a question mark. No quotation marks of any kind are used:

?A # Character literal for the ASCII character A
?" # Character literal for the double-quote character
?? # Character literal for the question mark character
Although Ruby has a character literal syntax, it does not have a special class to represent single characters. Also, the interpretation of character literals has changed between Ruby 1.8 and Ruby 1.9. In Ruby 1.8, character literals evaluate to the integer encoding of the specified character. ?A, for example, is the same as 65 because the ASCII encoding for the capital letter A is the integer 65. In Ruby 1.8, the character literal syntax only works with ASCII and single-byte characters.

2.3. String Operators

The String class defines several useful operators for manipulating strings of text. The + operator concatenates two strings and returns the result as a new String object:

planet = "Earth"
"Hello" + " " + planet # Produces "Hello Earth"

The << operator appends its second operand to its first, and should be familiar to C++ programmers. This operator is very different from +; it alters the lefthand operand rather than creating and returning a new object:

greeting = "Hello"
greeting << " " << "World"
puts greeting # Outputs "Hello World"
The * operator expects an integer as its righthand operand. It returns a String that repeats the text specified on the lefthand side the number of times specified by the righthand side:

ellipsis = 'a'*3 # Evaluates to 'aaa'

2.4 Accessing Characters and Substrings

Perhaps the most important operator supported by String is the square-bracket arrayindex operator [], which is used for extracting or altering portions of a string. This operator is quite flexible and can be used with a number of different operand types. It can also be used on the lefthand side of an assignment, as a way of altering string content.

In Ruby 1.8, a string is like an array of bytes or 8-bit character codes. The length of this array is given by the length or size method, and you get or set elements of the array simply by specifying the character number within square brackets:

s = 'hello';    # Ruby 1.8
s[0]                             # 104: the ASCII character code for the first character 'h'
s[s.length-1]                # 111: the character code of the last character 'o'
s[-1]                           # 111: another way of accessing the last character
s[-2]                           # 108: the second-to-last character
s[-s.length]    # 104: another way of accessing the first character
s[s.length]                   # nil: there is no character at that index
s[0,2] # "he"
s[-1,1]                       # "o": returns a string, not the character code ?o
s[0,0]                         # "": a zero-length substring is always empty
s[0,10]                      # "hello": returns all the characters that are available
s[s.length,1]               # "": there is an empty string immediately beyond the end
s[s.length+1,1]          # nil: it is an error to read past that
s[0,-1]                      # nil: negative lengths don't make any sense
s[0,1] = "H"                    # Replace first letter with a capital letter
s[s.length,0] = " world"   # Append by assigning beyond the end of the string
s[5,0] = ","                     # Insert a comma, without deleting anything
s[5,6] = ""                     # Delete with no insertion; s == "Hellod"
s[2..3]                           # "ll": characters 2 and 3
s[-3..-1]                       # "llo": negative indexes work, too
s[0..0]                          # "h": this Range includes one character index

Learn Ruby and more

Saturday, September 6, 2014

Datatypes and Objects