Elementary Data Structures ~ Arrays

Perhaps the most fundamental data structure is the array, which is defined as a primitive in C and in most other programming languages. We have already seen the use of an array as the basis for the development of an efficient algorithm, in the previous articles; we shall see many more examples in this section.

An array is a fixed collection of same-type data that are stored contiguously and that are accessible by an index. We refer to theth element of an array a  as a]. It is the responsibility of the programmer to store something meaningful in an array position a] before referring to a[ i ]. In C, it is also the responsibility of the programmer to use indices that are nonnegative and smaller than the array size. Neglecting these responsibilities are two of the more common programming mistakes.

Arrays are fundamental data structures in that they have a direct correspondence with memory systems on virtually all computers. To retrieve the contents of a word from memory in machine language, we provide an address. Thus, we could think of the entire computer memory as an array, with the memory addresses corresponding to array indices. Most computer-language processors translate programs that involve arrays into efficient machine-language programs that access memory directly, and we are safe in assuming that an array access such as a[ i ] translates to just a few machine instructions.


Figure 1    Sieve of Eratosthenes

To compute the prime numbers less than 32, we initialize the array to 1 (second column), which indicates that no numbers are known to be nonprime (a[ 0 ] and a[ 1 ] are not used and are not shown). Then, we set array entries whose indices are multiples of 2, 3, and 5 to 0, since we know these multiples to be nonprime. indices corresponding to array entries that remain 1 are prime (rightmost column).

Program 1    Sieve of Eratosthenes

The goal of this program is to set a[ i ] to 1 if i is prime, and to 0 if i is not prime. First, it set to 1 all array elements, to indicate that no numbers are known to be nonprime. Then it sets to 0 array elements corresponding to indices that are known to be nonprime (multiples of known primes). If a[i]  is still 1 after all multiples of smaller primes have been set to 0, then we know it to be prime.

Because the program uses an array consisting of the simplest type of elements, 0-1 values, it would be more space efficient if we explicitly used an array of bits, rather than one of integers. Also, some programming environments might require the array to be global if N is huge, or we can dynamically allocate it (see Program 2).


A simple examples of the use of an array is given by Program 1, which prints out all prime numbers less than 10000. The method used, which dates back to the third century B.C., is called the sieve of Eratosthenes (see Figure 1). It is typical of algorithms that exploit the fact that we can access efficiently any item of an array, given that item’s index. The implementation has four loops, three of which access the items of the array sequentially, from beginning to end; the fourth skips through the array, i items at a time. In some cases, sequential processing is essential; in other cases, sequential ordering is used because it is as good as any other. For example, we could change the first loop in Program 1 to without any effect on the computation.

We could also reverse the order of the inner loop in a similar manner, or we could change the final loop to print out the primes in decreasing order, but we could not change the order of the outer loop in the main computation, because it depends on all the integers less than i being processed before a[ i ] is tested for being prime.

We  will not analyze the running time of Program 1 in detail because that would take us astray into number theory, but it is clear that the running time is proportional to

One of the distinctive features of C is that an array name generates a pointer to the first element of the array. Moreover, simple pointer arithmetic is allowed: if p is a pointer to an object of a certain type, then we can write code that assumes that objects of that type are arranged sequentially, and can use *p to refer to the first object, *(p+1) to refer to the second object, *(p+2) to refer to the third object, and so forth. In other words,

*(a+i)    and    a[ i ]    are equivalent in C.

This equivalence provides an alternate mechanism for accessing objects in arrays that is sometimes more convenient than indexing. This mechanism is most often used for arrays of characters (strings);

Like structures, pointers to arrays are significant because they allow us to manipulate the arrays efficiently as higher-level objects. In particular, we can pass a pointer to an array as an argument to a function, thus enabling that function to access objects in the array without having to make a copy of the whole array. This capability is indispensable when we write programs to manipulate huge arrays.


Program 2    Dynamic memory allocation for an array

To change the value of the maximum prime computed in Program 1, we need to recompile the program. Instead, we can take the maximum desired number from the command line, and use it to allocate space for the array at execution time, using the library function malloc from stdlib.c. For example, if we compile this program and use 1000000 as a command-line argument, then we get all the primes less than 1 million (as long as our computer is big and fast enough to make the computation feasible); we can also debug with 100 (without using much time or space). We will use this idiom frequently, though, for brevity, we will omit the insufficient-memory test.


The implementation in Program 1 assumes that the size of the array must be known beforehand: to run the program for a different value of N, we must change the constant N and recompile the program before executing it. Program 2 shows an alternate approach, where a user of the program can type in the value of N, and it will respond with the primes less than N. It uses two basic C mechanisms, both of which involve passing arrays as arguments to functions. The first is the mechanism by which command-line arguments are passed to the main programs, in an array argv of size argc. The array argv is a compound array made up of objects that are arrays (strings) themselves, so we shall defer discussing it in further detail later section, and shall take on faith for the moment that the variable N gets the number that the user types when executing the program.

The second basic mechanism that we use in Program 2 is malloc, a function that allocates the amount of memory that we need for our array at execution time, and returns, for our exclusive use, a pointer to the array. In some programming languages, it is difficult or impossible to allocate arrays dynamically; in some other programming languages, memory allocation is an automatic mechanism. Dynamic allocation is an essential tool  in programs that manipulate multiple arrays, some of which might have to be huge. In this case, without memory allocation, we would have to predeclare an array as large as any value that the user is allowed to type. In a large program where we might use many arrays, it is not feasible to do so for each array. We will generally use code like Program 2 because of the flexibility that it provides, although in specific applications when the array size is known, simpler versions like Program 1 are perfectly suitable. If the array size is fixed and huge, the array may need to be global in some systems. We discuss several of the mechanisms behind memory allocation and look at a way to use malloc to support an abstract dynamic growth facility for arrays later. As we shall see, however, such mechanisms have associated costs, so we generally regard arrays as having the characteristic property that, once allocated, their sizes are fixed, and cannot be changed.

Not only do arrays closely reflect the low-level mechanisms for accessing data in memory on most computers, but also they find widespread use because they correspond directly to natural methods of organizing data for applications. For example, arrays also correspond directly to vectors, the mathematical term for indexed lists of objects.


Program 3    Coin-flipping simulation

If we flip a coin N times , we expect to get N/2 heads, but could get anywhere from 0 to N heads. This program runs the experiment M times, taking both N and M from the command line. It uses an array f to keep track of the frequency of occurrence of the i heads, then prints out a histogram of the result of the experiment, with one asterisk for each 10 occurrences.

The operation on which this program is based—indexing an array with a computed value—is critical to the efficiency of many computational procedures.Figure 2    Coin-flipping simulation

This table shows the result of running Program 2 with N = 32 and M = 1000, simulating 1000 experiments of flipping a coin 32 times. The number of heads that we should see is approximated by the normal distribution function, which is drawn over the data.


Program 3 is an example of a simulation program that uses an array. It simulates a sequence of Bernoulli trials, a familiar abstract concept from probability theory. If we flip a coin N times, the probability that we see k heads is 

The approximation is known as the normal approximation: the familiar bell-shaped curve. Figure 2 illustrate the output of Program 3 for 1000 trails of the experiment of flipping a coin 32 times. Many more details on the Bernoulli distribution and the normal approximation can be found in any text on probability, and we shall encounter these distributions again later. In the present context, our interest in the computation is that we use the numbers as indices into an array to count their frequency of occurrence. The ability of arrays to support this kind of operation is one of their prime virtues.

Program 1 and Program 3 both compute array indices from the data at hand. In a sense, when we use a computed value to access an array of size N, we are taking N possibilities into account with just a single operation. This gain in efficiency is compelling when we can realize it, and we shall be encountering algorithms throughout the book that make use of arrays in this way.


Program 4    Closest-point computation

This program illustrates the use of an array of structures, and is representative of the typical situation where we save items in an array to process them later, during some computation. It counts the number of pairs of N randomly generated points in the unit square that can be connected by a straight line of length less than d, using the data type for points described in previous section. The running time is O( N² ), so this program cannot be used for huge N. 


We use arrays to organize all different manner of types of objects, not just integers. In C, we can declare arrays of any built-in or user-defined type (i.e., computed objects declared as structures). Program 4 illustrates the use of an array of structures for points in the plane using the structure definition that we considered in previous section. This program also illustrates a common use of arrays: to save data away so that they can be quickly accessed in an organized manner in some computation. Incidentally, Program 4 is also interesting as a prototypical quadratic algorithm, which checks all pairs of a set of N data items, and therefore takes time proportional to N². In this series, we look for improvements whenever we see such an algorithm, because its use becomes infeasible as N grows. In this case, we shall see how to use a compound data structure to perform this computation in linear time, in later section.

We can create compound types of arbitrary complexity in a similar manner: We can have not just arrays of structs, but also arrays of arrays, or structs containing arrays. We will consider these different options in detail in later section. Before doing so, however, we will examine linked lists, which serve as the primary alternative to arrays for organizing collections of objects.

发表评论

电子邮件地址不会被公开。 必填项已用*标注