50% Off The Criterion Collection Shop Now

Life Out of Sequence: A Data-Driven History of Bioinformatics

Thirty years ago, the most likely place to find a biologist was standing at a laboratory bench, peering down a microscope, surrounded by flasks of chemicals and petri dishes full of bacteria. Today, you are just as likely to find him or her in a room that looks more like an office, poring over lines of code on computer screens. The use of computers in biology has radically transformed who biologists are, what they do, and how they understand life. In Life Out of Sequence, Hallam Stevens looks inside this new landscape of digital scientific work. Stevens chronicles the emergence of bioinformatics—the mode of working across and between biology, computing, mathematics, and statistics—from the 1960s to the present, seeking to understand how knowledge about life is made in and through virtual spaces. He shows how scientific data moves from living organisms into DNA sequencing machines, through software, and into databases, images, and scientific publications. What he reveals is a biology very different from the one of predigital days: a biology that includes not only biologists but also highly interdisciplinary teams of managers and workers; a biology that is more centered on DNA sequencing, but one that understands sequence in terms of dynamic cascades and highly interconnected networks. Life Out of Sequence thus offers the computational biology community welcome context for their own work while also giving the public a frontline perspective of what is going on in this rapidly changing field.

"1114940305"

Life Out of Sequence: A Data-Driven History of Bioinformatics

25.49 In Stock

Life Out of Sequence: A Data-Driven History of Bioinformatics

Add to Wishlist

Life Out of Sequence: A Data-Driven History of Bioinformatics

eBook

$25.49 ~~$33.99~~ Save 25%

View All Available Formats & Editions

eBook
$25.49 ~~$33.99~~ Save 25% Current price is $25.49, Original price is $33.99. You Save 25%.

View All Available Formats & Editions

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.

WANT A NOOK? Explore Now

Buy As Gift

Related collections and offers

LEND ME^® See Details

Overview

Product Details

ISBN-13:	9780226080345
Publisher:	University of Chicago Press
Publication date:	11/04/2013
Sold by:	Barnes & Noble
Format:	eBook
Pages:	272
File size:	6 MB

About the Author

Hallam Stevens is assistant professor at Nanyang Technological University in Singapore, where he teaches classes on the history of the life sciences and the history of information technology.

Read an Excerpt

Life Out of Sequence

A Data-Driven History of Bioinformatics

By Hallam Stevens

THE UNIVERSITY OF CHICAGO PRESS

CHAPTER 1

Building Computers

Before we can understand the effects of computers on biology, we need to understand what sorts of things computers are. Electronic computers were being used in biology even in the 1950s, but before 1980 they remained on the margins of biology—only a handful of biologists considered them important to their work. Now most biologists would find their work impossible without using a computer in some way. It seems obvious—to biologists as well as laypeople—that computers, databases, algorithms, and networks are appropriate tools for biological work. How and why did this change take place?

Perhaps it was computers that changed. As computers got better, a standard argument goes, they were able to handle more and more data and increasingly complex calculations, and they gradually became suitable for biological problems. This chapter argues that it was, in fact, the other way around: it was biology that changed to become a computerized and computerizable discipline. At the center of this change were data, especially sequence data. Computers are data processors: data storage, data management, and data analysis machines. During the 1980s, biologists began to produce large amounts of sequence data. These data needed to be collected, stored, maintained, and analyzed. Computers—data processing machines—provided a ready-made tool.

Our everyday familiarity with computers suggests that they are universal machines: we can use them to do the supermarket shopping, run a business, or watch a movie. But understanding the effects of computers—on biology at least—requires us to see these machines in a different light. The early history of computers suggests that they were not universal machines, but designed and adapted for particular kinds of data-driven problems. When computers came to be deployed in biology on a large scale, it was because these same kinds of problems became important in biology. Modes of thinking and working embedded in computational hardware were carried over from one discipline to another.

The use of computers in biology—at least since the 1980s—has entailed a shift toward problems involving statistics, probability, simulation, and stochastic methods. Using computers has meant focusing on the kinds of problems that computers are designed to solve. DNA, RNA, and protein sequences proved particularly amenable to these kinds of computations. The long strings of letters could be easily rendered as data and managed and manipulated as such. Sequences could be treated as patterns or codes that could be subjected to statistical and probabilistic analyses. They became objects ideally suited to the sorts of tools that computers offered. Bioinformatics is not just using computers to solve the same old biological problems; it marks a new way of thinking about and doing biology in which large volumes of data play the central role. Data-driven biology emerged because of the computer's history as a data instrument.

The first part of this chapter provides a history of early electronic computers and their applications to biological problems before the 1980s. It pays special attention to the purposes for which computers were built and the uses to which they were put: solving differential equations, stochastic problems, and data management. These problems influenced the design of the machines. Joseph November argues that between roughly 1955 and 1965, biology went from being an "exemplar of systems that computers could not describe to exemplars of systems that computers could indeed describe." The introduction of computers into the life sciences borrowed heavily from operations research. It involved mathematizing aspects of biology in order to frame problems in modeling and data management terms—the terms that computers worked in. Despite these adaptations, at the end of the 1970s, the computer still lay largely outside mainstream biological research. For the most part, it was an instrument ill-adapted to the practices and norms of the biological laboratory.

The invention of DNA sequencing in the late 1970s did much to change both the direction of biological research and the relationship of biology with computing. Since the early 1980s, the amount of sequence data has continued to grow at an exponential rate. The computer was a perfect tool with which to cope with the overwhelming flow of data. The second and third parts of this chapter consist of two case studies: the first of Walter Goad, a physicist who turned his computational skills toward biology in the 1960s; and the second of James Ostell, a computationally minded PhD student in biology at Harvard University in the 1980s. These examples show how the practices of computer use were imported from physics into biology and struggled to establish themselves there. These practices became established as a distinct subdiscipline of biology—bioinformatics—during the 1990s.

What Is a Computer?

The computer was an object designed and constructed to solve particular sorts of problems, first for the military and, soon afterward, for Big Physics. Computers were (and are) good at solving certain types of problems: numerical simulations, differential equations, stochastic and statistical problems, and problems involving the management of large amounts of data.

The modern electronic computer was born in World War II. Almost all the early attempts to build mechanical calculating devices were associated with weapons or the war effort. Paul Edwards argues that "for two decades, from the early 1940s until the early 1960s, the armed forces of the United States were the single most important driver of digital computer development." Alan Turing's eponymous machine was conceived to solve a problem in pure mathematics, but its first physical realization at Bletchley Park was as a device to break German ciphers. Howard Aiken's Mark I, built by IBM between 1937 and 1943, was used by the US Navy's Bureau of Ships to compute mathematical tables. The computers designed at the Moore School of Electrical Engineering at the University of Pennsylvania in the late 1930s were purpose-built for ballistics computations at the Aberdeen Proving Ground in Maryland. A large part of the design and the institutional impetus for the Electronic Numerical Integrator and Computer (ENIAC), also developed at the Moore School, came from John von Neumann. As part of the Manhattan Project, von Neumann was interested in using computers to solve problems in the mathematics of implosion. Although the ENIAC did not become functional until after the end of the war, its design—the kinds of problems it was supposed to solve—reflected wartime priorities.

With the emergence of the Cold War, military support for computers would continue to be of paramount importance. The first problem programmed onto the ENIAC (in November 1945) was a mathematical model of the hydrogen bomb. As the conflict deepened, the military found uses for computers in aiming and operating weapons, weapons engineering, radar control, and the coordination of military operations. Computers like MIT's Whirlwind (1951) and SAGE (Semi-Automatic Ground Environment, 1959) were the first to be applied to what became known as C³I: command, control, communications, and intelligence.

What implications did the military involvement have for computer design? Most early computers were designed to solve problems involving large sets of numbers. Firing tables are the most obvious example. Other problems, like implosion, also involved the numerical solution of differential equations. A large set of numbers—representing an approximate solution—would be entered into the computer; a series of computations on these numbers would yield a new, better approximation. A solution could be approached iteratively. Problems such as radar control also involved (real-time) updating of large amounts of data fed in from remote military installations. Storing and iteratively updating large tables of data was the exemplary computational problem.

Another field that quickly took up the use of digital electronic computers was physics, particularly the disciplines of nuclear and particle physics. The military problems described above belonged strictly to the domain of physics. Differential equations and systems of linear algebraic equations can describe a wide range of physical phenomena such as fluid flow, diffusion, heat transfer, electromagnetic waves, and radio active decay. In some cases, techniques of military computing were applied directly to physics problems. For instance, missile telemetry involved problems of real-time, multichannel communication that were also useful for controlling bubble chambers. A few years later, other physicists realized that computers could be used to great effect in "logic" machines: spark chambers and wire chambers that used electrical detectors rather than photographs to capture subatomic events. Bubble chambers and spark chambers were complicated machines that required careful coordination and monitoring so that the best conditions for recording events could be maintained by the experimenters. By building computers into the detectors, physicists were able to retain real-time control over their experimental machines.

But computers could be used for data reduction as well as control. From the early 1950s, computers were used to sort and analyze bubble chamber film and render the data into a useful form. One of the main problems for many particle physics experiments was the sorting of the signal from the noise: for many kinds of subatomic events, a certain "background" could be anticipated. Figuring out just how many background events should be expected inside the volume of a spark chamber was often a difficult problem that could not be solved analytically. Again following the lead of the military, physicists turned to simulations using computers. Starting with random numbers, physicists used stochastic methods that mimicked physical processes to arrive at "predictions" of the expected background. These "Monte Carlo" processes evolved from early computer simulations of atomic bombs on the ENIAC to sophisticated background calculations for bubble chambers. The computer itself became a particular kind of object: that is, a simulation machine.

The other significant use of computers that evolved between 1945 and 1955 was in the management of data. In many ways, this was a straightforward extension of the ENIAC's ability to work with large sets of numbers. The Moore School engineers J. Presper Eckert and John Mauchly quickly saw how their design for the Electronic Discrete Variable Advanced Calculator (EDVAC) could be adapted into a machine that could rapidly sort data—precisely the need of commercial work. This insight inspired the inventors to incorporate the Eckert-Mauchly Computer Corporation in December 1948 with the aim of selling electronic computers to businesses. The first computer they produced—the UNIVAC (Universal Automatic Computer)—was sold to the US Census Bureau in March 1951. By 1954, they had sold almost twenty machines to military (the US Air Force, US Army Map Service, Atomic Energy Commission) and nonmilitary customers (General Electric, US Steel, DuPont, Metropolitan Life, Consolidated Edison). Customers used these machines for inventory and logistics. The most important feature of the computer was its ability to "scan through a reel of tape, find the correct record or set of records, perform some process in it, and return the results again to tape." It was an "automatic" information processing system. The UNIVAC was successful because it was able to store, operate on, and manipulate large tables of numbers—the only difference was that these numbers now represented inventory or revenue figures rather than purely mathematical expressions.

Between the end of World War II and the early 1960s, computers were also extensively used by the military in operations research (OR). OR and the related field of systems analysis were devoted to the systematic analysis of logistical problems in order to find optimally efficient solutions. OR involved problems of game theory, probability, and statistics. These logical and numerical problems were understood as exactly the sorts of problems computers were good at solving. The use of computers in OR and systems analysis not only continued to couple them to the military, but also continued their association with particular sorts of problems: namely, problems with large numbers of well-defined variables that would yield to numerical and logical calculations.

What were the consequences of all this for the application of computers to biology? Despite their touted "universality," digital computers were not equally good at solving all problems. The ways in which early computers were used established standards and practices that influenced later uses. The design of early computers placed certain constraints on where and how they would and could be applied to biological problems. The use of computers in biology was successful only where biological problems could be reduced to problems of data analysis and management. Bringing computers to the life sciences meant following specific patterns of use that were modeled on approaches in OR and physics and which reproduced modes of practice and patronage from those fields.

In the late 1950s, there were two alternative notions of how computers might be applied to the life sciences. The first was that biology and biologists had to mathematize, becoming more like the physical sciences. The second was that computers could be used for accounting purposes, creating "a biology oriented toward the collation of statistical analysis of large volumes of quantitative data." Both notions involved making biological problems amenable to computers' data processing power. Robert Ledley—one of the strongest advocates of the application of computers in biology and medicine—envisioned the transformation of biologists' research and practices along the lines of Big Science.

In 1965, Ledley published Use of Computers in Biology and Medicine. The foreword (by Lee Lusted of the National Institutes of Health) acknowledged that computer use required large-scale funding and cooperation similar to that seen in physics. Ledley echoed these views in his preface:

Because of an increased emphasis on quantitative detail, elaborate experimentation and extensive quantitative data analysis are now required. Concomitant with this change, the view of the biologist as an individual scientist, personally carrying through each step of his data-reduction processes—that view is rapidly being broadened, to include the biologist as part of an intricate organizational chart that partitions scientific technical and administrative responsibilities.

Physics served as the paradigm of such organization. But the physical sciences also provided the model for the kinds of problems that computers were supposed to solve: those involving "large masses of data and many complicated interrelating factors." Many of the biomedical applications of computers that Ledley's volume explored treated biological systems according to their physical and chemical bases. The examples Ledley describes in his introduction include the numerical solution of differential equations describing biological systems (including protein structures, nerve fiber conduction, muscle fiber excitability, diffusion through semipermeable membranes, metabolic reactions, blood flow), simulations (Monte Carlo simulation of chemical reactions, enzyme systems, cell division, genetics, self-organizing neural nets), statistical analyses (medical records, experimental data, evaluation of new drugs, data from electrocardiograms and electroencephalograms, photomicrographic analysis); real-time experimental and clinical control (automatic respirators, analysis of electrophoresis, diffusion, and ultracentrifuge patterns, and counting of bacterial cultures) and medical diagnosis (including medical records and distribution and communication of medical knowledge). Almost all the applications were either borrowed directly from the physical sciences or depended on problems involving statistics or large volumes of information.

For the most part, the mathematization and rationalization of biology that Ledley and others believed was necessary for the "computerization" of the life sciences did not eventuate. By the late 1960s, however, the invention of minicomputers and the general reduction in the costs of computers allowed more biologists to experiment with their use. At Stanford University, a small group of computer scientists and biologists led by Edward Feigenbaum and Joshua Lederberg began to take advantage of these changes. After applying computers to the problem of determining the structure of organic molecules, this group began to extend their work into molecular biology.
(Continues...)

Excerpted from Life Out of Sequence by Hallam Stevens. Copyright © 2013 The University of Chicago. Excerpted by permission of THE UNIVERSITY OF CHICAGO PRESS.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Introduction Chapter 1: Building Computers Chapter 2: Making Knowledge Chapter 3: Organizing Space Chapter 4: Following Data Chapter 5: Ordering Objects Chapter 6: Seeing Genomes Conclusion: The End of Bioinformatics Acknowledgments Archival Sources Notes Bibliography Index

From the B&N Reads Blog

Page 1 of

Editorial Reviews

A rich and fascinating ethnographic and historical account of the transformations wrought by integrating statistical and computational methods and materials into the biological sciences. . . . The histories of biology, computing, database technology, and bioinformatic imaging all play a role in this wonderfully transdisciplinary story.

New Books in Science, Technology, and Society - Carla Nappi

What happens to biology with computerization? Hallam Stevens’s compelling ethnographic and historical narrative shows how the nature of the biological experiment has changed with the increasing use of the tools of information technology in life science and biomedicine. Life Out of Sequence traces rearrangements in the relationship between the virtual and the material as scientists work increasingly on databases rather than cells or bodies. As the book takes on the mirrored questions of the work of life and the life of work in front of the computer in the lab, the reader is brought into the world of bioinformatics, and comes to understand that this is not just a subfield of scientific activity, but a space in which the nature of knowledge production in life science is undergoing fundamental and rapid change.

Hannah Landecker

What is it like to do biology when the indispensable scientific instrument has become the computer, when biological objects are transformed into computer-compatible data, and when the manipulation of data replaces the manipulation of organisms and their parts? Life Out of Sequence is a vivid account of how the flow of massive amounts of data has fundamentally changed both the questions biologists ask and the answers they recognize. It is essential reading for anyone wanting to understand the world that biologists have made for themselves and that they are making for the rest of us.

Steven Shapin

[A] sharp and lucid work of history and anthropology of science. . . . [Stevens’s] clear and refined prose should extend the book’s readership beyond its disciplinary audiences in the social studies of science, to welcome scientists into this reading of their field’s past and present. . . . Stevens provides a highly readable telling of how bioinformatics took shape, how it works within technological and conceptual limits that change over time, and how individual, and mostly unsung, scientists made it happen. . . . An effective and enjoyable remolding of oversimplified ‘data-to-truth’ histories of science, Life Out of Sequence draws out the reciprocal impressions made by data systems and living systems on each other—and on the sense scientists make of life.

Science - Michael Fortun

Stevens presents engaging ethnographic fieldwork throughout the book. . . . An interesting read for life and computational scientists seeking a deeper understanding of the interdisciplinary connections of their domains.

Choice - D. Papamichail

"Readers benefit from the book's extensive source material, as well as from dozens of interviews with scientists who have widely divergent views on bioinformatics."

BioScience

Life Out of Sequence: A Data-Driven History of Bioinformatics

Life Out of Sequence: A Data-Driven History of Bioinformatics

eBook

eBook

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

Life Out of Sequence

A Data-Driven History of Bioinformatics

THE UNIVERSITY OF CHICAGO PRESS

CHAPTER 1

Table of Contents

Customer Reviews

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

Life Out of Sequence

A Data-Driven History of Bioinformatics

THE UNIVERSITY OF CHICAGO PRESS

CHAPTER 1

Table of Contents

Related Subjects

Customer Reviews