Introduction To Hadoop for Bioinformatics
Genomic data production now has the velocity, volume, and variety to be considered ‘Big Data’. As a result, the processing of genomic data can be improved by applying lessons and techniques from other industries that have worked successfully with Big Data. One of these techniques, now in common use within multiple industries--Energy, Financial Services, Online Commerce, and Telecommunications, for example--is the use of a Hadoop-based analysis infrastructure.

Hadoop is an open source software framework whose roots lie in the original Google search engine. The framework is a collection of software modules that manage the storage and processing of large datasets on clusters of commodity hardware. Because Hadoop segments and distributes data across a theoretically infinite number of compute nodes, the framework has the potential for virtually unlimited scalability and a speed increase of multiple orders of magnitude compared to legacy approaches.

These developments make it hypothetically possible to accelerate the analytical processing of NGS data. The execution speed of NGS data analysis directly impacts the speed at which research findings become available. Currently, analysis of a large dataset via a complex workflow can take days--in some cases weeks. This represents a significant impediment to research productivity. The goal of this project was to reduce these protracted analysis times by optimizing bioinformatics algorithms for Hadoop and executing them on Hadoop-compatible software and a Hadoop-optimized cluster.
1121337031
Introduction To Hadoop for Bioinformatics
Genomic data production now has the velocity, volume, and variety to be considered ‘Big Data’. As a result, the processing of genomic data can be improved by applying lessons and techniques from other industries that have worked successfully with Big Data. One of these techniques, now in common use within multiple industries--Energy, Financial Services, Online Commerce, and Telecommunications, for example--is the use of a Hadoop-based analysis infrastructure.

Hadoop is an open source software framework whose roots lie in the original Google search engine. The framework is a collection of software modules that manage the storage and processing of large datasets on clusters of commodity hardware. Because Hadoop segments and distributes data across a theoretically infinite number of compute nodes, the framework has the potential for virtually unlimited scalability and a speed increase of multiple orders of magnitude compared to legacy approaches.

These developments make it hypothetically possible to accelerate the analytical processing of NGS data. The execution speed of NGS data analysis directly impacts the speed at which research findings become available. Currently, analysis of a large dataset via a complex workflow can take days--in some cases weeks. This represents a significant impediment to research productivity. The goal of this project was to reduce these protracted analysis times by optimizing bioinformatics algorithms for Hadoop and executing them on Hadoop-compatible software and a Hadoop-optimized cluster.
10.0 In Stock
Introduction To Hadoop for Bioinformatics

Introduction To Hadoop for Bioinformatics

by Martin Gollery
Introduction To Hadoop for Bioinformatics

Introduction To Hadoop for Bioinformatics

by Martin Gollery

eBook

$10.00 

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers

LEND ME® See Details

Overview

Genomic data production now has the velocity, volume, and variety to be considered ‘Big Data’. As a result, the processing of genomic data can be improved by applying lessons and techniques from other industries that have worked successfully with Big Data. One of these techniques, now in common use within multiple industries--Energy, Financial Services, Online Commerce, and Telecommunications, for example--is the use of a Hadoop-based analysis infrastructure.

Hadoop is an open source software framework whose roots lie in the original Google search engine. The framework is a collection of software modules that manage the storage and processing of large datasets on clusters of commodity hardware. Because Hadoop segments and distributes data across a theoretically infinite number of compute nodes, the framework has the potential for virtually unlimited scalability and a speed increase of multiple orders of magnitude compared to legacy approaches.

These developments make it hypothetically possible to accelerate the analytical processing of NGS data. The execution speed of NGS data analysis directly impacts the speed at which research findings become available. Currently, analysis of a large dataset via a complex workflow can take days--in some cases weeks. This represents a significant impediment to research productivity. The goal of this project was to reduce these protracted analysis times by optimizing bioinformatics algorithms for Hadoop and executing them on Hadoop-compatible software and a Hadoop-optimized cluster.

Product Details

BN ID: 2940151341912
Publisher: Martin Gollery
Publication date: 03/05/2015
Sold by: Barnes & Noble
Format: eBook
File size: 1 MB

About the Author

Martin Gollery is the CEO of Tahoe Informatics, and the Director of Sales and Marketing Support for BioDatomics. He was the Director of Research at TimeLogic, the Director of the Nevada Center for Bioinformatics, a Bioinformaticist at the Desert Research Institute, and Director of Sales and Business Development for Omixon. He has written numerous papers, articles, book chapters and books. He is the narrator of the audiobook version of ‘It Takes a Genome’. Martin has taught High School Science, College-level Computer Science, and has trained engineers at NASA. He has hiked the Pacific Crest Trail from Mexico to Canada, has made first ascents in the Himalaya, has performed with the Nevada Shakespeare Company and his photographs have been published in several magazines.
From the B&N Reads Blog

Customer Reviews