Genomic data production now has the velocity, volume, and variety to be considered ‘Big Data’. As a result, the processing of genomic data can be improved by applying lessons and techniques from other industries that have worked successfully with Big Data. One of these techniques, now in common use within multiple industries--Energy, Financial Services, Online Commerce, and Telecommunications, for example--is the use of a Hadoop-based analysis infrastructure.
Hadoop is an open source software framework whose roots lie in the original Google search engine. The framework is a collection of software modules that manage the storage and processing of large datasets on clusters of commodity hardware. Because Hadoop segments and distributes data across a theoretically infinite number of compute nodes, the framework has the potential for virtually unlimited scalability and a speed increase of multiple orders of magnitude compared to legacy approaches.
These developments make it hypothetically possible to accelerate the analytical processing of NGS data. The execution speed of NGS data analysis directly impacts the speed at which research findings become available. Currently, analysis of a large dataset via a complex workflow can take days--in some cases weeks. This represents a significant impediment to research productivity. The goal of this project was to reduce these protracted analysis times by optimizing bioinformatics algorithms for Hadoop and executing them on Hadoop-compatible software and a Hadoop-optimized cluster.
1121337031
Hadoop is an open source software framework whose roots lie in the original Google search engine. The framework is a collection of software modules that manage the storage and processing of large datasets on clusters of commodity hardware. Because Hadoop segments and distributes data across a theoretically infinite number of compute nodes, the framework has the potential for virtually unlimited scalability and a speed increase of multiple orders of magnitude compared to legacy approaches.
These developments make it hypothetically possible to accelerate the analytical processing of NGS data. The execution speed of NGS data analysis directly impacts the speed at which research findings become available. Currently, analysis of a large dataset via a complex workflow can take days--in some cases weeks. This represents a significant impediment to research productivity. The goal of this project was to reduce these protracted analysis times by optimizing bioinformatics algorithms for Hadoop and executing them on Hadoop-compatible software and a Hadoop-optimized cluster.
Introduction To Hadoop for Bioinformatics
Genomic data production now has the velocity, volume, and variety to be considered ‘Big Data’. As a result, the processing of genomic data can be improved by applying lessons and techniques from other industries that have worked successfully with Big Data. One of these techniques, now in common use within multiple industries--Energy, Financial Services, Online Commerce, and Telecommunications, for example--is the use of a Hadoop-based analysis infrastructure.
Hadoop is an open source software framework whose roots lie in the original Google search engine. The framework is a collection of software modules that manage the storage and processing of large datasets on clusters of commodity hardware. Because Hadoop segments and distributes data across a theoretically infinite number of compute nodes, the framework has the potential for virtually unlimited scalability and a speed increase of multiple orders of magnitude compared to legacy approaches.
These developments make it hypothetically possible to accelerate the analytical processing of NGS data. The execution speed of NGS data analysis directly impacts the speed at which research findings become available. Currently, analysis of a large dataset via a complex workflow can take days--in some cases weeks. This represents a significant impediment to research productivity. The goal of this project was to reduce these protracted analysis times by optimizing bioinformatics algorithms for Hadoop and executing them on Hadoop-compatible software and a Hadoop-optimized cluster.
Hadoop is an open source software framework whose roots lie in the original Google search engine. The framework is a collection of software modules that manage the storage and processing of large datasets on clusters of commodity hardware. Because Hadoop segments and distributes data across a theoretically infinite number of compute nodes, the framework has the potential for virtually unlimited scalability and a speed increase of multiple orders of magnitude compared to legacy approaches.
These developments make it hypothetically possible to accelerate the analytical processing of NGS data. The execution speed of NGS data analysis directly impacts the speed at which research findings become available. Currently, analysis of a large dataset via a complex workflow can take days--in some cases weeks. This represents a significant impediment to research productivity. The goal of this project was to reduce these protracted analysis times by optimizing bioinformatics algorithms for Hadoop and executing them on Hadoop-compatible software and a Hadoop-optimized cluster.
10.0
In Stock
5
1
Introduction To Hadoop for Bioinformatics
Introduction To Hadoop for Bioinformatics
Related collections and offers
10.0
In Stock
Product Details
BN ID: | 2940151341912 |
---|---|
Publisher: | Martin Gollery |
Publication date: | 03/05/2015 |
Sold by: | Barnes & Noble |
Format: | eBook |
File size: | 1 MB |
About the Author
From the B&N Reads Blog