Imagen de portada de Amazon
Imagen de Amazon.com
Vista normal Vista MARC

Algorithms for data science Brian Steele, John Chandler, Swarna Reddy

Tipo de material: Libro
 impreso(a) 
 
  y electrónico  
  Libro impreso(a) y electrónico Idioma: Inglés Detalles de publicación: New York, New York, United States Springer Science+Business Media 2016Descripción: xxiii, 430 páginas 24 centímetrosISBN:
  • 3319457950
  • 9783319457956
Tema(s) en español: Formatos físicos adicionales: Algorithms for data scienceClasificación:
  • 518.1 S8
Recurso en línea: Formatos físicos adicionales disponibles:
  • Disponible en línea
Indice:Mostrar
Nota de acceso: Disponible para usuarios de ECOSUR con su clave de acceso Resumen:
Inglés

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.

This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.

Número de sistema: 42045
Etiquetas de esta biblioteca: No hay etiquetas de esta biblioteca para este título. Ingresar para agregar etiquetas.
Valoración
    Valoración media: 0.0 (0 votos)
Existencias
Tipo de ítem Biblioteca actual Colección Signatura topográfica Estado Código de barras
Libros Biblioteca Electrónica Recursos en línea (RE) Acervo General Recurso digital ECO400420459736
Libros Biblioteca San Cristóbal Acervo General (AG) Acervo General 518.1 S8 Disponible ECO010019237

Incluye bibliografía: páginas 419-421 e índice: páginas 423-430

1 Introduction.. 1.1 What Is Data Science?.. 1.2 Diabetes in America.. 1.3 Authors of the Federalist Papers.. 1.4 Forecasting NASDAQ Stock Prices.. 1.5 Remarks.. 1.6 The Book.. 1.7 Algorithms.. 1.8 Python.. 1.9 R.. 1.10 Terminology and Notation.. 1.10.1 Matrices and Vectors.. 1.11 Book Website.. Part I Data Reduction.. 2 Data Mapping and Data Dictionaries.. 2.1 Data Reduction.. 2.2 Political Contributions.. 2.3 Dictionaries.. 2.4 Tutorial: Big Contributors.. 2.5 Data Reduction.. 2.5.1 Notation and Terminology.. 2.5.2 The Political Contributions Example.. 2.5.3 Mappings.. 2.6 Tutorial: Election Cycle Contributions.. 2.7 Similarity Measures.. 2.7.1 Computation.. 2.8 Tutorial: Computing Similarity.. 2.9 Concluding Remarks About Dictionaries.. 2.10 Exercises.. 2.10.1 Conceptual.. 2.10.2 Computational.. 3 Scalable Algorithms and Associative Statistics.. 3.1 Introduction.. 3.2 Example: Obesity in the United States.. 3.3 Associative Statistics.. 3.4 Univariate Observations.. 3.4.1 Histograms.. 3.4.2 Histogram Construction.. 3.5 Functions.. 3.6 Tutorial: Histogram Construction.. 3.6.1 Synopsis.. 3.7 Multivariate Data.. 3.7.1 Notation and Terminology.. 3.7.2 Estimators.. 3.7.3 The Augmented Moment Matrix.. 3.7.4 Synopsis.. 3.8 Tutorial: Computing the Correlation Matrix.. 3.8.1 Conclusion.. 3.9 Introduction to Linear Regression.. 3.9.1 The Linear Regression Model.. 3.9.2 The Estimator of β.. 3.9.3 Accuracy Assessment.. 3.9.4 Computing R²adjusted.. 3.10 Tutorial: Computing β.. 3.10.1 Conclusion.. 3.11 Exercises.. 3.11.1 Conceptual.. 3.11.2 Computational.. 4 Hadoop and MapReduce.. 4.1 Introduction.. 4.2 The Hadoop Ecosystem.. 4.2.1 The Hadoop Distributed File System.. 4.2.2 MapReduce.. 4.2.3 Mapping.. 4.2.4 Reduction.. 4.3 Developing a Hadoop Application.. 4.4 Medicare Payments.. 4.5 The Command Line Environment.. 4.6 Tutorial: Programming a MapReduce Algorithm.. 4.6.1 The Mapper.. 4.6.2 The Reducer.. 4.6.3 Synopsis

4.7 Tutorial: Using Amazon Web Services.. 4.7.1 Closing Remarks.. 4.8 Exercises.. 4.8.1 Conceptual.. 4.8.2 Computational.. Part II Extracting Information from Data.. 5 Data Visualization.. 5.1 Introduction.. 5.2 Principles of Data Visualization.. 5.3 Making Good Choices.. 5.3.1 Univariate Data.. 5.3.2 Bivariate and Multivariate Data.. 5.4 Harnessing the Machine.. 5.4.1 Building Fig. 5.2.. 5.4.2 Building Fig. 5.3.. 5.4.3 Building Fig. 5.4.. 5.4.4 Building Fig. 5.5.. 5.4.5 Building Fig. 5.8.. 5.4.6 Building Fig. 5.10.. 5.4.7 Building Fig. 5.11.. 5.5 Exercises.. 6 Linear Regression Methods.. 6.1 Introduction.. 6.2 The Linear Regression Model.. 6.2.1 Example: Depression, Fatalism, and Simplicity.. 6.2.2 Least Squares.. 6.2.3 Confidence Intervals.. 6.2.4 Distributional Conditions.. 6.2.5 Hypothesis Testing.. 6.2.6 Cautionary Remarks.. 6.3 Introduction to R.. 6.4 Tutorial: R.. 6.4.1 Remark.. 6.5 Tutorial: Large Data Sets and R.. 6.6 Factors.. 6.6.1 Interaction.. 6.6.2 The Extra Sums-of-Squares F-test.. 6.7 Tutorial: Bike Share.. 6.7.1 An Incongruous Result.. 6.8 Analysis of Residuals.. 6.8.1 Linearity.. 6.8.2 Example: The Bike Share Problem.. 6.8.3 Independence.. 6.9 Tutorial: Residual Analysis.. 6.9.1 Final Remarks.. 6.10 Exercises.. 6.10.1 Conceptual.. 6.10.2 Computational.. 7 Healthcare Analytics.. 7.1 Introduction.. 7.2 The Behavioral Risk Factor Surveillance System.. 7.2.1 Estimation of Prevalence.. 7.2.2 Estimation of Incidence.. 7.3 Tutorial: Diabetes Prevalence and Incidence.. 7.4 Predicting At-Risk Individuals.. 7.4.1 Sensitivity and Specificity.. 7.5 Tutorial: Identifying At-Risk Individuals.. 7.6 Unusual Demographic Attribute Vectors.. 7.7 Tutorial: Building Neighborhood Sets.. 7.7.1 Synopsis.. 7.8 Exercises.. 7.8.1 Conceptual.. 7.8.2 Computational.. 8 Cluster Analysis.. 8.1 Introduction.. 8.2 Hierarchical Agglomerative Clustering.. 8.3 Comparison of States.. 8.4 Tutorial: Hierarchical Clustering of States

8.4.1 Synopsis.. 8.5 The k-Means Algorithm.. 8.6 Tutorial: The k-Means Algorithm.. 8.6.1 Synopsis.. 8.7 Exercises.. 8.7.1 Conceptual.. 8.7.2 Computational.. Part III Predictive Analytics.. 9 k-Nearest Neighbor Prediction Functions.. 9.1 Introduction.. 9.1.1 The Prediction Task.. 9.2 Notation and Terminology.. 9.3 Distance Metrics.. 9.4 The k-Nearest Neighbor Prediction Function.. 9.5 Exponentially Weighted k-Nearest Neighbors.. 9.6 Tutorial: Digit Recognition.. 9.6.1 Remarks.. 9.7 Accuracy Assessment.. 9.7.1 Confusion Matrices.. 9.8 k-Nearest Neighbor Regression.. 9.9 Forecasting the S&P 500.. 9.10 Tutorial: Forecasting by Pattern Recognition.. 9.10.1 Remark.. 9.11 Cross-Validation.. 9.12 Exercises.. 9.12.1 Conceptual.. 9.12.2 Computational.. 10 The Multinomial Naïve Bayes Prediction Function.. 10.1 Introduction.. 10.2 The Federalist Papers.. 10.3 The Multinomial Naïve Bayes Prediction Function.. 10.3.1 Posterior Probabilities.. 10.4 Tutorial: Reducing the Federalist Papers.. 10.4.1 Summary.. 10.5 Tutorial: Predicting Authorship of the Disputed Federalist Papers.. 10.5.1 Remark.. 10.6 Tutorial: Customer Segmentation.. 10.6.1 Additive Smoothing.. 10.6.2 The Data.. 10.6.3 Remarks.. 10.7 Exercises.. 10.7.1 Conceptual.. 10.7.2 Computational.. 11 Forecasting.. 11.1 Introduction.. 11.2 Tutorial: Working with Time.. 11.3 Analytical Methods.. 11.3.1 Notation.. 11.3.2 Estimation of the Mean and Variance.. 11.3.3 Exponential Forecasting.. 11.3.4 Autocorrelation.. 11.4 Tutorial: Computing ρτ.. 11.4.1 Remarks.. 11.5 Drift and Forecasting.. 11.6 Holt-Winters Exponential Forecasting.. 11.6.1 Forecasting Error.. 11.7 Tutorial: Holt-Winters Forecasting.. 11.8 Regression-Based Forecasting of Stock Prices.. 11.9 Tutorial: Regression-Based Forecasting.. 11.9.1 Remarks.. 11.10 Time-Varying Regression Estimators.. 11.11 Tutorial: Time-Varying Regression Estimators.. 11.11.1 Remarks.. 11.12 Exercises.. 11.12.1 Conceptual

11.12.2 Computational.. 12 Real-time Analytics.. 12.1 Introduction.. 12.2 Forecasting with a NASDAQ Quotation Stream.. 12.2.1 Forecasting Algorithms.. 12.3 Tutorial: Forecasting the Apple Inc. Stream.. 12.3.1 Remarks.. 12.4 The Twitter Streaming API.. 12.5 Tutorial: Tapping the Twitter Stream.. 12.5.1 Remarks.. 12.6 Sentiment Analysis.. 12.7 Tutorial: Sentiment Analysis of Hashtag Groups.. 12.8 Exercises.. A Solutions to Exercises.. B Accessing the Twitter API.. References.. Index

Disponible para usuarios de ECOSUR con su clave de acceso

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials. Inglés

This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners. Inglés

Disponible en línea

Adobe Acrobat profesional 6.0 o superior