Algorithms for data science Brian Steele, John Chandler, Swarna Reddy
Tipo de material:
Libro
impreso(a)
y electrónico
Idioma: Inglés Detalles de publicación: New York, New York, United States Springer Science+Business Media 2016Descripción: xxiii, 430 páginas 24 centímetrosISBN: - 3319457950
- 9783319457956
- 518.1 S8
- Disponible en línea
| Tipo de ítem | Biblioteca actual | Colección | Signatura topográfica | Estado | Código de barras | |
|---|---|---|---|---|---|---|
| Libros | Biblioteca Electrónica Recursos en línea (RE) | Acervo General | Recurso digital | ECO400420459736 | ||
| Libros | Biblioteca San Cristóbal Acervo General (AG) | Acervo General | 518.1 S8 | Disponible | ECO010019237 |
Incluye bibliografía: páginas 419-421 e índice: páginas 423-430
1 Introduction.. 1.1 What Is Data Science?.. 1.2 Diabetes in America.. 1.3 Authors of the Federalist Papers.. 1.4 Forecasting NASDAQ Stock Prices.. 1.5 Remarks.. 1.6 The Book.. 1.7 Algorithms.. 1.8 Python.. 1.9 R.. 1.10 Terminology and Notation.. 1.10.1 Matrices and Vectors.. 1.11 Book Website.. Part I Data Reduction.. 2 Data Mapping and Data Dictionaries.. 2.1 Data Reduction.. 2.2 Political Contributions.. 2.3 Dictionaries.. 2.4 Tutorial: Big Contributors.. 2.5 Data Reduction.. 2.5.1 Notation and Terminology.. 2.5.2 The Political Contributions Example.. 2.5.3 Mappings.. 2.6 Tutorial: Election Cycle Contributions.. 2.7 Similarity Measures.. 2.7.1 Computation.. 2.8 Tutorial: Computing Similarity.. 2.9 Concluding Remarks About Dictionaries.. 2.10 Exercises.. 2.10.1 Conceptual.. 2.10.2 Computational.. 3 Scalable Algorithms and Associative Statistics.. 3.1 Introduction.. 3.2 Example: Obesity in the United States.. 3.3 Associative Statistics.. 3.4 Univariate Observations.. 3.4.1 Histograms.. 3.4.2 Histogram Construction.. 3.5 Functions.. 3.6 Tutorial: Histogram Construction.. 3.6.1 Synopsis.. 3.7 Multivariate Data.. 3.7.1 Notation and Terminology.. 3.7.2 Estimators.. 3.7.3 The Augmented Moment Matrix.. 3.7.4 Synopsis.. 3.8 Tutorial: Computing the Correlation Matrix.. 3.8.1 Conclusion.. 3.9 Introduction to Linear Regression.. 3.9.1 The Linear Regression Model.. 3.9.2 The Estimator of β.. 3.9.3 Accuracy Assessment.. 3.9.4 Computing R²adjusted.. 3.10 Tutorial: Computing β.. 3.10.1 Conclusion.. 3.11 Exercises.. 3.11.1 Conceptual.. 3.11.2 Computational.. 4 Hadoop and MapReduce.. 4.1 Introduction.. 4.2 The Hadoop Ecosystem.. 4.2.1 The Hadoop Distributed File System.. 4.2.2 MapReduce.. 4.2.3 Mapping.. 4.2.4 Reduction.. 4.3 Developing a Hadoop Application.. 4.4 Medicare Payments.. 4.5 The Command Line Environment.. 4.6 Tutorial: Programming a MapReduce Algorithm.. 4.6.1 The Mapper.. 4.6.2 The Reducer.. 4.6.3 Synopsis
4.7 Tutorial: Using Amazon Web Services.. 4.7.1 Closing Remarks.. 4.8 Exercises.. 4.8.1 Conceptual.. 4.8.2 Computational.. Part II Extracting Information from Data.. 5 Data Visualization.. 5.1 Introduction.. 5.2 Principles of Data Visualization.. 5.3 Making Good Choices.. 5.3.1 Univariate Data.. 5.3.2 Bivariate and Multivariate Data.. 5.4 Harnessing the Machine.. 5.4.1 Building Fig. 5.2.. 5.4.2 Building Fig. 5.3.. 5.4.3 Building Fig. 5.4.. 5.4.4 Building Fig. 5.5.. 5.4.5 Building Fig. 5.8.. 5.4.6 Building Fig. 5.10.. 5.4.7 Building Fig. 5.11.. 5.5 Exercises.. 6 Linear Regression Methods.. 6.1 Introduction.. 6.2 The Linear Regression Model.. 6.2.1 Example: Depression, Fatalism, and Simplicity.. 6.2.2 Least Squares.. 6.2.3 Confidence Intervals.. 6.2.4 Distributional Conditions.. 6.2.5 Hypothesis Testing.. 6.2.6 Cautionary Remarks.. 6.3 Introduction to R.. 6.4 Tutorial: R.. 6.4.1 Remark.. 6.5 Tutorial: Large Data Sets and R.. 6.6 Factors.. 6.6.1 Interaction.. 6.6.2 The Extra Sums-of-Squares F-test.. 6.7 Tutorial: Bike Share.. 6.7.1 An Incongruous Result.. 6.8 Analysis of Residuals.. 6.8.1 Linearity.. 6.8.2 Example: The Bike Share Problem.. 6.8.3 Independence.. 6.9 Tutorial: Residual Analysis.. 6.9.1 Final Remarks.. 6.10 Exercises.. 6.10.1 Conceptual.. 6.10.2 Computational.. 7 Healthcare Analytics.. 7.1 Introduction.. 7.2 The Behavioral Risk Factor Surveillance System.. 7.2.1 Estimation of Prevalence.. 7.2.2 Estimation of Incidence.. 7.3 Tutorial: Diabetes Prevalence and Incidence.. 7.4 Predicting At-Risk Individuals.. 7.4.1 Sensitivity and Specificity.. 7.5 Tutorial: Identifying At-Risk Individuals.. 7.6 Unusual Demographic Attribute Vectors.. 7.7 Tutorial: Building Neighborhood Sets.. 7.7.1 Synopsis.. 7.8 Exercises.. 7.8.1 Conceptual.. 7.8.2 Computational.. 8 Cluster Analysis.. 8.1 Introduction.. 8.2 Hierarchical Agglomerative Clustering.. 8.3 Comparison of States.. 8.4 Tutorial: Hierarchical Clustering of States
8.4.1 Synopsis.. 8.5 The k-Means Algorithm.. 8.6 Tutorial: The k-Means Algorithm.. 8.6.1 Synopsis.. 8.7 Exercises.. 8.7.1 Conceptual.. 8.7.2 Computational.. Part III Predictive Analytics.. 9 k-Nearest Neighbor Prediction Functions.. 9.1 Introduction.. 9.1.1 The Prediction Task.. 9.2 Notation and Terminology.. 9.3 Distance Metrics.. 9.4 The k-Nearest Neighbor Prediction Function.. 9.5 Exponentially Weighted k-Nearest Neighbors.. 9.6 Tutorial: Digit Recognition.. 9.6.1 Remarks.. 9.7 Accuracy Assessment.. 9.7.1 Confusion Matrices.. 9.8 k-Nearest Neighbor Regression.. 9.9 Forecasting the S&P 500.. 9.10 Tutorial: Forecasting by Pattern Recognition.. 9.10.1 Remark.. 9.11 Cross-Validation.. 9.12 Exercises.. 9.12.1 Conceptual.. 9.12.2 Computational.. 10 The Multinomial Naïve Bayes Prediction Function.. 10.1 Introduction.. 10.2 The Federalist Papers.. 10.3 The Multinomial Naïve Bayes Prediction Function.. 10.3.1 Posterior Probabilities.. 10.4 Tutorial: Reducing the Federalist Papers.. 10.4.1 Summary.. 10.5 Tutorial: Predicting Authorship of the Disputed Federalist Papers.. 10.5.1 Remark.. 10.6 Tutorial: Customer Segmentation.. 10.6.1 Additive Smoothing.. 10.6.2 The Data.. 10.6.3 Remarks.. 10.7 Exercises.. 10.7.1 Conceptual.. 10.7.2 Computational.. 11 Forecasting.. 11.1 Introduction.. 11.2 Tutorial: Working with Time.. 11.3 Analytical Methods.. 11.3.1 Notation.. 11.3.2 Estimation of the Mean and Variance.. 11.3.3 Exponential Forecasting.. 11.3.4 Autocorrelation.. 11.4 Tutorial: Computing ρτ.. 11.4.1 Remarks.. 11.5 Drift and Forecasting.. 11.6 Holt-Winters Exponential Forecasting.. 11.6.1 Forecasting Error.. 11.7 Tutorial: Holt-Winters Forecasting.. 11.8 Regression-Based Forecasting of Stock Prices.. 11.9 Tutorial: Regression-Based Forecasting.. 11.9.1 Remarks.. 11.10 Time-Varying Regression Estimators.. 11.11 Tutorial: Time-Varying Regression Estimators.. 11.11.1 Remarks.. 11.12 Exercises.. 11.12.1 Conceptual
11.12.2 Computational.. 12 Real-time Analytics.. 12.1 Introduction.. 12.2 Forecasting with a NASDAQ Quotation Stream.. 12.2.1 Forecasting Algorithms.. 12.3 Tutorial: Forecasting the Apple Inc. Stream.. 12.3.1 Remarks.. 12.4 The Twitter Streaming API.. 12.5 Tutorial: Tapping the Twitter Stream.. 12.5.1 Remarks.. 12.6 Sentiment Analysis.. 12.7 Tutorial: Sentiment Analysis of Hashtag Groups.. 12.8 Exercises.. A Solutions to Exercises.. B Accessing the Twitter API.. References.. Index
Disponible para usuarios de ECOSUR con su clave de acceso
This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials. Inglés
This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners. Inglés
Disponible en línea
Adobe Acrobat profesional 6.0 o superior