The Nile on eBay
  FREE SHIPPING UK WIDE
 

The Big R-Book

by Philippe J.S. De Brouwer

Introduces professionals and scientists to statistics and machine learning using the programming language R Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science.  The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling. Part 5 teaches readers about exploring data. In Part 6 we learn to build models, Part 7 introduces the reader to the reality in companies, Part 8 covers reports and interactive applications and finally Part 9 introduces the reader to big data and performance computing. It also includes some helpful appendices. Provides a practical guide for non-experts with a focus on business usersContains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reportingUses a practical tone and integrates multiple topics in a coherent frameworkDemystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in RShows readers how to visualize results in static and interactive reportsSupplementary materials includes PDF slides based on the book's content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion Site The Big R-Book is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.

FORMAT
Hardcover
LANGUAGE
English
CONDITION
Brand New


Back Cover

Introduces professionals and scientists to statistics, machine learning, and big data using the programming language R Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science. The Big R-Book: From Data Science to Learning Machines and Big Data includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling and exploring data. In Part 5 we learn to build models, Part 6 introduces the reader to the reality in companies, Part 7 covers reports and interactive applications and Part 8 introduces the reader to big data and performance computing. The appendices focus on specialist topics such as building your own extention for R, answer questions that appear througout the book, etc. Provides a practical guide for non-experts with a focus on business users Contains a unique combination of topics including an introduction to R, machine learning, multi criteria decision analysis, mathematical models, data wrangling, and reporting Uses a practical tone and integrates multiple topics in a coherent framework Demystifies the hype around machine learning and AI by enabling readers to understand the models and program them in R Shows readers how to visualize results in reports and dynamic websites Supplementary materials include PDF slides based on the book's content on an Wiley Instructor-only Book Companion Site, as well as all the extracted R-code available to everyone on a Wiley Student Book Companion Site The Big R-Book is an excellent guide for science technology, engineering, or mathematics students and graduates who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models or review them.

Flap

Introduces professionals and scientists to statistics, machine learning, and big data using the programming language R Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science. The Big R-Book: From Data Science to Learning Machines and Big Data includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling and exploring data. In Part 5 we learn to build models, Part 6 introduces the reader to the reality in companies, Part 7 covers reports and interactive applications and Part 8 introduces the reader to big data and performance computing. The appendices focus on specialist topics such as building your own extention for R, answer questions that appear througout the book, etc. Provides a practical guide for non-experts with a focus on business users Contains a unique combination of topics including an introduction to R, machine learning, multi criteria decision analysis, mathematical models, data wrangling, and reporting Uses a practical tone and integrates multiple topics in a coherent framework Demystifies the hype around machine learning and AI by enabling readers to understand the models and program them in R Shows readers how to visualize results in reports and dynamic websites Supplementary materials include PDF slides based on the book's content on an Wiley Instructor-only Book Companion Site, as well as all the extracted R-code available to everyone on a Wiley Student Book Companion Site The Big R-Book is an excellent guide for science technology, engineering, or mathematics students and graduates who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models or review them.

Author Biography

PHILIPPE J.S. DE BROUWER, PHD, is director at HSBC, guest professor at four universities and MBA programs (University of Warsaw, Jagiellonian University, Krakow School of Business and AGH University of Science and Technology) and honorary consul for Belgium in Krakow. As a professor, he builds bridges not only between universities and the industry, but also across disciplines. He teaches mathematicians leadership skills and non-mathematicians coding. As a scientist, he tries to combine research on financial markets, psychology, and investments to the benefit of the investor. As an honorary consul he is passionate about serving the community and helping initiatives grow.

Table of Contents

Foreword xxv About the Author xxvii Acknowledgements xxix Preface xxxi About the Companion Site xxxv I Introduction 1 1 The Big Picture with Kondratiev and Kardashev 3 2 The Scientific Method and Data 7 3 Conventions 11 II Starting with R and Elements of Statistics 19 4 The Basics of R 21 4.1 Getting Started with R 23 4.2 Variables 26 4.3 Data Types 28 4.3.1 The Elementary Types 28 4.3.2 Vectors 29 4.3.3 Accessing Data from a Vector 29 4.3.4 Matrices 32 4.3.5 Arrays 38 4.3.6 Lists 41 4.3.7 Factors 45 4.3.8 Data Frames 49 4.3.9 Strings or the Character-type 54 4.4 Operators 57 4.4.1 Arithmetic Operators 57 4.4.2 Relational Operators 57 4.4.3 Logical Operators 58 4.4.4 Assignment Operators 59 4.4.5 Other Operators 61 4.5 Flow Control Statements 63 4.5.1 Choices 63 4.5.2 Loops 65 4.6 Functions 69 4.6.1 Built-in Functions 69 4.6.2 Help with Functions 69 4.6.3 User-defined Functions 70 4.6.4 Changing Functions 70 4.6.5 Creating Function with Default Arguments 71 4.7 Packages 72 4.7.1 Discovering Packages in R 72 4.7.2 Managing Packages in R 73 4.8 Selected Data Interfaces 75 4.8.1 CSV Files 75 4.8.2 Excel Files 79 4.8.3 Databases 79 5 Lexical Scoping and Environments 81 5.1 Environments in R 81 5.2 Lexical Scoping in R 83 6 The Implementation of OO 87 6.1 Base Types 89 6.2 S3 Objects 91 6.2.1 Creating S3 Objects 94 6.2.2 Creating Generic Methods 96 6.2.3 Method Dispatch 97 6.2.4 Group Generic Functions 98 6.3 S4 Objects 100 6.3.1 Creating S4 Objects 100 6.3.2 Using S4 Objects 101 6.3.3 Validation of Input 105 6.3.4 Constructor functions 107 6.3.5 The Data slot 108 6.3.6 Recognising Objects, Generic Functions, and Methods 108 6.3.7 CreatingS4Generics 110 6.3.8 Method Dispatch 111 6.4 The Reference Class, refclass, RC or R5 Model 113 6.4.1 Creating RC Objects 113 6.4.2 Important Methods and Attributes 117 6.5 Conclusions about the OO Implementation 119 7 Tidy R with the Tidyverse 121 7.1 The Philosophy of the Tidyverse 121 7.2 Packages in the Tidyverse 124 7.2.1 The Core Tidyverse 124 7.2.2 The Non-core Tidyverse 125 7.3 Working with the Tidyverse 127 7.3.1 Tibbles 127 7.3.2 Piping with R 132 7.3.3 Attention Points When Using the Pipe 133 7.3.4 Advanced Piping 134 7.3.5 Conclusion 137 8 Elements of Descriptive Statistics 139 8.1 Measures of Central Tendency 139 8.1.1 Mean 139 8.1.2 The Median 142 8.1.3 The Mode 143 8.2 Measures of Variation or Spread 145 8.3 Measures of Covariation 147 8.3.1 The Pearson Correlation 147 8.3.2 The Spearman Correlation 148 8.3.3 Chi-square Tests 149 8.4 Distributions 150 8.4.1 Normal Distribution 150 8.4.2 Binomial Distribution 153 8.5 Creating an Overview of Data Characteristics 155 9 Visualisation Methods 159 9.1 Scatterplots 161 9.2 Line Graphs 163 9.3 Pie Charts 165 9.4 Bar Charts 167 9.5 Boxplots 171 9.6 Violin Plots 173 9.7 Histograms 176 9.8 Plotting Functions 179 9.9 Maps and Contour Plots 180 9.10 Heat-maps 181 9.11 Text Mining 184 9.11.1 Word Clouds 184 9.11.2 Word Associations 188 9.12 Colours in R 191 10 Time Series Analysis 197 10.1 Time Series in R 197 10.1.1 The Basics of Time Series in R 197 10.2 Forecasting 200 10.2.1 Moving Average 200 10.2.2 Seasonal Decomposition 206 11 Further Reading 211 III Data Import 213 12 A Short History of Modern Database Systems 215 13 RDBMS 219 14 SQL 223 14.1 Designing the Database 223 14.2 Building the Database Structure 226 14.2.1 Installing a RDBMS 226 14.2.2 Creating the Database 228 14.2.3 Creating the Tables and Relations 229 14.3 Adding Data to the Database 235 14.4 Querying the Database 239 14.4.1 The Basic Select Query 239 14.4.2 More Complex Queries 240 14.5 Modifying the Database Structure 244 14.6 Selected Features of SQL 249 14.6.1 Changing Data 249 14.6.2 Functions in SQL 249 15 Connecting R to an SQL Database 253 IV Data Wrangling 257 16 Anonymous Data 261 17 Data Wrangling in the tidyverse 265 17.1 Importing the Data 266 17.1.1 Importing from an SQLRDBMS 266 17.1.2 Importing Flat Files in the Tidyverse 267 17.2 Tidy Data 275 17.3 Tidying Up Data with tidyr 277 17.3.1 Splitting Tables 278 17.3.2 Convert Headers to Data 281 17.3.3 Spreading One Column Over Many 284 17.3.4 Split One Columns into Many 285 17.3.5 Merge Multiple Columns Into One 286 17.3.6 Wrong Data 287 17.4 SQL-like Functionality via dplyr 288 17.4.1 Selecting Columns 288 17.4.2 Filtering Rows 289 17.4.3 Joining 290 17.4.4 Mutating Data 293 17.4.5 Set Operations 296 17.5 String Manipulation in the tidyverse 299 17.5.1 Basic String Manipulation 300 17.5.2 Pattern Matching with Regular Expressions 302 17.6 Dates with lubridate 314 17.6.1 ISO 8601 Format 315 17.6.2 Time-zones 317 17.6.3 Extract Date and Time Components 318 17.6.4 Calculating with Date-times 319 17.7 Factors with Forcats 325 18 Dealing with Missing Data 333 18.1 Reasons for Data to be Missing 334 18.2 Methods to Handle Missing Data 336 18.2.1 Alternative Solutions to Missing Data 336 18.2.2 Predictive Mean Matching(PMM) 338 18.3 R Packages to Deal with Missing Data 339 18.3.1 mice 339 18.3.2 missForest 340 18.3.3 Hmisc 341 19 Data Binning 343 19.1 What is Binning and Why Use It 343 19.2 Tuning the Binning Procedure 347 19.3 More Complex Cases: Matrix Binning 352 19.4 Weight of Evidence and Information Value 359 19.4.1 Weight of Evidence(WOE) 359 19.4.2 Information Value(IV) 359 19.4.3 WOE and IV in R 359 20 Factoring Analysis and Principle Components 363 20.1 Principle Components Analysis (PCA) 364 20.2 Factor Analysis 368 V Modelling 373 21 Regression Models 375 21.1 Linear Regression 375 21.2 Multiple Linear Regression 379 21.2.1 Poisson Regression 379 21.2.2 Non-linear Regression 381 21.3 Performance of Regression Models 384 21.3.1 Mean Square Error (MSE) 384 21.3.2 R-Squared 384 21.3.3 Mean Average Deviation(MAD) 386 22 Classification Models 387 22.1 Logistic Regression 388 22.2 Performance of Binary Classification Models 390 22.2.1 The Confusion Matrix and Related Measures 391 22.2.2 ROC 393 22.2.3 The AUC 396 22.2.4 The Gini Coefficient 397 22.2.5 Kolmogorov-Smirnov (KS) for Logistic Regression 398 22.2.6 Finding an Optimal Cut-off 399 23 Learning Machines 405 23.1 Decision Tree 407 23.1.1 Essential Background 407 23.1.2 Important Considerations 412 23.1.3 Growing Trees with the Package rpart 414 23.1.4 Evaluating the Performance of a Decision Tree 424 23.2 Random Forest 428 23.3 Artificial Neural Networks (ANNs) 434 23.3.1 The Basics of ANNs in R 434 23.3.2 Neural Networks in R 436 23.3.3 The Work-flow to for Fitting a NN 438 23.3.4 Cross Validate the NN 444 23.4 Support Vector Machine 447 23.4.1 Fitting a SVM in R 447 23.4.2 Optimizing the SVM 449 23.5 Unsupervised Learning and Clustering 450 23.5.1 k-Means Clustering 450 23.5.2 Visualizing Clusters in Three Dimensions 462 23.5.3 Fuzzy Clustering 464 23.5.4 Hierarchical Clustering 466 23.5.5 Other Clustering Methods 468 24 Towards a Tidy Modelling Cycle with modelr 469 24.1 Adding Predictions 470 24.2 Adding Residuals 471 24.3 Bootstrapping Data 472 24.4 Other Functions of modelr 474 25 Model Validation 475 25.1 Model Quality Measures 476 25.2 Predictions and Residuals 477 25.3 Bootstrapping 479 25.3.1 Bootstrapping in Base R 479 25.3.2 Bootstrapping in the tidyverse with modelr 481 25.4 Cross-Validation 483 25.4.1 Elementary Cross Validation 483 25.4.2 Monte Carlo Cross Validation 486 25.4.3 k-Fold Cross Validation 488 25.4.4 Comparing Cross Validation Methods 489 25.5 Validation in a Broader Perspective 492 26 Labs 495 26.1 Financial Analysis with quantmod 495 26.1.1 The Basics of quantmod 495 26.1.2 Types of Data Available in quantmod 496 26.1.3 Plotting with quantmod 497 26.1.4 The quantmod Data Structure 500 26.1.5 Support Functions Supplied by quantmod 502 26.1.6 Financial Modelling in quantmod 504 27 Multi Criteria Decision Analysis (MCDA) 511 27.1 What and Why 511 27.2 General Work-flow 513 27.3 Identify the Issue at Hand: Steps 1 and 2 516 27.4 Step3: the Decision Matrix 518 27.4.1 Construct a Decision Matrix 518 27.4.2 Normalize the Decision Matrix 520 27.5 Step 4: Delete Inefficient and Unacceptable Alternatives 521 27.5.1 Unacceptable Alternatives 521 27.5.2 Dominance – Inefficient Alternatives 521 27.6 Plotting Preference Relationships 524 27.7 Step5: MCDA Methods 526 27.7.1 Examples of Non-compensatory Methods 526 27.7.2 The Weighted Sum Method(WSM) 527 27.7.3 Weighted Product Method(WPM) 530 27.7.4 ELECTRE 530 27.7.5 PROMethEE 540 27.7.6 PCA(Gaia) 553 27.7.7 Outranking Methods 557 27.7.8 Goal Programming 558 27.8 Summary MCDA 561 VI Introduction to Companies 563 28 Financial Accounting (FA) 567 28.1 The Statements of Accounts 568 28.1.1 Income Statement 568 28.1.2 Net Income: The P&L statement 568 28.1.3 Balance Sheet 569 28.2 The Value Chain 571 28.3 Further, Terminology 573 28.4 Selected Financial Ratios 575 29 Management Accounting 583 29.1 Introduction 583 29.1.1 Definition of Management Accounting (MA) 583 29.1.2 Management Information Systems (MIS) 584 29.2 Selected Methods in MA 585 29.2.1 Cost Accounting 585 29.2.2 Selected Cost Types 587 29.3 Selected Use Cases of MA 590 29.3.1 Balanced Scorecard 590 29.3.2 Key Performance Indicators (KPIs) 591 30 Asset Valuation Basics 597 30.1 Time Value of Money 598 30.1.1 Interest Basics 598 30.1.2 Specific Interest Rate Concepts 598 30.1.3 Discounting 600 30.2 Cash 601 30.3 Bonds 602 30.3.1 Features of a Bond 602 30.3.2 Valuation of Bonds 604 30.3.3 Duration 606 30.4 The Capital Asset Pricing Model (CAPM) 610 30.4.1 The CAPM Framework 610 30.4.2 The CAPM and Risk 612 30.4.3 Limitations and Shortcomings of the CAPM 612 30.5 Equities 614 30.5.1 Definition 614 30.5.2 Short History 614 30.5.3 Valuation of Equities 615 30.5.4 Absolute Value Models 616 30.5.5 Relative Value Models 625 30.5.6 Selection of Valuation Methods 630 30.5.7 Pitfalls in Company Valuation 631 30.6 Forwards and Futures 638 30.7 Options 640 30.7.1 Definitions 640 30.7.2 Commercial Aspects 642 30.7.3 Short History 643 30.7.4 Valuation of Options at Maturity 644 30.7.5 The Black and Scholes Model 649 30.7.6 The Binomial Model 654 30.7.7 Dependencies of the Option Price 660 30.7.8 The Greeks 664 30.7.9 Delta Hedging 665 30.7.10 Linear Option Strategies 667 30.7.11 Integrated Option Strategies 674 30.7.12 Exotic Options 678 30.7.13 Capital Protected Structures 680 VII Reporting 683 31 A Grammar of Graphics with ggplot2 687 31.1 TheBasicsofggplot2 688 31.2 Over-plotting 692 31.3 CaseStudyforggplot2 696 32 R Markdown 699 33 knitr and LATEX 703 34 An Automated Development Cycle 707 35 Writing and Communication Skills 709 36 Interactive Apps 713 36.1 Shiny 715 36.2 Browser Born Data Visualization 719 36.2.1 HTML-widgets 719 36.2.2 Interactive Maps with leaflet 720 36.2.3 Interactive Data Visualisation with ggvis 721 36.2.4 googleVis 723 36.3 Dashboards 725 36.3.1 The Business Case: a Diversity Dashboard 726 36.3.2 A Dashboard with flexdashboard 731 36.3.3 A Dashboard with shinydashboard 737 VIII Bigger and Faster R 741 37 Parallel Computing 743 37.1 Combine foreach and doParallel 745 37.2 Distribute Calculations over LAN with Snow 748 37.3 Using the GPU 752 37.3.1 Getting Started with gpuR 754 37.3.2 On the Importance of Memory use 757 37.3.3 Conclusions for GPU Programming 759 38 R and Big Data 761 38.1 Use a Powerful Server 763 38.1.1 Use R on a Server 763 38.1.2 Let the Database Server do the Heavy Lifting 763 38.2 Using more Memory than we have RAM 765 39 Parallelism for Big Data 767 39.1 Apache Hadoop 769 39.2 Apache Spark 771 39.2.1 Installing Spark 771 39.2.2 Running Spark 773 39.2.3 SparkR 776 39.2.4 sparklyr 788 39.2.5 SparkR or sparklyr 791 40 The Need for Speed 793 40.1 Benchmarking 794 40.2 Optimize Code 797 40.2.1 Avoid Repeating the Same 797 40.2.2 Use Vectorisation where Appropriate 797 40.2.3 Pre-allocating Memory 799 40.2.4 Use the Fastest Function 800 40.2.5 Use the Fastest Package 801 40.2.6 Be Mindful about Details 802 40.2.7 Compile Functions 804 40.2.8 Use C or C++ Code in R 806 40.2.9 Using a C++ Source File in R 809 40.2.10CallCompiledC++Functions in R 811 40.3 Profiling Code 812 40.3.1 The Package profr 813 40.3.2 The Package proftools 813 40.4 Optimize Your Computer 817 IX Appendices 819 A Create your own R Package 821 A.1 Creating the Package in the R Console 823 A.2 Update the Package Description 825 A.3 Documenting the Functionsxs 826 A.4 Loading the Package 827 A.5 Further Steps 828 B Levels of Measurement 829 B.1 Nominal Scale 829 B.2 Ordinal Scale 830 B.3 Interval Scale 831 B.4 Ratio Scale 832 C Trademark Notices 833 C.1 General Trademark Notices 834 C.2 R-Related Notices 835 C.2.1 Crediting Developers of R Packages 835 C.2.2 The R-packages used in this Book 835 D Code Not Shown in the Body of the Book 839 E Answers to Selected Questions 845 Bibliography 859 Nomenclature 869 Index 881 

Feature

Foreword xxv About the Author xxvii Acknowledgements xxix Preface xxxi About the Companion Site xxxv I Introduction 1 1 The Big Picture with Kondratiev and Kardashev 3 2 The Scientific Method and Data 7 3 Conventions 11 II Starting with R and Elements of Statistics 19 4 The Basics of R 21 5 Lexical Scoping and Environments 81 6 The Implementation of OO 87 7 Tidy R with the Tidyverse 121 8 Elements of Descriptive Statistics 139 9 Visualisation Methods 159 10 Time Series Analysis 197 11 Further Reading 211 III Data Import 213 12 A Short History of Modern Database Systems 215 13 RDBMS 219 14 SQL 223 15 Connecting R to an SQL Database 253 IV Data Wrangling 257 16 Anonymous Data 261 17 Data Wrangling in the tidyverse 265 18 Dealing with Missing Data 333 19 Data Binning 343 20 Factoring Analysis and Principle Components 363 V Modelling 373 21 Regression Models 375 22 Classification Models 387 23 Learning Machines 405 24 Towards a Tidy Modelling Cycle with modelr 469 25 Model Validation 475 26 Labs 495 27 Multi Criteria Decision Analysis (MCDA) 511 VI Introduction to Companies 563 28 Financial Accounting (FA) 567 29 Management Accounting 583 30 Asset Valuation Basics 597 VII Reporting 683 31 A Grammar of Graphics with ggplot2 687 32 R Markdown 699 33 knitr and LATEX 703 34 An Automated Development Cycle 707 35 Writing and Communication Skills 709 36 Interactive Apps 713 VIII Bigger and Faster R 741 37 Parallel Computing 743 38 R and Big Data 761 39 Parallelism for Big Data 767 40 The Need for Speed 793 IX Appendices 819 A Create your own R package 821 B Levels of Measurement 829 C Trademark Notices 833 D Code Not Shown in the Body of the Book 839 E Answers to Selected Questions 845 Bibliography 859 Nomenclature 869 Index 881

Details

ISBN1119632722
Language English
ISBN-10 1119632722
ISBN-13 9781119632726
Format Hardcover
Country of Publication United States
Short Title The Big R-Book
Year 2020
DEWEY 005.7
Publication Date 2020-12-03
UK Release Date 2020-12-03
Pages 928
Imprint John Wiley & Sons Inc
Place of Publication New York
NZ Release Date 2020-10-27
Author Philippe J.S. De Brouwer
Publisher John Wiley & Sons Inc
Subtitle From Data Science to Learning Machines and Big Data
Audience Professional & Vocational
US Release Date 2020-12-03
AU Release Date 2020-10-08

TheNile_Item_ID:131330123;