TY - GEN AU - Acevedo, Miguel F. TI - Data analysis and statistics for geography, environmental science and engineering SN - 9781439885017 U1 - 519.5 PY - 2013/// CY - Boca Raton PB - CRC Press N1 - CONTENTS Preflace xv Acknowledgments xix Author xii PART I Introduction to Probability, Statistics, Time Series, and Spatial Analysis Chapter 1 Introduction 3 Brief History of Statistical and Probabilistic Analysis 3 Computers 4 Applications 4 Types of Variables 4 Discrete 5 Continuous 5 Discretization 5 Independent vs. Dependent Variables 6 Probability Theory and Random Variables 6 Methodology 6 Descriplive Statistics 7 Inferential Statistics 7 Predictors, Models, and Regression 7 Time Series 8 Spatial Data Analysi 8 Matrices and Multiple Dimensions 8 Other Approaches: Process-Based Models 9 Baby Steps: Calculations and Graphs 9 Mean, Variance, and Standard Deviation of a Sample 9 Simple Graphs as Text: Stem-and-Leaf Plots 10 Histograms 11 Exercises 11 Computer Session: Introduction to R 11 Working Directory 11 Installing R 11 Personalize the R GUI Shortcut 11 Running R 13 Basic R Skills 13 R Console 15 Scripts 15 Graphics Device 16 Downloading Data Files 17 Read a Simple Text Data File 17 Simple Statistics 19 Simple Graphs as Text: Stem-and-Leaf Plots 20 Simple Graphs to a Graphics Window 20 Addressing Entries of an Array 20 Example: Salinity 22 CSV Text Files 23 Store Your Data Files and Objects 24 Command History and Long Sequences of Commands 25 Editing Data in Objects 25 Cleanup and Close R .Session 26 Computer Exercises 26 Supplementary Reading 27 Chapter 2 Probability Theory 29 Events and Probabilities 29 Algebra of Events 29 Combinations 31 Probability Trees 32 Conditional Probability 33 Testing Water Quality: False Negative and False Positive 34 Bayes' Theorem 35 Generalization of Bayes' Rule to Many Events 36 Bio-Sensing 36 Decision Making 37 Exercises 39 Computer Session: Introduction to Rcmdr, Programming, and Multiple Plots 40 R Commander 40 Package Installation and Loading 40 R GUI SDI Option: Best for R Commander 43 How to Import a Text Data File Using Rcmdr 43 Simple Graphs on a Text Window 45 Simple Graphs on a Graphics Window: Histograms 46 More than One Variable: Reading Files and Plot Variables 47 Using the R Console 48 Using the R Commander 51 Programming Loops 53 Application: Bayes' Theorem 54 Application: Decision Making 55 More on Graphics Windows 55 Editing Data in Objects 56 Clean Up and Exit 56 Additional GUIs to Use R 57 Modifying the R Commander 57 Other Packages to Be Used in the Book 57 Computer Exercises 58 Supplementary Reading 58 Chapter 3 Random Variables, Distributions, Moments, and Statistics 59 Random Variables 59 Distributions 59 Contents vii Probability Mass and Density Functions (pmf and pdf) 59 Cumulative Functions (cmf and cdf) 62 Histograms 62 Moments 63 First Moment or Mean 63 Second Central Moment or Variance 64 Population and Sample 66 Other Statistics and Ways of Characterizing a Sample 67 Some Important RV and Distributions 68 Application Examples: Species Diversity 72 Central Limit Theorem 72 Random Number Generation 73 Exercises 74 Computer Session: Probability and Descriptive Statistics 75 Descriptive Statistics: Categorical Data, Table, and Pie Chart 75 Using a Previously Generated Object or a Dataset 78 Summary of Descriptive Statistics and Histogram 78 Density Approximation 81 Theoretical Distribution: Example Binomial Distribution 82 Application Example: Species Diversity 86 Random Number Generation 86 3.9.8Comparing Sample and Theoretical Distributions Example Binomial 89 Programming Application: Central Limit Theorem 90 Sampling: Function Sample 92 Cleanup and Close R Session 92 Computer Exercises 93 Supplementary Reading 93 Chapter 4 Exploratory Analysis and Introduction to Inferential Statistics 95 4.1 Exploratory Data Analysis (EDA) 95 Index Plot 95 Boxplot 95 Empirical Cumulative Distribution Function (ecdf) 96 Quantile-Quantile (q-q) Plots 98 Combining Plots for Exploratory Data Analysis (EDA) 98 Relationships: Covariance and Correlation 98 4.2.1 Serial Data: Time Series and Autocorrelation 101 Statistical Inference 102 Hypothesis Testing 103 p-Value 105 Power 105 Confidence Intervals 107 Statistical Methods 109 Parametric Methods 110 Z Test or Standard Normal 110 Thef-Test 110 The FTest Ill Correlation 112 4.6 Nonparametric Methods 1 '2 Mann-Whitney or Wilcoxon Rank Sum Test 112 Wilcoxon Signed Rank Test 112 Spearman Correlation 112 Exercises 1' 3 Computer Session: Exploratory Analysis and Inferential Statistics 113 Create an Example Dataset 1 '3 Index Plot 113 Boxplot 114 Empirical Cumulative Plot 114 Functions 115 Building a Function: Example 115 More on the Standard Normal 116 Quantile-Quantile (q-q) Plots 118 Function to Plot Exploratory Data Analysis (EDA) Graphs 119 Time Series and Autocorrelation Plots 120 Additional Functions for the Rconsole and the R Commander 121 Parametric: One Sample f-Test or Means Test 122 Power Analysis: One Sample f-Test 124 Parametric: Two-Sample f-Test 126 Power Analysis: Two Sample f-Test 128 Using Data Sets from Packages 129 Nonparametric: Wilcoxon Test 130 Bivariate Data and Correlation Test 132 Computer Exercises 135 Supplementary Reading 136 Chapter 5 More on Inferential Statistics: Goodness of Fit, Contingency Analysis, and Analysis of Variance '37 5.1Goodness of Fit (GOF) 137 Qualitative: Exploratory Analysis 137 x2 (Chi-Square) Test 137 Kolmogorov-Smirnov (K-S) 140 Shapiro-Wilk Test 140 Counts and Proportions 141 Contingency Tables and Cross-Tabulation 141 Analysis of Variance 144 ANOVA One-Way 145 ANOVA Two-Way 148 Factor Interaction in ANOVA Two-Way 149 Nonparametric Analysis of Variance 150 Exercises '5' Computer Session: More on Inferential Statistics 153 GOF: Exploratory Analysis 153 GOF: Chi-Square Test 154 GOF: Kolmogorov-Smirnov Test 155 GOF: Shapiro-Wilk 156 Count Tests and the Binomial 156 Obtaining a Single Element of a Test Result 157 Comparing Proportions: prop.test 158 Contingency Tables: Direct Input 159 Contingency Tables: Cross-Tabulation 160 ANOVA One-Way 162 ANOVA Two-Way 166 ANOVA Nonparametric: Kruskal-Wallis 169 ANOVA Nonparametric: Friedman 172 ANOVA: Generating Fictional Data for Further Learning 172 Computer Exercises 175 Supplementary Reading 176 Chapter 6 Regression 177 6.1Simple Linear Least Squares Regression 177 Derivatives and Optimization 178 Calculating Regression Coefficients 180 Interpreting the Coefficients Using Sample Means, Variances,and Covariance 183 Regression Coefficients from Expected Values 184 Interpretation of the Error Terms 185 Evaluating Regression Models 188 Regression through the Origin 192 ANOVA as Predictive Tool 195 Nonlinear Regression 196 Log Transform 197 Nonlinear Optimization 197 Polynomial Regression 198 Predicted vs. Observed Plots 198 6.4 Computer .Session: Simple Regression 200 Scatter Plots 200 Simple Linear Regression 202 Nonintercept Model or Regression through the Origin 206 ANOVA One Way: As Linear Model 208 Linear Regression: Lack-of-Fit to Nonlinear Data 211 Nonlinear Regression by Transformation 214 Nonlinear Regression by Optimization 216 Polynomial Regression 219 Predicted vs. Observed Plots 221 Computer Exercises 221 Supplementary Reading 223 Chapter 7 Stochastic or Random Processes and Time Series 225 Stochastic Processes and Time Series: Basics 225 Gaussian 225 Autocovariance and Autocorrelation 227 Periodic Series, Filtering, and Spectral Analysis 231 Poisson Process 238 Marked Poisson Process 241 Simulation 247 Exercises 249 Computer Session: Random Processes and Time Series 250 Gaussian Random Processes 250 Autocorrelation 252 Periodic Process 252 Filtering and Spectrum 253 Sunspots Example 254 Poisson Process 255 Poisson Process Simulation 255 Marked Poisson Process Simulation: Rainfall 256 Computer Exercises 257 Supplementary Reading 258 Chapter 8 Spatial Point Patterns 259 Types of Spatially Explicit Data 259 Types of Spatial Point Patterns 259 Spatial Distribution 259 Testing Spatial Patterns: Cell Count Methods 260 Testing Uniform Patterns 260 Testing for Spatial Randomness 261 Clustered Patterns 263 8.5 Nearest-Neighbor Analysis 264 First-Order Analysis 264 Second-Order Analysis 266 Marked Point Patterns 268 Geostatistics: Regionalized Variables 269 Variograms: Covariance and Semivariance 270 Covariance 271 Semivariance 272 Directions 274 Variogram Models 276 Exponential Model 276 Spherical Model 278 Gaussian Model 278 Linear and Power Models 279 Modeling the Empirical Variogram 280 Exercises 281 Computer Session: Spatial Analysis 284 Packages and Functions 284 File Format 284 Creating a Pattern: Location-Only 285 Generating Patterns with Random Numbers 286 Grid or Quadrat Analysis: Chi-Square Test for Uniformity 288 Grid or Quadrat Analysis: Randomness, Poisson Model 289 Nearest-Neighbor Analysis: G and K Functions 290 Monte Carlo: Nearest-Neighbor Analysis of Uniformity 293 Marked Spatial Patterns: Categorical Marks 294 Marked Spatial Patterns: Continuous Values 298 Marked Patterns: Use Sample Data from sgeostat 301 Computer Exercises 305 Supplementary Reading 306 PART II Matrices, Tempral and Spatial Autoregressive Processes, and Multivariate Analysis Chapter 9 Matrices and Linear Algebra 309 Matrices 309 Dimension of a Matrix 309 Vectors 310 Square Matrices 310 Trace 311 Symmetric Matrices: Covariance Matrix 311 Identity 312 9.5 Matrix Operations 312 Addition and Subtraction 312 Scalar Multiplication 313 Linear Combination 313 Matrix Multiplication 313 Determinant of a Matrix 315 Matrix Transposition 316. Major Product 316 Matrix Inversion 317 Solving Systems of Linear Equations 319 Linear Algebra Solution of the Regression Problem 321 Alternative Matrix Approach to Linear Regression 323 Exercises 325 Computer Session: Matrices and Linear Algebra 326 Creating Matrices 326 Operations 327 Other Operations 330 Solving System of Linear Equations 331 Inverse 331 Computer Exercises 332 Supplementary Reading 332 Chapter 10 Multivariate Models 333 10.1 Multiple Linear Regression 333 Mat ri x Approach 333 Population Concepts and Expected Values 338 Evaluation and Diagnostics 339 Variable Selection 340 Multivariate Regression 342 Two-Group Discriminant Analysi 344 Multiple Analysis of Variance (M ANOVA) 349 Exercises 353 10.6 Computer Session: Multivariate Models 355 Multiple Linear Regression 355 Multivariate Regression 359 Two-Group Linear Discriminant Analysis 361 M ANOVA 363 Computer Exercises 365 Functions 365 Supplementary Reading 367 Chapter 11 Dependent Stochastic Processes and Time Series 369 11.1 Markov 369 Dependent Models: Markov Chain 369 Two-Step Rainfall Generation: First Step Markov Sequence 371 Combining Dry/Wet Days with Amount on Wet Days 371 Forest Succession 374 Semi-Markov Processes 378 Autoregressive (AR) Process 381 ARMA and ARIMA Models 387 Exercises 389 Computer Session: Markov Processes and Autoregressive Time Series 389 Weather Generation: Rainfall Models 389 Semi-Markov 391 AR(p) Modeling and Forecast 392 ARIMA(p, d, q) Modeling and Forecast 395 Computer Exercises 398 SEEG Functions 400 Supplementary Reading 403 Chapter 12 Geostatistics: Kriging 405 Kriging 405 Ordinary Kriging 405 Universal Kriging 413 Data Transformations 414 Exercises 414 Computer Session: Geostatistics, Kriging 415 Ordinary Kriging 415 Universal Kriging 417 Regular Grid Data Files 422 Functions 425 Computer Exercises 428 Supplementary Reading 428 Chapter 13 Spatial Auto-Correlation and Auto-Regression 429 Lattice Data: Spatial Auto-Correlation and Auto-Regression 429 Spatial Structure and Variance Inflation 429 Neighborhood Structure 429 Spatial Auto-Correlation 432 Moran's/ 432 Transformations m 433 Geary's c 434 Spatial Auto-Regression 434 Exercises 436 Computer Session: Spatial Correlation and Regression 437 Packages 437 Mapping Regions 438 Neighborhood Structure 440 Structure Using Distance 441 Structure Based on Borders 445 Spatial Auto-Correlation 446 Spatial Auto-Regression Models 448 Neighborhood Structure Using Tripack 451 Neighborhood Structure for Grid Data 452 Computer Exercises 453 Supplementary Reading 454 Chapter 14 Multivariate Analysis I: Reducing Dimensionality 455 Multivariate Analysis: Eigen-Decomposition 455 Vectors and Linear Transformation 455 Eigenvalues and Eigenvectors 455 Finding Eigenvalues 457 Finding Eigenvectors 458 4.4 Eigen-Decomposition of a Covariance Matrix 459 Covariance Matrix 459 Bivariate Case 461 Principal Components Analysis (PCA) 465 Singular Value Decomposition and Biplots 469 Factor Analysis 472 Correspondence Analysis 475 Exercises 479 Computer Session: Multivariate Analysis, PCA 480 14.10.1 Eigenvalues and Eigenvectors of Covariance Matrices 480 14.10.2 PCA: A Simple 2x2 Example Using Eigenvaluesand Eigenvectors 481 PCA: A 2 x 2 Example 483 PCA Higher-Dimensional Example 485 PCA Using the Rcmdr 486 Factor Analysis 490 Factor Analysis Using Rcmdr 493 Correspondence Analysis 495 Computer Exercises 499 Supplementary Reading 500 Chapter 15 Multivariate Analysis II: Identifying and Developing Relationships among Observations and Variables 501 Introduction 501 Multigroup Discriminant Analysis (MDA) 501 Canonical Correlation 502 Constrained (or Canonical) Correspondence Analysis (CCA) 505 Cluster Analysis 506 Multidimensional Scaling (MDS) 508 Exercises 509 Computer Session: Multivariate Analysis II 509 Multigroup Linear Discriminant Analysis 509 Canonical Correlation 514 Canonical Correspondence Analysis 515 Cluster Analysis 516 Multidimensional Scaling (MDS) 518 Computer Exercises 520 Supplementary Reading520 Bibliography 521 Index 525 ER -