Goals of the Book
There are many different languages people commonly use to do data analysis and data science. We focus primarily on R, but also use several other domain-specific languages (DSLs) and even touch on languages such as the UNIX shell and C.
This book is not intended to teach the syntax or semantics of the R language, or any of the other languages we use. Nor is it written to list the large number of packages and functions that data scientists commonly use in R. Instead, we wrote the book so that people could experience the thought process involved in solving authentic computational problems related to data analysis problems.
There are many books that teach programming by introducing the important ideas in a section and illustrating them with one or more examples. These are very useful and essential starting points. However, the code in the examples in these books is the final, polished version written by an expert, as it should be. These do not expose the reader to the actual process of writing code, but just the final result.
Our aim is to illustrate the process by which programmers approach a problem and reason about different ways of implementing the solution. This process is very dynamic and iterative. We write some code, test it, change it, refine and extend it and generalize it. Often, we “start over,” having learned from the first attempt, or prototype, and develop a more succinct, clearer version. Along the way, we make trade-offs between simplicity, efficiency, generality, reuse, correct
and approximate results, and so on.
We try to find ways to minimize changes to the code while making it faster and more flexible. In this book, we try to illustrate this entire process and the often implicit decisions experienced programmers make. The hope is to complement the textbooks and provide students, researchers (and even faculty) with a glimpse into how professional data scientists think about daily computational tasks.