Birmingham, Packt Publishing, 2016. - 416p.
Linear models have been known to scholars and practitioners and studied by them for a long time now. Before they were adopted into data science and placed into the syllabi of numerous boot camps and in the early chapters of many practical how-to-do books, they have been a prominent and relevant element of the body of knowledge of statistics, economics, and of many other respectable quantitative fields of study.
Consequently, there is a vast availability of monographs, book chapters, and papers about linear regression, logistic regression (its classification variant), and the different types of generalized linear models; models where the original linear regression paradigm is adapted in its formulation in order to solve more complex problems.
Yet, in spite of such an embarrassment of riches, we have never encountered any book that really explains the speed and ease of implementation of such linear models when, as a developer or a data scientist, you have to quickly create an application or API whose
response cannot be defined programmatically but it does have to learn from data.
Of course we are very well aware of the limitations of linear models (being simple unfortunately has some drawbacks) and we also know how there is no fixed solution for any data science problem; however, our experience in the field has told us that the following advantages of a linear model cannot be easily ignored:
It’s easy to explain how it works to yourself, to the management, or to anyone
It’s flexible in respect of your data problem, since it can handle numeric and probability estimates, ranking, and classification up to a large number of classes
It’s fast to train, no matter what the amount of data you have to process
It’s fast and easy to implement in any production environment
It’s scalable to real-time response toward users
If for you, as it is daily for us, it is paramount to deliver value from data in a fast and tangible way, just follow us and discover how far linear model can help you get to.
Regression – The Workhorse of Data Science, introduces why regression is indeed useful for data science, how to quickly set up Python for data science and provides an overview of the packages used throughout the book with the help of examples. At the end of this chapter, we will be able to run all the examples contained in the following chapters. You will have clear ideas and motivations why regression analysis is not just an underrated technique taken from statistics but a powerful and effective data science algorithm.
Approaching Simple Linear Regression, presents the simple linear regression by first describing a regression problem, where to fit a regressor, and then giving some intuitions underneath the math formulation of its algorithm. Then, you will learn how to tune the model for higher performances and understand every parameter of it, deeply. Finally, the engine under the hood the gradient descent will be described.
Multiple Regression in Action, extends the simple linear regression to extract predictive information from more than a feature and create models that can solve real-life prediction tasks. The stochastic gradient descent technique, explained in the previous chapter, will be powered up to cope with a matrix of features and to complete the overview, you will be shown multi-collinearity, interactions, and polynomial regression
topics.
Logistic Regression, continues laying down the foundations of your knowledge of linear model. Starting from the necessary mathematical definitions, it demonstrates how to furthermore extend the linear regression to classification problems, both binary and multiclass.
Data Preparation, discusses about the data feeding the model, describing what can be done to prepare the data in the best way and how to deal with unusual situations, especially when data is missing and outliers are present.
Achieving Generalization, will introduce you to the key data science recipes for testing your model thoroughly, tune it at its best, make it parsimonious, and to put it against real fresh data, before proceeding to more complex techniques.
Online and Batch Learning, illustrates the best practices to train classifiers on Big Data; it first focuses on batch learning and its limitations and then introduces online learning. Finally, you will be showed an example of Big Data, combining the benefits of online learning and the power of the hashing trick.
Advanced Regression Methods, introduces some advanced methods for regression. Without getting too deep into their mathematical formulation, but always keeping an eye on practical applications, we will discuss the ideas underneath Least Angle Regression, Bayesian Regression, and stochastic gradient descent with hinge loss, and also touch upon bagging and boosting techniques.
Real-world Applications for Regression Models, comprises of four practical examples of real-world data science problems solved by linear models. The ultimate goal is to demonstrate how to approach such problems and how develop the reasoning around their resolution, so that they can be used as blueprints for similar challenges you’ll encounter.