R Losses Regression

Source: R/ml_regression_linear_regression.R

Ridge regression is an extension of linear regression where the loss function is modified to minimize the complexity of the model. This modification is done by adding a penalty parameter that is equivalent to the square of the magnitude of the coefficients. Loss function = OLS + alpha. summation (squared coefficient values). Logistic Regression. If linear regression serves to predict continuous Y variables, logistic regression is used for binary classification. If we use linear regression to model a dichotomous variable (as Y), the resulting model might not restrict the predicted Ys within 0 and 1. Calculate the Huber loss, a loss function used in robust regression. This loss function is less sensitive to outliers than rmse.This function is quadratic for small residual values and linear for large residual values.

Perform regression using linear regression.

Arguments

Regression
x

A spark_connection, ml_pipeline, or a tbl_spark.

formula

Used when x is a tbl_spark. R formula as a character string or a formula. This is used to transform the input dataframe before fitting, see ft_r_formula for details.

fit_intercept

Boolean; should the model be fit with an intercept term?

elastic_net_param

ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.

reg_param

Regularization parameter (aka lambda)

max_iter

The maximum number of iterations to use.

weight_col

The name of the column to use as weights for the model fit.

loss

The loss function to be optimized. Supported options: 'squaredError'and 'huber'. Default: 'squaredError'

solver

Solver algorithm for optimization.

standardization

Whether to standardize the training features before fitting the model.

tol

Param for the convergence tolerance for iterative algorithms.

features_col

Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by ft_r_formula.

label_col

Label column name. The column should be a numeric column. Usually this column is output by ft_r_formula.

prediction_col

Prediction column name.

uid

A character string used to uniquely identify the ML estimator.

...

Optional arguments; see Details.

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Predictor object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the predictor appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, a predictor is constructed then immediately fit with the input tbl_spark, returning a prediction model.

  • tbl_spark, with formula: specified When formula is specified, the input tbl_spark is first transformed using a RFormula transformer before being fit by the predictor. The object returned in this case is a ml_model which is a wrapper of a ml_pipeline_model.

Details

When x is a tbl_spark and formula (alternatively, response and features) is specified, the function returns a ml_model object wrapping a ml_pipeline_model which contains data pre-processing transformers, the ML predictor, and, for classification models, a post-processing transformer that converts predictions into class labels. For classification, an optional argument predicted_label_col (defaults to 'predicted_label') can be used to specify the name of the predicted label column. In addition to the fitted ml_pipeline_model, ml_model objects also contain a ml_pipeline object where the ML predictor stage is an estimator ready to be fit against data. This is utilized by ml_save with type = 'pipeline' to faciliate model refresh workflows.

See also

R Loess Regression

See http://spark.apache.org/docs/latest/ml-classification-regression.html for more information on the set of supervised learning algorithms.

Other ml algorithms: ml_aft_survival_regression(),ml_decision_tree_classifier(),ml_gbt_classifier(),ml_generalized_linear_regression(),ml_isotonic_regression(),ml_linear_svc(),ml_logistic_regression(),ml_multilayer_perceptron_classifier(),ml_naive_bayes(),ml_one_vs_rest(),ml_random_forest_classifier()

Examples

loess {stats}R Documentation
Formula

Local Polynomial Regression Fitting

Description

Fit a polynomial surface determined by one or more numericalpredictors, using local fitting.

Usage

Arguments

formula

a formula specifying the numeric response andone to four numeric predictors (best specified via an interaction,but can also be specified additively). Will be coerced to a formulaif necessary.

data

an optional data frame, list or environment (or objectcoercible by as.data.frame to a data frame) containingthe variables in the model. If not found in data, thevariables are taken from environment(formula),typically the environment from which loess is called.

weights

optional weights for each case.

subset

an optional specification of a subset of the data to beused.

na.action

the action to be taken with missing values in theresponse or predictors. The default is given bygetOption('na.action').

model

should the model frame be returned?

span

the parameter α which controls the degree ofsmoothing.

enp.target

an alternative way to specify span, as theapproximate equivalent number of parameters to be used.

degree

the degree of the polynomials to be used, normally 1 or2. (Degree 0 is also allowed, but see the ‘Note’.)

parametric

should any terms be fitted globally rather thanlocally? Terms can be specified by name, number or as a logicalvector of the same length as the number of predictors.

drop.square

for fits with more than one predictor anddegree = 2, should the quadratic term be dropped for particularpredictors? Terms are specified in the same way as forparametric.

normalize

should the predictors be normalized to a common scaleif there is more than one? The normalization used is to set the10% trimmed standard deviation to one. Set to false for spatialcoordinate predictors and others known to be on a common scale.

family

if 'gaussian' fitting is by least-squares, and if'symmetric' a re-descending M estimator is used with Tukey'sbiweight function. Can be abbreviated.

method

fit the model or just extract the model frame. Can be abbreviated.

control

control parameters: see loess.control.

...

control parameters can also be supplied directly(ifcontrol is not specified).

Details

Fitting is done locally. That is, for the fit at point x, thefit is made using points in a neighbourhood of x, weighted bytheir distance from x (with differences in ‘parametric’variables being ignored when computing the distance). The size of theneighbourhood is controlled by α (set by span orenp.target). For α < 1, theneighbourhood includes proportion α of the points,and these have tricubic weighting (proportional to (1 - (dist/maxdist)^3)^3). Forα > 1, all points are used, with the‘maximum distance’ assumed to be α^(1/p)times the actual maximum distance for p explanatory variables.

Loess Regression Smoothing R

For the default family, fitting is by (weighted) least squares. Forfamily='symmetric' a few iterations of an M-estimationprocedure with Tukey's biweight are used. Be aware that as the initialvalue is the least-squares fit, this need not be a very resistant fit.

It can be important to tune the control list to achieve acceptablespeed. See loess.control for details.

Value

An object of class 'loess'.

Note

As this is based on cloess, it is similar to but not identical tothe loess function of S. In particular, conditioning is notimplemented.

The memory usage of this implementation of loess is roughlyquadratic in the number of points, with 1000 points taking about 10Mb.

degree = 0, local constant fitting, is allowed in thisimplementation but not documented in the reference. It seems very littletested, so use with caution.

Author(s)

Loess Regression R Span

B. D. Ripley, based on the cloess package of Cleveland,Grosse and Shyu.

Source

R Losses Regression Calculator

The 1998 version of cloess package of Cleveland,Grosse and Shyu. A later version is available as dloess athttps://www.netlib.org/a/.

References

W. S. Cleveland, E. Grosse and W. M. Shyu (1992) Local regressionmodels. Chapter 8 of Statistical Models in S eds J.M. Chambersand T.J. Hastie, Wadsworth & Brooks/Cole.

See Also

loess.control,predict.loess.

lowess, the ancestor of loess (withdifferent defaults!).

Examples