R Losses Regression

admin 7/18/2022

Source: R/ml_regression_linear_regression.R

R Loess Regression
Loess Regression Smoothing R
Loess Regression R Span
R Losses Regression Calculator

Ridge regression is an extension of linear regression where the loss function is modified to minimize the complexity of the model. This modification is done by adding a penalty parameter that is equivalent to the square of the magnitude of the coefficients. Loss function = OLS + alpha. summation (squared coefficient values). Logistic Regression. If linear regression serves to predict continuous Y variables, logistic regression is used for binary classification. If we use linear regression to model a dichotomous variable (as Y), the resulting model might not restrict the predicted Ys within 0 and 1. Calculate the Huber loss, a loss function used in robust regression. This loss function is less sensitive to outliers than rmse.This function is quadratic for small residual values and linear for large residual values.

Perform regression using linear regression.

Arguments

x	A `spark_connection`, `ml_pipeline`, or a `tbl_spark`.
formula	Used when `x` is a `tbl_spark`. R formula as a character string or a formula. This is used to transform the input dataframe before fitting, see ft_r_formula for details.
fit_intercept	Boolean; should the model be fit with an intercept term?
elastic_net_param	ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.
reg_param	Regularization parameter (aka lambda)
max_iter	The maximum number of iterations to use.
weight_col	The name of the column to use as weights for the model fit.
loss	The loss function to be optimized. Supported options: 'squaredError'and 'huber'. Default: 'squaredError'
solver	Solver algorithm for optimization.
standardization	Whether to standardize the training features before fitting the model.
tol	Param for the convergence tolerance for iterative algorithms.
features_col	Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by `ft_r_formula`.
label_col	Label column name. The column should be a numeric column. Usually this column is output by `ft_r_formula`.
prediction_col	Prediction column name.
uid	A character string used to uniquely identify the ML estimator.
...	Optional arguments; see Details.

Value

The object returned depends on the class of x.

spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Predictor object and can be used to compose Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the predictor appended to the pipeline.
tbl_spark: When x is a tbl_spark, a predictor is constructed then immediately fit with the input tbl_spark, returning a prediction model.
tbl_spark, with formula: specified When formula is specified, the input tbl_spark is first transformed using a RFormula transformer before being fit by the predictor. The object returned in this case is a ml_model which is a wrapper of a ml_pipeline_model.

Details

When x is a tbl_spark and formula (alternatively, response and features) is specified, the function returns a ml_model object wrapping a ml_pipeline_model which contains data pre-processing transformers, the ML predictor, and, for classification models, a post-processing transformer that converts predictions into class labels. For classification, an optional argument predicted_label_col (defaults to 'predicted_label') can be used to specify the name of the predicted label column. In addition to the fitted ml_pipeline_model, ml_model objects also contain a ml_pipeline object where the ML predictor stage is an estimator ready to be fit against data. This is utilized by ml_save with type = 'pipeline' to faciliate model refresh workflows.

Examples

loess {stats}

R Documentation

Local Polynomial Regression Fitting

Description

Fit a polynomial surface determined by one or more numericalpredictors, using local fitting.

Usage

Arguments

`formula`	a formula specifying the numeric response andone to four numeric predictors (best specified via an interaction,but can also be specified additively). Will be coerced to a formulaif necessary.
`data`	an optional data frame, list or environment (or objectcoercible by `as.data.frame` to a data frame) containingthe variables in the model. If not found in `data`, thevariables are taken from `environment(formula)`,typically the environment from which `loess` is called.
`weights`	optional weights for each case.
`subset`	an optional specification of a subset of the data to beused.
`na.action`	the action to be taken with missing values in theresponse or predictors. The default is given by`getOption('na.action')`.
`model`	should the model frame be returned?
`span`	the parameter α which controls the degree ofsmoothing.
`enp.target`	an alternative way to specify `span`, as theapproximate equivalent number of parameters to be used.
`degree`	the degree of the polynomials to be used, normally 1 or2. (Degree 0 is also allowed, but see the ‘Note’.)
`parametric`	should any terms be fitted globally rather thanlocally? Terms can be specified by name, number or as a logicalvector of the same length as the number of predictors.
`drop.square`	for fits with more than one predictor and`degree = 2`, should the quadratic term be dropped for particularpredictors? Terms are specified in the same way as for`parametric`.
`normalize`	should the predictors be normalized to a common scaleif there is more than one? The normalization used is to set the10% trimmed standard deviation to one. Set to false for spatialcoordinate predictors and others known to be on a common scale.
`family`	if `'gaussian'` fitting is by least-squares, and if`'symmetric'` a re-descending M estimator is used with Tukey'sbiweight function. Can be abbreviated.
`method`	fit the model or just extract the model frame. Can be abbreviated.
`control`	control parameters: see `loess.control`.
`...`	control parameters can also be supplied directly(if`control` is not specified).

Details

Fitting is done locally. That is, for the fit at point x, thefit is made using points in a neighbourhood of x, weighted bytheir distance from x (with differences in ‘parametric’variables being ignored when computing the distance). The size of theneighbourhood is controlled by α (set by span orenp.target). For α < 1, theneighbourhood includes proportion α of the points,and these have tricubic weighting (proportional to (1 - (dist/maxdist)^3)^3). Forα > 1, all points are used, with the‘maximum distance’ assumed to be α^(1/p)times the actual maximum distance for p explanatory variables.

Loess Regression Smoothing R

For the default family, fitting is by (weighted) least squares. Forfamily='symmetric' a few iterations of an M-estimationprocedure with Tukey's biweight are used. Be aware that as the initialvalue is the least-squares fit, this need not be a very resistant fit.

It can be important to tune the control list to achieve acceptablespeed. See loess.control for details.

Value

An object of class 'loess'.

Note

As this is based on cloess, it is similar to but not identical tothe loess function of S. In particular, conditioning is notimplemented.

The memory usage of this implementation of loess is roughlyquadratic in the number of points, with 1000 points taking about 10Mb.

degree = 0, local constant fitting, is allowed in thisimplementation but not documented in the reference. It seems very littletested, so use with caution.

Author(s)

Loess Regression R Span

B. D. Ripley, based on the cloess package of Cleveland,Grosse and Shyu.

Source