Title: | Copula-Based Estimator for Long-Range Dependent Processes under Missing Data |
---|---|
Description: | Implements the copula-based estimator for univariate long-range dependent processes, introduced in Pumi et al. (2023) <doi:10.1007/s00362-023-01418-z>. Notably, this estimator is capable of handling missing data and has been shown to perform exceptionally well, even when up to 70% of data is missing (as reported in <arXiv:2303.04754>) and has been found to outperform several other commonly applied estimators. |
Authors: | Taiane Schaedler Prass [aut, cre, com] , Guilherme Pumi [aut, ctb] |
Maintainer: | Taiane Schaedler Prass <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1 |
Built: | 2024-11-07 03:10:55 UTC |
Source: | https://github.com/cran/PPMiss |
This function calculates the coefficients corresponding to
,
up to a truncation point
arfima.coefs(ar = NULL, ma = NULL, d = 0, trunc = 1)
arfima.coefs(ar = NULL, ma = NULL, d = 0, trunc = 1)
ar |
the coefficients of the autoregressive polinomial. Default is NULL |
ma |
the coefficients of the moving average polinomial. Default is null |
d |
the long memory parameter. Default is 0. |
trunc |
the truncation point. Default is 1. |
The coefficients values up to order ‘trunc’.
cks <- arfima.coefs(d = 0.3, trunc = 5) cks cks <- arfima.coefs(d = 0.1, trunc = 5, ar = 0.5, ma = 0.6) cks
cks <- arfima.coefs(d = 0.3, trunc = 5) cks cks <- arfima.coefs(d = 0.1, trunc = 5, ar = 0.5, ma = 0.6) cks
Let be the copula parameter associated to
and
be an estimate of
based on pseudo observations. The long memory parameter
is estimated by
.
d.fit(xt, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1, optim.method = "Brent", method = "mpl", s = 1, m = 24, theta.start = 0.1, empirical = TRUE, r = 2, a = 0, d.interval = c(-0.5, 0.5))
d.fit(xt, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1, optim.method = "Brent", method = "mpl", s = 1, m = 24, theta.start = 0.1, empirical = TRUE, r = 2, a = 0, d.interval = c(-0.5, 0.5))
xt |
a vector with the observed time series. Missing observations are allowed. |
copula |
an object of class ‘copula’. Readily available options
are |
dCdtheta |
a two parameter function that returns the limit of the copula
derivative, with respect to |
theta.lower |
the lower bound for |
theta.upper |
the upper bound for |
optim.method |
a character string specifying the optimization method.
For all available options see |
method |
a character string specifying the copula parameter estimator used. This can be one of: ‘mpl’, ‘itau’, ‘irho’, ‘itau.mpl’ or ‘ml’. See fitCopula for details. Default is ‘mpl’. |
s |
integer. The smallest lag |
m |
integer. The largest lag |
theta.start |
starting value for the parameter optimization via |
empirical |
logical. If |
r |
the exponent used in the Minkowski distance used to calculate |
a |
the value of |
d.interval |
a vector of size 2 giving the lower and upper bound for the
long memory parameter |
, the estimated value of
.
#------------------------- # ARFIMA(0,d,0) process #------------------------- trunc <- 50000 n = 1000 cks <- arfima.coefs(d = 0.25, trunc = trunc) eps <- rnorm(trunc+n) x <- sapply(1:n, function(t) sum(cks*rev(eps[t:(t+trunc)]))) #---------------------- # Original time series #----------------------- # For Frank copula, -Inf < theta < Inf. However, "Brent" requires # finite lower and upper bounds so we use c(-100, 100) here d_frank <- d.fit(xt = x, copula = frank, dCdtheta = dCtheta_frank, theta.lower = -100, theta.upper = 100) d_amh <- d.fit(xt = x, copula = amh, dCdtheta = dCtheta_amh, theta.lower = -1, theta.upper = 1) d_fgm <- d.fit(xt = x, copula = fgm, dCdtheta = dCtheta_fgm, theta.lower = -1, theta.upper = 1) d_gauss <- d.fit(xt = x, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1) c(FRANK = d_frank, AMH = d_amh, FGM = d_fgm, GAUSS = d_gauss) #---------------------------- # Adding some missing values #---------------------------- index <- sample(1:n, size = round(0.2*n)) xt <- x xt[index] <- NA d_frank_m <- d.fit(xt = xt, copula = frank, dCdtheta = dCtheta_frank, theta.lower = -100, theta.upper = 100) d_amh_m <- d.fit(xt = xt, copula = amh, dCdtheta = dCtheta_amh, theta.lower = -1, theta.upper = 1) d_fgm_m <- d.fit(xt = xt, copula = fgm, dCdtheta = dCtheta_fgm, theta.lower = -1, theta.upper = 1) d_gauss_m <- d.fit(xt = xt, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1) data.frame( series = c("Complete", "Missing"), FRANK = c(d_frank, d_frank_m), AMH = c(d_amh, d_amh_m), FGM = c(d_fgm, d_fgm_m), GAUSS = c(d_gauss, d_gauss_m)) #------------------------- # ARFIMA(1,d,1) process #------------------------- # For a faster algorithm to generate ARFIMA processes, # see the package "arfima" trunc <- 50000 cks <- arfima.coefs(d = 0.35, trunc = trunc, ar = -0.2, ma = 0.4) n = 1000 eps <- rnorm(trunc+n) x <- sapply(1:n, function(t) sum(cks*rev(eps[t:(t+trunc)]))) #---------------------- # Original time series #----------------------- # For Frank copula, -Inf < theta < Inf. However, "Brent" requires # finite lower and upper bounds so we use c(-100, 100) here d_frank <- d.fit(xt = x, copula = frank, dCdtheta = dCtheta_frank, theta.lower = -100, theta.upper = 100) d_amh <- d.fit(xt = x, copula = amh, dCdtheta = dCtheta_amh, theta.lower = -1, theta.upper = 1) d_fgm <- d.fit(xt = x, copula = fgm, dCdtheta = dCtheta_fgm, theta.lower = -1, theta.upper = 1) d_gauss <- d.fit(xt = x, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1) c(FRANK = d_frank, AMH = d_amh, FGM = d_fgm, GAUSS = d_gauss) #---------------------------- # Adding some missing values #---------------------------- n = 1000 index <- sample(1:n, size = round(0.2*n)) xt <- x xt[index] <- NA d_frank_m <- d.fit(xt = xt, copula = frank, dCdtheta = dCtheta_frank, theta.lower = -100, theta.upper = 100) d_amh_m <- d.fit(xt = xt, copula = amh, dCdtheta = dCtheta_amh, theta.lower = -1, theta.upper = 1) d_fgm_m <- d.fit(xt = xt, copula = fgm, dCdtheta = dCtheta_fgm, theta.lower = -1, theta.upper = 1) d_gauss_m <- d.fit(xt = xt, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1) data.frame( series = c("Complete", "Missing"), FRANK = c(d_frank, d_frank_m), AMH = c(d_amh, d_amh_m), FGM = c(d_fgm, d_fgm_m), GAUSS = c(d_gauss, d_gauss_m))
#------------------------- # ARFIMA(0,d,0) process #------------------------- trunc <- 50000 n = 1000 cks <- arfima.coefs(d = 0.25, trunc = trunc) eps <- rnorm(trunc+n) x <- sapply(1:n, function(t) sum(cks*rev(eps[t:(t+trunc)]))) #---------------------- # Original time series #----------------------- # For Frank copula, -Inf < theta < Inf. However, "Brent" requires # finite lower and upper bounds so we use c(-100, 100) here d_frank <- d.fit(xt = x, copula = frank, dCdtheta = dCtheta_frank, theta.lower = -100, theta.upper = 100) d_amh <- d.fit(xt = x, copula = amh, dCdtheta = dCtheta_amh, theta.lower = -1, theta.upper = 1) d_fgm <- d.fit(xt = x, copula = fgm, dCdtheta = dCtheta_fgm, theta.lower = -1, theta.upper = 1) d_gauss <- d.fit(xt = x, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1) c(FRANK = d_frank, AMH = d_amh, FGM = d_fgm, GAUSS = d_gauss) #---------------------------- # Adding some missing values #---------------------------- index <- sample(1:n, size = round(0.2*n)) xt <- x xt[index] <- NA d_frank_m <- d.fit(xt = xt, copula = frank, dCdtheta = dCtheta_frank, theta.lower = -100, theta.upper = 100) d_amh_m <- d.fit(xt = xt, copula = amh, dCdtheta = dCtheta_amh, theta.lower = -1, theta.upper = 1) d_fgm_m <- d.fit(xt = xt, copula = fgm, dCdtheta = dCtheta_fgm, theta.lower = -1, theta.upper = 1) d_gauss_m <- d.fit(xt = xt, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1) data.frame( series = c("Complete", "Missing"), FRANK = c(d_frank, d_frank_m), AMH = c(d_amh, d_amh_m), FGM = c(d_fgm, d_fgm_m), GAUSS = c(d_gauss, d_gauss_m)) #------------------------- # ARFIMA(1,d,1) process #------------------------- # For a faster algorithm to generate ARFIMA processes, # see the package "arfima" trunc <- 50000 cks <- arfima.coefs(d = 0.35, trunc = trunc, ar = -0.2, ma = 0.4) n = 1000 eps <- rnorm(trunc+n) x <- sapply(1:n, function(t) sum(cks*rev(eps[t:(t+trunc)]))) #---------------------- # Original time series #----------------------- # For Frank copula, -Inf < theta < Inf. However, "Brent" requires # finite lower and upper bounds so we use c(-100, 100) here d_frank <- d.fit(xt = x, copula = frank, dCdtheta = dCtheta_frank, theta.lower = -100, theta.upper = 100) d_amh <- d.fit(xt = x, copula = amh, dCdtheta = dCtheta_amh, theta.lower = -1, theta.upper = 1) d_fgm <- d.fit(xt = x, copula = fgm, dCdtheta = dCtheta_fgm, theta.lower = -1, theta.upper = 1) d_gauss <- d.fit(xt = x, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1) c(FRANK = d_frank, AMH = d_amh, FGM = d_fgm, GAUSS = d_gauss) #---------------------------- # Adding some missing values #---------------------------- n = 1000 index <- sample(1:n, size = round(0.2*n)) xt <- x xt[index] <- NA d_frank_m <- d.fit(xt = xt, copula = frank, dCdtheta = dCtheta_frank, theta.lower = -100, theta.upper = 100) d_amh_m <- d.fit(xt = xt, copula = amh, dCdtheta = dCtheta_amh, theta.lower = -1, theta.upper = 1) d_fgm_m <- d.fit(xt = xt, copula = fgm, dCdtheta = dCtheta_fgm, theta.lower = -1, theta.upper = 1) d_gauss_m <- d.fit(xt = xt, copula = gauss, dCdtheta = dCtheta_gauss, theta.lower = -1, theta.upper = 1) data.frame( series = c("Complete", "Missing"), FRANK = c(d_frank, d_frank_m), AMH = c(d_amh, d_amh_m), FGM = c(d_fgm, d_fgm_m), GAUSS = c(d_gauss, d_gauss_m))
Calculates an estimate for the constant given by
where ,
is such that
(the product copula), and
is a sequence of absolutely continuous distribution
functions.
k1fun(dCdtheta, fun, data, empirical, mean = 0, sd = 1)
k1fun(dCdtheta, fun, data, empirical, mean = 0, sd = 1)
dCdtheta |
a function providing the limit as |
fun |
optionally, a function providing an estimator for the probability density function. |
data |
the observed time series. Only used to obtain the quantile function
when |
empirical |
logical. If |
mean |
the mean of the gaussian distribution.
Only used if |
sd |
the standard deviation of the gaussian distribution.
Only used if |
Here and
are replaced by sample estimators for these
functions or the gaussian density and quantile functions are used, depending
on the context.
The function kdens
is used as sample estimator of and
quantile
is the sample estimator of .
The value of .
trunc = 50000 cks <- arfima.coefs(d = 0.25, trunc = trunc) eps <- rnorm(trunc+1000) x <- sapply(1:1000, function(t) sum(cks*rev(eps[t:(t+trunc)]))) # kernel density function dfun <- kdens(x) # calculating K1 using four copulas and empirical estimates for F' and F^{(-1)} K1_frank_e <- k1fun(dCdtheta = dCtheta_frank, fun = dfun, data = x, empirical = TRUE) K1_amh_e <- k1fun(dCdtheta = dCtheta_amh, fun = dfun, data = x, empirical = TRUE) K1_fgm_e <- k1fun(dCdtheta = dCtheta_fgm, fun = dfun, data = x, empirical = TRUE) K1_gauss_e <- k1fun(dCdtheta = dCtheta_gauss, fun = dfun, data = x, empirical = TRUE) # calculating K1 using four copulas and gaussian marginals K1_frank_g <- k1fun(dCdtheta = dCtheta_frank, fun = NULL, data = NULL, empirical = FALSE, mean = mean(x), sd = sd(x)) K1_amh_g <- k1fun(dCdtheta = dCtheta_amh, fun = NULL, data = NULL, empirical = FALSE, mean = mean(x), sd = sd(x)) K1_fgm_g <- k1fun(dCdtheta = dCtheta_fgm, fun = NULL, data = NULL, empirical = FALSE, mean = mean(x), sd = sd(x)) K1_gauss_g <- k1fun(dCdtheta = dCtheta_gauss, fun = NULL, data = NULL, empirical = FALSE, mean = mean(x), sd = sd(x)) # comparing results data.frame(MARGINAL = c("Empirical", "Gaussian"), FRANK = c(K1_frank_e, K1_frank_g), AMH = c(K1_amh_e, K1_amh_g), FGM = c(K1_fgm_e, K1_fgm_g), GAUSS = c(K1_gauss_e, K1_gauss_g))
trunc = 50000 cks <- arfima.coefs(d = 0.25, trunc = trunc) eps <- rnorm(trunc+1000) x <- sapply(1:1000, function(t) sum(cks*rev(eps[t:(t+trunc)]))) # kernel density function dfun <- kdens(x) # calculating K1 using four copulas and empirical estimates for F' and F^{(-1)} K1_frank_e <- k1fun(dCdtheta = dCtheta_frank, fun = dfun, data = x, empirical = TRUE) K1_amh_e <- k1fun(dCdtheta = dCtheta_amh, fun = dfun, data = x, empirical = TRUE) K1_fgm_e <- k1fun(dCdtheta = dCtheta_fgm, fun = dfun, data = x, empirical = TRUE) K1_gauss_e <- k1fun(dCdtheta = dCtheta_gauss, fun = dfun, data = x, empirical = TRUE) # calculating K1 using four copulas and gaussian marginals K1_frank_g <- k1fun(dCdtheta = dCtheta_frank, fun = NULL, data = NULL, empirical = FALSE, mean = mean(x), sd = sd(x)) K1_amh_g <- k1fun(dCdtheta = dCtheta_amh, fun = NULL, data = NULL, empirical = FALSE, mean = mean(x), sd = sd(x)) K1_fgm_g <- k1fun(dCdtheta = dCtheta_fgm, fun = NULL, data = NULL, empirical = FALSE, mean = mean(x), sd = sd(x)) K1_gauss_g <- k1fun(dCdtheta = dCtheta_gauss, fun = NULL, data = NULL, empirical = FALSE, mean = mean(x), sd = sd(x)) # comparing results data.frame(MARGINAL = c("Empirical", "Gaussian"), FRANK = c(K1_frank_e, K1_frank_g), AMH = c(K1_amh_e, K1_amh_g), FGM = c(K1_fgm_e, K1_fgm_g), GAUSS = c(K1_gauss_e, K1_gauss_g))
The probability density function is estimated using a kernel density
approach. More specifically, first
is estimated
using
(default for the function
density
)
equally spaced points ,
, in the interval
, where
is the bandwidth for
the Gaussian kernel density estimator, chosen by applying the Silverman's
rule of thumb (the default procedure in
density
).
A cubic spline interpolation (the default method for spline
)
is then applied to the pairs to obtain
for all
.
kdens(x)
kdens(x)
x |
the data from which the estimate is to be computed. |
a function that approximates the probability density function.
# creating a time series trunc = 50000 cks <- arfima.coefs(d = 0.25, trunc = trunc) eps <- rnorm(trunc+1000) x <- sapply(1:1000, function(t) sum(cks*rev(eps[t:(t+trunc)]))) # kernel density function dfun <- kdens(x) # plot curve(dfun, from = min(x), to = max(x))
# creating a time series trunc = 50000 cks <- arfima.coefs(d = 0.25, trunc = trunc) eps <- rnorm(trunc+1000) x <- sapply(1:1000, function(t) sum(cks*rev(eps[t:(t+trunc)]))) # kernel density function dfun <- kdens(x) # plot curve(dfun, from = min(x), to = max(x))
Implemented copulas and the corresponding derivative limit,
as , where
is such that
.
An estimate for
is obtained based on the copula function used.
and the derivatives are used to obtain an estimate for
.
The functions ‘frank’, ‘amh’, ‘fgm’ and ‘gauss’
are shortcuts for
copula::frankCopula()
,
copula::amhCopula()
, copula::fgmCopula()
and
copula::normalCopula()
from package ‘copula’,
respectively.
frank dCtheta_frank(u, v) amh dCtheta_amh(u, v) fgm dCtheta_fgm(u, v) gauss dCtheta_gauss(u, v)
frank dCtheta_frank(u, v) amh dCtheta_amh(u, v) fgm dCtheta_fgm(u, v) gauss dCtheta_gauss(u, v)
u |
a real number between 0 and 1. |
v |
a real number between 0 and 1. |
An object of class frankCopula
of length 1.
An object of class amhCopula
of length 1.
An object of class fgmCopula
of length 1.
An object of class normalCopula
of length 1.
The constant is given by
where ,
and
is a sequence of absolutely continuous distribution
functions
Archimedean copula objects of class ‘frankCopula’, ‘amhCopula’ or a
Farlie-Gumbel-Morgenstern copula object of class ‘fgmCopula’ or an elliptical
copula object of class ‘normalCopula’. For details, see
archmCopula
, fgmCopula
and
ellipCopula
.
The derivative functions return the limit, as , of the
derivative with respect to
, corresponding to the copula functions.