References:
denoisr is a R package which permits to smooth (remove the noise from) functional data by, first, estimate the Hurst coefficient of the underlying generating process.
Functional data to smooth should be defined on a univariate compact, but can be irregularly sampled. denoisr can also be used only for Hurst parameter estimation.
denoisr have a support for the package funData.
Let \(I\) be a compact interval of \(\mathbb{R}\). We assume \(X^{(1)}, \dots, X^{(N)}\) to be an independent sample of a random process \(X = \left(X_t : t \in I\right)\) with continuous trajectories. For each \(1 \leq n \leq N\), and given a positive integer \(M_n\), let \(T_m^{(n)}, 1 \leq m \leq M_n\) be the random observation times for the curve \(X^{(n)}\). These times are obtained as independent copies of a variable \(T\) taking values in \(I\). The integers \(M_1, \dots, M_N\) represent an independent sample of an integer-valued random variable \(M\) with expectation \(\mu\). We assume that the realizations \(X\), \(M\) and \(T\) are mutually independent. The observations consist of the pairs \(\left(Y_m^{(n)}, T_m^{(n)}\right) \in \mathbb{R} \times I\) where \(Y_m^{(n)}\) is defined as \[Y_m^{(n)} = X^{(n)}(T_m^{(n)}) + \varepsilon_m^{(n)}, \quad 1 \leq n \leq N,~ 1 \leq m \leq M_n,\] and \(\varepsilon_m^{(n)}\) are independent copies of a centered error variable \(\varepsilon\).
Let \(t_0 \in I\) an arbitrarily fixed piont. The aim is to estimate \(X^{(1)}(t_0), \dots, X^{(N)}(t_0)\).
The Canadian temperature data correspond to the daily temperature at \(35\) different locations in Canada averaged over 1960 to 1994. This dataset is included in the package fda.
This section performs the smoothing of the Canadian temperature Weather data.
# Load data
data("canadian_temperature_daily")This package manipulates lists to define functional data. So, functions are provided to convert it as functional data object defined in the package funData.
# Convert list to funData object
data_fd <- list2funData(canadian_temperature_daily)So, we can use all the available function for funData object. As an example, we can plot the data with the ggplot2 style.
# Plot of the data
autoplot(data_fd) +
labs(title = 'Daily Temperature Data',
x = 'Normalized Day of Year',
y = 'Temperature in °C') +
theme_minimal()
The smoothing of the curves can be performed using the function smooth_curves. It accepts the parameter t0_list which correspond to the times where we want to estimate the bandwidth and the parameter k0_list which is the considered neighborhood.
# Smooth the data
smooth_data <- smooth_curves(canadian_temperature_daily,
t0_list = 0.5,
k0_list = 5)Then, we can plot the smoothed data.
# Plot of the smoothed data
smooth_data_fd <- list2funData(smooth_data$smooth)
autoplot(smooth_data_fd) +
labs(title = 'Smooth Daily Temperature Data',
x = 'Normalized Day of Year',
y = 'Temperature in °C') +
theme_minimal()
Finally, we show on particular curve along its smoothed version to look at the impact of smoothing.
# Plot one realisation of the smoothed data
montreal <- tibble(t = canadian_temperature_daily$Montreal$t,
Data = canadian_temperature_daily$Montreal$x,
Smooth = smooth_data$smooth$Montreal$x) %>%
reshape2::melt(id = 't')
ggplot(montreal, aes(x = t, y = value, color = variable)) +
geom_line() +
labs(title = 'Example of Montreal',
x = 'Normalized Day of Year',
y = 'Temperature in °C',
color = '') +
theme_minimal()
We will do the same analysis on some simulated datasets.
First, we will do the analysis on a dataset of fractional brownian motion. A fractional brownian motion is a continuous-time Gaussian process \(B_H(t)\) on \([0, 1]\), such that:
\(B_H(0) = 0\)
\(\mathbb{E}(B_H(t)) = 0, \quad \forall t \in [0, 1]\)
\(\mathbb{E}(B_H(s)B_H(t)) = \frac{1}{2}\left(\lvert s\rvert^{2H} + \lvert t\rvert^{2H} - \lvert t - s\rvert^{2H}\right)\) where \(H\) is a real number in \((0, 1)\), named Hurst parameter.
# Simulate some data
set.seed(42)
fractional_brownian <- generate_fractional_brownian(N = 100, M = 350,
H = 0.5, sigma = 0.05)
# Smooth the data
smooth_data <- smooth_curves(fractional_brownian,
t0_list = 0.5,
k0_list = 14)
# Estimation of the Hurst coefficient
print(smooth_data$parameter$H0)
#> [1] 0.5035138
# Plot a particular observation
obs <- tibble(t = fractional_brownian[[1]]$t,
Noisy = fractional_brownian[[1]]$x,
Truth = fractional_brownian[[1]]$x_true,
Smooth = smooth_data$smooth[[1]]$x) %>%
reshape2::melt(id = 't')
ggplot(obs, aes(x = t, y = value, color = variable)) +
geom_line() +
labs(title = 'Fractional brownian motion',
x = 'Normalized time',
y = 'Value',
color = '') +
theme_minimal()
set.seed(42)
piece_frac_brown <- generate_piecewise_fractional_brownian(N = 150, M = 350,
H = c(0.2, 0.5, 0.8),
sigma = 0.05)
# Smooth the data
smooth_data <- smooth_curves(piece_frac_brown,
t0_list = c(0.15, 0.5, 0.85),
k0_list = c(14, 14, 14))
# Estimation of the Hurst coefficient
print(smooth_data$parameter$H0)
#> [1] 0.1634871 0.6689895 0.7037816
# Plot a particular observation
obs <- tibble(t = piece_frac_brown[[1]]$t,
Noisy = piece_frac_brown[[1]]$x,
Truth = piece_frac_brown[[1]]$x_true,
Smooth = smooth_data$smooth[[1]]$x) %>%
reshape2::melt(id = 't')
ggplot(obs, aes(x = t, y = value, color = variable)) +
geom_line() +
labs(title = 'Piecewise fractional brownian motion',
x = 'Normalized time',
y = 'Value',
color = '') +
theme_minimal()
# Simulate some data
set.seed(42)
inte_fractional_brownian <- generate_integrate_fractional_brownian(N = 100, M = 350, H = 0.5, sigma = 0.025)
# Smooth the data
smooth_data <- smooth_curves_regularity(inte_fractional_brownian,
t0 = 0.5, k0 = 14)
# Estimation of the Hurst coefficient
print(smooth_data$parameter$H0)
#> [1] 1.533345
# Plot a particular observation
obs <- tibble(t = inte_fractional_brownian[[1]]$t,
Noisy = inte_fractional_brownian[[1]]$x,
Truth = inte_fractional_brownian[[1]]$x_true,
Smooth = smooth_data$smooth[[1]]$x) %>%
reshape2::melt(id = 't')
ggplot(obs, aes(x = t, y = value, color = variable)) +
geom_line() +
labs(title = 'Integreted Fractional brownian motion',
x = 'Normalized time',
y = 'Value',
color = '') +
theme_minimal()