The study of the rodent fluctuations of the North was initiated in its modern form with Elton’s pioneering work. Many scientific studies have been designed to collect yearly rodent abundance data, but the resulting time series are generally subject to at least two “problems”: being short and non-linear. We explore the use of the continuous threshold autoregressive (TAR) models for analyzing such data. In the simplest case, the continuous TAR models are additive autoregressive models, being piecewise linear in one lag, and linear in all other lags. The location of the slope change is called the threshold parameter. The continuous TAR models for rodent abundance data can be derived from a general prey-predator model under some simplifying assumptions. The lag in which the threshold is located sheds important insights on the structure of the prey-predator system. We propose to assess the uncertainty on the location of the threshold via a new bootstrap called the nearest block bootstrap (NBB) which combines the methods of moving block bootstrap and the nearest neighbor bootstrap. The NBB assumes an underlying finite-order time-homogeneous Markov process. Essentially, the NBB bootstraps blocks of random block sizes, with each block being drawn from a non-parametric estimate of the future distribution given the realized past bootstrap series. We illustrate the methods by simulations and on a particular rodent abundance time series from Kilpisj{\"{a}}rvi, Northern Finland.
Multivariate failure time data arise frequently in survival analysis. A commonly used technique is the working independence estimator for marginal hazard models. Two natural questions are how to improve the efficiency of the working independence estimator and how to identify the situations under which such an estimator has high statistical efficiency. In this paper, three weighted estimators are proposed based on three different optimal criteria in terms of the asymptotic covariance of weighted estimators. Simplified close-form solutions are found, which always outperform the working independence estimator. We also prove that the working independence estimator has high statistical efficiency, when asymptotic covariance of derivatives of partial log-likelihood functions is nearly exchangeable or diagonal. Simulations are conducted to compare the performance of the weighted estimator and working independence estimator. A data set from Busselton population health surveys is analyzed using the proposed estimators.
Some of the history of soccer/world football in China is presented. Then consideration turns to the 2008 Chinese Super League. It has 16 teams. The results from the first half of the season, i.e. 15 rounds, are studied. The response of interest for a specific game is whether the home team won, tied or lost, who the home team was, and who the opponent was. The response is ordinal-valued. A generalized linear model is fit and then, given the remaining fixtures, used to predict the final standings of the season. Other explanatories, such as round number, are considered for inclusion in the model. Simulation is employed to estimate probabilities of interest.
This paper considers the estimation of an unknown function $h$ that can be characterized as a solution to a nonlinear operator equation mapping between two infinite dimensional Hilbert spaces. The nonlinear operator is unknown but can be consistently estimated, and its inverse is discontinuous, rendering the problem ill-posed. We establish the consistency for the class of estimators that are regularized using general lower semicompact penalty functions. We derive the optimal convergence rates of the estimators under the Hilbert scale norms. We apply our results to two important problems in economics and finance: (1) estimating the parameters of the pricing kernel of defaultable bonds; (2) recovering the volatility surface implied by option prices allowing for measurement error in the option prices and numerical error in the computation of the operator.
Recurrent event time data are common in biomedical follow-up studies, in which a study subject may experience repeated occurrences of an event of interest. In this paper, we evaluate two popular nonparametric tests for recurrent event time data in terms of their relative efficiency. One is the log-rank test for classical survival data and the other a more recently developed nonparametric test based on comparing mean recurrent rates. We show analytically that, somewhat surprisingly, the log-rank test that only makes use of time to the first occurrence could be more efficient than the test for mean occurrence rates that makes use of all available recurrence times, provided that subject-to-subject variation of recurrence times is large. Explicit formula are derived for asymptotic relative efficiencies under the frailty model. The findings are demonstrated via extensive simulations.
Advances in nanotechnology enable scientists for the first time to study biological processes on a nanoscale molecule-by-molecule basis. They also raise challenges and opportunities for statisticians and applied probabilists. To exemplify the stochastic inference and modeling problems in the field, this paper discusses a few selected cases, ranging from likelihood inference, Bayesian data augmentation, and semi- and non-parametric inference of nanometric biochemical systems to the utilization of stochastic integro-differential equations and stochastic networks to model single-molecule biophysical processes. We discuss the statistical and probabilistic issues as well as the biophysical motivation and physical meaning behind the problems, emphasizing the analysis and modeling of real experimental data.
The best breakdown point robustness is one of the most outstanding features of the univariate median. For this robustness property, the median, however, has to pay the price of a low efficiency at normal and other light-tailed models. Affine equivariant multivariate analogues of the univariate median with high breakdown points were constructed in the past two decades. For the high breakdown robustness, most of them also have to sacrifice their efficiency at normal and other models, nevertheless. The affine equivariant maximum depth estimator proposed and studied in this paper turns out to be an exception. Like the univariate median, it also possesses a highest breakdown point among all its multivariate competitors. Unlike the univariate median, it is also highly efficient relative to the sample mean at normal and various other distributions, overcoming the vital low-efficiency shortcoming of the univariate and other multivariate generalized medians. The paper also studies the asymptotics of the estimator and establishes its limit distribution without symmetry and other strong assumptions that are typically imposed on the underlying distribution.
In this paper, the constrained M-estimation of the regression coefficients and scatter parameters in a general multivariate linear regression model is considered. Since the constrained M-estimation is not easy to compute, an up-dating recursion procedure is proposed to simplify the computation of the estimators when a new observation is obtained. We show that, under mild conditions, the recursion estimates are strongly consistent. In addition, the asymptotic normality of the recursive constrained M-estimators of regression coefficients is established. A Monte Carlo simulation study of the recursion estimates is also provided. Besides, robustness and asymptotic behavior of constrained M-estimators are briefly discussed.
We study the asymptotic distribution of the $L_1$ regression estimator under general conditions with matrix norming and possibly non i.i.d. errors. We then introduce an appropriate bootstrap procedure to estimate the distribution of this estimator and study its asymptotic properties. It is shown that this bootstrap is consistent under suitable conditions and in other situations the bootstrap limit is a random distribution.
Let $X_{1}, \ldots, X_{n}$ be independent and identically distributed random variables and $W_n=W_n(X_{1}, \ldots, X_n)$ be an estimator of parameter $\theta $. Denote $T_{n}=(W_{n}-\theta _{0})/ s_{n}$, where $s_{n}^{2}$ is a variance estimator of $W_n$. In this paper a general result on the limiting distributions of the non-central studentized statistic $T_n$ is given. Especially, when $s_n^2$ is the jacknife estimate of variance, it is shown that the limit could be normal, a weighted $\chi^2$ distribution, a stable distribution, or a mixture of normal and stable distribution. Applications to the power of the studentized $U$- and $L$- tests are also discussed.
For several decades, much attention has been paid to the two-sample Behrens-Fisher (BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures. Little work, however, has been done for the $k$-sample BF problem for high dimensional data which tests the equality of the mean vectors of several high-dimensional normal populations with unequal covariance structures. In this paper we study this challenging problem via extending the famous Scheffe's transformation method, which reduces the $k$-sample BF problem to a one-sample problem. The induced one-sample problem can be easily tested by the classical Hotelling's $T^2$ test when the size of the resulting sample is very large relative to its dimensionality. For high dimensional data, however, the dimensionality of the resulting sample is often very large, and even much larger than its sample size, which makes the classical Hotelling’s $T^2$ test not powerful or not even well defined. To overcome this difficulty, we propose and study an $L^2$-norm based test. The asymptotic powers of the proposed $L^2$-norm based test and Hotelling's $T^2$ test are derived and theoretically compared. Methods for implementing the $L^2$-norm based test are described. Simulation studies are conducted to compare the $L^2$-norm based test and Hotelling's $T^2$ test when the latter can be well defined, and to compare the proposed implementation methods for the $L^2$-norm based test otherwise. The methodologies are motivated and illustrated by a real data example.
The generalized Friedman's urn model is a popular urn model which is widely used in many disciplines. In particular, it is extensively used in treatment allocation schemes in clinical trials. In this paper, we show that both the urn composition process and the allocation proportion process can be approximated by a multi-dimensional Gaussian process almost surely for a multi-color generalized Friedman's urn model with both homogeneous and non-homogeneous generating matrices. The Gaussian process is a solution of a stochastic differential equation. This Gaussian approximation is important for the understanding of the behavior of the urn process and is also useful for statistical inferences. As an application, we obtain the asymptotic properties including the asymptotic normality and the law of the iterated logarithm for a multi-color generalized Friedman's urn model as well as the randomized-play-the-winner rule as a special case.
The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics. The high dimensionality of the feature space causes serious difficulties: (i) the sample correlations between features become high even if the features are stochastically independent; (ii) the computation becomes intractable. These difficulties make conventional approaches either inapplicable or inefficient. The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem. Along this line, we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space. The procedure of tournament screening mimics that of a tournament. It is shown theoretically that the tournament screening has the sure screening property, a necessary property which should be satisfied by any valid screening procedure. It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches.
We use the functional principal component analysis (FPCA) to model and predict the weight growth in children. In particular, we examine how the approach can help discern growth patterns of underweight children relative to their normal counterparts, and whether a commonly used transformation to normality plays any constructive roles in a predictive model based on the FPCA. Our work supplements the conditional growth charts developed by Wei and He (2006) by constructing a predictive growth model based on a small number of principal components scores on individual's past.
Ranked-set sampling (RSS) often provides more efficient inference than simple random sampling (SRS). In this article, we propose a systematic nonparametric technique, RSS-EL, for hypothesis testing and interval estimation with balanced RSS data using empirical likelihood (EL). We detail the approach for interval estimation and hypothesis testing in one-sample and two-sample problems and general estimating equations. In all three cases, RSS is shown to provide more efficient inference than SRS of the same size. Moreover, the RSS-EL method does not require any easily violated assumptions needed by existing rank-based nonparametric methods for RSS data, such as perfect ranking, identical ranking scheme in two groups, and location shift between two population distributions. The merit of the RSS-EL method is also demonstrated through simulation studies.
In this paper we investigate how to employ stochastic regression to hedge risks in finance, where the risk of a security is measured by its quadratic variation process. Mykland and Zhang used this technique to demonstrate how to reduce the risk of a given security by introducing another security. In this paper, we investigate how to further reduce the remaining unhedgable risk by adding more hedging securities. Some practical guidelines on how to choose those hedging securities in practice is also given.
Recently generalized exponential distribution has received considerable attentions. In this paper, we deal with the Bayesian inference of the unknown parameters of the progressively censored generalized exponential distribution. It is assumed that the scale and the shape parameters have independent gamma priors. The Bayes estimates of the unknown parameters cannot be obtained in the closed form. Lindley’s approximation and importance sampling technique have been suggested to compute the approximate Bayes estimates. Markov Chain Monte Carlo method has been used to compute the approximate Bayes estimates and also to construct the highest posterior density credible intervals. We also provide different criteria to compare two different sampling schemes and hence to find the optimal sampling schemes. It is observed that finding the optimum censoring procedure is a computationally expensive process. And we have recommended to use the sub-optimal censoring procedure, which can be obtained very easily. Monte Carlo simulations are performed to compare the performances of the different methods and one data analysis has been performed for illustrative purposes.
· CNKI · Wamfangdata