Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data

Show authors

Publication Type

Journal

Journal Name

Physical Review E

Publication Date

October, 2007

Volume

E 76

Issue

026209

Abstract

Commonly used dependence measures, such as linear correlation, cross-correlogram or Kendall's Tau, cannot capture the complete dependence structure in data unless the structure is restricted to linear, periodic or monotonic. Mutual information (MI) has been frequently utilized for capturing the complete dependence structure including nonlinear dependence. Recently, several methods have been proposed for the MI estimation, such as kernel density estimators (KDE), k-nearest neighbors (KNN), Edgeworth approximation of differential entropy, and adaptive partitioning of the XY plane. However, outstanding gaps in the current literature have precluded the ability to effectively automate these methods, which, in turn, have caused limited adoptions by the application communities. This study attempts to address a key gap in the literature, specifically, the evaluation of the above methods to choose the best method, particularly in terms of their robustness for short and noisy data, based on comparisons with the theoretical MI estimates, which can be computed analytically, as well with linear correlation and Kendall's Tau. Here we consider smaller data sizes, such as 50, 100, and 1 000, where this study considers 50 and 100 data points as very short and 1 000 as short. We consider a broader class of functions, specifically linear, quadratic, periodic and chaotic, contaminated with artificial noise with varying noise-to-signal ratios. The case studies presented here are motivated by domain consideration in the earth sciences where the data are short and noisy. Our results indicate KDE as the best choice for very short data at relatively high noise-to-signal levels whereas the performance of KNN is the best for short data at relatively low noise levels as well as for short data consistently across noise levels. In addition, the optimal smoothing parameter of a Gaussian kernel appears to be the best choice for KDE while three nearest neighbors appear optimal for KNN. Thus, in situations where the approximate data sizes are known in advance, and exploratory data analysis and/or domain knowledge can be used to provide a priori insights on the noise-to-signal ratios, the results in the paper point to a way forward for automating the process of MI estimation.

91����

Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data

Abstract

Researchers

Organizations

91��