Two datasets were considered in the application of the techniques proposed to address multicollinearity in the high-dimensional and big data domain. These are:
Alon.csv: a cancer classification dataset. The dataset contains 2000 gene expression levels on 62 colon tissues of which 40 are cancerous.
Airport.csv: data obtained from the U.S. Bureau of Transportation Statistics, which provides information on U.S. transportation systems. The dataset contains information such as the origin and destination airport of various flights, the time at which flights are scheduled to arrive and depart, the delay in departing and arriving flights, and the time and distance to destination airports for major air carriers.