isotree - Isolation-Based Outlier Detection
Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.
Last updated 14 days ago
anomaly-detectionimputationisolation-forestoutlier-detectioncppopenmp
10.34 score 201 stars 6 dependents 115 scripts 1.1k downloadsoutliertree - Explainable Outlier Detection Through Decision Tree Conditioning
Outlier detection method that flags suspicious values within observations, constrasting them against the normal values in a user-readable format, potentially describing conditions within the data that make a given outlier more rare. Full procedure is described in Cortes (2020) <doi:10.48550/arXiv.2001.00636>. Loosely based on the 'GritBot' <https://www.rulequest.com/gritbot-info.html> software.
Last updated 16 days ago
anomaly-detectionoutlier-detectioncppopenmp
7.63 score 56 stars 2 dependents 21 scripts 425 downloadscostsensitive - Cost-Sensitive Multi-Class Classification
Reduction-based techniques for cost-sensitive multi-class classification, in which each observation has a different cost for classifying it into one class, and the goal is to predict the class with the minimum expected cost for each new observation. Implements Weighted All-Pairs (Beygelzimer, Langford, & Zadrozny (2008) <doi:10.1007/978-0-387-79361-0_1>), Weighted One-Vs-Rest (Beygelzimer,Dani, Hayes, Langford, Zadrozny, (2005) <https://dl.acm.org/citation.cfm?id=1102358>) and Regression One-Vs-Rest. Works with arbitrary classifiers taking observation weights, or with regressors. Also implements cost-proportionate rejection sampling for working with classifiers that don't accept observation weights.
Last updated 16 days ago
cost-sensitive-classificationmulti-label-classification
5.42 score 47 stars 28 scripts 135 downloadspoismf - Factorization of Sparse Counts Matrices Through Poisson Likelihood
Creates a non-negative low-rank approximate factorization of a sparse counts matrix by maximizing Poisson likelihood with L1/L2 regularization (e.g. for implicit-feedback recommender systems or bag-of-words-based topic modeling) (Cortes, (2018) <arXiv:1811.01908>), which usually leads to very sparse user and item factors (over 90% zero-valued). Similar to hierarchical Poisson factorization (HPF), but follows an optimization-based approach with regularization instead of a hierarchical prior, and is fit through gradient-based methods instead of variational inference.
Last updated 7 months ago
implicit-feedbackpoisson-factorizationopenblasopenmp
4.66 score 46 stars 9 scripts 321 downloadsreadsparse - Read and Write Sparse Matrices in 'SVMLight' and 'LibSVM' Formats
Read and write labelled sparse matrices in text format as used by software such as 'SVMLight', 'LibSVM', 'ThunderSVM', 'LibFM', 'xLearn', 'XGBoost', 'LightGBM', and others. Supports labelled data for regression, classification (binary, multi-class, multi-label), and ranking (with 'qid' field), and can handle header metadata and comments in files.
Last updated 16 days ago
libsvmsparse-matricessvmlightcpp
4.19 score 8 stars 13 scripts 244 downloadsnonneg.cg - Non-Negative Conjugate-Gradient Minimizer
Minimize a differentiable function subject to all the variables being non-negative (i.e. >= 0), using a Conjugate-Gradient algorithm based on a modified Polak-Ribiere-Polyak formula as described in (Li, (2013) <https://www.hindawi.com/journals/jam/2013/986317/abs/>).
Last updated 5 years ago
conjugate-gradientminimizeoptimizationopenblascppopenmp
3.00 score 2 stars 1 scripts 205 downloads