isotree - Isolation-Based Outlier Detection
Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <arXiv:1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <arXiv:2110:13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <arXiv:1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <arXiv:1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <arXiv:2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.
Last updated 2 months ago
anomaly-detectionimputationisolation-forestoutlier-detection
10.13 score 188 stars 4 packages 92 scripts 1.1k downloadsoutliertree - Explainable Outlier Detection Through Decision Tree Conditioning
Outlier detection method that flags suspicious values within observations, constrasting them against the normal values in a user-readable format, potentially describing conditions within the data that make a given outlier more rare. Full procedure is described in Cortes (2020) <doi:10.48550/arXiv.2001.00636>. Loosely based on the 'GritBot' <https://www.rulequest.com/gritbot-info.html> software.
Last updated 3 months ago
anomaly-detectionoutlier-detection
7.63 score 56 stars 2 packages 21 scripts 509 downloadscostsensitive - Cost-Sensitive Multi-Class Classification
Reduction-based techniques for cost-sensitive multi-class classification, in which each observation has a different cost for classifying it into one class, and the goal is to predict the class with the minimum expected cost for each new observation. Implements Weighted All-Pairs (Beygelzimer, Langford, & Zadrozny (2008) <doi:10.1007/978-0-387-79361-0_1>), Weighted One-Vs-Rest (Beygelzimer,Dani, Hayes, Langford, Zadrozny, (2005) <https://dl.acm.org/citation.cfm?id=1102358>) and Regression One-Vs-Rest. Works with arbitrary classifiers taking observation weights, or with regressors. Also implements cost-proportionate rejection sampling for working with classifiers that don't accept observation weights.
Last updated 5 months ago
cost-sensitive-classificationmulti-label-classification
5.30 score 47 stars 28 scripts 167 downloadspoismf - Factorization of Sparse Counts Matrices Through Poisson Likelihood
Creates a non-negative low-rank approximate factorization of a sparse counts matrix by maximizing Poisson likelihood with L1/L2 regularization (e.g. for implicit-feedback recommender systems or bag-of-words-based topic modeling) (Cortes, (2018) <arXiv:1811.01908>), which usually leads to very sparse user and item factors (over 90% zero-valued). Similar to hierarchical Poisson factorization (HPF), but follows an optimization-based approach with regularization instead of a hierarchical prior, and is fit through gradient-based methods instead of variational inference.
Last updated 5 months ago
implicit-feedbackpoisson-factorization
4.65 score 45 stars 9 scripts 344 downloadsreadsparse - Read and Write Sparse Matrices in 'SVMLight' and 'LibSVM' Formats
Read and write labelled sparse matrices in text format as used by software such as 'SVMLight', 'LibSVM', 'ThunderSVM', 'LibFM', 'xLearn', 'XGBoost', 'LightGBM', and others. Supports labelled data for regression, classification (binary, multi-class, multi-label), and ranking (with 'qid' field), and can handle header metadata and comments in files.
Last updated 5 months ago
libsvmsparse-matricessvmlight
4.32 score 8 stars 13 scripts 284 downloadsnonneg.cg - Non-Negative Conjugate-Gradient Minimizer
Minimize a differentiable function subject to all the variables being non-negative (i.e. >= 0), using a Conjugate-Gradient algorithm based on a modified Polak-Ribiere-Polyak formula as described in (Li, (2013) <https://www.hindawi.com/journals/jam/2013/986317/abs/>).
Last updated 5 years ago
conjugate-gradientminimizeoptimization
3.00 score 2 stars 1 scripts 218 downloads