isotree - Isolation-Based Outlier Detection
Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.
Last updated 11 days ago
anomaly-detectionimputationisolation-forestoutlier-detectioncppopenmp
10.43 score 206 stars 6 dependents 115 scripts 1.9k downloadsoutliertree - Explainable Outlier Detection Through Decision Tree Conditioning
Outlier detection method that flags suspicious values within observations, constrasting them against the normal values in a user-readable format, potentially describing conditions within the data that make a given outlier more rare. Full procedure is described in Cortes (2020) <doi:10.48550/arXiv.2001.00636>. Loosely based on the 'GritBot' <https://www.rulequest.com/gritbot-info.html> software.
Last updated 3 months ago
anomaly-detectionoutlier-detectioncppopenmp
7.34 score 58 stars 2 dependents 21 scripts 729 downloadsrecometrics - Evaluation Metrics for Implicit-Feedback Recommender Systems
Calculates evaluation metrics for implicit-feedback recommender systems that are based on low-rank matrix factorization models, given the fitted model matrices and data, thus allowing to compare models from a variety of libraries. Metrics include P@K (precision-at-k, for top-K recommendations), R@K (recall at k), AP@K (average precision at k), NDCG@K (normalized discounted cumulative gain at k), Hit@K (from which the 'Hit Rate' is calculated), RR@K (reciprocal rank at k, from which the 'MRR' or 'mean reciprocal rank' is calculated), ROC-AUC (area under the receiver-operating characteristic curve), and PR-AUC (area under the precision-recall curve). These are calculated on a per-user basis according to the ranking of items induced by the model, using efficient multi-threaded routines. Also provides functions for creating train-test splits for model fitting and evaluation.
Last updated 3 months ago
implicit-feedbackmatrix-factorizationrecommender-systemsopenblascppopenmp
5.45 score 28 stars 322 downloadscostsensitive - Cost-Sensitive Multi-Class Classification
Reduction-based techniques for cost-sensitive multi-class classification, in which each observation has a different cost for classifying it into one class, and the goal is to predict the class with the minimum expected cost for each new observation. Implements Weighted All-Pairs (Beygelzimer, Langford, & Zadrozny (2008) <doi:10.1007/978-0-387-79361-0_1>), Weighted One-Vs-Rest (Beygelzimer,Dani, Hayes, Langford, Zadrozny, (2005) <https://dl.acm.org/citation.cfm?id=1102358>) and Regression One-Vs-Rest. Works with arbitrary classifiers taking observation weights, or with regressors. Also implements cost-proportionate rejection sampling for working with classifiers that don't accept observation weights.
Last updated 3 months ago
cost-sensitive-classificationmulti-label-classification
5.30 score 47 stars 28 scripts 205 downloadsreadsparse - Read and Write Sparse Matrices in 'SVMLight' and 'LibSVM' Formats
Read and write labelled sparse matrices in text format as used by software such as 'SVMLight', 'LibSVM', 'ThunderSVM', 'LibFM', 'xLearn', 'XGBoost', 'LightGBM', and others. Supports labelled data for regression, classification (binary, multi-class, multi-label), and ranking (with 'qid' field), and can handle header metadata and comments in files.
Last updated 1 months ago
libsvmsparse-matricessvmlightcpp
4.48 score 8 stars 15 scripts 430 downloadspoismf - Factorization of Sparse Counts Matrices Through Poisson Likelihood
Creates a non-negative low-rank approximate factorization of a sparse counts matrix by maximizing Poisson likelihood with L1/L2 regularization (e.g. for implicit-feedback recommender systems or bag-of-words-based topic modeling) (Cortes, (2018) <arXiv:1811.01908>), which usually leads to very sparse user and item factors (over 90% zero-valued). Similar to hierarchical Poisson factorization (HPF), but follows an optimization-based approach with regularization instead of a hierarchical prior, and is fit through gradient-based methods instead of variational inference.
Last updated 10 months ago
implicit-feedbackpoisson-factorizationopenblasopenmp
4.36 score 46 stars 9 scripts 463 downloadsnonneg.cg - Non-Negative Conjugate-Gradient Minimizer
Minimize a differentiable function subject to all the variables being non-negative (i.e. >= 0), using a Conjugate-Gradient algorithm based on a modified Polak-Ribiere-Polyak formula as described in (Li, (2013) <https://www.hindawi.com/journals/jam/2013/986317/abs/>).
Last updated 5 years ago
conjugate-gradientminimizeoptimizationopenblascppopenmp
3.00 score 2 stars 1 scripts 238 downloads