An R package implementing a large scale k-nearest neighbor (KNN) classifier using the Lucene search engine.
covariates <- data.frame(rowIds = c(1,1,1,2,2,3),
covariateIds = c(10,11,12,10,11,12),
covariateValues = c(1,1,1,1,1,1))
outcomes <- data.frame(rowIds = c(1,2,3),
y = c(1,0,0))
dataForPrediction <- Andromeda::andromeda(covariates = covariates,
outcomes = outcomes)
indexFolder <- "s:/temp/lucene"
buildKnn(outcomes = dataForPrediction$outcomes,
covariates = dataForPrediction$covariates,
indexFolder = indexFolder)
prediction <- predictKnn(outcomes = dataForPrediction$outcomes,
covariates = dataForPrediction$covariates,
indexFolder = indexFolder,
k = 10,
weighted = TRUE)
BigKnn is an R package using the Java based Lucene search engine. The data for the KNN is stored in a folder on the local file system.
Running the package requires R with the package rJava installed. Also requires Java 1.8 or higher.
See the instructions here for configuring your R environment, including Java.
Use the following commands in R to install the BigKnn package:
install.packages("remotes")
remotes::install_github("ohdsi/BigKnn")
Documentation can be found on the package website.
PDF versions of the documentation are also available: * Package manual: BigKnn manual
Read here how you can contribute to this package.