Package: textTinyR 1.1.8

textTinyR: Text Processing for Small or Big Data Files

It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

Authors:Lampros Mouselimis [aut, cre]

textTinyR_1.1.8.tar.gz
textTinyR_1.1.8.zip(r-4.7)textTinyR_1.1.8.zip(r-4.6)textTinyR_1.1.8.zip(r-4.5)
textTinyR_1.1.8.tgz(r-4.6-x86_64)textTinyR_1.1.8.tgz(r-4.6-arm64)textTinyR_1.1.8.tgz(r-4.5-x86_64)textTinyR_1.1.8.tgz(r-4.5-arm64)
textTinyR_1.1.8.tar.gz(r-4.7-arm64)textTinyR_1.1.8.tar.gz(r-4.7-x86_64)textTinyR_1.1.8.tar.gz(r-4.6-arm64)textTinyR_1.1.8.tar.gz(r-4.6-x86_64)
textTinyR_1.1.8.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
textTinyR/json (API)

# Install 'textTinyR' in R:
install.packages('textTinyR', repos = c('https://mlampros.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/mlampros/texttinyr/issues

Uses libs:
  • openblas– Optimized BLAS
  • c++– GNU Standard C++ Library v3
  • openmp– GCC OpenMP (GOMP) support library

On CRAN:

Conda:

bhboostcpp11processingrcpprcpparmadillotextopenblascppopenmp

7.40 score 39 stars 1 packages 214 scripts 510 downloads 30 exports 7 dependencies

Last updated from:11c7a8e669. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK208
linux-devel-x86_64OK252
source / vignettesOK319
linux-release-arm64OK210
linux-release-x86_64OK226
macos-release-arm64OK152
macos-release-x86_64OK310
macos-oldrel-arm64OK122
macos-oldrel-x86_64OK396
windows-develOK231
windows-releaseOK211
windows-oldrelOK223
wasm-releaseOK180

Exports:batch_computebig_tokenize_transformbytes_convertercluster_frequencyCOS_TEXTcosine_distanceCount_Rowsdense_2sparsedice_distancedims_of_word_vecsDoc2VecJACCARD_DICElevenshtein_distanceload_sparse_binarymatrix_sparsityread_charactersread_rowssave_sparse_binaryselect_predictorssparse_Meanssparse_Sumssparse_term_matrixTEXT_DOC_DISSIMtext_file_parsertext_intersecttoken_statstokenize_transform_texttokenize_transform_vec_docsutf_localevocabulary_parser

Dependencies:BHdata.tablelatticeMatrixR6RcppRcppArmadillo

Functionality of the textTinyR package
classes | functions | big_tokenize_transform class | word cloud | word vectors | sparse_term_matrix class | token_stats class | helper functions for sparse_matrices | tokenization | utility functions

Last update: 2021-10-29
Started: 2017-01-04

Word vectors - doc2vec - text clustering
textTinyR - fastTextR - doc2vec - kmeans - cluster_medoids

Last update: 2021-10-29
Started: 2018-04-03

Readme and manuals

Help Manual

Help pageTopics
Compute batchesbatch_compute
String tokenization and transformation for big data setsbig_tokenize_transform
bytes converter of a text file ( KB, MB or GB )bytes_converter
Frequencies of an existing cluster objectcluster_frequency
Cosine similarity for text documentsCOS_TEXT
cosine distance of two character strings (each string consists of more than one words)cosine_distance
Number of rows of a fileCount_Rows
convert a dense matrix to a sparse matrixdense_2sparse
dice similarity of words using n-gramsdice_distance
dimensions of a word vectors filedims_of_word_vecs
Conversion of text documents to word-vector-representation features ( Doc2Vec )Doc2Vec
Jaccard or Dice similarity for text documentsJACCARD_DICE
levenshtein distance of two wordslevenshtein_distance
load a sparse matrix in binary formatload_sparse_binary
sparsity percentage of a sparse matrixmatrix_sparsity
read a specific number of characters from a text fileread_characters
read a specific number of rows from a text fileread_rows
save a sparse matrix in binary formatsave_sparse_binary
Exclude highly correlated predictorsselect_predictors
RowMens and colMeans for a sparse matrixsparse_Means
RowSums and colSums for a sparse matrixsparse_Sums
Term matrices and statistics ( document-term-matrix, term-document-matrix)sparse_term_matrix
Dissimilarity calculation of text documentsTEXT_DOC_DISSIM
text file parsertext_file_parser
intersection of words or letters in tokenized texttext_intersect
token statisticstoken_stats
String tokenization and transformation ( character string or path to a file )tokenize_transform_text
String tokenization and transformation ( vector of documents )tokenize_transform_vec_docs
utf-locale for the available languagesutf_locale
returns the vocabulary counts for small or medium ( xml and not only ) filesvocabulary_parser