The package offers the following main functions:
stringdist
computes pairwise distances between two
input character vectors (shorter one is recycled)stringdistmatrix
computes the distance matrix for one
or two vectorsstringsim
computes a string similarity between 0 and 1,
based on stringdist
amatch
is a fuzzy matching equivalent of R’s native
match
functionain
is a fuzzy matching equivalent of R’s native
%in%
operatorseq_dist
, seq_distmatrix
,
seq_amatch
and seq_ain
for distances between,
and matching of integer sequences.These functions are built upon C
-code that re-implements
some common (weighted) string distance functions. Distance functions
include:
Also, there are some utility functions:
qgrams()
tabulates the qgrams in one or more
character
vectors.seq_qrams()
tabulates the qgrams (somtimes called
ngrams) in one or more integer
vectors.phonetic()
computes phonetic codes of strings
(currently only soundex)printable_ascii()
is a utility function that detects
non-printable ascii or non-ascii characters.Some of stringdist
’s underlying C
functions
can be called directly from C
code in other packages. The
description of the API can be found by either typing
?stringdist_api
in the R console or open the vignette
directly as follows:
vignette("stringdist_C-Cpp_api", package="stringdist")
Examples of packages that link to stringdist
can be
found here
and here.