fastTextR is an R interface to the fastText library. It can be used to word representation learning (Bojanowski et al., 2016) and supervised text classification (Joulin et al., 2016). Particularly the advantage of fastText to other software is that, it was designed for biggish data.

The following example is based on the examples provided in the fastText library, the example shows how to use fastTextR for word representation. For more informations about word representations can be found at the fastText homepage.


Load pretrained model

The training of these models can be quite time consuming therefore pre-trained models are a good option.

model <- ft_load("cc.en.300.bin")

Printing word vectors

ft_word_vectors(model, c("asparagus", "pidgey", "yellow"))[,1:5]
##                   [,1]          [,2]         [,3]       [,4]         [,5]
## asparagus 0.0292057190 -0.0114405714 -0.003201437 0.03087331  0.127229080
## pidgey    0.0452978685  0.0090015158  0.067562237 0.11123407 -0.008441916
## yellow    0.0007776691 -0.0001886144  0.001824494 0.03869999  0.036413591

Nearest neighbor queries

ft_nearest_neighbors(model, 'asparagus', k = 5L)
##   aspargus broccolini artichokes asparagus.  asparagas 
##  0.7316202  0.6995656  0.6930545  0.6915916  0.6911229

Word analogies

ft_analogies(model, c("berlin", "germany", "france"))
##        paris      france.      avignon  montpellier       paris. 
##    0.6831182    0.6408537    0.6288283    0.6138449    0.6059716 
##       rennes       london       Paris.       toulon montparnasse 
##    0.5884554    0.5832924    0.5743204    0.5727922    0.5715630


[1] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.04606},

[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification

  title={Bag of Tricks for Efficient Text Classification},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.01759},