Back to the program

GPU Accelerated Natural Language Processing

Thursday 17th

from 17:15 to 17:55

Theatre 19

Keynote

There are many tasks in Natural Language Processing that benefit from the massive parallelism that GPUs bring to the table. Once the text is hashed, which is performed when reading the documents, the gpus can achieve massive performance in most algorithms in the NLP field.

At Wavecrafters we are using NVIDIA GPUs, from the gaming versions to professional ones, handling millions of documents at blazing speed.

Some of the examples such as assembling of entropy o Point Wise Mutual Information matrices for millions of documents, with vocabularies up to million words can be computed and assembled as sparse matrices in miliseconds.

In the retrieval and search phases Nvidia's GPU can perform advanced semantic searches (vector embeddings) over millions of documents in matter of milliseconds; speeds that obviate the need to store indexes for searches.

Another advantage of such speed is that in practice there are no size limits with the document corpus sizes; each Gigabyte of GPU memory can handle roughly a million documents (depending on the vector size used), but using CUDA streams the loading of partial document matrices can be overlapped with computation, so even cheap GPUs can handle smart queries on datasets of more than hundred million documents in human interactive times (less than a second).

In Automatic Speech Recognition Deep Neural Networks work very well for the Acoustic part of the process, however one of the limitations of many current implementations lies in the language model side. Using n-grams models severely restricts the size of the vocabulary that we can employ. With a multipass DNN-HMM system, we can adjust the language model so we cover a much more extensive and contextually relevant vocabulary. After the first pass the semantic context of the speech is extracted and a new, more adapted language model is calculated on the fly from a very extensive corpa of documents, this second language model is then used to add more specialized vocabulary and rescore the HMM graph. In some areas this can yield significant improvements in the Word Error Rate (WER).

With new arquitecture Pascal there has been a big jump in performance as well as memory sizes. We will show benchmarks comparing the last three Nvidia GPU architectures(Kepler, Maxwell and Pascal) in several Natural Language Processing tasks.

We will also do a small demo using the Arxiv scientific papers database, in order to demonstrate that the NLP algorithms used work as intended, and that the calculated vectors convey the semantic meaning of the papers.

Guillermo Moliní

WavecraftersLead Developer