Last call for the course on Advanced R programming

Last call for the course on Advanced R programming scheduled in Leuven, Belgium on Febuary 20-21 2018. Register at: You’ll learn during that course: The apply family of functions, basic parallel programming for these functions and commonly needed data manipulation skills Making a basic reproducible report using Sweave and knitr including tables, graphs and literate programming How to create an R package Understand how S3 programming works, generics, environments, namespaces. Basic tips on how to organise and develop R code and test it. Need other training: visit
Original Post: Last call for the course on Advanced R programming

Log shiny app visitors and R usage to Google Analytics

If you work on applications for clients or have open sourced some shiny apps, a question that arises is how is your application being used. What you can do in order to find out how your hard work is being consumed is putting your code in logs and then viewing the logs. An easier way however to track usage of your application is just sending page views or application events to Google Analytics. That’s exactly what the GAlogger R package ( is doing. It allows to log R events and R usage to Google Analytics and was created with the following use cases in mind: Track usage of your application If someone visits a page in your web application (e.g. Shiny) or web service (e.g. RApache, Plumber), use the GAlogger R package to send the page and title of the…
Original Post: Log shiny app visitors and R usage to Google Analytics

Natural Language Processing for non-English languages with udpipe

BNOSAC is happy to announce the release of the udpipe R package ( which is a Natural Language Processing toolkit that provides language-agnostic ‘tokenization’, ‘parts of speech tagging’, ‘lemmatization’, ‘morphological feature tagging’ and ‘dependency parsing’ of raw text. Next to text parsing, the package also allows you to train annotation models based on data of ‘treebanks’ in ‘CoNLL-U’ format as provided at Language models The package provides direct access to language models trained on more than 50 languages. The following languages are directly available: afrikaans, ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian, finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br, portuguese, romanian, russian-syntagrus, russian,…
Original Post: Natural Language Processing for non-English languages with udpipe

An overview of open data from Belgium

BNOSAC is working on building an application on top of open data from questions and answers given at the parliament in Belgium. It will basically show what our civil servants in parliament are busy with. If you are interested in co-developing, feel free to get in touch for a quick chat. For those of you interested in an overview of open data available in Belgium, we’ve made a presentation showing what open data is available in Belgium for direct use (see below). Interested in how open data can be used for your business, get in touch. {aridoc engine=”pdfjs” width=”100%” height=”550″}images/bnosac/blog/open_data_be.pdf{/aridoc} Related offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time…
Original Post: An overview of open data from Belgium

CRAN search based on natural language processing

CRAN contains up to date (October 2017) more than 11500 R packages. If you want to scroll through all of these, you probably need to spend a few days, assuming you need 5 seconds per package and there are 8 hours in a day. Since R version 3.4, we can also get a dataset will all packages, their dependencies, the package title, the description and even the installation errors which the packages have. Which makes the CRAN database with all packages an excellent dataset for doing text mining. If you want to get that dataset, just do as follows in R: library(tools)crandb <- CRAN_package_db() Based on that data the following CRAN NLP searcher app was built as shown below. I’ts available for inspection at and is a tiny wrapper around the result of annotating the package title and package…
Original Post: CRAN search based on natural language processing

Text Mining with R – upcoming courses in Belgium

We use text mining a lot in day-to-day data mining operations. In order to share our knowledge on this, to show that R is an extremely mature platform to do business-oriented text analytics and to give you practical experience with text mining, our course on Text Mining with R is scheduled for the 3rd consecutive year at LStat, the Leuven Statistics Research Center (Belgium) as well as at the Data Science Academy in Brussels. Courses are scheduled 2 times in November 2017 and also in March 2018. This course is a hands-on course covering the use of text mining tools for the purpose of data analysis. It covers basic text handling, natural language engineering and statistical modelling on top of textual data. The following items are covered. Text encodings Cleaning of text data, regular expressions String distances Graphical displays of…
Original Post: Text Mining with R – upcoming courses in Belgium

Is udpipe your new NLP processor for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing

If you work on natural language processing in a day-to-day setting which involves statistical engineering, at a certain timepoint you need to process your text with a number of text mining procedures of which the following ones are steps you must do before you can get usefull information about your text Tokenisation (splitting your full text in words/terms) Parts of Speech (POS) tagging (assigning each word a syntactical tag like is the word a verb/noun/adverb/number/…) Lemmatisation (a lemma means that the term we “are” is replaced by the verb to “be”, more information: Dependency Parsing (finding relationships between, namely between “head” words and words which modify those heads, allowing you to look to words which are maybe far away from each other in the raw text but influence each other) If you do this in R, there aren’t much…
Original Post: Is udpipe your new NLP processor for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing

Machine Learning with R – upcoming course in Belgium

For R users interested in Machine Learning, you can attend our upcoming course on Machine Learning with R which is scheduled on 16-17 October 2017 in Leuven, Belgium. This is now the 4th year this course is given at the university of Leuven so we made quite some updates since the first time this was given 4 years ago. During the course you’ll learn the following techniques from a methodological as well as a practical perspective: naive bayes, trees, feed-forward neural networks, penalised regression, bagging, random forests, boosting and if time permits graphical lasso, penalised generalised additive models, support vector machines. Subscribe here: For a full list of training courses provided by BNOSAC – either in-house or in-public: go to For R users interested in text mining with R, applied spatial modelling with R, advanced R programming or…
Original Post: Machine Learning with R – upcoming course in Belgium

Computer Vision Algorithms for R users

Just before the summer holidays, BNOSAC presented a talk called Computer Vision and Image Recognition algorithms for R users at the UseR conference. In the talk 6 packages on Computer Vision with R were introduced in front of an audience of about 250 persons. The R packages we covered and that were developed by BNOSAC are: image.CornerDetectionF9:  FAST-9 corner detection image.CannyEdges: Canny Edge Detector image.LineSegmentDetector: Line Segment Detector (LSD) image.ContourDetector:  Unsupervised Smooth Contour Line Detection image.dlib: Speeded up robust features (SURF) and histogram of oriented gradients (FHOG) features image.darknet: Image classification using darknet with deep learning models AlexNet, Darknet, VGG-16, GoogleNet and Darknet19. As well object detection using the state-of-the art YOLO detection system For those of you who missed this, you can still see the video of the presentation & view the pdf of the presentation below. The packages…
Original Post: Computer Vision Algorithms for R users