Rserve to the rescue!
Last week I participated in a two day hackaton. It was a cool initiative that gave me an opportunity to work with people from different areas and with technology I am not familiar with. I had fun trying to figure out how to solve some tricky problems.
As we don’t have much time, there is always some pressure/hurry to present a working solution in the end and if possible with some cool extra features. We have to react fast and most of the time we end up cutting some corners, but … hey! it was an hackaton :D. We won’t be maintaining or evolving that code base.
Actually, I think that having a retrospective could be a good exercise to analyse what could have been done differently and what can be improved in the current solution if it was a real, long term project.
What is Rserve
and what does it has to do with this?
The idea for this hackaton was to create some intelligence based on a couple of twitter’s feeds. We had an orchestrator service (written in java) that would gather data from those feeds and some internal databases and pass it all to an AI component (a bit dumb in beginning, I must say) that would process this data and return relevant information to our users so that they could make decisions based on more relevant data points.
The team had a Data Engineer to help us evolve this AI component logic, making it a bit more intelligent. The problem, however, was this was being developed in a R code base, the only language he was comfortable with, on a windows machine using RStudio
. It was not feasible to translate all of it to another language. So the question was how could we use this code as a service so that it could be called by the orchestrator.
We had never used R and had no idea if it was easy or not to do this, but after a quick search on google, we found Rserve
. A TCP/IP server that allows other applications to use R for computation of statistical models, plots, etc. Rserve
itself is provided as a regular R package and by starting the its executable we basically create a server that can be called to evaluate R expressions.
After setting up R and installing the Rserve
package, I started by creating a script so I could easily start and stop the Rserve
process at will.
#!/bin/bash
start() {
R CMD /usr/local/lib/R/3.6/site-library/Rserve/libs/Rserve \
--no-save \
--RS-encoding utf8 \
--RS-conf Rserv.conf
echo "rserve started!"
}
stop() {
killall Rserve
echo "rserve stopped!"
}
RETVAL=0
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
*)
echo "Usage: $0 {start|restart|stop}"
RETVAL=1
esac
exit $RETVAL
Then we added the Rserve
client package into our Java application’s pom.xml
so we could start testing the evaluation of R expressions.
<dependency>
<groupId>org.rosuda.REngine</groupId>
<artifactId>Rserve</artifactId>
<version>1.8.1</version>
</dependency>
RConnection c = new RConnection();
REXP x = c.eval("R.version.string");
System.out.println(x.asString()); // R version 3.6.1 (2019-07-05)
Now we had to run the real code. We had to load the code to the Rserve
and there must be a better option than calling eval
with a huge code base as an input string, or trying to add multiple evals… which would rapidly become hard to manage. In fact, there is a better solution. We can load the code simply by setting a configuration on the Rserve.conf
file. This file can be configured with the option --RS-conf <path>
.
With this setup we were able to create a clean API that could be used seamlessly on the java side. Check this simple example with the palindrome
method.
# Rserve.conf
source /path/to/palindrome.r
# palindrome.r
palindrome <- function(p) {
for(i in 1:floor(nchar(p)/2) ) {
r <- nchar(p) - i + 1
if (substr(p, i, i) != substr(p, r, r)) return(FALSE)
}
TRUE
}
public class RserveTest {
private RConnection c;
@Before
public void setUp() throws Exception {
c = new RConnection();
}
@Test
public void test_palindrome() throws RserveException, REXPMismatchException {
c.assign("a_palindrome", "aba");
REXP is_palindrome = c.eval("palindrome(a_palindrome)");
assertEquals(1, is_palindrome.asInteger());
}
@Test
public void test_non_palindrome() throws RserveException, REXPMismatchException {
c.assign("a_non_palindrome", "abc");
REXP not_palindrome = c.eval("palindrome(a_non_palindrome)");
assertEquals(0, not_palindrome.asInteger());
}
}
After making the test with this simple function pass we just defined the JSON contract between the AI component and the orchestrator so we could completely decouple the development in R and the Java orchestration logic. We created the data models and we isolated all the communication logic in a class that we called AiClient
.
public class AiClient {
private final RConnection connection;
private final ObjectMapper mapper;
public AiClient() throws RserveException {
connection = new RConnection();
mapper = new ObjectMapper();
}
public List<MarketAnalysis> getMarketAnalysis(AiInputData data) {
try {
String inputData = mapper.writeValueAsString(data);
connection.assign("in", inputData);
REXP result = connection.eval("marketAnalysis(in)");
return mapper.readValue(result.asString(), new TypeReference<List<MarketAnalysis>>() {});
} catch (REXPMismatchException | RserveException | IOException ex) {
log.warn("Failed to get AI Analysis!! : " + ex.getMessage());
}
return Collections.emptyList();
}
}
There could be other solutions to this problem but this ended up working for our not so complex use case. Actually… while googling we bumped into plumber
, a R package that can expose R code as an http api, that could be an option to consider.
In the end, time flied and I had a really good time :D. I got to apply some concepts of clean architecture/code in a context where they are usually not considered so important and realize that even there they could actually help the team work together and go faster.