This service provides some Natural Language Processing (NLP) functionalities used in our system.
GET /docsim: Calculate the similarity between two pieces of text.
Request parameters:
a: first piece of textb: second piece of textmodel: language model in use, currently only'news'is available.
Response (JSON):
A scalar floating point value representing the similarity, ranging from 0.0 (completely irrelevant) to 1.0 (exactly the same).
--
POST /docsim_1ton: Compute similarity for one piece of text with many.
Request body should be in JSON, with the following fields:
one(string): the pivotmany(list of strings): the candidates to be comparedmodel(string): the same as forGET /docsimAPI
Response (JSON):
A list of length the same as the length of the input array many, each with a floating point value representing the similarity between the pivot and
the corresponding text from many.
--
POST /ner: Extract an entity name (usually a company) from a sentence (usually a news title)
Request parameters:
q: the sentencethreshold(float [0,1]):, default to be 0.5, indicating how high probability should take for a single character be considered in a part of the entity name. higher value corresponds to lower false positive rate, while lower value corresponds to lower false negative rate.return_raw(true or false, optional): whether return a list of probability of each character being a part of the entity name
Response (JSON):
entity(a string or null): representing the entity extractedthreshold: same as the value passed inpositive_confidence(float [0,1], or null): how confident the algorithm determined that the entity found is correct; if no entity is found, this field will be nulloverall_confidence(float [0,1]): how confident the algorithm determined that all the character been classified correctlyraw(list of pairs): only present when requested withreturn_raw; each pair contains a character and the probability for that character being a part of the entity name