users query an entity pairs with a bunch of context entities, they prefer to know the exact relations between such an entity pair under those context entities base, so results should be selected and ranked focusing on those “constraints” or “subgroups”. Note that although we strongly recommend to use context entities or entity modifiers, our system actually allows to query all indexed words as query keywords.
Type-pair-specific priors + Pattern driven ranking

Another observation is that the association of related phrases and entity instances usually formulates some regular patterns and the distribution of relation phrases and patterns vary depending on the entity pair type information (See the experiment result in the Section 6.1). Moreover, under specific entity type pair, a relatively small finite set of patterns on one hand occurs much more frequently than the rest because of the long tail distribution. On the other hand, such pattern sets cover higher quality context vectors. For example, practically and also intuitively, a (E VP E) sub pattern in a long pattern usually is better than (E E VP) and (VP E E). Those insights drive us to mine pattern weights across different entity type pairs.
Extraction model
In this section,we will elaborate the algorithms we design for the first step in our whole system--extraction task. Specifically, we group this section following the implementation order of each subtasks.
Verb phrase extraction (zzq)
Relation phrase polarity predict (tx)
To better understand the semantic meaning behind relation phrases and richer our clustering knowledge, we decide to classify sentiment polarity of each relation phrase. Regarding to sentiment polarity, we mean the each relation phrase with an entity typ...

... middle of paper ...

...ster to represent such group.
Another more sophisticated solution is to cluster based on the relation phrases’ contexts instead of its own content. In this approach, we suggest to cluster on experiment context vectors first to get word group information. After that, for each relation phrase, we remove it from the context vector and do chunking to get a bunch of consecutive words. For each substring, we use its words’ group information as the vector representation. Consider the efficiency on the large-scale corpus, it is recommended to use MinHash[11] and LSH on those sets of vectors to compute the Jaccard similarity among those relation phrases. Theoretically, the first approach group relation phrases if they share common/similar terms in the phrases while the second approach define similar relation phrases by evaluating whether they coexist in a similar context.

