///Bio-Text Mining for Construction of Biomedical Information Networks
Bio-Text Mining for Construction of Biomedical Information Networks2017-12-12T11:29:58+00:00

Bio-Text Mining for Construction of Biomedical Information Networks

Massive biomedical text data has been generated from research literature, web publication portals, experimental reports and social media.   It is critical but challenging to mine such massive, unstructured, dynamic, noisy and unintegrated data and turn them into structured knowledge.   We propose to develop effective and scalable methods to automatically integrate and transform such biomedical text data into relatively structured biomedical information networks and then develop effective data mining methods to mine such text-rich biomedical networks and generate useful knowledge for KnowEnG and other BD2K center projects. We have been develop multiple innovative and scalable methods for construction and mining of biomedical text-rich information networks, outlined as follows:  (i) phrase mining, including completely unsupervised phrase mining method ToPMine and lightly supervised phrase mining method: SegPhrase; (ii) relation expression clustering-based, distance supervision and multi-strategy integrated optimization framework, ClusType, (iii) meta-path based similarity search, and (iv) heterogeneous network mining.  We have conducted studies on construction and mining of biomedical information networks based on PubMed abstracts with some interesting results.   Some preliminary studies on other kinds of massive text datasets, such as New York Times, Yelp data, Twitter data, and the DBLP research publication datasets have demonstrated the power and high promise of the proposed approach.  We expect more dedicated work on biomedical text mining in the coming months to benefit multiple NIH BD2K centers.


1.       Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R. Voss, and Jiawei Han, “Scalable Topical Phrase Mining from Text Corpora”, PVLDB 8(3): 305 – 316, 2015.  (Also, in Proc. 2015 Int. Conf. on Very Large Data Bases (VLDB’15), Kohala Coast, Hawaii, Sept. 2015)

2.       Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, Heng Ji, Jiawei Han, “ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering”, in Proc. of 2015 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’15), Sydney, Australia, Aug. 2015

3.       Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, Jiawei Han, “Mining Quality Phrases from Massive Text Corpora”,  in Proc. of 2015 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’15),  Melbourne, Australia, May 2015 (won Grand Prize in Yelp Dataset Challenge, 2015)

4.       Ming Ji, Qi He, Jiawei Han, and Scott Spangler,  “Mining Strong Relevance between Heterogeneous Entities from Unstructured Biomedical Data”, Data Mining and Knowledge Discovery (DMKD), 2015

5.       Chi Wang, Marina Danilevsky, Jialu Liu, Nihit Desai, Heng Ji, and Jiawei Han, “Constructing Topical Hierarchies in Heterogeneous Information Networks”, in Proc. 2013 IEEE Int. Conf. on Data Mining (ICDM’13), Dallas, TX, Dec. 2013. (selected as one of the best papers for journal special issue publication in the journal “Knowledge and Information Systems”, Wang’s thesis won 2015 ACM SIGKDD Dissertation Award)

6.       Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu, “Integrating Meta-Path Selection with User Guided Object Clustering in Heterogeneous Information Networks”, in Proc. of 2012 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’12), Beijing, China, Aug. 2012 (KDD’12 Best Student Paper Award, invited to the special issue of ACM Transactions on KDD)

7.       Yizhou Sun and Jiawei Han, Mining Heterogeneous Information Networks: Principles and Methodologies, Morgan &Claypool Publishers, July 2012 (Sun’s thesis won 2013 ACM SIGKDD Dissertation Award)


Meng Qu

Jingbo Shang

Jian Peng

Sheng Wang

Xiang Ren

Jialu Liu

Ahmed El-Kishky

Yu Shi

Doris Xin

Henry Lin

Saurabh Sinha

ChengXiang Zhai

Jiawei Han


Jiawei Han