News

Work Experience

  • Research Assistant | Data Science Research (DSR) Lab , University of Florida -- Oct 2018 - Present

    • Team Lead & Individual Contributor, DARPA's AIDA Project | 2019-2022

      Active Interpretation of Disparate Alternatives
      University of Florida

      • Project Overview:
        Played a dual role in the DARPA-sponsored "Active Interpretation of Disparate Alternatives" (AIDA) project, focusing on developing an advanced search engine for analyzing alternative hypotheses in event-centric knowledge graphs.
      • Contributions:
        • Engineered a novel two-tier graph searching algorithm, optimizing knowledge graph exploration at both mention and cluster levels. This innovation led to a significant 25% improvement in the final F1 score.
        • Developed a sophisticated graph clustering algorithm. This algorithm enhanced the differentiation of alternative hypotheses by quantifying both structural and semantic distances between candidates, resulting in a 20% improvement in clustering quality (v-measure).
        • The system demonstrated exceptional performance, achieving top results in the NIST TAC SMKBP2020 evaluation, a benchmark test in knowledge-based performance.
    • First Author, Textual QA Survey Project | 2020-2021

      "More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering "
      University of Florida

      • Project Overview:
        Authored the comprehensive research paper titled "More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering." This research involved a thorough analysis of Textual Question Answering (QA), aiming to provide natural language answers using unstructured data, mainly focusing on machine reading comprehension (MRC).
      • Contributions:
        • Developed a new taxonomy for textual QA, considering novel application scenarios.
        • Covered 7 major subtasks of textual QA, including classical MRC, four novel MRC tasks (conversational QA, multi-hop QA, long-form QA, and cross-language QA), ODQA, and commonsense QA.
        • Conducted comparative analysis across multiple facets of benchmark datasets.
        • Provided detailed statistics for each dataset and described evaluation metrics for each benchmark.
        • Addressed current trends and challenges in textual QA, suggesting directions for future research to tackle unsolved tasks and propose more challenging datasets.
    • MythQA Project Lead and First Author | 2021-2022

      Our work is accepted by SIGIR'23 ( )
      University of Florida

      • Project Overview:
        Spearheaded the "MythQA" project at the University of Florida, an innovative endeavor in the realm of machine learning. This project was centered on the development of large-scale check-worthy claim detection using multi-answer open-domain question answering (QA).
      • Contributions:
        • Led a team in the creation of a unique benchmark dataset, utilizing advanced data collection methods via the Twitter API.
        • Innovated in the development of new evaluation metrics and a specialized data schema, tailored specifically for this project, along with detailed annotation guidelines.
        • Successfully developed and implemented a user-friendly annotation tool and a comprehensive end-to-end pipeline. This pipeline played a crucial role in assessing the impact of various components, such as information retrieval, machine reading comprehension, and distinct answer selection, significantly enhancing the project's overall performance.
    • M3 Project Lead and First Author | 2022-2023

      Open-Domain Multi-Hop Dense Sentence Retrieval (under submission)
      University of Florida

      • Project Overview:
        Contributed to the cutting-edge research paper "M3: A Multi-Task Mixed-Objective Learning Framework for Open-Domain Multi-Hop Dense Sentence Retrieval". This project aimed to enhance dense retrieval performance by merging contrastive learning with multi-task and mixed-objective learning frameworks.
      • Contributions:
        • Developed the M3 system, an advanced, recursive multi-hop dense sentence retrieval system using a new method for dense sentence representation learning.
        • Proposed the M3-DSR method, vastly surpassing existing benchmarks in sentence-level retrieval.
        • Introduced an innovative heuristic hybrid ranking algorithm that effectively combines single-hop and multi-hop sentence evidence, thereby improving performance.
        • Achieved groundbreaking results in multi-hop retrieval and fact verification on the FEVER dataset, setting a new standard in open-domain fact verification.

  • Machine Learning Intern | Nokia Bell Labs -- Jun 2022 - Aug 2022
    • Project Overview:
      As a Machine Learning Intern at Nokia Bell Labs, I spearheaded the development of a retrieval-based framework to streamline the analysis of complex ticketing systems. This involved handling vast amounts of data, with each ticket containing 10-100 million log lines.
    • Contributions:
      • Implemented and proposed a system to efficiently retrieve relevant log lines from extensive log files based on ticket information.
      • Conducted comprehensive data management tasks, including cleaning, processing, and visualizing large-scale time-series semi-structured log data.
      • Developed an advanced dense log retrieval system by fine-tuning encoders in an contrastive learning framework.
      • My contributions led to the creation of a model that significantly surpassed the BM25 baseline, achieving a 16.1% improvement in retrieval recall.
  • Deep Learning Researcher | Large Intelligence (Li) Lab, University of Florida -- May 2018 - Oct 2018

    Research on the state-of-art deep learning technologies regarding the recommender systems in the business supervised by Dr.Andy Li.
    Developed a transformer-based sequential recommender system, improving recommendation accuracy by 1.7%.

Publications

Interests

  • Machine Learning
  • Natural Language Processing
  • Information Retrieval
  • Retrieval-Augmented Multi-Model Generation

Education

  • Ph.D. in Computer Science., 2019-now

    University of Florida, USA

  • M.Sc. in Electrical and Computing Eng., 2016-2018

    University of Florida, USA

  • B.Sc. in Microelectronics, 2012-2016

    Sichuan University, China

Achievements

Gartner Group Graduate Fellowship
        Apr. 2023     |        CISE Department at Univerisy of Florida      |      Certificate

Gartner Group Graduate Fellowship
        Apr. 2022     |        CISE Department at Univerisy of Florida      |      Certificate

Accomplished the Graph Analytics for Big Data Courses
        Feb. 2019     |        UCSanDiego|Online, Coursera     |      Certificate

Accomplished the Deep Learning Specialization Courses
        Jul. 2018     |        deeplearning.ai, Coursera     |      Certificate

Accomplished the Mechine Learning Courses
        Jul. 2017     |        Stanford|Online, Coursera     |      Certificate

Contact