Amazon Search Mission and Query Understanding

Intellegent Search Experience with Human Level Understanding

The Mission and Query Understanding team builds the shopping mission understanding pipeline for Amazon's product search engine. We connect customer queries with products through the product knowledge graph. We bring state-of-art machine learning algorithms from NLP, Information Retrieval and Data Mining to improve shopping experience on Amazon.

Job Intern Hire Research Publications Visting Scholar People Data & Code Repository

Job Opportunity

Contact smu-as-intern-hire@amazon.com if you are interested in our job opportunities (Please contact us if you have more than 2 first-author papers published in top-tier conferences such as NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, KDD, WWW, AAAI, IJCAI, SIGIR. We prefer Ph.D students. ).

Hiring Research Interns

The Search Mission Query Understanding (SMU) team in Amazon Search is looking for PhD students worldwide in NLP, Information Retrieval, and Data Mining to join us in 2023 as research interns. As a PhD intern, you will solve challenging problems such as knowledge graph construction and mining, entity recognition and linking, unsupervised learning and weak supervised Learning, multi-task and multi-lingual learning, user behavior graph mining, large scale machine learning and multi-turn user interaction modeling. You will enjoy the internship in the team if you like working on massive data sets, see you research results make hundreds of millions dollars production impact and publish you result in top conference. Our internship focuses on publications. In addition to highly competitive compensation, you will receive full support of your research and publication during your internship, including mentorship and advisory from top scholars in the field. We look for interns in Spring, Summer, and Fall. Contact: smu-as-intern-hire@amazon.com

Why Choose Us?

Scientists and Interns loves our team culture

Research and publication obsession: every intern has the full support to public a top conference paper
Massive data and massive hardware support
Strong academia collaboration on research
Research work that matters in production: our research work makes its way to production and real customer experience improvements
Freedom and fun: we support 20% side project, monthly team building, reading groups, hackathones, career coaching and mentorship

Image Credit: DALL·E 2

Research Areas

Learning with limited guidance: Our datasets are at massive trillion scale. Labeling a small portion of them is cost-prohibitive. Additionally the human labels volume varies by NLP tasks. We are doing active research in active learning, semi/weakly-supervised learning, transfer Learning, meta-learning, and multi-task learning to build E-commerce query understanding with limited human guidance
Knowledge Graph Mining: Products on Amazon forms a giant knowledge graph. We are doing research to extract structure knowledge from product textual and image descriptions. Additionally, we have a massive trillion scale heterogeneous graph of products, entities, queries, sellers, and customers. We are doing active research on graph mining to extract common sense knowledge from the graph
Deep Learning and NLP: We are actively work on NLP tasks for language modeling, query rewriting, entity linking, few-shot learning, text generation, and QnA.
Search and Recommendation: We work on classic information retrieval research for recommendation and multilingual search.
Large Scale Machine Learning: We work on Data Embedding and Compression, Model distillation, Large language model Pretraining, leveraging the trillion scale data and AWS new hardware.
Conversational Shopping: Shopping is a multi-turn engagement model. How can we use Contextual Bandits for Multi-Turn Engagement, Reinforcement Learning to make our search engine appears as an inteligent sales person?

Image Credit: DALL·E 2

Visiting Scholars

Prof. Yangqiu Song

HKUST

Prof. Suhang Wang

Penn State University

Prof. Chao Zhang

Georgia Institute of Technology

Team Members

Dr. Artur Bekasov

Applied Scientist

Dribbble

Dr. Tianyu Cao

Sr. Applied Scientist

Google Scholar

Dr. Limeng Cui

Applied Scientist

Dribbble

Dr. Yifan Gao

Applied Scientist

Dribbble

Dr. William Headden

Sr. Applied Scientist

Dribbble

Dr. Haoming Jiang

Sr. Applied Scientist

Dr. Adam Kiezun

Principle Applied Scientist

Google Scholar

Dr. Ruirui Li

Applied Scientist

Google Scholar

Dr. Zheng Li

Sr. Applied Scientist

Hanqing Lu

Sr. Applied Scientist

Google Scholar

Dr. Chen Luo

Sr. Applied Scientist

Sreyashi Nag

Sr. Applied Scientist

Google Scholar

Dr. Philipp Schmidt

Applied Science Manager

Google Scholar

Dr. Xinlu Tan

Research Scientist

Dr. Xianfeng Tang

Applied Scientist

Dribbble

Yinghan Wang

Applied Scientist

Dr. Zhengyang Wang

Applied Scientist

Google Scholar

Jingfeng Yang

Applied Scientist

Dribbble

Bing Yin

Senior Applied Science Manager

Dr. Qingyu Yin

Sr. Applied Scientist

Google Scholar

Publications

Query Attribute Recommendation at Amazon Search
Chen Luo, William Headean, Neela Avudaiappan, Haoming Jiang, Tianyu Cao, Qingyu Yin, Yifan Gao, Zheng Li, Rahul Goutam, Haiyang Zhang, Bing Yin (RecSys 2022)
Task-Agnostic Graph Explanations [pdf]
Yaochen Xie, Sumeet Katariya, Xianfeng Tang, Edward Huang, Nikhil Rao, Karthik Subbian, Shuiwang Ji (NeurIPS 2022)
Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graph [pdf][code]
Ruijie Wang, Zheng Li, Dachun Sun, Shengzhong Liu, Jinning Li, Bing Yin, Tarek Abdelzaher (NeurIPS 2022)
Multilingual knowledge graph completion with self-supervised adaptive graph alignment [pdf][code][data]
Zijie Huang, Zheng Li, Haoming Jiang, Tianyu Cao, Hanqing Lu, Bing Yin, Karthik Subbian, Yizhou Sun, Wei Wang (ACL 2022)
RETE: Retrieval-enhanced temporal event forecasting on unified query product evolutionary graph [pdf][code]
Ruijie Wang, Zheng Li, Danqing Zhang, Qingyu Yin, Tong Zhao, Bing Yin, Tarek Abdelzaher (WWW 2022)
ROSE: Robust caches for Amazon product search
Chen Luo, Vihan Lakshman, Anshumali Shrivastava, Tianyu Cao, Sreyashi Nag, Rahul Goutam, Hanqing Lu, Yiwei Song, Bing Yin (WWW 2022)
ALLIE: Active learning on large-scale imbalanced graphs
Limeng Cui, Xianfeng Tang, Sumeet Katariya, Nikhil Rao, Pallav Agrawal, Karthik Subbian, Dongwon Lee (WWW 2022)
Massive text normalization via an efficient randomized algorithm
Nan Jiang, Chen Luo, Vihan Lakshman, Yesh Dattatreya, Yexiang Xue (WWW 2022)
Search filter ranking with language-aware label embeddings
Jacek Golebiowski, Felice Antonio Merra, Ziawasch Abedjan, Felix Biessmann (WWW 2022)
Condensing Graphs via One-Step Gradient Matching [pdf][code]
Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, Ying Bin (SIGKDD 2022)
Can clicks be both labels and features? Unbiased behavior feature collection and uncertainty-aware learning to rank
Tao Yang, Chen Luo, Hanqing Lu, Parth Gupta, Bing Yin, Qingyao Ai (SIGIR 2022)
CERES: Pretraining of graph-conditioned transformer for semi-structured session data
Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao, Chao Zhang (NAACL 2022)
Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training [pdf][code][data]
Yifan Gao, Qingyu Yin, Zheng Li, Rui Meng, Tong Zhao, Bing Yin, Irwin King, Michael R. Lyu (NAACL 2022 (Finding))
SeqZero: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models
Jingfeng Yang, Haoming Jiang, Qingyu Yin, Danqing Zhang, Bing Yin, Diyi Yang (NAACL 2022 (Finding))
MetaTS: Meta Teacher-Student Network for Multilingual Sequence Labeling with Minimal Supervision
Zheng Li, Danqing Zhang, Tianyu Cao, Ying Wei, Yiwei Song and Bing Yin (EMNLP 2021)
QUEACO: Borrowing treasures from weakly labeled behavior data for query attribute value extraction
Danqing Zhang*, Zheng Li*, Tianyu Cao, Chen Luo, Tony Wu, Hanqing Lu, Yiwei Song, Bing Yin, Tuo Zhao and Qiang Yang (CIKM 2021, Industry track, * denotes equal contribution)
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data [pdf][Codes]
Haoming Jiang, Danqing Zhang, Tianyu Cao, Bing Yin, Tuo Zhao (ACL 2021, Long paper, oral)
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning [pdf]
Hui Liu, Danqing Zhang, Bing Yin and Xiaodan Zhu (NAACL 2021)
Graph-based Multilingual Product Retrieval in E-Commerce Search [pdf]
Hanqing Lu, Youna Hu, Tong Zhao, Tony Wu, Yiwei Song and Bing Yin (NAACL 2021, Industry track)
Learn to Cross-lingual Transfer with Meta Graph Learning Across Heterogeneous Languages [pdf]
Zheng Li, Mukul Kumar, William Headden, Bing Yin, Ying Wei, Yu Zhang, Qiang Yang (EMNLP 2020, Long paper, oral)
QUEEN: Neural Query Rewriting in E-commerce [pdf]
Yaxuan Wang*, Hanqing Lu*, Yunwen Xu, Rahul Goutam, Yiwei Song and Bing Yin (WWW KMECommerce Workshop 2021)
Unsupervised Synonym Extraction for Document Enhancement in E-commerce Search [pdf]
Hanqing Lu, Yunwen Xu, Qingyu Yin, Tianyu Cao, Boris Aleksandrovsky, Yiwei Song, Xianlong Fan and Bing Yin (WWW KMECommerce Workshop 2021)
Hierarchical Multi-label Classification of Queries to Browse Categories [pdf]
Heran Lin, Pengcheng Xiong, Danqing Zhang, Fan Yang, Ryoichi Kato, Mukul Kumar, William Headden and Bing Yin (SIGIR ECommerce Workshop 2020 Best Paper)

Events

Announcing The First Workshop on Evaluations and Assessments of Neural Conversation Systems (EANCS) Co-located with EMNLP 2021
The June 2021 Issue of IEE Special Bulletin on Data Engineering, edited by Bing and Sreyashi, is now published
We successfully organized the Knowledge Management in E-Commerce Workshop at the WebConference 2021

Alumni

Dr. Danqing Zhang

Dribbble

Dr. Sergey Kirshner

Dribbble

Dr. Minghui He

Dr. Jacek Golebiowski

Google Scholar

Dr. Konstantine Arkoudas

Google Scholar

Interns

Upcoming, Current, and Alumni

Changlong Yu, HKUST advised by Prof Yangqiu Song
Jiaxin Bai, HKUST advised by Prof Yangqiu Song
Xin Liu, HKUST advised by Prof Yangqiu Song
Changlong Yu, HKUST advised by Prof Yangqiu Song
Teng Xiao, Penn State advised by Suhang Wang
Zining Zhu, University of Toronto
Jie Huang, UIUC
Yu Wang, UIC
Juan Zha, USC
Peifeng Wang, USC
Enyan Dai, Penn State University
Yaochen Xie, TAMU
Haoyu Wang, Purdue
Yufei Wang,
Wei Jin, MSU advised by Prof Jiliang Tang, 2021 Fall, http://cse.msu.edu/~jinwei2/
Zijie Huang, UCLA advised by Prof Yizou Sun, 2021 Summer, https://zijieh.github.io/
Ruijie Wang, UIUC advised by Prof Tarek Abdelzaher, 2021 Summer, Google Scholar Page
Sheikh Sarwar, UMass advised by Prof James Allan, 2021 Summer, https://people.cs.umass.edu/~smsarwar/
Chen Liang, GaTech advised by Prof Tuo Zhao, 2021 Fall, https://cliang1453.github.io/
Jingfeng Yang, GaTech advised by Prof Diyi Yang, 2021 Fall, https://jingfengyang.github.io/
Qi Zeng, UIUC advised by Prof Heng Ji, 2022 Summer, https://vickizeng.com/
Zongyue Qin, UCLA advised by Prof Yizou Sun, 2022 Summer
Tao Yang, Utah advised by Prof Qingyao Ai, 2021 Summer, http://www.cs.utah.edu/~taoyang/
Tong Zhao, Notre Dame advised by Prof Meng Jiang, 2021 Summer, https://tzhao.io/
Rui Feng, GaTech advised by Prof Chao Zhang, 2021 Spring, Google Scholar Page
Yujia Xie, GaTech advised by Prof Tuo Zhao and Hongyuan Zha, 2021 Spring, https://sites.google.com/view/yujia
Simiao Zuo, GaTech advised by Prof Tuo Zhao, 2020 Winter, https://simiaozuo.github.io/
Hui Liu, Queens advised by Prof Xiaodan Zhu, 2020 Fall, https://layneins.github.io/
Tao Li, University of Utah advised by Prof Vivek Srikumar, 2019 Summer, https://www.cs.utah.edu/~tli//

Image Credit: DALL·E 2