Information Extraction with Weak Supervision
12:15 - 1:00 pm
In-person meeting: Building C Room BC 115
Zoom meeting: https://lehigh.zoom.us/j/
Human provides knowledge in various forms. One major form is in text, such as scientific literature, news articles, and reports. In the past two decades, there have been many efforts in extracting structured knowledge from the unstructured text through the work of curators for better knowledge organization, efficient knowledge search, and retrieval, and inspiring new knowledge discovery, especially in life sciences. Automated information extraction systems are established to facilitate manual curation. However, with the growing volume of literature and breadth of information, it is increasingly difficult for curators and information extraction systems to keep up. In this talk, I will present my recent work on automated information extraction with less human effort. These methods aim to train high-quality information extractors with noisy labels provided via crowdsourcing or existing resources.
Dr. Qi Li is an assistant professor at the Department of Computer Science, Iowa State University. She was a postdoc at the Department of Computer Science, the University of Illinois at Urbana-Champaign after obtaining her Ph.D. in Computer Science and Engineering from SUNY Buffalo in 2017. She has received several awards including ``Rising Star in EECS'' in 2018 and the Best Dissertation Award at the Department of Computer Science and Engineering, University at Buffalo. Her research interests lie in the area of data mining with a focus on information extraction and truth discovery from multiple data sources. Dr. Li has published over 40 papers in major data mining, database, and natural language processing conferences, such as KDD, VLDB, SigMOD, EMNLP, with over 2,200 citations and an h-index of 19. She serves as PC member on data mining conferences, including KDD, WWW, SDM, and PAKDD, and co-chair the TrueFact workshop since 2019. Her research is sponsored by NSF, USDA, and DoD.
Data Science Seminar
The Department of Electrical and Computer Engineering will be hosting a seminar by Prof. Narayana Prasad Santhanam from University of Hawaii Prasad has worked on scientific foundations of data science for a number of years, and his talk would be of interest to i-DISC faculty.
Registration in advanced required: https://lehigh.zoom.us/
Abstract: Regularization is often used to match available training sample sizes to model complexity. As training sample sizes increase, regularization constraints are usually relaxed when choosing the model. A natural question then arises: as the constraints relax, does the selected model keep varying or is the procedure stable in the sense that at some point, no further relaxation of constraints changes the selected model substantially?
To understand this, we develop a statistical framework of eventually-almost sure prediction. Using only samples from a probabilistic model, we predict properties of the model and of future observations. The prediction game continues in an online fashion as the sample size grows with new observations. After each prediction, the predictor incurs a binary (0-1) loss. The probability model underlying a sample is otherwise unknown except that it belongs to a known class of models. The goal is to make finitely many errors (i.e. loss of 1) with probability 1 under the generating model, no matter what it may be in the known model class.
We characterize problems that can be predicted with finitely many errors. Our characterization is through regularization, and answers precisely the question of when regularization eventually settles on a model and when it does not. Furthermore, we also characterize when a universal stopping rule can identify (to any given confidence) at what point no further errors will be made. We specialize these general results to a number of problems---online classification, entropy prediction, Markov processes, risk management---of which we will focus on online classification in this task.
Bio: Narayana Santhanam is an Associate Professor at the University of Hawaii with research interests in the intersection of learning theory, statistics and information theory, and applications thereof. He obtained his PhD from the University of California, San Diego, and held a postdoctoral position at the University of California, Berkeley, before taking up a faculty position at the University of Hawaii. He is currently an Associate Editor of the IEEE Transactions of Information Theory and a member of the Center for Science of Information (a NSF Science and Technology center), and among his current pedagogical priorities is to develop a robust data science curriculum grounded in engineering fundamentals to students in electrical engineering as well as other majors.
DeepLearn 2021 Summer July 26-30, 2021
Las Palmas de Gran Canaria, Spain
Event Website: https://irdta.eu/
REGISTRATION: It has to be done at https://irdta.eu/
Early registration deadline: April 25, 2021
DeepLearn 2021 Summer will be a research training event with a global scope aiming at updating participants on the most recent advances in the critical and fast developing area of deep learning. Previous events were held in Bilbao, Genova and Warsaw.
Deep learning is a branch of artificial intelligence covering a spectrum of current exciting research and industrial innovation that provides more efficient algorithms to deal with large-scale data in neurosciences, computer vision, speech recognition, language processing, human-computer interaction, drug discovery, biomedical informatics, healthcare, recommender systems, learning theory, robotics, games, etc. Renowned academics and industry pioneers will lecture and share their views with the audience.
Most deep learning subareas will be displayed, and main challenges identified through 24 four-hour and a half courses and 3 keynote lectures, which will tackle the most active and promising topics. The organizers are convinced that outstanding speakers will attract the brightest and most motivated students. Interaction will be a main component of the event.
An open session will give participants the opportunity to present their own work in progress in 5 minutes. Moreover, there will be two special sessions with industrial and recruitment profiles.
QUESTIONS AND FURTHER INFORMATION: firstname.lastname@example.org
Co-organized by Department of Information Engineering, Marche Polytechnic University, Institute for Research Development, Training and Advice – IRDTA, Brussels/London