by

Noah Smith on NLP at SoCS

Continuing on the SoCS workshop, the afternoon session is a tutorial from Noah Smith at CMU about using NLP for socio-computational kinds of work.

Talking about how NLP people tend to make choices in models, algorithms, collection, and cleaning decisions, rather than what are “the right answers” (which are context-dependent), feels like a nice, fruitful way to discuss using NLP for socio-computational work. We’ll start with a discussion of document classification since that’s where Noah went.

Much of the story around NLP for document classification is thoughtful annotation/labeling of your data for the categories/attributes of interest. Having good justifications, theories, and research questions that lead you to create appropriate categories for the text and goals you have. And, once you get that dataset created, share it — people love useful datasets and might help you on the work.

Likewise, thinking carefully about how to transform the texts to text features–word counts, stemming, bigrams/trigrams, defining word categories (a la Linguistic Inquiry and Word Count, or LIWC)–is important and requires a thoughtful balance of intuition and justification.

Question: what are, for NLP folks, for CSCW folks, for social science folks, the “right” or “good” ways to justify choices of category schemes, labeling, feature construction, etc.?

One answer, around choice of ML algorithm, is to say “SVM performs a little better but you need to be able to talk about probabilities, so I’ll trade off a bit of performance for other kinds of interpretability”. And, especially if you choose a linear model from features to categories, the algorithms have relatively small (and predictable) kinds of differences — perhaps more noise than is worth optimizing on, versus spending your time on other stages that require more intuition/justification/art.

Another answer is that you should pick methods that you can talk sensibly about and that your community gets: if you can’t explain it at all, or to your community, you are in a world of hurt. Practical issues around tool choice that fit your research pipeline and skills and budget also matter.

Performance is only a piece of the tradeoff — and you really want to compare it on held out data. (You can be very careful about this by taking files with your test data and making them unreadable.) Likewise, you want to compare to a reasonable baseline; at the very least, against a “predict the most common class” zero-rule baseline. You might also think about the maximum expected performance, perhaps considering inter-coder agreement as an upper bound.

Performance went bad: what went wrong? Not enough data, bad labels, meaningless features, home-grown algorithms and implementations, (perhaps) the wrong algorithm, not enough experience or insight into the domain, …

Parsing for parts of speech or entity recognition is like sharing dinners. At dinner, the people around you will influence decisions on what to order. At NLP, the words nearby (and maybe some far away) might influence the classification of the words you’re looking at. The Viterbi algorithm for sequence labeling is a useful way to account for some of these dependencies.

Noah claims that this is going to be the next big idea from NLP that makes it big in the world of computational social science, because lots of important text analysis cames including part of speech tagging, entity recognition, and translation can be modeled pretty well as sequence labeling problems. Further, the algorithms for this kind of structured prediction are more or less generalizations of standard ML classification algorithms.

That said, there are a lot of really tough problems, especially around more semantic goals such as predicting framings, where there’s some in-progress work that is dangerous to rely on but perhaps fun to play with, including some of Noah’s own group’s work.

I’m going to not cover the clustering side, because I need a little break from typing and thinking, but hopefully this was useful/interesting for some folks.

Bonus note: can you predict if a bill will make it out of committee or a paper gets cited? Yes, at least better than chance, according to their paper.

Write a Comment

Comment

12 Comments

  1. Noah Smith is a renowned computer scientist and professor who has contributed significantly to the field of natural language processing (NLP). I prefer to get Corporate magician that promote their magical tricks. His lecture at the School of Computer Science (SoCS) may cover various topics related to NLP, including machine learning, deep learning, and neural networks.

  2. Towcester B&B offers a comfortable and inviting accommodation option in the charming town of Towcester, England. With its warm hospitality and cozy rooms, guests can enjoy a peaceful stay in this delightful bed and breakfast. The towcester b&b provides a range of amenities, including complimentary breakfast and free Wi-Fi, ensuring a convenient and enjoyable experience. Located within easy reach of local attractions and amenities, Towcester B&B is an ideal choice for travelers seeking a homely atmosphere and a pleasant base to explore the town and its surroundings.

  3. Noah Smith’s captivating presentation on Natural Language Processing (NLP) at the Society of Computer Scientists (SoCS) left the audience spellbound, akin to one of the best magic shows in los angeles. Smith’s insights into the ever-evolving landscape of NLP showcased the remarkable strides made in understanding human language through AI. With a blend of cutting-edge research and practical applications, he demonstrated how NLP has transformed industries, making it clear that NLP’s potential is nothing short of magical in the world of technology and communication. His talk was a testament to the enchanting possibilities that lie ahead in the realm of NLP.

  4. Noah Smith’s insights on Natural Language Processing at the Symposium on Computational Science shed light on the evolving landscape of language technology. His expertise delves into the intricacies of NLP algorithms and their applications across various domains. With innovations like Ozempic, NLP continues to revolutionize how we interact with and analyze textual data, shaping the future of AI-driven communication and understanding.

  5. Noah Smith’s insights on Natural Language Processing at the Symposium on Computational Science shed light on its evolving landscape. His adept analysis delves into the intricate interplay between linguistics and technology, emphasizing the transformative potential of NLP. As he unravels the complexities, Smith’s discourse serves as a shipping tape binding theoretical concepts to practical applications, propelling the field forward with clarity and purpose.

  6. Sports shows offer a thrilling experience for fans, blending entertainment and excitement. Whether it’s football, basketball, or tennis, the anticipation of the game keeps viewers on the edge of their seats. With advancements in broadcasting technology, high-definition coverage, or 스포츠고화질중계has transformed the way we watch sports. This crystal-clear quality allows fans to catch every play, every goal, and every celebration, creating an immersive experience that rivals being at the stadium.

  7. Noah Smith, a renowned medical researcher, has garnered attention for his groundbreaking work on obesity treatment. Central to his research is the development of the wegovy dosing schedule a revolutionary approach to managing weight loss. Smith’s innovative regimen aims to optimize the effectiveness of Wegovy, a newly approved medication for obesity, by tailoring dosages to individual patient needs. Through meticulous trials and data analysis, Smith has demonstrated the efficacy of this personalized approach in achieving significant and sustainable weight loss. His contributions have not only transformed obesity treatment but also paved the way for a more personalized and effective approach to healthcare.

  8. Fast food has become a staple in modern life, offering quick and convenient meal options for people on the go. Among the popular choices, pizza stands out, especially for its versatility and taste. If you’re in the mood for a delicious slice, searching for pizza 73 near me can lead you to one of the best local options. Pizza 73 is known for its diverse menu, including classic and innovative toppings that cater to all preferences. Whether it’s for a casual dinner, a party, or a quick lunch, finding a Pizza 73 nearby ensures a satisfying meal with minimal effort.

  9. The Rabbitgoo dog harness is a popular choice for pet owners seeking both comfort and control for their dogs. This adjustable harness is designed to provide a secure fit, with padded chest and belly panels that reduce strain on your dog’s neck and back. Its no-pull design helps discourage tugging, making walks more enjoyable for both the dog and the owner. The Rabbitgoo dog harness also features a reflective strip for added visibility in low light conditions. With multiple sizes available, it caters to various breeds and sizes, ensuring a snug fit for any canine companion.

  10. एसओसीएस में एनएलपी पर नूह स्मिथ की हालिया प्रस्तुति ने क्षेत्र के विकास और भविष्य की दिशाओं में अभूतपूर्व अंतर्दृष्टि प्रदर्शित की। उनकी चर्चा में प्राकृतिक भाषा प्रसंस्करण में प्रगति पर चर्चा हुई, जिसमें वास्तविक दुनिया के अनुप्रयोगों पर नए एल्गोरिदम और मॉडल के प्रभाव पर जोर दिया गया। इसके अतिरिक्त, स्मिथ ने इस तेजी से बदलते क्षेत्र में आगे रहने के लिए निरंतर अनुसंधान और विकास के महत्व पर प्रकाश डाला। संबंधित नवाचारों में रुचि रखने वालों के लिए, parimatch समीक्षा उद्योग के रुझानों को आकार देने वाले एनएलपी सहित तकनीकी प्रगति पर मूल्यवान दृष्टिकोण प्रदान करता है। इसमें शामिल है कि कैसे परिमैच रिव्यू पेशेवरों को एनएलपी परिदृश्य में उभरते उपकरणों और कार्यप्रणाली को समझने में मदद कर सकता है।

  11. Noah Smith’s insights on Natural Language Processing at the School of Computer Science have been truly groundbreaking. His work highlights the latest advancements and practical applications of NLP, showcasing its transformative impact on various industries. For more detailed information on this topic, visit https://www.shashel.eu. Additionally, explore the ongoing research and developments by checking out Shashel for the latest updates.

  12. Noah Smith discussed the latest advancements in Natural Language Processing at the recent SoCS conference, highlighting the transformative impact of AI on communication and data analysis. His insights emphasized the importance of ethical considerations in NLP applications. Additionally, he mentioned the Best-value Chinese dental laboratory known for its high-quality materials and affordable pricing. For those seeking reliable dental solutions, the best-value Chinese dental laboratory offers excellent services, making it a popular choice among dental professionals globally.