Looking Back on 2025 (and Ahead to 2026)

Happy New Year 2026! I honestly cannot believe it is already another year. Looking back, 2025 feels like it passed in a blur of late nights, deadlines, competitions, and moments that quietly changed how I think about learning. This blog became my way of slowing things down. Each post captured something I was wrestling with at the time, whether it was research, language, or figuring out what comes next after high school. As I look back on what I wrote in 2025 and look ahead to 2026, this post is both a reflection and a reset.

That sense of reflection shaped how I wrote this year. Many of my early posts grew out of moments where I wished someone had explained a process more clearly when I was starting out.

Personal Growth and Practical Guides

Some of my 2025 writing focused on making opportunities feel more accessible. I wrote about publishing STEM research as a high school student and tried to break down the parts that felt intimidating at first, like where to submit and what “reputable” actually means in practice.

I also shared recommendations for summer programs and activities in computational linguistics, pulling from what I applied to, what I learned, and what I wish I had known earlier. Writing these posts helped me realize how much “figuring it out” is part of the process.

As I got more comfortable sharing advice, my posts started to shift outward. Instead of only focusing on how to get into research, I began asking bigger questions about how language technology shows up in real life.

Research and Real-World Application

In the first few months of the year, I stepped back from posting as school, VEX Robotics World Championship, and research demanded more of my attention. When I came back, one of the posts that felt most meaningful to write was Back From Hibernation. In it, I reflected on how sustained effort turned into a tangible outcome: a co-authored paper accepted to a NAACL 2025 workshop.

Working with my co-author and mentor, Sidney Wong, taught me a lot about the research process, especially how to respond thoughtfully to committee feedback and refine a paper through a careful round of revision. More than anything, that experience showed me what academic research looks like beyond the initial idea. It is iterative, collaborative, and grounded in clarity.

Later posts explored the intersection of language technology and society. I wrote about AI resume scanners and the ethical tensions they raise, especially when automation meets human judgment. I also reflected on applications of NLP in recommender systems after following work presented at RecSys 2025, which expanded my view of where computational linguistics appears beyond the examples people usually cite.

Another recurring thread was how students, especially high school students, can connect with professors for research. Writing about that made me more intentional about how I approach academic communities, not just as someone trying to get a yes, but as someone who genuinely wants to learn.

Those topics were not abstract for me. In 2025, I also got to apply these ideas through Student Echo, my nonprofit focused on listening to student voices at scale.

Student Echo and Hearing What Students Mean

Two of the most meaningful posts I wrote this year were about Student Echo projects where we used large language models to help educators understand open-ended survey responses.

In Using LLMs to Hear What Students Are Really Saying, I shared how I led a Student Echo collaboration with the Lake Washington School District, supported by district leadership and my principal, to extract insights from comments that are often overlooked because they are difficult to analyze at scale. The goal was simple but ambitious: use language models to surface what students care about, where they are struggling, and what they wish could be different.

In AI-Driven Insights from the Class of 2025 Senior Exit Survey, I wrote about collaborating with Redmond High School to analyze responses from the senior exit survey. What stood out to me was how practical the insights became once open-ended text was treated seriously, from clearer graduation task organization to more targeted counselor support.

Writing these posts helped me connect abstract AI ideas to something grounded and real. When used responsibly, these tools can help educators listen to students more clearly.

Not all of my learning in 2025 happened through writing or research, though. Some of the most intense lessons happened in the loudest places possible.

Robotics and Real-World Teamwork

A major part of my year was VEX Robotics. In my VEX Worlds 2025 recap, I wrote about what it felt like to compete globally with my team, Ex Machina, after winning our state championship. The experience forced me to take teamwork seriously in a way that is hard to replicate anywhere else. Design matters, but communication and adaptability matter just as much.

In another post, I reflected on gearing up for VEX Worlds 2026 in St. Louis. That one felt more reflective, not just because of the competition ahead, but because it made me think about what it means to stay committed to a team while everything else in life is changing quickly.

Experiences like VEX pushed me to think beyond my own projects. That curiosity carried into academic spaces as well.

Conferences and Big Ideas

Attending SCiL 2025 was my first real academic conference, and writing about it helped me process how different it felt from school assignments. I also reflected on changes to arXiv policy and what they might mean for openness in research. These posts marked a shift from learning content to thinking about how research itself is structured and shared.

Looking across these posts now, from robotics competitions to survey analytics to research reflections, patterns start to emerge.

Themes That Defined My Year

Across everything I wrote in 2025, a few ideas kept resurfacing:

  • A consistent interest in how language and AI intersect in the real world
  • A desire to make complex paths feel more navigable for other students
  • A growing appreciation for the human side of technical work, including context, trust, and listening

2025 taught me as much outside the classroom as inside it. This blog became a record of that learning.

Looking Toward 2026

As 2026 begins, I see this blog less as a record of accomplishments and more as a space for continued exploration. I am heading into the next phase of my education with more questions than answers, and I am okay with that. I want to keep writing about what I am learning, where I struggle, and how ideas from language, AI, and engineering connect in unexpected ways. If 2025 was about discovering what I care about, then 2026 is about going deeper, staying curious, and building with intention.

Thanks for reading along so far. I am excited to see where this next year leads.

— Andrew

4,600 hits

From Human Chatbots to Whale and Bird Talk: The Surprising Rise of Bio-Acoustic NLP in 2025

As a high school student passionate about computational linguistics, I find it amazing how the same technologies that power our everyday chatbots and voice assistants are now being used to decode animal sounds. This emerging area blends bioacoustics (the study of animal vocalizations) with natural language processing (NLP) and machine learning. Researchers are starting to treat animal calls almost like a form of language, analyzing them for patterns, individual identities, species classification, and even possible meanings.

Animal vocalizations do not use words the way humans do, but they frequently show structure, repetition, and context-dependent variation, features that remind us of linguistic properties in human speech.

A Highlight from ACL 2025: Monkey Voices Get the AI Treatment

One of the most interesting papers presented at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), the leading conference in our field, focuses directly on this topic.

Paper title: “Acoustic Individual Identification of White-Faced Capuchin Monkeys Using Joint Multi-Species Embeddings

Authors: Álvaro Vega-Hidalgo, Artem Abzaliev, Thore Bergman, Rada Mihalcea (University of Michigan)

What the paper covers

White-faced capuchin monkeys each have a unique vocal signature. Being able to identify which individual is calling is valuable for studying their social structures, kinship, and conservation efforts.

The main difficulty is the lack of large labeled datasets for wild or rare species. Human speech has massive annotated corpora, but animal data is much scarcer.

The researchers address this through cross-species pre-training, a transfer learning strategy. They take acoustic embedding models (essentially sound “fingerprints”) pre-trained on: (1) Extensive human speech data and (2) Large-scale bird call datasets.

These models are then applied to white-faced capuchin vocalizations, even though the original training never included capuchin sounds.

Key findings

  • Embeddings derived from human speech and bird calls transferred surprisingly well to monkey vocalizations.
  • Combining multi-species representations (joint embeddings) improved identification accuracy further.

This demonstrates how knowledge from one domain can help another distant one, similar to how learning one human language can make it easier to pick up a related one. It offers a practical solution to the data scarcity problem that often limits animal bioacoustics research.

This paper was one of 22 contributions from the University of Michigan’s Computer Science and Engineering group at ACL 2025, showing how far computational linguistics has expanded beyond traditional human text and speech.

Another ACL 2025 Contribution: Exploring Dog Communication

ACL 2025 also included “Toward Automatic Discovery of a Canine Phonetic Alphabet” by Theron S. Wang and colleagues. The work investigates the phonetic-like building blocks in dog vocalizations and aims to discover them automatically. This is an early step toward analyzing dog sounds in a more structured, language-inspired framework.

Why This Matters

  • Conservation applications — Automated systems can monitor endangered species like whales or rare birds continuously, reducing the need for long-term human fieldwork in remote locations.
  • Insights into animal communication — Researchers are beginning to test whether calls follow rule-based patterns or convey specific information (about food, threats, or social bonds), much like how humans use syntax and intonation.
  • Transfer of AI techniques — Models originally built for human speech transfer effectively to other species. New foundation models in 2025 (e.g., like NatureLM-audio) even handle thousands of animal species and support natural language queries such as “What bird is calling here?”

While these ACL 2025 papers represent cutting-edge academic work, the broader field is gaining momentum, with related discussions appearing in events like the 2025 NeurIPS workshop on AI for Non-Human Animal Communication.

This area is growing rapidly thanks to better data availability and stronger models. In the coming years, we might see practical tools that help interpret bird alarm calls or monitor ocean ecosystems through whale vocalizations.

What do you think? Would you be excited to build a simple AI tool to analyze your pet’s sounds or contribute to dolphin communication research? Computational linguistics is moving far beyond chatbots. It is now helping us listen to the voices of the entire planet.

Thanks for reading. I’d love to hear your thoughts in the comments!

— Andrew

4,600 hits

From VEX Robotics to Silicon Valley: Why Physical Intelligence Is Harder Than It Looks

According to ACM TechNews (Wednesday, December 17, 2025), ACM Fellow Rodney Brooks argues that Silicon Valley’s current obsession with humanoid robots is misguided and overhyped. Drawing on decades of experience, he contends that general-purpose, humanlike robots remain far from practical, unsafe to deploy widely, and unlikely to achieve human-level dexterity in the near future. Brooks cautions that investors are confusing impressive demonstrations and AI training techniques with genuine real-world capability. Instead, he argues that meaningful progress will come from specialized, task-focused robots designed to work alongside humans rather than replace them. The original report was published in The New York Times under the title “Rodney Brooks, the Godfather of Modern Robotics, Says the Field Has Lost Its Way.”

I read the New York Times coverage of Rodney Brooks’ argument that Silicon Valley’s current enthusiasm for humanoid robots is likely to end in disappointment. Brooks is widely respected in the robotics community. He co-founded iRobot and has played a major role in shaping modern robotics research. His critique is not anti-technology rhetoric but a perspective grounded in long experience with the practical challenges of engineering physical systems. He makes a similar case in his blog post, “Why Today’s Humanoids Won’t Learn Dexterity”.

Here’s what his core points seem to be:

Why he thinks this boom will fizzle

  • The industry is betting huge sums on general-purpose humanoid robots that can do everything humans do—walk, manipulate objects, adapt to new tasks—based on current AI methods. Brooks argues that belief in this near-term is “pure fantasy” because we still lack the basic sensing and physical dexterity that humans take for granted.
  • He emphasizes that visual data and generative models aren’t a substitute for true touch sensing and force control. Current training methods can’t teach a robot to use its hands with the precision and adaptation humans have.
  • Safety and practicality matter too. Humanoid robots that fall or make a mistake could be dangerous around people, which slows deployment and commercial acceptance.
  • He expects a big hype phase followed by a trough of disappointment—a period where money flows out of the industry because the technology hasn’t lived up to its promises.

Where I agree with him

I think Brooks is right that engineering the physical world is harder than it looks. Software breakthroughs like large language models (LLMs) are impressive, but even brilliant language AI doesn’t give a robot the equivalent of muscle, touch, balance, and real-world adaptability. Robots that excel at one narrow task (like warehouse arms or autonomous vacuum cleaners) don’t generalize to ambiguous, unpredictable environments like a home or workplace the way vision-based AI proponents hope. The history of robotics is full of examples where clever demos got headlines long before practical systems were ready.

It would be naive to assume that because AI is making rapid progress in language and perception, physical autonomy will follow instantly with the same methods.

Where I think he might be too pessimistic

Fully dismissing the long-term potential of humanoid robots seems premature. Complex technology transitions often take longer and go in unexpected directions. For example, self-driving cars have taken far longer than early boosters predicted, but we are seeing incremental deployments in constrained zones. Humanoid robots could follow a similar curve: rather than arriving as general-purpose helpers, they may find niches first (healthcare support, logistics, elder care) where the environment and task structure make success easier. Brooks acknowledges that robots will work with humans, but probably not in a human look-alike form in everyday life for decades.

Also, breakthroughs can come from surprising angles. It’s too soon to say that current research paths won’t yield solutions to manipulation, balance, and safety, even if those solutions aren’t obvious yet.

Bottom line

Brooks’ critique is not knee-jerk pessimism. It is a realistic engineering assessment grounded in decades of robotics experience. He is right to question hype and to emphasize that physical intelligence is fundamentally different from digital intelligence.

My experience in VEX Robotics reinforces many of his concerns, even though VEX robots are not humanoid. Building competition robots showed me how fragile physical systems can be. Small changes in friction, battery voltage, alignment, or field conditions routinely caused failures that no amount of clever code could fully anticipate. Success came from tightly scoped designs, extensive iteration, and task-specific mechanisms rather than general intelligence. That contrast makes the current humanoid hype feel misaligned with how robotics actually progresses in practice, where reliability and constraint matter more than appearance or breadth.

Dismissing the possibility of humanoid robots entirely may be too strict, but expecting rapid, general-purpose success is equally misguided. Progress will likely be slower, more specialized, and far less dramatic than Silicon Valley forecasts suggest.

— Andrew

4,600 hits

A Short Guide to Understanding NeurIPS 2025 Through Three Key Reports

Introduction

NeurIPS (Neural Information Processing Systems) 2025 brought together the global machine learning community for its thirty ninth annual meeting. It represents both continuity and change in the world’s premier machine learning conference. Held December 2 to 7 in San Diego, with a simultaneous secondary site in Mexico City, the conference drew enormous attention from researchers across academia, industry, and policy. The scale was striking. There were more than 21,575 submissions and over 5,200 accepted papers, which placed the acceptance rate at about 24.5 percent. With such breadth, NeurIPS 2025 offered a detailed look at the current state of AI research and the questions shaping its future.

Why I Follow the Conference

Even though my senior year has been filled with college applications and demanding coursework, I continue to follow NeurIPS closely because it connects directly to my future interests in computational linguistics and NLP. Reading every paper is unrealistic, but understanding the broader themes is still possible. For students or early researchers who want to stay informed without diving into thousands of pages, the following three references are especially helpful.

References:

  1. NeurIPS 2025: A Guide to Key Papers, Trends & Stats (Intuition Labs)
  2. Trends in AI at NeurIPS 2025 (Medium)
  3. At AI’s biggest gathering, its inner workings remain a mystery (NBC News)

Executive Summary of the Three Reports

1. Intuition Labs: Key Papers, Trends, and Statistics

The Intuition Labs summary of NeurIPS 2025 is a detailed, professionally structured report that provides a comprehensive overview of the conference. It opens with an Executive Summary highlighting key statistics, trends, awards, and societal themes, followed by sections on Introduction and Background, NeurIPS 2025 Organization and Scope (covering dates, venues, scale, and comparisons to prior years), and Submission and Review Process (with subsections on statistics, responsible practices, and ethics).

The report then delves into the core content through Technical Program Highlights (key themes, notable papers, and interdisciplinary bridging), Community and Social Aspects (affinity events, workshops, industry involvement, and conference life), Data and Evidence: Trends Analysis, Case Studies and Examples (including the best paper on gated attention and an invited talk panel), Implications and Future Directions, and a concluding section that reflects on the event’s significance. This logical flow, from context and logistics to technical depth, community, evidence, specifics, and forward-looking insights, makes it an ideal reference for understanding the conference’s breadth and maturation of AI research. It is a helpful summary for readers who want both numbers and high level insights.

2. Medium: Trends in AI at NeurIPS 2025

This article highlights key trends observed at NeurIPS 2025 through workshops, signaling AI’s maturation beyond text-based models. Major themes include embodied AI in physical/biological realms (e.g., animal communication via bioacoustics, health applications with regulatory focus, robotic world models, spatial reasoning, brain-body foundations, and urban/infrastructure optimization); reliability and interpretability (robustness against unreliable data, regulatable designs, mechanistic interpretability of model internals, and lifecycle-aware LLM evaluations); advanced reasoning and agents (multi-turn interactions, unified language-agent-world models, continual updates, mathematical/logical reasoning, and scientific discovery); and core theoretical advancements (optimization dynamics, structured graphs, and causality).

The author concludes that AI is evolving into situated ecosystems integrating biology, cities, and agents, prioritizing structure, geometry, causality, and protective policies alongside innovation, rather than pure scaling.

3. NBC News: The Challenge of Understanding AI Systems

NBC News focuses on a different but equally important issue. Even with rapid performance gains, researchers remain unsure about what drives model behavior. Many noted that interpretability is far behind capability growth. The article describes concerns about the lack of clear causal explanations for model outputs and the difficulty of ensuring safety when internal processes are not fully understood. Several researchers emphasized that the field needs better tools for understanding neural networks before deploying them widely. This tension between rapid advancement and limited interpretability shaped many of the conversations at NeurIPS 2025.

For Further Exploration

For readers who want to explore the conference directly, the NeurIPS 2025 website provides access to papers, schedules, and workshop materials:
https://neurips.cc/Conferences/2025

— Andrew

4,600 hits

How AI and Computational Linguistics Are Unlocking Medieval Jewish History

On December 3 (2025), ACM TechNews featured a story about a groundbreaking use of artificial intelligence in historical and linguistic research. It referred to an earlier report “Vast trove of medieval Jewish records opened up by AI” from Reuters. The article described a new project applying AI to the Cairo Geniza, a massive archive of medieval Jewish manuscripts that spans nearly one thousand years. These texts were preserved in a synagogue storeroom and contain records of daily life, legal matters, trade, personal letters, religious study, and community events.

The goal of the project is simple in theory and monumental in practice. Researchers are training an AI system to read, transcribe, and organize hundreds of thousands of handwritten documents. This would allow scholars to access the material far more quickly than traditional methods permit.


Handwriting Recognition for Historical Scripts

Computational linguistics plays a direct role in how machines learn to read ancient handwriting. AI models can be taught to detect character shapes, page layouts, and writing patterns even when the script varies from one writer to another or comes from a style no longer taught today. This helps the system replicate the work of experts who have spent years studying how historical scripts evolved.


Making the Text Searchable and Comparable

Once the handwriting is converted to text, another challenge begins. Historical manuscripts often use non standard spelling, abbreviations, and inconsistent grammar. Computational tools can normalize these differences, allowing researchers to search archives accurately and evaluate patterns that would be difficult to notice manually.


Extracting Meaning Through NLP

After transcription and normalization, natural language processing tools can identify names, dates, locations, and recurring themes in the documents. This turns raw text into organized data that supports historical analysis. Researchers can explore how people, places, and ideas were connected across time and geography.


Handling Multiple Languages and Scripts

The Cairo Geniza contains material written in Hebrew, Arabic, Aramaic, and Yiddish. A transcription system must recognize and handle multiple scripts, alphabets, and grammatical structures. Computational linguistics enables the AI to adapt to these differences so the dataset becomes accessible as a unified resource.


Restoring Damaged Manuscripts

Many texts are incomplete because of age and physical deterioration. Modern work in ancient text restoration uses machine learning models to predict missing letters or words based on context and surrounding information. This helps scholars reconstruct documents that might otherwise remain fragmented.


Why This Matters for Researchers and the Public

AI allows scholars to process these manuscripts on a scale that would not be feasible through manual transcription alone. Once searchable, the collection becomes a resource for historians, linguists, and genealogists. Connections between communities and individuals can be explored in ways that were not possible before. Articles about the project suggest that this could lead to a mapping of relationships similar to a historical social graph.

This technology also expands access beyond expert scholars. Students, teachers, local historians, and interested readers may one day explore the material in a clear and searchable form. If automated translation improves alongside transcription, the archive could become accessible to a global audience.


Looking Ahead

This project is a strong example of how computational linguistics can support the humanities. It shows how tools developed for modern language tasks can be applied to cultural heritage, historical research, and community memory. AI is not replacing the work of historians. Instead, it is helping uncover material that scholars would never have time to process on their own.

Projects like this remind us that the intersection of language and technology is not only changing the future. It is now offering a deeper look into the past.

— Andrew

4,600 hits

AI Sycophancy: When Our Chatbots Say “Yes” Instead of “Why”

“I asked ChatGPT to check my argument and it just kept agreeing with me.”
“Gemini told me my logic was solid even when I knew it wasn’t.”
“Grok feels like a hype-man, not a thinking partner.”

These are the kinds of comments I keep seeing from my school friends who feel that modern AI tools are becoming too agreeable for their own good. Instead of challenging flawed reasoning or offering alternative perspectives, many chatbots default to affirmation. This behavior has a name: AI sycophancy. The term does not originate from me. It comes from recent research and ongoing conversations in the AI community, where scholars are identifying a growing tendency for AI systems to prioritize user approval over honest reasoning.

At first glance, this might feel harmless or even comforting. After all, who does not like being told they are right? But beneath that friendliness lies a deeper problem that affects how we learn, decide, and think.


What is AI Sycophancy?

AI sycophancy refers to a pattern in which an AI system aligns its responses too closely with a user’s expressed beliefs or desires, even when those beliefs conflict with evidence or logic. Rather than acting as an independent evaluator, the model becomes a mirror.

For example, a user might say, “I think this argument is correct. Do you agree?” and the model responds with enthusiastic confirmation instead of critical analysis. Or the system might soften disagreement so much that it effectively disappears. Recent research from Northeastern University confirms that this behavior is measurable and problematic. Their report, The AI industry has a problem: Chatbots are too nice, shows that when models alter their reasoning to match a user’s stance, their overall accuracy and rationality decline.
https://news.northeastern.edu/2025/11/24/ai-sycophancy-research/


Why Does It Exist?

Several forces contribute to the rise of AI sycophancy:

  • Training incentives and reward systems.
    Many models are optimized to be helpful, polite, and pleasant. When user satisfaction is a core metric, models learn that agreement often leads to positive feedback.
  • User expectations.
    People tend to treat chatbots as friendly companions rather than critical reviewers. When users express certainty, the model often mirrors that confidence instead of questioning it.
  • Alignment trade-offs.
    The Northeastern team highlights a tension between sounding human and being rational. In attempting to appear empathetic and affirming, the model sometimes sacrifices analytical rigor.
  • Ambiguous subject matter.
    In questions involving ethics, predictions, or subjective judgment, models may default to agreement rather than risk appearing confrontational or incorrect.

What Are the Impacts?

The consequences of AI sycophancy extend beyond mild annoyance.

  • Weakened critical thinking.
    Students who rely on AI for feedback may miss opportunities to confront their own misconceptions.
  • Lower reasoning quality.
    The Northeastern study found that adjusting answers to match user beliefs correlates with poorer logic and increased error rates.
  • Risk in high-stakes contexts.
    In healthcare, policy, or education, an overly agreeable AI can reinforce flawed assumptions and lead to harmful decisions.
  • False confidence.
    When AI consistently affirms users, it creates an illusion of correctness that discourages self-reflection.
  • Ethical concerns.
    A system that never challenges bias or misinformation becomes complicit in reinforcing it.

How to Measure and Correct It

Measuring sycophancy

Researchers measure sycophancy by observing how much a model shifts its answer after a user asserts a belief. A typical approach involves:

  • Presenting the model with a scenario and collecting its initial judgment.
  • Repeating the scenario alongside a strong user opinion or belief.
  • Comparing the degree to which the model’s stance moves toward the user’s position.
  • Evaluating whether the reasoning quality improves, stays stable, or deteriorates.

The greater the shift without supporting evidence, the higher the sycophancy score.


Correcting the behavior

Several strategies show promise:

  • Penalize agreement that lacks evidence during training.
  • Encourage prompts that demand critique or alternative views.
  • Require models to express uncertainty or justify reasoning steps.
  • Educate users to value disagreement as a feature rather than a flaw.
  • Use multi-agent systems where one model challenges another.
  • Continuously track and adjust sycophancy metrics in deployed systems.

Why This Matters to Me as a Student

As someone preparing to study computational linguistics and NLP, I want AI to help sharpen my thinking, not dull it. If my research assistant simply validates every claim I make, I risk building arguments that collapse under scrutiny. In chess, improvement only happens through strong opposition. The same is true for intellectual growth. Agreement without resistance is not growth. It is stagnation.

Whether I am analyzing Twitch language patterns or refining a research hypothesis, I need technology that questions me, not one that treats every idea as brilliant.


Final Thought

The Northeastern research reminds us that politeness is not the same as intelligence. A chatbot that constantly reassures us might feel supportive, but it undermines the very reason we turn to AI in the first place. We do not need machines that echo our beliefs. We need machines that help us think better.

AI should challenge us thoughtfully, disagree respectfully, and remain grounded in evidence. Anything less turns a powerful tool into a flattering reflection.

— Andrew

4,600 hits

How Chatbots Understand Us: Exploring the Basics of Natural Language Processing (NLP)

If you’ve ever asked Siri a question, chatted with a customer support bot, or played around with ChatGPT, you’ve already seen natural language processing (NLP) in action. But have you ever wondered: How do these systems actually understand what I’m saying? That question is what first got me curious about NLP, and now, as a high school student diving into computational linguistics, I want to break it down for others who might be wondering too.


What Is NLP?

Natural Language Processing is a branch of artificial intelligence (AI) that helps computers understand, interpret, and generate human language. It allows machines to read text, hear speech, figure out what it means, and respond in a way that (hopefully) makes sense.

NLP is used in tons of everyday tools and apps, like:

  • Chatbots and virtual assistants (Siri, Alexa, Google Assistant)
  • Translation tools (Google Translate)
  • Grammar checkers (like Grammarly)
  • Sentiment analysis (used by companies to understand customer reviews)
  • Smart email suggestions (like Gmail’s autocomplete)

How Do Chatbots Understand Language?

Here’s a simplified view of what happens when you talk to a chatbot:

1. Text Input

You say something like: “What’s the weather like today?”
If it’s a voice assistant, your speech is first turned into text through speech recognition.

2. Tokenization

The text gets split into chunks called tokens (usually words or phrases). So that sentence becomes:
[“What”, “’s”, “the”, “weather”, “like”, “today”, “?”]

3. Understanding Intent and Context

The chatbot has to figure out what you mean. Is this a question? A request? Does “weather” refer to the forecast or something else?

This part usually involves models trained on huge amounts of text data, which learn patterns of how people use language.

4. Generating a Response

Once the bot understands your intent, it decides how to respond. Maybe it retrieves information from a weather API or generates a sentence like “Today’s forecast is sunny with a high of 75°F.”

All of this happens in just a few seconds.


Some Key Concepts in NLP

If you’re curious to dig deeper into how this all works, here are a few beginner-friendly concepts to explore:

  • Syntax and Parsing: Figuring out sentence structure (nouns, verbs, grammar rules)
  • Semantics: Understanding meaning and context
  • Named Entity Recognition (NER): Detecting names, dates, locations in a sentence
  • Language Models: Tools like GPT or BERT that learn how language works from huge datasets
  • Word Embeddings: Representing words as vectors so computers can understand similarity (like “king” and “queen” being close together in vector space)

Why This Matters to Me

My interest in NLP and computational linguistics started with my nonprofit work at Student Echo, where we use AI to analyze student survey responses. Since then, I’ve explored research topics like sentiment analysis, LLMs vs. neural networks, and even co-authored a paper accepted at a NAACL 2025 workshop. I also use tools like Zotero to manage my reading and citations, something I wish I had known earlier.

What excites me most is how NLP combines computer science with human language. I’m especially drawn to the possibilities of using NLP to better understand online communication (like on Twitch) or help preserve endangered languages.


Final Thoughts

So the next time you talk to a chatbot, you’ll know there’s a lot going on behind the scenes. NLP is a powerful mix of linguistics and computer science, and it’s also a really fun space to explore as a student.

If you’re curious about getting started, try exploring Python, open-source NLP libraries like spaCy or NLTK, or even just reading research papers. It’s okay to take small steps. I’ve been there too. 🙂

— Andrew

4,600 hits

When Filters Meet Freedom: Reflections on arXiv’s New Review Article and Position Paper Policy

Introduction

On October 31, 2025, arXiv announced a major change for computer science submissions titled Updated Practice for Review Articles and Position Papers in the arXiv CS Category.” The new rule means that authors can no longer freely upload review or position papers unless those papers have already been accepted through peer review at a recognized venue, like a journal or a top conference. The goal, according to arXiv, is to reduce the growing flood of low-quality review and position papers while focusing attention on those that have been properly vetted.

In other words, arXiv is raising the bar. The change aims to make it easier for readers to find credible, expert-driven papers while reducing the moderation burden caused by the recent surge in AI-assisted writing.

As someone who reads, cites, and learns from arXiv papers and as the author of an arXiv publication myself (A Bag-of-Sounds Approach to Multimodal Hate Speech Detection), I find this policy both reasonable and limiting. My own paper does not fall under the category of a review article or position paper, but being part of the author community gives me a closer view of how changes like this affect researchers across different stages. Below are my thoughts on what works about this update and what could be improved.


What Makes Sense

1. Quality control is important.
arXiv’s moderators have faced an explosion of review and position papers lately, especially as tools like ChatGPT make it simple to write large-scale summaries. Requiring prior peer review helps ensure that papers go beyond surface-level summaries and present well-supported insights.

2. It helps readers find reliable content.
This new policy should make it easier to find review and position papers that genuinely analyze the state of a field rather than just list references. Readers can trust that what they find has passed at least one layer of expert evaluation.

3. It protects the reputation of arXiv.
As arXiv grows, maintaining its credibility becomes harder. This rule shows that the platform wants to stay a trusted place for research, not a dumping ground for half-finished work.


What Feels Too Restrictive

1. Delayed sharing of ideas.
In fast-moving areas like AI, a good review or position paper is often most useful before it goes through months of peer review. Requiring acceptance first makes timely discussions harder and risks leaving out emerging voices.

2. Peer review is not always a perfect filter.
Some peer-reviewed papers lack depth, while others that are innovative struggle to get published. Using acceptance as the only sign of quality ignores the many great works still in progress.

3. It discourages open discussion.
Position papers often spark important debates or propose new frameworks. If they cannot be shared until they are formally accepted, the whole community loses the chance to discuss and refine them early on.

4. It creates fairness issues.
Not every subfield has equally strong conference or journal opportunities. This policy could unintentionally exclude researchers from smaller or less well-funded institutions.


My Take

I see why arXiv made this move. The moderation workload has likely become overwhelming, and the quality of submissions needs consistent standards. But I think the solution is too rigid. Instead of blocking all unreviewed papers, arXiv could build a middle ground.

For example:

  • Let trusted researchers or groups submit unreviewed drafts that are clearly labeled as “pre-peer review.”
  • Introduce a “community-reviewed” label based on endorsements or expert feedback.
  • Create a temporary category where papers can stay for a limited time before being moved or archived.

This would preserve openness while keeping quality high.


Closing Thoughts

The tension between openness and quality control is not new, but AI and easy content creation have made it sharper. I believe arXiv’s new policy has good intentions, but it risks slowing collaboration and innovation if applied too strictly.

The best research environments are the ones that combine trust, feedback, and access. Hopefully, arXiv will keep experimenting until it finds a balance that protects quality without closing the door on fresh ideas.

— Andrew

4,600 hits

The Collins Word of the Year and Why It Matters for Computational Linguistics

Every year, a single word captures the moment when language and culture meet. Sometimes it comes from politics, sometimes from technology, but it always tells a story about how people think and communicate. As someone drawn to both words and code, I see each new “Word of the Year” as more than a headline. It’s data, meaning, and evolution all at once.

As I prepare to study Computational Linguistics in college, I have been paying attention not only to algorithms and corpora but also to the ways language changes around us. One of the most interesting reflections of that change is the annual “Word of the Year” chosen by Collins Dictionary. In this post, I’ll review the past ten years of Collins’ selections, explain how the 2025 Word of the Year was chosen (including the shortlist), and discuss why this matters for computational linguistics.


Past Ten Years of Collins Word of the Year

YearWord of the YearBrief explanation
2016BrexitCaptured the UK’s vote to leave the EU and its wide political, social, and linguistic effects.
2017fake newsReflected the rise of misinformation and debates about truth in media.
2018single-useHighlighted environmental awareness and discussions about disposable culture.
2019climate strikeDescribed global youth activism inspired by Greta Thunberg and climate movements.
2020lockdownDefined the year of the Covid-19 pandemic and its global restrictions.
2021NFTStood for “non-fungible token” and represented the emergence of digital assets and blockchain culture.
2022permacrisisDescribed a long period of instability and uncertainty, fitting the global mood.
2023AIRepresented artificial intelligence becoming central to everyday conversation.
2024bratCaptured the confident, independent attitude popularized by youth culture and pop music.
2025vibe codingDescribed the blending of language and technology through conversational code creation.

The 2025 Word of the Year: vibe coding

For 2025, Collins Dictionary selected vibe coding as its Word of the Year. The term refers to new software development practices that use natural language and artificial intelligence to create applications by describing what one wants rather than manually writing code. It describes a form of “coding by conversation” that bridges creativity and computation.

Source: Collins Dictionary Word of the Year 2025


How Collins Selects the Word of the Year

The Collins team monitors its extensive language database throughout the year. Using large-scale corpus analysis, they track words that rise sharply in frequency or reflect cultural, political, or technological change. The process includes:

  • Lexicographic monitoring: Editors and linguists identify new or trending words across print, social media, and digital sources.
  • Corpus analysis: Statistical tools measure frequency and context to see which words stand out.
  • Editorial review: The final decision balances data and cultural relevance to choose a word that captures the spirit of the year.

Shortlist for 2025

In addition to vibe coding, this year’s shortlist includes aura farming, biohacking, broligarchy, clanker, coolcation, glaze, HENRY, micro-retirement, and taskmasking.

You can view the full list on the Collins website: https://www.collinsdictionary.com/us/woty


Why the Collins Word of the Year Matters for Computational Linguistics

As someone preparing to study Computational Linguistics, I find the Collins Word of the Year fascinating for several reasons:

  1. Language change in data
    Each year’s word shows how new vocabulary enters real-world language use. Computational linguistics often studies these changes through corpora to model meaning over time.
  2. Human-machine interaction
    Vibe coding reflects a growing trend where natural language acts as an interface between humans and technology. It is an example of how linguistic principles are now shaping software design.
  3. Semantic and cultural evolution
    The meanings of words like “brat” or “AI” evolve quickly in digital contexts. For computational linguists, tracking these semantic shifts supports research in language modeling and word embeddings.
  4. Lexicographic data as research input
    Collins’ approach mirrors computational methods. Their frequency-based analysis can inspire how we model and predict linguistic trends using data science.
  5. Pedagogical and research relevance
    New words like vibe coding demonstrate how emerging technology changes both everyday communication and the future topics of linguistic research. They show where language innovation meets computation.

Reflection

When I first read that “vibe coding” had been chosen as the 2025 Word of the Year, I couldn’t help thinking about how it perfectly represents where computational linguistics is heading. Language is no longer just a subject of study; it is becoming a tool for creation. What used to be a set of rigid commands is turning into natural conversation.

The term also reminds me that words are living data points. Each new entry in a dictionary records a shift in how people think and communicate. For future computational linguists, observing how dictionaries evolve gives insight into how models and algorithms should adapt too.

It’s easy to see the Word of the Year as a piece of pop culture, but it’s really a linguistic dataset in disguise. Every annual choice documents how society expresses what matters most at that moment, and that is what makes it so meaningful to study.


Sources and Links

— Andrew

4,600 hits

AI in Schoolwork: Difference Approaches Taken in the U.S. and China

Recently, I read an article from MIT Technology Review titled “Chinese universities want students to use more AI, not less.” It really made me think about the differences in how the U.S. and China are approaching AI in education, especially as a high school student growing up in Washington state.

In China, AI has gone from being a taboo to a toolkit in just a couple of years. University students once had to find mirror versions of ChatGPT through secondhand marketplaces and VPNs just to access the tools. Back then, professors warned students not to use AI for assignments. But now, things have completely changed.

Chinese universities are actively encouraging students to use generative AI tools, as long as they follow best practices. Professors are adding AI-specific lessons to their classes. For example, one law professor teaches students how to prompt effectively and reminds them that AI is only useful when combined with human judgment. Students are using tools like DeepSeek for everything from writing literature reviews to organizing thoughts.

This push for AI education isn’t just happening in individual classrooms. It’s backed by national policy. The Chinese Ministry of Education released guidelines in April 2025 calling for an “AI plus education” approach. The goal is to help students develop critical thinking, digital fluency, and real-world skills across all education levels. Cities like Beijing have even introduced AI instruction in K–12 schools.

In China, AI is also viewed as a key to career success. A report from YiCai found that 80 percent of job listings for recent college grads mention AI as a desired skill. So students see learning how to use AI properly as something that gives them a competitive edge in a tough job market.

That’s pretty different from what I’ve seen here in the U.S.

In July 2024, the Washington Office of Superintendent of Public Instruction (OSPI) released official guidance for AI in schools. The message isn’t about banning AI. It’s about using it responsibly. The guidance encourages human-centered learning, with values like transparency, privacy, equity, and critical thinking. Students are encouraged to use AI tools to support their learning, but not to replace it.

Instead of secretly using AI to write a paper, students in Washington are encouraged to talk openly about how and when they use it. Teachers are reminded that AI should be a support, not a shortcut. The guidance also warns about overusing AI detection tools, especially since those tools can sometimes unfairly target multilingual students.

Adding to this, a recent brain-scan study by MIT Media Lab called “Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task” raises some interesting points. Over four months, participants had their brains scanned while using ChatGPT for writing tasks. The results were surprising:

  • 83% of AI users couldn’t remember what they had just written
  • Brain activity dropped by 47% in AI users and stayed low even after stopping
  • Their writing was technically correct but described by teachers as robotic
  • ChatGPT made users 60% faster, but reduced learning-related brain activity by 32%

The group that performed the best started their work without AI and only added it later. They had stronger memory, better brain engagement, and wrote with more depth. This shows that using AI right matters. If we rely on it too much, we might actually learn less.

MIT’s full research can be found here or read the paper on arXiv. (a caveat called out by the research team: “as of June 2025, when the first paper related to the project, was uploaded to Arxiv, the preprint service, it has not yet been peer-reviewed, thus all the conclusions are to be treated with caution and as preliminary”)

So what does this all mean?

I think both China’s and our approaches have something valuable to offer. China is focused on future skills and career readiness. The U.S. is focused on ethics, fairness, and critical thinking. Personally, I believe students should be allowed to use AI in schoolwork, but with the right guidance. We should be learning how to prompt better, double-check results, and combine AI tools with our own thinking.

AI is already part of our world. Instead of hiding from it, we should be learning how to use it the right way.

You can read the full MIT Technology Review article here
Washington’s official AI guidance for schools (published July 2024) is here (PDF)

— Andrew

4,600 hits

Blog at WordPress.com.

Up ↑