Xiaol.x
Researcher, this channel is just my reading list.
Currently, my research focuses on RNNs in large language models (LLMs). I’m interested in exploring how to adapt transformer-based models, like the 671B R1, to use RNN attention. You can check out my ongoing work on ARWKV here.
https://huggingface.co/papers/2501.15570
X: https://x.com/xiaolGo
Papers: https://scholar.google.com/citations?user=TPJYxnkAAAAJ

Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV

Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Engineering

Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

Don't Pay Attention

High-Dimensional Data Analysis with Low-Dimensional Models Principles, Computation, and Applications

pLSTM: parallelizable Linear Source Transition Mark networks

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models

Low-Rank Thinning

Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention

Spurious Rewards: Rethinking Training Signals in RLVR

How we built our multi-agent research system

Google's Approach for Secure AI Agents

Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Self-Challenging Language Model Agents

Top AI Papers of the Week from elvis

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Sampling 3D Molecular Conformers with Diffusion Transformers

Inherently Faithful Attention Maps for Vision Transformers

Transformers Learn Faster with Semantic Focus

Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity

RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models

Attention-Only Transformers via Unrolled Subspace Denoising

Pursuing the Nature of Intelligence

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Reinforcement Learning Teachers of Test Time Scaling

Solving Inequality Proofs with Large Language Models

What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers

TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

RWKV-IF: Efficient and Controllable RNA Inverse Folding via Attention-Free Language Modeling