We propose TextOp, a novel framework for real-time, interactive, text-driven humanoid robot motion generation and control. It allows users to instruct the robot using natural language and modify commands on the fly, producing smooth, whole-body motions instantly. Our system utilizes a two-system architecture for execution. At the high level, a robot motion diffusion autoregressive model processes current user text commands to generate the kinematic motion trajectory. The low level employs a universal motion tracking policy for motor control. In this way, TextOp achieves both instant responsiveness and precise robot control. TextOp is highly versatile and supports a wide range of behaviours, from simple gestures to complex motion sequences, all without pre-recorded scripts or manual programming. This approach provides a significantly more intuitive human-robot interaction paradigm, unlocking the potential for highly adaptable and easily controllable robots in real-world applications.