Tokens And Context
Tokens are the pieces of text the AI uses to read your messages and generate replies.
Every word (or part of a word), symbol, and emoji uses tokens.
Understanding tokens and the context window helps you get longer, more consistent chats.
What are tokens?
Tokens are the units the AI uses to process text.
Every time you:
send a message
receive a reply
create a character
write a greeting, scenario, or example dialogue
…you are using tokens.
Token limits depend on your chosen AI model and your subscription tier.
How tokens affect character creation
When creating a character, every field uses tokens:
personality
greeting
scenario
background and lore
example dialogues
You can see token usage under each text box.
A common target is keeping character setup concise (often around 800–1,100 tokens). This leaves more room for the conversation.
What is a context window?
The context window is the amount of text the AI can “see” at one time. It uses that text to generate the next reply.
Think of it like a desk with limited space:
your recent messages are on the desk
the character definition is on the desk
recent replies are on the desk
generation instructions and settings are on the desk
When the desk gets full, older messages are removed to make room.
Why this matters
If an older message no longer fits, the AI may not use it. That can break continuity in long chats.
Older details stay consistent if they are:
repeated
summarized
stored using memory systems
Simple rule: the AI remembers the most recent parts best.
Why older messages get forgotten
AI does not have unlimited live memory during a chat.
As your chat grows, the system must fit all of this into the context window:
character definition
your messages
character replies
generation settings and instructions
When there is no room left, the oldest messages drop out.
Once a message is outside the current context window, the AI may no longer use it.
Why we don’t always use the model’s maximum context
Modern models can support very large context windows.
Larger context also increases:
cost
response time
compute usage
To keep Pixelchat fast and affordable, we set context limits by tier.
How the context window fills up (examples)
Your context window is shared by everything the AI needs for a reply, including:
character definition
recent messages
recent replies
generation instructions and settings
So even if your tier supports 4,096 tokens, not all 4,096 are available for chat history.
Example: Free tier (4k token context)
Let’s say:
character definition = 1,000 tokens
average message (user or bot) = ~150 tokens
That leaves roughly 3,096 tokens for recent conversation content. This is before extra formatting and instruction overhead.
At ~150 tokens per message, the AI may only keep around 20 recent messages visible. This is a rough estimate.
Exact numbers vary with message length, character setup size, and generation settings.
Example: I'm All In tier (16k token context)
With a larger context window, more of the recent chat stays visible.
In many chats, this means dozens more messages stay “in memory”. In short-message chats, it can sometimes be close to 100 previous messages.
This is an estimate, not a guarantee. Longer messages reduce how many fit.
Context window by subscription tier
This is the maximum conversation context (the total amount of text the AI can work with at once):
Free Users: Up to 4,096 tokens
Get A Taste Users: Up to 4,096 tokens
True Supporter Users: Up to 8,192 tokens
I’m All In Users: Up to 16,384 tokens
This total is shared across everything needed for the reply. It is not just your most recent messages.
How this affects long conversations
As a conversation gets longer:
the AI keeps the most recent messages
older messages may drop out of the context window
continuity can weaken over time
If an important detail was mentioned much earlier, repeat it or summarize it.
Reply tokens vs context window
These are not the same thing.
1) Context window
How much total text the AI can see and use at once.
2) Reply tokens
How many tokens the AI can spend generating a single reply.
If reply tokens are too low, responses get shorter or cut off. This can happen even with a large context window.
Reply token limits by tier
Per-reply tokens depend on your settings and subscription tier:
Free and Get A Taste Users: Up to 180 tokens per reply
True Supporter and I’m All In Users: Up to 300 tokens per reply
You can change this in Generation Settings.
Many users confuse reply length with memory.
They are related, but they are not the same thing:
Reply tokens = how long a response can be
Context window = how much the AI can “see” at once
Semantic Memory (subscriber benefit)
Subscriber tiers may also benefit from Semantic Memory 2.0.
Semantic memory helps preserve important details from earlier in the chat. It can help even when the original messages no longer fit in the context window.
It improves long-term continuity in long roleplays.
Semantic memory helps with continuity when older messages drop out.
Learn more in Semantic Memory 2.0 and Memory Manager.
Getting the most out of your tokens
Keep character definitions concise and focused
Avoid long greetings and huge example dialogues unless needed
Repeat or summarize key details in long chats
Increase reply tokens in Generation Settings for longer replies
Use higher tiers for a larger context window and better continuity
Quick summary
Tokens measure text
Reply tokens affect response length
Context window affects what the AI can use at once
In long chats, older messages may drop out of active context
Higher tiers allow larger context windows
Semantic memory helps preserve important details beyond active context
Last updated