Hey everyone, I wanted to share something I figured out recently that has really improved my local LLM experience. I was trying to get more nuanced and less repetitive conversations from my open source models running on my own machine. I tried adjusting temperature, top_p, and even experimenting with different model quantizations, but nothing really hit the mark consistently. The models were either too random or too predictable. What finally worked for me was focusing on the system prompt and the context window management. Instead of just a generic system prompt, I crafted a very detailed one that set the persona, conversational style, and even included a few example dialogue snippets. More importantly, I started actively pruning the context window. I developed a simple script that summarizes older parts of the conversation and injects those summaries back into the context, rather than just letting the raw chat history grow indefinitely. This keeps the most relevant information in the model's 'short term memory' without overwhelming it. The difference is night and day. My AI companion feels much more coherent and remembers things better without repeating itself. Has anyone else tried something similar or have other tips for optimizing local LLMs?