I am tweaking my methodology and system tools for testing AI models.
Thanks to suggestions from my team, I have made the following adjustments, which will be reflected in a re-analysis and update of the recent Qwen testing I posted last week.
- Changes:
- Increased allowances for thinking/reasoning models in terms of response times to allow for increased thought loops and Multiple Experts (ME) models
- Increased tolerances for speed and handling concerns on the testing systems. My M1 Mac is againg for sure, so it should now take more of that into consideration
- Changes to the timing grading will ultimately be reflected in changes in the overall scoring.