Technopolis Group Governing AI-assisted evaluation in public innovation funding: where efficiency ends and judgement begins

Artificial Intelligence is already changing how funding systems process information. But when public money, innovation policy and territorial equity are at stake, the central question is not whether AI can evaluate better than humans. It is which parts of evaluation can be supported by AI without weakening fairness, additionality, confidentiality and public accountability.

Public innovation funding systems are under pressure. Calls are increasingly competitive, application volumes are growing, evaluators are scarce, and the reviewer time these processes consume is itself a substantial and rising cost (Aczel, Szaszi & Holcombe, 2021). Public authorities are expected to deliver decisions faster while maintaining procedural rigour. In this context, the appeal of Artificial Intelligence is clear: large language models and machine learning tools can process documents, identify inconsistencies, summarise complex files and support reviewers at a speed no human system can easily match.

Yet innovation funding is also a public policy instrument, and its evaluation procedures cannot be treated as purely administrative exercises. Through funding decisions, the State exercises directionality: it selects, legitimises and accelerates particular technological, economic and territorial pathways. Evaluation, in this sense, does not simply measure the intrinsic quality of applications; it also helps constitute policy direction. Choices about which projects receive support shape technological trajectories, regional development, business competitiveness and the distribution of public resources. The use of AI in evaluation must therefore be assessed not only by reference to efficiency, but also against the standards that give public funding its legitimacy in the first place: transparency, proportionality, contestability, equal treatment and human responsibility.

The debate therefore needs to move beyond the simplistic opposition between “AI versus human evaluators”. The more relevant distinction is between AI that informs a decision and AI that determines a decision.

From eligibility to merit: three layers of evaluation

Evaluation is often described as a two-step process: eligibility first, merit later. In practice, the boundary is more complex. Public innovation funding involves at least three layers of judgement, each with a different degree of codifiability and risk.

The first layer is formal eligibility. This includes verifiable criteria such as deadlines, documentation, company status, location, tax and social security compliance, expenditure thresholds or the presence of mandatory forms. These are largely rule-based checks. AI can provide clear value here, particularly when integrated into controlled administrative systems.
The second layer is interpretative compliance. This is the grey zone between eligibility and merit. It includes questions such as the reasonableness of costs, the coherence of the work plan, alignment with the objectives and priorities of the call, compliance with DNSH principles, the demonstration of the incentive effect, the avoidance of double funding, or the preliminary alignment with Smart Specialisation Strategies. These tasks are not pure merit assessment, but they already require contextual interpretation. AI can support them, but only as a supervised analytical tool.
The third layer is strategic and substantive merit. This includes novelty, additionality, breakthrough potential, territorial impact, expected externalities, implementation quality, relative ranking and contribution to policy objectives. This is where evaluation becomes normative and comparative. Merit is not an intrinsic property waiting to be detected by a model. It is the result of applying public policy criteria to a specific project, in a specific context, under a specific funding instrument.

This is where AI becomes most fragile. The decisive variable is not the technical complexity of the task but the proximity of the AI output to the administrative act of deciding: risk rises as a system moves from informing a reviewer towards determining an outcome. The operational line to draw is therefore not simply between eligibility and merit, but between support and delegation.

AI can help test, organise and challenge information, but it should not become the mechanism that scores proposals, ranks applicants or determines outcomes.

What the evidence tells us

The strongest case for AI in the evaluation of innovation funding applications is operational. Evidence from adjacent domains shows major productivity gains in document-heavy review processes. Large language models can summarise, classify and compare technical documents far more quickly than human reviewers. In real-world Portuguese deployments involving COMPETE/Portugal 2030 and the Environmental Fund, AI-supported review improved processing efficiency across both pilots; in the energy-efficiency reimbursement case specifically, reviewer productivity rose by around 20% with a negligible false-positive rate (Marques et al., 2025).

For managing authorities, the immediate gain is not better judgement but lower administrative friction: fewer missing documents, clearer internal consistency checks and more reviewer time freed for the genuinely discretionary parts of evaluation. AI can help identify missing information, flag internal inconsistencies, compare declared information against requirements and provide reviewers with structured summaries. Used well, it improves consistency and reduces the burden on evaluation teams.

However, the evidence becomes much less reassuring when AI approaches substantive judgement. Studies comparing human and AI reviewers show that AI may align with humans in identifying clearly strong or clearly weak cases, but struggles with fine-grained merit differentiation (Shcherbiak et al., 2024; Al-Ibrahim, 2024). This is particularly problematic in competitive funding calls, where the decisive choices often occur in the middle of the distribution, not at the extremes.

The evidence from implementation also points in the same direction. The “la Caixa” Foundation has piloted AI-assisted pre-screening of biomedical proposals, but with human reviewers checking rejections (Carbonell Cortés et al., 2024). In India, evidence from a national funding programme has shown that automated proposal screening can create false-negative risks, potentially excluding promising proposals before expert review (Nagarajappa et al., 2026). The Swiss National Science Foundation, by contrast, has used machine learning for reviewer matching and the analysis of review reports rather than for merit decisions (Okasa & Jorstad, 2024). The Research on Research Institute’s Funding by Algorithm handbook (Newman-Griffis et al., 2025), based on the experience of funders across several countries, also stresses that responsible use should begin with people and problems, not with the technology.

Across the cases reviewed, AI looks most defensible in administrative verification, in structuring documents and in helping reviewers prepare. Once its outputs start to approach scoring, ranking or exclusion, the evidence thins and the institutional risks climb sharply.

The risk of confusing fluency with quality

One of the most important risks is that AI may reward the quality of the application text rather than the quality of the underlying project. This is not a bias that is unique to AI: human evaluators may also be influenced by fluency, narrative coherence and the professional presentation of a proposal. The difference is that AI systems may reproduce this bias more systematically, at greater scale and with a stronger appearance of objectivity. Well-written, polished and consultancy-supported proposals may appear more credible to language-based systems, even when their substantive innovation is limited. Conversely, projects from smaller firms, less experienced applicants or low-density territories may be penalised because they are less polished, less standardised or less aligned with the vocabulary of the call.

This creates three connected biases.

First, writing bias: fluent proposals may be interpreted as better proposals.

Second, documentation bias: organisations with stronger administrative capacity may appear more robust, regardless of the actual quality of the innovation.

And finally, the strategic conformity bias: applications that reproduce the language of funding priorities (digital transition, sustainability, resilience, S3 alignment) may be favoured even when the alignment is rhetorical rather than substantive.

For cohesion policy, this is not a technical detail but a core concern, because administrative capacity is never neutral. If AI-assisted evaluation systematically rewards applicants who are better able to produce fluent, standardised and policy-aligned narratives, it may convert differences in proposal-writing capacity into differences in access to public investment — potentially turning instruments designed to reduce territorial and organisational inequalities into instruments that quietly amplify them.

When AI sits on both sides

A further challenge is emerging as AI enters both sides of the funding process. Applicants increasingly use generative AI to structure, polish and optimise their proposals, while funders explore AI tools to screen, summarise and compare them. Evaluation may then become a dialogue between machine-optimised applications and machine-assisted review, leaving humans to check outputs that algorithmic language has already shaped. If evaluation begins to reward the textual optimisation of applications rather than the quality of the projects themselves, automation may end up reinforcing the conformity bias noted above instead of improving the assessment process.

Why innovation judgment cannot be reduced to pattern recognition

Innovation evaluation is inherently forward-looking as it requires judgement about what is new, what is credible, what is feasible and what could generate public value. This is difficult for systems trained on past patterns to get right.

AI models are powerful pattern recognisers, but breakthrough innovation is often valuable precisely because it does not fully resemble the past. A system calibrated on historical decisions may reproduce the types of proposals that were previously funded, rather than identify the projects that should be supported under current policy objectives. And even where large language models show some capacity to produce ideas judged as novel, this concerns the generation of ideas rather than the comparative, policy-bound assessment of merit that evaluation requires (Si, Yang & Hashimoto, 2025).

This is especially important across EU instruments, from Horizon Europe and the European Innovation Council to national Recovery and Resilience Plans and ERDF-funded cohesion policy programmes. Each has a different understanding of merit. In Horizon Europe, the challenge is scientific excellence, impact and implementation quality; in the EIC, it is breakthrough potential, market creation and entrepreneurial capacity. In Recovery and Resilience Plans, it is systemic impact, milestones, reform logic and structural transformation; and in ERDF-funded programmes, it is additionality, territorial development, regional transformation and contribution to cohesion.

AI can help read, organise and test information across these instruments, but the core of the evaluation should remain human because the core of the evaluation is a policy judgement.

Governing the boundary between support and delegation

The relevant policy question is institutional rather than technological: under what conditions does AI use stay compatible with legality, fairness, accountability and expert judgement?

The central governance question is therefore not whether a human remains somewhere in the loop, but where decision rights are located: who interprets the evidence, who weighs trade-offs, who signs the judgement and who remains accountable when the decision is challenged. In this sense, oversight only works when responsibility is clearly allocated. Once it shrinks to a formality, it offers little real protection.

Between administrative support and final decision-making lies a set of intermediate uses (triage, risk flagging, benchmarking and draft synthesis) whose legitimacy depends less on the technology itself than on whether the human reviewer can understand, challenge and override the output. A defensible model for AI-assisted evaluation should therefore distinguish between three levels of use:

Administrative AI can be used for formal eligibility and document validation. This includes checking completeness, identifying missing fields, supporting consistency checks and helping reviewers process large volumes of information. This is the lowest-risk and highest-value use case.
Analytical AI can be used as a non-binding second reader. It may flag inconsistencies, summarise technical content, compare claims against available evidence, identify risks, benchmark information and support reviewers in preparing their analysis. This use is acceptable when outputs are transparent, auditable and subject to human supervision.
Evaluative AI should be treated with extreme caution. This includes scoring, ranking, excluding, selecting or producing final autonomous justifications. In the current evidential, ethical and regulatory context, these functions should remain limited or prohibited in public innovation funding.

A practical governance architecture should therefore include: closed or certified processing environments; guarantees that application content is neither retained nor used for model training; systematic logging of where AI has been used; human review of exceptions; auditing of divergences between AI outputs and expert judgement; monitoring of false negatives; and regular assessment of distributive effects across territories and types of applicants.

Within this architecture, confidentiality and model learning must be treated as distinct concerns. For publicly funded programmes the relevant question is not simply whether a generative tool is used, but how application data is handled: whether processing takes place in a closed or certified environment, whether content is retained, whether it is used to train or fine-tune the underlying model, what is logged, whether data crosses borders, and who controls access. The updated ERA Living Guidelines make this concern explicit, drawing attention to the risks posed by third parties and by instructions embedded in documents that may shape AI outputs without the user’s knowledge.

What international experience tells us

The cases reviewed do not point to a single institutional model, but they do suggest a common boundary: AI is being adopted around the evaluation process rather than as the evaluator itself. Public funding organisations that have adopted AI typically use it for:

Document processing
Workflow triage / routing
Reviewer matching
Consistency checks
Portfolio analysis

By contrast, in none of the major cases reviewed is there robust evidence of final merit assessment being fully delegated to AI systems. Experiences from Portugal, Spain, Switzerland and Norway all point in the same direction: AI creates value when supporting administrative and analytical processes, but the final assessment of innovation remains a human responsibility.

Emerging public-funder practice points the same way. The Research Council of Finland, for instance, allows applicants to use generative AI when preparing their proposals, but prohibits staff, reviewers and decision-makers from entering application or review information into generative AI tools, and states that its own machine-learning and natural-language-processing tools support analysis and optimisation but do not review applications or make funding decisions (Research Council of Finland, n.d.).

Use Case	AI Suitable	Human oversight
Document processing	✔	—
Workflow triage / routing	✔	✔ light oversight
Exclusion-oriented triage	⚠	✔ mandatory review
Reviewer matching	✔	—
Consistency checks	✔	—
Portfolio analysis	✔	—
Final merit assessment	✘	✔ human-only decision

The regulatory context: high-risk or high-impact?

The EU AI Act does not settle the specific governance question of AI-assisted funding evaluation (Regulation (EU) 2024/1689). Its risk-based approach, however, and its concern with systems that may affect access to important opportunities, support a precautionary interpretation for public innovation funding. Even where a specific AI system used in funding evaluation is not formally classified as “high-risk”, it may still be high-impact in administrative and policy terms.

The distinction matters because a tool used in public innovation funding may affect much more than the internal management of an evaluation procedure. It may influence which firms or universities obtain support, which projects are allowed to progress, and how public investment is distributed across regions and markets. Public funders should therefore approach such systems with caution from the outset, ensuring that risk assessment, human oversight, transparency, contestability and proportionality are not added after deployment, but built into the basic design of the system.

In this context, contestability requires more than a general statement that AI was used somewhere in the process. Where AI has played a role in the analysis of an application, the applicant should be able to understand the nature of that role, even if access to the model itself or to internal prompts is neither possible nor appropriate. This requirement is closely connected to familiar principles of public administration, including the existence of an audit trail, the duty to give reasons and the principle of good administration. A decision cannot be meaningfully challenged if its informational basis is opaque. Nor can a funding body demonstrate the legality of an award if it is unable to explain how the relevant assessment was produced.

For funding schemes that may later be subject to audit or judicial review, administrative accountability depends on documenting the use of AI with sufficient granularity. Records should indicate the function for which the tool was used, the part of the application or supporting document that was analysed, the version of the tool, the human reviewer responsible, and whether the AI output was accepted, amended or rejected. Such recording is not simply a technical safeguard; it is part of what makes the decision administratively accountable.

Recent European guidance points in the same direction, although it does so through different institutional vocabularies. The European Research Council has stressed that evaluation tasks must not be delegated to AI systems and that confidentiality must be protected. Similarly, the ERA Living Guidelines on the responsible use of generative AI in research underline the need for responsible, transparent and accountable use. Together, these materials suggest that the question is no longer whether AI may assist administrative and evaluative work, but under what conditions its use can remain compatible with legality, fairness and public trust.

For a managing authority, these concerns come down to five practical questions.

The five tests for legitimate AI-assisted evaluation
Purpose test — what precise function is the AI system performing?
Delegation test — does it inform judgement, or determine the outcome?
Traceability test — can the use of AI be reconstructed afterwards?
Contestability test — can an applicant understand and challenge the informational basis of the decision?
Distributional test — does it affect territories, sectors or types of applicant unequally?

A hybrid future, but not an automated one

AI will become a normal part of public administration and funding management. The question is whether it will make evaluation systems more robust, transparent and equitable, or simply faster. The evidence base remains uneven: operational gains are increasingly documented, but there is still limited evidence on distributive effects, applicant behaviour, appeal outcomes and the long-term influence of AI-supported evaluation on the composition of funded portfolios.

In public innovation funding, the value of faster evaluation must be weighed against the conditions that make evaluation legitimate in the first place: the quality of judgement, the ability to explain decisions, and confidence that public resources are being allocated in accordance with policy objectives rather than the textual optimisation of applications.

The most promising path is hybrid: AI as supporting infrastructure, not a stand-in for the evaluator. Used well, it can ease administrative overload, improve consistency and free reviewer time for the parts of the process where human judgement actually matters. But the final responsibility for merit, directionality and public value must remain human.

Bibliography

Aczel, B., Szaszi, B., & Holcombe, A. O. (2021). A billion-dollar donation: estimating the cost of researchers’ time spent on peer review. Research Integrity and Peer Review, 6, 14.

Al-Ibrahim, H. (2024). The potential for artificial intelligence assistance in funding research. RAND Corporation / Pardee RAND Graduate School.

Carbonell Cortés, C., Parra-Rojas, C., Pérez-Lozano, A., Arcara, F., Vargas-Sánchez, S., Fernández-Montenegro, R., Casado-Marín, D., Rondelli, B., & López-Verdeguer, I. (2024). AI-assisted prescreening of biomedical research proposals: ethical considerations and the pilot case of ‘la Caixa’ Foundation. Data & Policy, 6, e49. doi:10.1017/dap.2024.41

European Commission — Directorate-General for Research and Innovation. (2026). Living guidelines on the responsible use of generative AI in research. ERA Forum.

European Research Council — Scientific Council. (2026). Guidelines on the use of artificial intelligence in the evaluation of grant proposals. ERC.

Marques, J. D. S., Duarte, A. V., Carvalho, A., Rocha, G., Martins, B., & Oliveira, A. L. (2025). Leveraging LLMs to streamline the review of public funding applications. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, 2041–2060. doi:10.18653/v1/2025.emnlp-industry.143

Nagarajappa, C. G., Koley, M., Kumar, A., Panigrahy, R., & Arya, P. K. (2026). Recall, risk, and governance in automated proposal screening for research funding: Evidence from a national funding programme. arXiv:2602.07869.

Newman-Griffis, D., Buckley Woods, H., Wu, Y., Thelwall, M., & Holm, J. (2025). Funding by algorithm: A handbook for responsible uses of AI and machine learning by research funders. Research on Research Institute.

Okasa, G., & Jorstad, A. (2024). The Value of Pre-training for Scientific Text Similarity: Evidence from Matching Grant Proposals to Reviewers. Proceedings of the 9th Swiss Text Analytics Conference, 89–101.

Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). OJ L, 2024/1689, 12.7.2024.

Research Council of Finland. (n.d.). Artificial intelligence. A–Z index of application guidelines. Research Council of Finland. Accessed 21 June 2026.

Shcherbiak, A., Habibnia, H., Böhm, R., & Fiedler, S. (2024). Evaluating science: A comparison of human and AI reviewers. Judgment and Decision Making, 19, e21. doi:10.1017/jdm.2024.24

Si, C., Yang, D., & Hashimoto, T. (2025). Can LLMs generate novel research ideas? A large-scale human study with 100+ NLP researchers. International Conference on Learning Representations (ICLR 2025). arXiv:2409.04109.

Alexandre Almeida
Managing Partner

Governing AI-assisted evaluation in public innovation funding: where efficiency ends and judgement begins