It would only import all records or none; so if you wanted to filter, you would have to do it after the temporary collection has been created. Furthermore, to reduce drag on the network, you could restrict this specific function to queue up or a quota like 100 runs per second.
Rationale: If we can collect the entire contents of a datasource to a temporary collection, we can operate on the temporary collection instead. This would allow for all aggregate functions and other non-delegable functions to work on the temporary collection. Then from time to time, we can reload the datasource to have the most updated data in the temporary collection.
Could you create a function that would pull in 500 records at a time until all are pulled in? It would work like my solution, but it would be automated with fewer lines of code.
I am suggesting these functions as temporary solutions to not having all functions delegable.