Although the voiceover have perfect timing when saved at VideoScribe, the speed changes in the downloaded version ( there is no longer a match between text an voice). What may be the reason for this?

Yes, we still haven't managed to fix this bug yet but it has now been selected for development for fixing in version 3.1.1. We are currently working on 3.1 and 3.1.1 will follow shortly after.

The bug is:

If a text element with a 0 second Animate time is immediately followed by any other element that has 0 second Animate time, any pause and transition times are ignored for the first element on the rendered video.
A workaround for now is to add 0.1 seconds to the Animate time and remove the hand of either element.

