I met up with Sean Oxspring this evening to discuss the mechanics of my game level, and anything that will need re-evaluating. Sean is a recent graduate of the university of Lincoln in Games Computing, and I've worked with him in the last year on a couple of small video games projects in the past year, including my 2nd year final project. In this year's project he will be assisting me with the programming aspect of the game level.
So what is achievable?
My introduction scene as it stands, is doable. In fact it will even work within the games engine as opposed to having a video file that plays, and then going into the game (this is what I originally thought I might have to do). What I would do is create some 'speed bump' objects in the scene on my road surface, and then create two transparent colliders to kept the carriage from going off course. The camera is placed inside the carriage, which gives the option to also allow the player to look around the carriage interior as well as out the window; or it can be fixed to only show the window view.

 
The next aspect of the programming is the random location generation for the key which must be found. Apparently this is quite simple to achieve. I will create 7 empty 'objects' in the level and place them in each of the locations I have planned. I can then create a condition that causes my key model object to appear in any one of these empty objects, each time the game starts. The engine will generate a number from 1 to 7 and this will determine the origin of the key object.    
Now for the bad news, there is an issue with my idea of having the hands on screen actually interacting with objects in the scene. Apparently this is quite difficult to achieve, or at least, I would require a much longer period of production as well as needing a small team of programmers to work out the interactions.
The alternative I am considering, which shouldn't demand as much programming work, is a mechanic like that in The Chinese Room/Frictional Games' "Amnesia: A Machine for Pigs". The UI consists only of a very small dot in the center of the screen that indicates your line of view. When the player's view lines up with an interactive object, a hand icon appears over it, letting you know that you can interact with this particular object. As you click and drag, the animation plays and in some cases will trigger another action.
I would consider having a similar system in my own project, where the player will have a line of site that will indicate to them which objects are interactive, and when they click the object's animation will play.