TAGS :Viewed: 8 - Published at: a few seconds ago

[ EC2 run_instances: Many instances, slightly different startup scripts? ]

I'm doing an embarrassingly parallel operation on Amazon Web Services, in which I'm spinning up a large number of EC2 instance that all have slightly different scripts to run on startup. Currently, I'm starting up each instance individually within a for loop like so (I'm using the Python boto package to talk to AWS):

for parameters in parameter_list:
    #Create this instance's startup script
    user_data = startup_script%parameters 

    #Run this instance 
    reservation = ec2.run_instances(ami,
                                key_name=key_name,
                                security_groups=group_name,
                                instance_type=instance_type,
                                user_data=user_data)

However, this takes too long. ec2.run_instances allows one to start many instances at once, using the max_count keyword. I would like to create many instance simultaneously passing each their own unique startup script (user_data). Is there any way to do this? One cannot just pass a list of scripts to user_data.

One option would be to pass the same startup script, but have the script reference another peice of data associated with that instance. EC2's tag system could work, but I don't know of a way to assign tags in a similarly parallel fashion. Is there any kind of instance-specific data I can assign to a set of instances in parallel?

Answer 1


AFAIK, there is no simple solution. How about using Simple Queue Service(SQS)?

  1. Add start-up scripts (aka user-data) to SQS
  2. write user-data as
    • read a start-up script from SQS and run it

If your script is upper than 256k, you do not add it to SQS directly. So, try this procedure.

  1. Add start-up scripts (aka user-data) to S3
  2. Add the S3 url of the script to SQS
  3. write user-data as
    • read a url from SQS
    • download the script from S3
    • run it

Sorry, It's very complicated. Hope this helps.

Answer 2


Simple. Fork just before you initialize each node.

newPid = os.fork()
if newPid == 0:
    is_master = False

    # Create the instance
    ...blah blah blah...
else:
    logging.info( 'Launched host %s ...' % hostname )