17 Oct 2018

feedPlanet Python

The No Title® Tech Blog: Haiku R1/beta1 review - revisiting BeOS, 18 years after its latest official release

Having experimented and used BeOS R5 Pro back in the early 2000's, when the company that created it was just going down, I have been following with some interest the development of Haiku during all these years. While one can argue that both the old BeOS and Haiku miss some important features to be considered modern OSes these days, the fact is that a lightweight operating system can always be, for instance, an excellent way to bring new life into old, or new but less powerfull, hardware.

17 Oct 2018 7:30pm GMT

Stack Abuse: Python Dictionary Tutorial

Introduction

Python comes with a variety of built-in data structures, capable of storing different types of data. A Python dictionary is one such data structure that can store data in the form of key-value pairs. The values in a Python dictionary can be accessed using the keys. In this article, we will be discussing the Python dictionary in detail.

Creating a Dictionary

To create a Python dictionary, we need to pass a sequence of items inside curly braces {}, and separate them using a comma (,). Each item has a key and a value expressed as a "key:value" pair.

The values can belong to any data type and they can repeat, but the keys must remain unique.

The following examples demonstrate how to create Python dictionaries:

Creating an empty dictionary:

dict_sample = {}  

Creating a dictionary with integer keys:

dict_sample = {1: 'mango', 2: 'pawpaw'}  

Creating a dictionary with mixed keys:

dict_sample = {'fruit': 'mango', 1: [4, 6, 8]}  

We can also create a dictionary by explicitly calling the Python's dict() method:

dict_sample = dict({1:'mango', 2:'pawpaw'})  

A dictionary can also be created from a sequence as shown below:

dict_sample = dict([(1,'mango'), (2,'pawpaw')])  

Dictionaries can also be nested, which means that we can create a dictionary inside another dictionary. For example:

dict_sample = {1: {'student1' : 'Nicholas', 'student2' : 'John', 'student3' : 'Mercy'},  
        2: {'course1' : 'Computer Science', 'course2' : 'Mathematics', 'course3' : 'Accounting'}}

To print the dictionary contents, we can use the Python's print() function and pass the dictionary name as the argument to the function. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
print(dict_sample)  

Output:

{'Company': 'Toyota', 'model': 'Premio', 'year': 2012}

Accessing Elements

To access dictionary items, pass the key inside square brackets []. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
x = dict_sample["model"]  
print(x)  

Output:

Premio  

We created a dictionary named dict_sample. A variable named x was then created and its value is set to be the value for the key "model" in the dictionary.

Here is another example:

dict = {'Name': 'Mercy', 'Age': 23, 'Course': 'Accounting'}  
print("Student Name:", dict['Name'])  
print("Course:", dict['Course'])  
print("Age:", dict['Age'])  

Output:

Student Name: Mercy  
Course: Accounting  
Age: 23  

The dictionary object also provides the get() function, which can be used to access dictionary elements as well. We append the function with the dictionary name using the dot operator and then pass the name of the key as the argument to the function. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
x = dict_sample.get("model")  
print(x)  

Output:

Premio  

Now we know how to access dictionary elements using a few different methods. In the next section we'll discuss how to add new elements to an already existing dictionary.

Adding Elements

There are numerous ways to add new elements to a dictionary. We can use a new index key and assign a value to it. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
dict_sample["Capacity"] = "1800CC"  
print(dict_sample)  

Output:

{'Capacity': '1800CC', 'year': 2012, 'Company': 'Toyota', 'model': 'Premio'}

The new element has "Capacity" as the key and "1800CC" as its corresponding value. It has been added as the first element of the dictionary.

Here is another example. First let's first create an empty dictionary:

MyDictionary = {}  
print("An Empty Dictionary: ")  
print(MyDictionary)  

Output:

An Empty Dictionary:  

The dictionary returns nothing as it has nothing stored yet. Let us add some elements to it, one at a time:

MyDictionary[0] = 'Apples'  
MyDictionary[2] = 'Mangoes'  
MyDictionary[3] = 20  
print("\n3 elements have been added: ")  
print(MyDictionary)  

Output:

3 elements have been added:  
{0: 'Apples', 2: 'Mangoes', 3: 20}

To add the elements, we specified keys as well as the corresponding values. For example:

MyDictionary[0] = 'Apples'  

In the above example, 0 is the key while "Apples" is the value.

It is even possible for us to add a set of values to one key. For example:

MyDictionary['Values'] = 1, "Pairs", 4  
print("\n3 elements have been added: ")  
print(MyDictionary)  

Output:

3 elements have been added:  
{'Values': (1, 'Pairs', 4)}

In the above example, the name of the key is "Values" while everything after the = sign are the actual values for that key, stored as a Set.

Other than adding new elements to a dictionary, dictionary elements can also be updated/changed, which we'll go over in the next section.

Updating Elements

After adding a value to a dictionary we can then modify the existing dictionary element. You use the key of the element to change the corresponding value. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}

dict_sample["year"] = 2014

print(dict_sample)  

Output:

{'year': 2014, 'model': 'Premio', 'Company': 'Toyota'}

In this example you can see that we have updated the value for the key "year" from the old value of 2012 to a new value of 2014.

Removing Elements

The removal of an element from a dictionary can be done in several ways, which we'll discuss one-by-one in this section:

The del keyword can be used to remove the element with the specified key. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
del dict_sample["year"]  
print(dict_sample)  

Output:

{'Company': 'Toyota', 'model': 'Premio'}

We called the del keyword followed by the dictionary name. Inside the square brackets that follow the dictionary name, we passed the key of the element we need to delete from the dictionary, which in this example was "year". The entry for "year" in the dictionary was then deleted.

Another way to delete a key-value pair is to use the pop() function and pass the key of the entry to be deleted as the argument to the function. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
dict_sample.pop("year")  
print(dict_sample)  

Output:

{'Company': 'Toyota', 'model': 'Premio'}

We invoked the pop() function by appending it with the dictionary name. Again, in this example the entry for "year" in the dictionary will be deleted.

The popitem() function removes the last item inserted into the dictionary, without needing to specify the key. Take a look at the following example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
dict_sample.popitem()  
print(dict_sample)  

Output:

{'Company': 'Toyota', 'model': 'Premio'}

The last entry into the dictionary was "year". It has been removed after calling the popitem() function.

But what if you want to delete the entire dictionary? It would be difficult and cumbersome to use one of these methods on every single key. Instead, you can use the del keyword to delete the entire dictionary. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
del dict_sample  
print(dict_sample)  

Output:

NameError: name 'dict_sample' is not defined  

The code returns an error. The reason is that we are trying to access a dictionary which doesn't exist since it has been deleted.

However, your use-case may require you to just remove all dictionary elements and be left with an empty dictionary. This can be achieved by calling the clear() function on the dictionary:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
dict_sample.clear()  
print(dict_sample)  

Output:

{}

The code returns an empty dictionary since all the dictionary elements have been removed.

Other Common Methods

The len() Method

With this method, you can count the number of elements in a dictionary. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
print(len(dict_sample))  

Output:

3  

There are three entries in the dictionary, hence the method returned 3.

The copy() Method

This method returns a copy of the existing dictionary. For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}
x = dict_sample.copy()

print(x)  

Output:

{'Company': 'Toyota', 'year': 2012, 'model': 'Premio'}

We created a copy of dictionary named dict_sample and assigned it to the variable x. If x is printed on the console, you will see that it contains the same elements as those stored by dict_sample dictionary.

Note that this is useful because modifications made to the copied dictionary won't affect the original one.

The items() Method

When called, this method returns an iterable object. The iterable object has key-value pairs for the dictionary, as tuples in a list. This method is primarily used when you want to iterate through a dictionary.

The method is simply called on the dictionary object name as shown below:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}

for k, v in dict_sample.items():  
  print(k, v)

Output:

('Company', 'Toyota')
('model', 'Premio')
('year', 2012)

The object returned by items() can also be used to show the changes that have been implemented on the dictionary. This is demonstrated below:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}

x = dict_sample.items()

print(x)

dict_sample["model"] = "Mark X"

print(x)  

Output:

dict_items([('Company', 'Toyota'), ('model', 'Premio'), ('year', 2012)])  
dict_items([('Company', 'Toyota'), ('model', 'Mark X'), ('year', 2012)])  

The output shows that when you change a value in the dictionary, the items object is also updated to reflect this change.

The fromkeys() Method

This method returns a dictionary having specified keys and values. It takes the syntax given below:

dictionary.fromkeys(keys, value)  

The value for required keys parameter is an iterable and it specifies the keys for the new dictionary. The value for value parameter is optional and it specifies the default value for all the keys. The default value for this is None.

Suppose we need to create a dictionary of three keys all with the same value. We can do so as follows:

name = ('John', 'Nicholas', 'Mercy')  
age = 25

dict_sample = dict.fromkeys(name, age)


print(dict_sample)  

Output:

{'John': 25, 'Mercy': 25, 'Nicholas': 25}

In the script above, we specified the keys and one value. The fromkeys() method was able to pick the keys and combine them with this value to create a populated dictionary.

The value for the keys parameter is mandatory. The following example demonstrates what happens when the value for the values parameter is not specified:

name = ('John', 'Nicholas', 'Mercy')

dict_sample = dict.fromkeys(name)


print(dict_sample)  

Output:

{'John': None, 'Mercy': None, 'Nicholas': None}

The default value, which is None, was used.

The setdefault() Method

This method is applicable when we need to get the value of the element with the specified key. If the key is not found, it will be inserted into the dictionary alongside the specified value.

The method takes the following syntax:

dictionary.setdefault(keyname, value)  

In this function the keyname parameter is required. It represents the keyname of the item you need to return a value from. The value parameter is optional. If the dictionary already has the key, this parameter won't have any effect. If the key doesn't exist, then the value given in this function will become the value of the key. It has a default value of None.

For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}

x = dict_sample.setdefault("color", "Gray")

print(x)  

Output

Gray  

The dictionary doesn't have the key for color. The setdefault() method has inserted this key and the specified a value, that is, "Gray", has been used as its value.

The following example demonstrates how the method behaves if the value for the key does exist:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}

x = dict_sample.setdefault("model", "Allion")

print(x)  

Output:

Premio  

The value "Allion" has no effect on the dictionary since we already have a value for the key.

The keys() Method

This method also returns an iterable object. The object returned is a list of all keys in the dictionary. And just like with the items() method, the returned object can be used to reflect the changes made to the dictionary.

To use this method, we only call it on the name of the dictionary, as shown below:

dictionary.keys()  

For example:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}

x = dict_sample.keys()

print(x)  

Output:

dict_keys(['model', 'Company', 'year'])  

Often times this method is used to iterate through each key in your dictionary, like so:

dict_sample = {  
  "Company": "Toyota",
  "model": "Premio",
  "year": 2012
}

for k in dict_sample.keys():  
  print(k)

Output:

Company  
model  
year  

Conclusion

This marks the end of this tutorial on Python dictionaries. These dictionaries store data in "key:value" pairs. The "key" acts as the identifier for the item while "value" is the value of the item. The Python dictionary comes with a variety of functions that can be applied for retrieval or manipulation of data. In this article, we saw how Python dictionary can be created, modified and deleted along with some of the most commonly used dictionary methods.

17 Oct 2018 2:15pm GMT

Real Python: Python, Boto3, and AWS S3: Demystified

Amazon Web Services (AWS) has become a leader in cloud computing. One of its core components is S3, the object storage service offered by AWS. With its impressive availability and durability, it has become the standard way to store videos, images, and data. You can combine S3 with other services to build infinitely scalable applications.

Boto3 is the name of the Python SDK for AWS. It allows you to directly create, update, and delete AWS resources from your Python scripts.

If you've had some AWS exposure before, have your own AWS account, and want to take your skills to the next level by starting to use AWS services from within your Python code, then keep reading.

By the end of this tutorial, you'll:

Before exploring Boto3's characteristics, you will first see how to configure the SDK on your machine. This step will set you up for the rest of the tutorial.

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

Installation

To install Boto3 on your computer, go to your terminal and run the following:

$ pip install boto3

You've got the SDK. But, you won't be able to use it right now, because it doesn't know which AWS account it should connect to.

To make it run against your AWS account, you'll need to provide some valid credentials. If you already have an IAM user that has full permissions to S3, you can use those user's credentials (their access key and their secret access key) without needing to create a new user. Otherwise, the easiest way to do this is to create a new AWS user and then store the new credentials.

To create a new user, go to your AWS account, then go to Services and select IAM. Then choose Users and click on Add user.

Give the user a name (for example, boto3user). Enable programmatic access. This will ensure that this user will be able to work with any AWS supported SDK or make separate API calls:

add AWS IAM user

To keep things simple, choose the preconfigured AmazonS3FullAccess policy. With this policy, the new user will be able to have full control over S3. Click on Next: Review:

aws s3 IAM user add policy

Select Create user:

aws s3 IAM user finish creation

A new screen will show you the user's generated credentials. Click on the Download .csv button to make a copy of the credentials. You will need them to complete your setup.

Now that you have your new user, create a new file, ~/.aws/credentials:

$ touch ~/.aws/credentials

Open the file and paste the structure below. Fill in the placeholders with the new user credentials you have downloaded:

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

Save the file.

Now that you have set up these credentials, you have a default profile, which will be used by Boto3 to interact with your AWS account.

There is one more configuration to set up: the default region that Boto3 should interact with. You can check out the complete table of the supported AWS regions. Choose the region that is closest to you. Copy your preferred region from the Region column. In my case, I am using eu-west-1 (Ireland).

Create a new file, ~/.aws/config:

$ touch ~/.aws/config

Add the following and replace the placeholder with the region you have copied:

[default]
region = YOUR_PREFERRED_REGION

Save your file.

You are now officially set up for the rest of the tutorial.

Next, you will see the different options Boto3 gives you to connect to S3 and other AWS services.

Client Versus Resource

At its core, all that Boto3 does is call AWS APIs on your behalf. For the majority of the AWS services, Boto3 offers two distinct ways of accessing these abstracted APIs:

You can use either to interact with S3.

To connect to the low-level client interface, you must use Boto3's client(). You then pass in the name of the service you want to connect to, in this case, s3:

import boto3
s3_client = boto3.client('s3')

To connect to the high-level interface, you'll follow a similar approach, but use resource():

import boto3
s3_resource = boto3.resource('s3')

You've successfully connected to both versions, but now you might be wondering, "Which one should I use?"

With clients, there is more programmatic work to be done. The majority of the client operations give you a dictionary response. To get the exact information that you need, you'll have to parse that dictionary yourself. With resource methods, the SDK does that work for you.

With the client, you might see some slight performance improvements. The disadvantage is that your code becomes less readable than it would be if you were using the resource. Resources offer a better abstraction, and your code will be easier to comprehend.

Understanding how the client and the resource are generated is also important when you're considering which one to choose:

Boto3 generates the client and the resource from different definitions. As a result, you may find cases in which an operation supported by the client isn't offered by the resource. Here's the interesting part: you don't need to change your code to use the client everywhere. For that operation, you can access the client directly via the resource like so: s3_resource.meta.client.

One such client operation is .generate_presigned_url(), which enables you to give your users access to an object within your bucket for a set period of time, without requiring them to have AWS credentials.

Common Operations

Now that you know about the differences between clients and resources, let's start using them to build some new S3 components.

Creating a Bucket

To start off, you need an S3 bucket. To create one programmatically, you must first choose a name for your bucket. Remember that this name must be unique throughout the whole AWS platform, as bucket names are DNS compliant. If you try to create a bucket, but another user has already claimed your desired bucket name, your code will fail. Instead of success, you will see the following error: botocore.errorfactory.BucketAlreadyExists.

You can increase your chance of success when creating your bucket by picking a random name. You can generate your own function that does that for you. In this implementation, you'll see how using the uuid module will help you achieve that. A UUID4's string representation is 36 characters long (including hyphens), and you can add a prefix to specify what each bucket is for.

Here's a way you can achieve that:

import uuid
def create_bucket_name(bucket_prefix):
    # The generated bucket name must be between 3 and 63 chars long
    return ''.join([bucket_prefix, str(uuid.uuid4())])

You've got your bucket name, but now there's one more thing you need to be aware of: unless your region is in the United States, you'll need to define the region explicitly when you are creating the bucket. Otherwise you will get an IllegalLocationConstraintException.

To exemplify what this means when you're creating your S3 bucket in a non-US region, take a look at the code below:

s3_resource.create_bucket(Bucket=YOUR_BUCKET_NAME,
                          CreateBucketConfiguration={
                              'LocationConstraint': 'eu-west-1'})

You need to provide both a bucket name and a bucket configuration where you must specify the region, which in my case is eu-west-1.

This isn't ideal. Imagine that you want to take your code and deploy it to the cloud. Your task will become increasingly more difficult because you've now hardcoded the region. You could refactor the region and transform it into an environment variable, but then you'd have one more thing to manage.

Luckily, there is a better way to get the region programatically, by taking advantage of a session object. Boto3 will create the session from your credentials. You just need to take the region and pass it to create_bucket() as its LocationConstraint configuration. Here's how to do that:

def create_bucket(bucket_prefix, s3_connection):
    session = boto3.session.Session()
    current_region = session.region_name
    bucket_name = create_bucket_name(bucket_prefix)
    bucket_response = s3_connection.create_bucket(
        Bucket=bucket_name,
        CreateBucketConfiguration={
        'LocationConstraint': current_region})
    print(bucket_name, current_region)
    return bucket_name, bucket_response

The nice part is that this code works no matter where you want to deploy it: locally/EC2/Lambda. Moreover, you don't need to hardcode your region.

As both the client and the resource create buckets in the same way, you can pass either one as the s3_connection parameter.

You'll now create two buckets. First create one using the client, which gives you back the bucket_response as a dictionary:

>>>
>>> first_bucket_name, first_response = create_bucket(
...     bucket_prefix='firstpythonbucket', 
...     s3_connection=s3_resource.meta.client)
firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304 eu-west-1

>>> first_response
{'ResponseMetadata': {'RequestId': 'E1DCFE71EDE7C1EC', 'HostId': 'r3AP32NQk9dvbHSEPIbyYADT769VQEN/+xT2BPM6HCnuCb3Z/GhR2SBP+GM7IjcxbBN7SQ+k+9B=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'r3AP32NQk9dvbHSEPIbyYADT769VQEN/+xT2BPM6HCnuCb3Z/GhR2SBP+GM7IjcxbBN7SQ+k+9B=', 'x-amz-request-id': 'E1DCFE71EDE7C1EC', 'date': 'Fri, 05 Oct 2018 15:00:00 GMT', 'location': 'http://firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304.s3.amazonaws.com/', 'content-length': '0', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'Location': 'http://firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304.s3.amazonaws.com/'}

Then create a second bucket using the resource, which gives you back a Bucket instance as the bucket_response:

>>>
>>> second_bucket_name, second_response = create_bucket(
...     bucket_prefix='secondpythonbucket', s3_connection=s3_resource)
secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644 eu-west-1

>>> second_response
s3.Bucket(name='secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644')

You've got your buckets. Next, you'll want to start adding some files to them.

Naming Your Files

You can name your objects by using standard file naming conventions. You can use any valid name. In this article, you'll look at a more specific case that helps you understand how S3 works under the hood.

If you're planning on hosting a large number of files in your S3 bucket, there's something you should keep in mind. If all your file names have a deterministic prefix that gets repeated for every file, such as a timestamp format like "YYYY-MM-DDThh:mm:ss", then you will soon find that you're running into performance issues when you're trying to interact with your bucket.

This will happen because S3 takes the prefix of the file and maps it onto a partition. The more files you add, the more will be assigned to the same partition, and that partition will be very heavy and less responsive.

What can you do to keep that from happening?

The easiest solution is to randomize the file name. You can imagine many different implementations, but in this case, you'll use the trusted uuid module to help with that. To make the file names easier to read for this tutorial, you'll be taking the first six characters of the generated number's hex representation and concatenate it with your base file name.

The helper function below allows you to pass in the number of bytes you want the file to have, the file name, and a sample content for the file to be repeated to make up the desired file size:

def create_temp_file(size, file_name, file_content):
    random_file_name = ''.join([str(uuid.uuid4().hex[:6]), file_name])
    with open(random_file_name, 'w') as f:
        f.write(str(file_content) * size)
    return random_file_name

Create your first file, which you'll be using shortly:

first_file_name = create_temp_file(300, 'firstfile.txt', 'f')   

By adding randomness to your file names, you can efficiently distribute your data within your S3 bucket.

Creating Bucket and Object Instances

The next step after creating your file is to see how to integrate it into your S3 workflow.

This is where the resource's classes play an important role, as these abstractions make it easy to work with S3.

By using the resource, you have access to the high-level classes (Bucket and Object). This is how you can create one of each:

first_bucket = s3_resource.Bucket(name=first_bucket_name)
first_object = s3_resource.Object(
    bucket_name=first_bucket_name, key=first_file_name)

The reason you have not seen any errors with creating the first_object variable is that Boto3 doesn't make calls to AWS to create the reference. The bucket_name and the key are called identifiers, and they are the necessary parameters to create an Object. Any other attribute of an Object, such as its size, is lazily loaded. This means that for Boto3 to get the requested attributes, it has to make calls to AWS.

Understanding Sub-resources

Bucket and Object are sub-resources of one another. Sub-resources are methods that create a new instance of a child resource. The parent's identifiers get passed to the child resource.

If you have a Bucket variable, you can create an Object directly:

first_object_again = first_bucket.Object(first_file_name)

Or if you have an Object variable, then you can get the Bucket:

first_bucket_again = first_object.Bucket()

Great, you now understand how to generate a Bucket and an Object. Next, you'll get to upload your newly generated file to S3 using these constructs.

Uploading a File

There are three ways you can upload a file:

In each case, you have to provide the Filename, which is the path of the file you want to upload. You'll now explore the three alternatives. Feel free to pick whichever you like most to upload the first_file_name to S3.

Object Instance Version

You can upload using an Object instance:

s3_resource.Object(first_bucket_name, first_file_name).upload_file(
    Filename=first_file_name)

Or you can use the first_object instance:

first_object.upload_file(first_file_name)

Bucket Instance Version

Here's how you can upload using a Bucket instance:

s3_resource.Bucket(first_bucket_name).upload_file(
    Filename=first_file_name, Key=first_file_name)

Client Version

You can also upload using the client:

s3_resource.meta.client.upload_file(
    Filename=first_file_name, Bucket=first_bucket_name,
    Key=first_file_name)

You have successfully uploaded your file to S3 using one of the three available methods. In the upcoming sections, you'll mainly work with the Object class, as the operations are very similar between the client and the Bucket versions.

Downloading a File

To download a file from S3 locally, you'll follow similar steps as you did when uploading. But in this case, the Filename parameter will map to your desired local path. This time, it will download the file to the tmp directory:

s3_resource.Object(first_bucket_name, first_file_name).download_file(
    f'/tmp/{first_file_name}') # Python 3.6+

You've successfully downloaded your file from S3. Next, you'll see how to copy the same file between your S3 buckets using a single API call.

Copying an Object Between Buckets

If you need to copy files from one bucket to another, Boto3 offers you that possibility. In this example, you'll copy the file from the first bucket to the second, using .copy():

def copy_to_bucket(bucket_from_name, bucket_to_name, file_name):
    copy_source = {
        'Bucket': bucket_from_name,
        'Key': file_name
    }
    s3_resource.Object(bucket_to_name, file_name).copy(copy_source)

copy_to_bucket(first_bucket_name, second_bucket_name, first_file_name)

Note: If you're aiming to replicate your S3 objects to a bucket in a different region, have a look at Cross Region Replication.

Deleting an Object

Let's delete the new file from the second bucket by calling .delete() on the equivalent Object instance:

s3_resource.Object(second_bucket_name, first_file_name).delete()

You've now seen how to use S3's core operations. You're ready to take your knowledge to the next level with more complex characteristics in the upcoming sections.

Advanced Configurations

In this section, you're going to explore more elaborate S3 features. You'll see examples of how to use them and the benefits they can bring to your applications.

ACL (Access Control Lists)

Access Control Lists (ACLs) help you manage access to your buckets and the objects within them. They are considered the legacy way of administrating permissions to S3. Why should you know about them? If you have to manage access to individual objects, then you would use an Object ACL.

By default, when you upload an object to S3, that object is private. If you want to make this object available to someone else, you can set the object's ACL to be public at creation time. Here's how you upload a new file to the bucket and make it accessible to everyone:

second_file_name = create_temp_file(400, 'secondfile.txt', 's')
second_object = s3_resource.Object(first_bucket.name, second_file_name)
second_object.upload_file(second_file_name, ExtraArgs={
                          'ACL': 'public-read'})

You can get the ObjectAcl instance from the Object, as it is one of its sub-resource classes:

second_object_acl = second_object.Acl()

To see who has access to your object, use the grants attribute:

>>>
>>> second_object_acl.grants
[{'Grantee': {'DisplayName': 'name', 'ID': '24aafdc2053d49629733ff0141fc9fede3bf77c7669e4fa2a4a861dd5678f4b5', 'Type': 'CanonicalUser'}, 'Permission': 'FULL_CONTROL'}, {'Grantee': {'Type': 'Group', 'URI': 'http://acs.amazonaws.com/groups/global/AllUsers'}, 'Permission': 'READ'}]

You can make your object private again, without needing to re-upload it:

>>>
>>> response = second_object_acl.put(ACL='private')
>>> second_object_acl.grants
[{'Grantee': {'DisplayName': 'name', 'ID': '24aafdc2053d49629733ff0141fc9fede3bf77c7669e4fa2a4a861dd5678f4b5', 'Type': 'CanonicalUser'}, 'Permission': 'FULL_CONTROL'}]

You have seen how you can use ACLs to manage access to individual objects. Next, you'll see how you can add an extra layer of security to your objects by using encryption.

Note: If you're looking to split your data into multiple categories, have a look at tags. You can grant access to the objects based on their tags.

Encryption

With S3, you can protect your data using encryption. You'll explore server-side encryption using the AES-256 algorithm where AWS manages both the encryption and the keys.

Create a new file and upload it using ServerSideEncryption:

third_file_name = create_temp_file(300, 'thirdfile.txt', 't')
third_object = s3_resource.Object(first_bucket_name, third_file_name)
third_object.upload_file(third_file_name, ExtraArgs={
                         'ServerSideEncryption': 'AES256'})

You can check the algorithm that was used to encrypt the file, in this case AES256:

>>>
>>> third_object.server_side_encryption
'AES256'

You now understand how to add an extra layer of protection to your objects using the AES-256 server-side encryption algorithm offered by AWS.

Storage

Every object that you add to your S3 bucket is associated with a storage class. All the available storage classes offer high durability. You choose how you want to store your objects based on your application's performance access requirements.

At present, you can use the following storage classes with S3:

If you want to change the storage class of an existing object, you need to recreate the object.

For example, reupload the third_object and set its storage class to Standard_IA:

third_object.upload_file(third_file_name, ExtraArgs={
                         'ServerSideEncryption': 'AES256', 
                         'StorageClass': 'STANDARD_IA'})

Note: If you make changes to your object, you might find that your local instance doesn't show them. What you need to do at that point is call .reload() to fetch the newest version of your object.

Reload the object, and you can see its new storage class:

>>>
>>> third_object.reload()
>>> third_object.storage_class
'STANDARD_IA'

Note: Use LifeCycle Configurations to transition objects through the different classes as you find the need for them. They will automatically transition these objects for you.

Versioning

You should use versioning to keep a complete record of your objects over time. It also acts as a protection mechanism against accidental deletion of your objects. When you request a versioned object, Boto3 will retrieve the latest version.

When you add a new version of an object, the storage that object takes in total is the sum of the size of its versions. So if you're storing an object of 1 GB, and you create 10 versions, then you have to pay for 10GB of storage.

Enable versioning for the first bucket. To do this, you need to use the BucketVersioning class:

def enable_bucket_versioning(bucket_name):
    bkt_versioning = s3_resource.BucketVersioning(bucket_name)
    bkt_versioning.enable()
    print(bkt_versioning.status)
>>>
>>> enable_bucket_versioning(first_bucket_name)
Enabled

Then create two new versions for the first file Object, one with the contents of the original file and one with the contents of the third file:

s3_resource.Object(first_bucket_name, first_file_name).upload_file(
   first_file_name)
s3_resource.Object(first_bucket_name, first_file_name).upload_file(
   third_file_name)

Now reupload the second file, which will create a new version:

s3_resource.Object(first_bucket_name, second_file_name).upload_file(
    second_file_name)

You can retrieve the latest available version of your objects like so:

>>>
>>> s3_resource.Object(first_bucket_name, first_file_name).version_id
'eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv'

In this section, you've seen how to work with some of the most important S3 attributes and add them to your objects. Next, you'll see how to easily traverse your buckets and objects.

Traversals

If you need to retrieve information from or apply an operation to all your S3 resources, Boto3 gives you several ways to iteratively traverse your buckets and your objects. You'll start by traversing all your created buckets.

Bucket Traversal

To traverse all the buckets in your account, you can use the resource's buckets attribute alongside .all(), which gives you the complete list of Bucket instances:

>>>
>>> for bucket in s3_resource.buckets.all():
...     print(bucket.name)
...
firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304
secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644

You can use the client to retrieve the bucket information as well, but the code is more complex, as you need to extract it from the dictionary that the client returns:

>>>
>>> for bucket_dict in s3_resource.meta.client.list_buckets().get('Buckets'):
...     print(bucket_dict['Name'])
...
firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304
secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644

You have seen how to iterate through the buckets you have in your account. In the upcoming section, you'll pick one of your buckets and iteratively view the objects it contains.

Object Traversal

If you want to list all the objects from a bucket, the following code will generate an iterator for you:

>>>
>>> for obj in first_bucket.objects.all():
...     print(obj.key)
...
127367firstfile.txt
616abesecondfile.txt
fb937cthirdfile.txt

The obj variable is an ObjectSummary. This is a lightweight representation of an Object. The summary version doesn't support all of the attributes that the Object has. If you need to access them, use the Object() sub-resource to create a new reference to the underlying stored key. Then you'll be able to extract the missing attributes:

>>>
>>> for obj in first_bucket.objects.all():
...     subsrc = obj.Object()
...     print(obj.key, obj.storage_class, obj.last_modified,
...           subsrc.version_id, subsrc.metadata)
...
127367firstfile.txt STANDARD 2018-10-05 15:09:46+00:00 eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv {}
616abesecondfile.txt STANDARD 2018-10-05 15:09:47+00:00 WIaExRLmoksJzLhN7jU5YzoJxYSu6Ey6 {}
fb937cthirdfile.txt STANDARD_IA 2018-10-05 15:09:05+00:00 null {}

You can now iteratively perform operations on your buckets and objects. You're almost done. There's one more thing you should know at this stage: how to delete all the resources you've created in this tutorial.

Deleting Buckets and Objects

To remove all the buckets and objects you have created, you must first make sure that your buckets have no objects within them.

Deleting a Non-empty Bucket

To be able to delete a bucket, you must first delete every single object within the bucket, or else the BucketNotEmpty exception will be raised. When you have a versioned bucket, you need to delete every object and all its versions.

If you find that a LifeCycle rule that will do this automatically for you isn't suitable to your needs, here's how you can programatically delete the objects:

def delete_all_objects(bucket_name):
    res = []
    bucket=s3_resource.Bucket(bucket_name)
    for obj_version in bucket.object_versions.all():
        res.append({'Key': obj_version.object_key,
                    'VersionId': obj_version.id})
    print(res)
    bucket.delete_objects(Delete={'Objects': res})

The above code works whether or not you have enabled versioning on your bucket. If you haven't, the version of the objects will be null. You can batch up to 1000 deletions in one API call, using .delete_objects() on your Bucket instance, which is more cost-effective than individually deleting each object.

Run the new function against the first bucket to remove all the versioned objects:

>>>
>>> delete_all_objects(first_bucket_name)
[{'Key': '127367firstfile.txt', 'VersionId': 'eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv'}, {'Key': '127367firstfile.txt', 'VersionId': 'UnQTaps14o3c1xdzh09Cyqg_hq4SjB53'}, {'Key': '127367firstfile.txt', 'VersionId': 'null'}, {'Key': '616abesecondfile.txt', 'VersionId': 'WIaExRLmoksJzLhN7jU5YzoJxYSu6Ey6'}, {'Key': '616abesecondfile.txt', 'VersionId': 'null'}, {'Key': 'fb937cthirdfile.txt', 'VersionId': 'null'}]

As a final test, you can upload a file to the second bucket. This bucket doesn't have versioning enabled, and thus the version will be null. Apply the same function to remove the contents:

>>>
>>> s3_resource.Object(second_bucket_name, first_file_name).upload_file(
...     first_file_name)
>>> delete_all_objects(second_bucket_name)
[{'Key': '9c8b44firstfile.txt', 'VersionId': 'null'}]

You've successfully removed all the objects from both your buckets. You're now ready to delete the buckets.

Deleting Buckets

To finish off, you'll use .delete() on your Bucket instance to remove the first bucket:

s3_resource.Bucket(first_bucket_name).delete()

If you want, you can use the client version to remove the second bucket:

s3_resource.meta.client.delete_bucket(Bucket=second_bucket_name)

Both the operations were successful because you emptied each bucket before attempting to delete it.

You've now run some of the most important operations that you can perform with S3 and Boto3. Congratulations on making it this far! As a bonus, let's explore some of the advantages of managing S3 resources with Infrastructure as Code.

Python Code or Infrastructure as Code (IaC)?

As you've seen, most of the interactions you've had with S3 in this tutorial had to do with objects. You didn't see many bucket-related operations, such as adding policies to the bucket, adding a LifeCycle rule to transition your objects through the storage classes, archive them to Glacier or delete them altogether or enforcing that all objects be encrypted by configuring Bucket Encryption.

Manually managing the state of your buckets via Boto3's clients or resources becomes increasingly difficult as your application starts adding other services and grows more complex. To monitor your infrastructure in concert with Boto3, consider using an Infrastructure as Code (IaC) tool such as CloudFormation or Terraform to manage your application's infrastructure. Either one of these tools will maintain the state of your infrastructure and inform you of the changes that you've applied.

If you decide to go down this route, keep the following in mind:

Conclusion

Congratulations on making it to the end of this tutorial!

You're now equipped to start working programmatically with S3. You now know how to create objects, upload them to S3, download their contents and change their attributes directly from your script, all while avoiding common pitfalls with Boto3.

May this tutorial be a stepping stone in your journey to building something great using AWS!

Further Reading

If you want to learn more, check out the following:


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

17 Oct 2018 2:00pm GMT

10 Nov 2011

feedPlanetJava

OSDir.com - Java: Oracle Introduces New Java Specification Requests to Evolve Java Community Process

From the Yet Another dept.:

To further its commitment to the Java Community Process (JCP), Oracle has submitted the first of two Java Specification Requests (JSRs) to update and revitalize the JCP.

10 Nov 2011 6:01am GMT

OSDir.com - Java: No copied Java code or weapons of mass destruction found in Android

From the Fact Checking dept.:

ZDNET: Sometimes the sheer wrongness of what is posted on the web leaves us speechless. Especially when it's picked up and repeated as gospel by otherwise reputable sites like Engadget. "Google copied Oracle's Java code, pasted in a new license, and shipped it," they reported this morning.



Sorry, but that just isn't true.

10 Nov 2011 6:01am GMT

OSDir.com - Java: Java SE 7 Released

From the Grande dept.:

Oracle today announced the availability of Java Platform, Standard Edition 7 (Java SE 7), the first release of the Java platform under Oracle stewardship.

10 Nov 2011 6:01am GMT

28 Oct 2011

feedPlanet Ruby

O'Reilly Ruby: MacRuby: The Definitive Guide

Ruby and Cocoa on OS X, the iPhone, and the Device That Shall Not Be Named

28 Oct 2011 8:00pm GMT

14 Oct 2011

feedPlanet Ruby

Charles Oliver Nutter: Why Clojure Doesn't Need Invokedynamic (Unless You Want It to be More Awesome)

This was originally posted as a comment on @fogus's blog post "Why Clojure doesn't need invokedynamic, but it might be nice". I figured it's worth a top-level post here.

Ok, there's some good points here and a few misguided/misinformed positions. I'll try to cover everything.

First, I need to point out a key detail of invokedynamic that may have escaped notice: any case where you must bounce through a generic piece of code to do dispatch -- regardless of how fast that bounce may be -- prevents a whole slew of optimizations from happening. This might affect Java dispatch, if there's any argument-twiddling logic shared between call sites. It would definitely affect multimethods, which are using a hand-implemented PIC. Any case where there's intervening code between the call site and the target would benefit from invokedynamic, since invokedynamic could be used to plumb that logic and let it inline straight through. This is, indeed, the primary benefit of using invokedynamic: arbitrarily complex dispatch logic folds away allowing the dispatch to optimize as if it were direct.

Your point about inference in Java dispatch is a fair one...if Clojure is able to infer all cases, then there's no need to use invokedynamic at all. But unless Clojure is able to infer all cases, then you've got this little performance time bomb just waiting to happen. Tweak some code path and obscure the inference, and kablam, you're back on a slow reflective impl. Invokedynamic would provide a measure of consistency; the only unforeseen perf impact would be when the dispatch turns out to *actually* be polymorphic, in which case even a direct call wouldn't do much better.

For multimethods, the benefit should be clear: the MM selection logic would be mostly implemented using method handles and "leaf" logic, allowing hotspot to inline it everywhere it is used. That means for small-morphic MM call sites, all targets could potentially inline too. That's impossible without invokedynamic unless you generate every MM path immediately around the eventual call.

Now, on to defs and Var lookup. Depending on the cost of Var lookup, using a SwitchPoint-based invalidation plus invokedynamic could be a big win. In Java 7u2, SwitchPoint-based invalidation is essentially free until invalidated, and as you point out that's a rare case. There would essentially be *no* cost in indirecting through a var until that var changes...and then it would settle back into no cost until it changes again. Frequently-changing vars could gracefully degrade to a PIC.

It's also dangerous to understate the impact code size has on JVM optimization. The usual recommendation on the JVM is to move code into many small methods, possibly using call-through logic as in multimethods to reuse the same logic in many places. As I've mentioned, that defeats many optimizations, so the next approach is often to hand-inline logic everywhere it's used, to let the JVM have a more optimizable view of the system. But now we're stepping on our own feet...by adding more bytecode, we're almost certainly impacting the JVM's optimization and inlining budgets.

OpenJDK (and probably the other VMs too) has various limits on how far it will go to optimize code. A large number of these limits are based on the bytecoded size of the target methods. Methods that get too big won't inline, and sometimes won't compile. Methods that inline a lot of code might not get inlined into other methods. Methods that inline one path and eat up too much budget might push out more important calls later on. The only way around this is to reduce bytecode size, which is where invokedynamic comes in.

As of OpenJDK 7u2, MethodHandle logic is not included when calculating inlining budgets. In other words, if you push all the Java dispatch logic or multimethod dispatch logic or var lookup into mostly MethodHandles, you're getting that logic *for free*. That has had a tremendous impact on JRuby performance; I had previous versions of our compiler that did indeed infer static target methods from the interpreter, but they were often *slower* than call site caching solely because the code was considerably larger. With invokedynamic, a call is a call is a call, and the intervening plumbing is not counted against you.

Now, what about negative impacts to Clojure itself...

#0 is a red herring. JRuby supports Java 5, 6, and 7 with only a few hundred lines of changes in the compiler. Basically, the compiler has abstract interfaces for doing things like constant lookup, literal loading, and dispatch that we simply reimplement to use invokedynamic (extending the old non-indy logic for non-indified paths). In order to compile our uses of invokedynamic, we use Rémi Forax's JSR-292 backport, which includes a "mock" jar with all the invokedynamic APIs stubbed out. In our release, we just leave that library out, reflectively load the invokedynamic-based compiler impls, and we're off to the races.

#1 would be fair if the Oracle Java 7u2 early-access drops did not already include the optimizations that gave JRuby those awesome numbers. The biggest of those optimizations was making SwitchPoint free, but also important are the inlining discounting and MutableCallSite improvements. The perf you see for JRuby there can apply to any indirected behavior in Clojure, with the same perf benefits as of 7u2.

For #2, to address the apparent vagueness in my blog post...the big perf gain was largely from using SwitchPoint to invalidate constants rather than pinging a global serial number. Again, indirection folds away if you can shove it into MethodHandles. And it's pretty easy to do it.

#3 is just plain FUD. Oracle has committed to making invokedynamic work well for Java too. The current thinking is that "lambda", the support for closures in Java 7, will use invokedynamic under the covers to implement "function-like" constructs. Oracle has also committed to Nashorn, a fully invokedynamic-based JavaScript implementation, which has many of the same challenges as languages like Ruby or Python. I talked with Adam Messinger at Oracle, who explained to me that Oracle chose JavaScript in part because it's so far away from Java...as I put it (and he agreed) it's going to "keep Oracle honest" about optimizing for non-Java languages. Invokedynamic is driving the future of the JVM, and Oracle knows it all too well.

As for #4...well, all good things take a little effort :) I think the effort required is far lower than you suspect, though.

14 Oct 2011 2:40pm GMT

07 Oct 2011

feedPlanet Ruby

Ruby on Rails: Rails 3.1.1 has been released!

Hi everyone,

Rails 3.1.1 has been released. This release requires at least sass-rails 3.1.4

CHANGES

ActionMailer

ActionPack

ActiveModel

ActiveRecord

ActiveResource

ActiveSupport

Railties

SHA-1

You can find an exhaustive list of changes on github. Along with the closed issues marked for v3.1.1.

Thanks to everyone!

07 Oct 2011 5:26pm GMT

21 Mar 2011

feedPlanet Perl

Perl NOC Log: Planet Perl is going dormant

Planet Perl is going dormant. This will be the last post there for a while.

image from planet.perl.org

Why? There are better ways to get your Perl blog fix these days.

You might enjoy some of the following:

Will Planet Perl awaken again in the future? It might! The universe is a big place, filled with interesting places, people and things. You never know what might happen, so keep your towel handy.

21 Mar 2011 2:04am GMT

Ricardo Signes: improving on my little wooden "miniatures"

A few years ago, I wrote about cheap wooden discs as D&D minis, and I've been using them ever since. They do a great job, and cost nearly nothing. For the most part, we've used a few for the PCs, marked with the characters' initials, and the rest for NPCs and enemies, usually marked with numbers.

With D&D 4E, we've tended to have combats with more and more varied enemies. (Minions are wonderful things.) Numbering has become insufficient. It's too hard to remember what numbers are what monster, and to keep initiative order separate from token numbers. In the past, I've colored a few tokens in with the red or green whiteboard markers, and that has been useful. So, this afternoon I found my old paints and painted six sets of five colors. (The black ones I'd already made with sharpies.)

D&D tokens: now in color

I'm not sure what I'll want next: either I'll want five more of each color or I'll want five more colors. More colors will require that I pick up some white paint, while more of those colors will only require that I re-match the secondary colors when mixing. I think I'll wait to see which I end up wanting during real combats.

These colored tokens should work together well with my previous post about using a whiteboard for combat overview. Like-type monsters will get one color, and will all get grouped to one slot on initiative. Last night, for example, the two halfling warriors were red and acted in the same initiative slot. The three halfling minions were unpainted, and acted in another, later slot. Only PCs get their own initiative.

I think that it did a good amount to speed up combat, and that's even when I totally forgot to bring the combat whiteboard (and the character sheets!) with me. Next time, we'll see how it works when it's all brought together.

21 Mar 2011 12:47am GMT

20 Mar 2011

feedPlanet Perl

Dave Cross: Perl Vogue T-Shirts

Is Plack the new Black?In Pisa I gave a lightning talk about Perl Vogue. People enjoyed it and for a while I thought that it might actually turn into a project.

I won't though. It would just take far too much effort. And, besides, a couple of people have pointed out to be that the real Vogue are rather protective of their brand.

So it's not going to happen, I'm afraid. But as a subtle reminder of the ideas behind Perl Vogue I've created some t-shirts containing the article titles from the talk. You can get them from my Spreadshirt shop.

20 Mar 2011 12:02pm GMT