With pandas you can create a one-dimensional array called Series.
These one-dimensional arrays can be labeled and can hold any data type(strings, objects, integers, etc.).
However, they are homogeneous. You cannot have a string and an integer in the same Series.
You can create a Series from a list, dictionary, array, and more.
Here is a simple Series from an array:
When we print firstSeries we get this output:
❗ It is possible to create a Series from more than one data type thanks to type coercion[1].
DataFrames
DataFrames are pandas two-dimensional arrays. They are aligned in tabular format. They have labels and are created with rows and columns.
Just like Series, DataFrames can be created from any data type.
However, unlike Series, DataFrames are heterogeneous. You can have more than one data type in a DataFrame.
Here is how to create a DataFrame from two different Series:
The output of our df looks like this:
Both DataFrames and Series have an index. By default, this index goes from 0 to the length of the array.
A singular column within any DataFrame is defined as a Series. So, if a DataFrame has 10 columns, those 10 columns are all individual Series.
We will show you different ways of how we will use these arrays to import datasets and manipulate them in the next few insights.
Footnotes
[1:Coersion]
In Python, coercion is automatic. It is when the language implicitly converts an object to a different type to avoid errors.
For instance, you can add a float (3.1) and an integer (2) without any errors.
The python interpreter figures out that one is a float and the other is an int. It converts the int into a float and then it adds them up.
As for the Series, take this example:
It would compile and run without any problems. This is because the dtype of this Series is coerced into an "object".
If we call s we would get:
Which shows its dtype is an object.