Python Basics

Python syntax

Variable assignment and naming convention

You can assign value to a variable. In this example, we assign the value 'hello world' to the variable x.

x='hello world'
print("The variable has the value:", x)

You can overwrite the value of a variable if you assign another value to the same variable

x=3
print(x)

Warning

Be careful when you assign different values to the same variable as in Jupyter Notebook you can mix up the assignment if you don't execute cells in the right order.

To assign a variable we need to find a variable name that should describe the data we will store. A variable can have a short name (like x and y) or a more descriptive name (age, carname, total_volume).

Rules for Python variables:

A variable name must start with a letter or the underscore character
A variable name cannot start with a number
A variable name can only contain alpha-numeric characters and underscores (A-Z, 0-9, and _ )
A variable name is case-sensitive (age, Age, and AGE are three different variables)
A variable name cannot be any of the Python keywords.

Note

As a naming convention, you can either use mixed case naming (camel case) or lowercase with words separated by underscores(snake case).

camelCase = 'mixed case'
snake_case = 'words separated by underscore'

Python data type

Boolean

Boolean type may have one of two values True or False. It aims to evaluate an expression through those values.

t_boolean = True
print(type(t_boolean))
compute_boolean = 1 + 1 == 1
print(compute_boolean)

Integers

Integers are whole numbers on which you can perform arithmetic operations. It can either be positive or negative without length restrictions.

t_integer = 12
print(type(t_integer))
compute_integer =40 + 6 - 2
print(compute_integer)

Float

Float, or "floating point number" is a number, positive or negative, containing one or more decimals.

t_float = 12.58
print(type(t_float))
compute_float = 1.5 + 10.7 + 4.9
print(compute_float)

String

Strings in Python are arrays of bytes representing Unicode characters. It is used to store 'text' and can be accessed through their position (It works more or less like a list of characters). It is possible to use arithmetics operation on string with a behavioral change compared to integers.

t_string = 'Hello'
print(type(t_string))
compute_string= '400' + '4' + ' Error'
print(compute_string)

Note

Strings in Python are surrounded by either single quotation marks or double quotation marks. 'hello' is the same as "hello". You can also assign a multiline string to a variable using three quotes before and after.

single_quote = 'Hello'
doulbe_quote = "Hello"
multilign_string = '''This is
a multiline string
If you need to store
long arrays of characters.
'''

Python data structures

List

Python Lists are a way to store multi-values within a variable. It is very flexible as the items in a list do not need to be of the same type. We can access list values and we can delete/append new values easily.

t_list=['Chapter', 1, 'This', 'Is', 'a', 'list', 'of', 'strings', '.']
print(t_list)

Dictionary

Dictionaries are unordered collections of data values, used to store data values like a map, it holds the Key:Value pair. We can store different types of data and store nested data to be accessed at different levels.

t_dictionary = {'Peugeot': 306, "Mercedes-Benz": "C220", "Audi": "A4"}
print(t_dictionary)

You can store complex data within a dictionary with multiple levels of data.

Example of a nested dictionary

t_nested_dict = [

                    {
                        "state": "Florida",
                        "shortname":"FL",
                        "Info": {"governor": "Rick Scott"},
                        "Counties": [ 
                            {"name": "Dade", "population": 12345}, 
                            {"name": "Broward", "population": 40000},
                            {"name": "Palm Beach", "population": 60000},
                        ],
                    },
                    {
                        "state": "Ohio",
                        "shortname": "OH",
                        "info": {"governor":"John Kasich"},
                        "counties": [
                            {"name": "Summit", "population": 1234}, 
                            {"name": "Cuyahoga", "population": 1337},
                        ],
                    },
                ]
print(t_nested_dict)

Tuple

Python Tuple is a collection of Python objects much like a list but Tuples are immutable. The elements in the tuple cannot be added or removed once created, like a List, a Tuple can also contain elements of various types.

t_tuple = ('This', 'is', 'a', 'tuple.') 
print(t_tuple)

DataFrame

A Dataframe is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

import pandas as pd
data = {
    "City": ['Paris', 'Lyon', 'Bordeaux'],
    "Population": [2161000, 513275, 249712]
}
t_dataframe = pd.DataFrame(data) 
print(t_dataframe)

Data manipulation

For loop

A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).

cartes = ['Infinite', 'Platinium', 'Gold', 'Classic']
for carte in cartes:
    print(carte)

While

With the while loop, we can execute a set of statements as long as a condition is true.

montant = 0
plafond = 100
while montant < plafond:
    print(montant) 
    montant += 10

Python supports the usual logical conditions from mathematics:

Equals is a == b
Not Equals is a != b
Less than is a < b
Less than or equal to is a <= b
Greater than: a > b
Greater than or equal to is a >= b These conditions can be used in several ways, most commonly in "if statements" and loops. An "If statement" is used with the "if" function.

montant_compte = 50 
if montant_compte < -500:
    print("Compte en contentieux")
elif montant_compte < 0:
    print("Compte débiteur")
else: 
    print("Compte créditeur")

Function

A function is a block of code which only runs when it is called. You can pass data, known as parameters, into a function. A function can return data as a result. Information can be passed into function through arguments. Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with a coma.

def check_account_statut (montant_compte):
    if montant_compte < -500: 
        statut='Compte en contentieux'
    elif montant_compte < 0:
        statut="Compte débiteur"
    else:
        statut='Compte créditeur'
    return statut

Example - Generate and merge multiple files

Import Python packages

import pandas as pd
import os
import glob

Nasdaq data exampleCAC40 data example

Create a dictionary of Nasdaq top companies data

nasdaq_top_companies = [
    {
        "Ticker": "AAPL",
        "Company": "Apple Inc.",
        "Industry": "Technology",
        "Market Cap": "Trillion Dollars",
        "Share_Price": 150.00,
        "Description": "Share price for Apple Inc. in September 2021"
    },
    {
        "Ticker": "MSFT",
        "Company": "Microsoft Corporation",
        "Industry": "Technology",
        "Market Cap": "Trillion Dollars",
        "Share_Price": 300.00,
        "Description": "Share price for Microsoft Corporation in September 2021"
    },
    {
        "Ticker": "AMZN",
        "Company": "Amazon.com Inc.",
        "Industry": "E-commerce",
        "Market Cap": "Trillion Dollars",
        "Share_Price": 3400.00,
        "Description": "Share price for Amazon.com Inc. in September 2021"
    },
    {
        "Ticker": "GOOGL",
        "Company": "Alphabet Inc. (Google)",
        "Industry": "Technology",
        "Market Cap": "Trillion Dollars",
        "Share_Price": 2700.00,
        "Description": "Share price for Alphabet Inc. in September 2021"
    },
    {
        "Ticker": "FB",
        "Company": "Meta Platforms, Inc. (Facebook)",
        "Industry": "Social Media",
        "Market Cap": "Trillion Dollars",
        "Share_Price": 330.00,
        "Description": "Share price for Meta Platforms, Inc. in September 2021"
    }
]
# Load dict to DataFrame
df_nasdaq=pd.DataFrame(nasdaq_top_companies)

Create a dictionnary of CAC40 top companies data

cac40_data = [
    {
        "Ticker": "AC",
        "Company": "Accor",
        "Share_Price": 35.60
    },
    {
        "Ticker": "AI",
        "Company": "Air Liquide",
        "Share_Price": 147.80
    },
    {
        "Ticker": "MT",
        "Company": "ArcelorMittal",
        "Share_Price": 28.90
    },
    {
        "Ticker": "BNP",
        "Company": "BNP Paribas",
        "Share_Price": 53.20
    },
    {
        "Ticker": "CAP",
        "Company": "Capgemini",
        "Share_Price": 140.45
    },
]
# Load dict to DataFrame
cac40_data=pd.DataFrame(cac40_data)

Create the folder Market_Data within the repository

os.mkdir(r'./MARKET_DATA/')

Create the function CreateCsvFile

def CreateCsvFile(df, PrefixName: str):
    '''
    The function CreateCsvFile enables to create multiple csv files by tickers 
    from a dataframe and a prefix as the filename.
    '''
    for ticker in df['Ticker']:
        pathName = r'./MARKET_DATA/'+ str(PrefixName) +str(ticker)+'.csv'
        df[df['Ticker']==ticker].to_csv(pathName, index=False)
    return None

Use the function CreateCsvFile with nasdaq and cac40 DataFrames

CreateCsvFile(df_nasdaq, 'NASDAQ_')
CreateCsvFile(cac40_data, 'CAC40_')

Import and concat files with the same name structure

# Assign the folder to parse
pathFolder = r'.\MARKET_DATA'
# Parse the folder
walkFolder = [path[0] for path in os.walk (pathFolder)]
# Initialize list
listOfFile = []
# Loop on the folder
for folder in walkFolder:
    # Assign folder path
    folderName = os.path.basename(folder)
    # Assign folder path joined to a file type searched
    files = os.path.join(folder, 'NASDAQ_*.csv')
    # Store the file path matching the file type searched
    listOfFile = glob.glob(files)

Parse a list of file and concatenate the data

# Initialize list
full_list = []
# Loop on all files
for file in listOfFile:
    # Import data to a Dataframe
    full_file = pd.read_csv(file, dtype = str).reset_index(drop = True)
    # Assign the file path to the file_name columns
    full_file['file_name']=os.path.basename(file)
    # Append the DataFrame to the list initialized
    full_list.append(full_file)
# Merge the list of DataFrames to a single DataFrame
fileConcat = pd.concat(full_list)