Python Basics
Python syntax
Variable assignment and naming convention
You can assign value to a variable. In this example, we assign the value 'hello world' to the variable x.
You can overwrite the value of a variable if you assign another value to the same variable
Warning
Be careful when you assign different values to the same variable as in Jupyter Notebook you can mix up the assignment if you don't execute cells in the right order.
To assign a variable we need to find a variable name that should describe the data we will store. A variable can have a short name (like x and y) or a more descriptive name (age, carname, total_volume).
Rules for Python variables:
- A variable name must start with a letter or the underscore character
- A variable name cannot start with a number
- A variable name can only contain alpha-numeric characters and underscores (A-Z, 0-9, and _ )
- A variable name is case-sensitive (age, Age, and AGE are three different variables)
- A variable name cannot be any of the Python keywords.
Note
As a naming convention, you can either use mixed case naming (camel case) or lowercase with words separated by underscores(snake case).
Python data type
Boolean
Boolean type may have one of two values True or False. It aims to evaluate an expression through those values.
Integers
Integers are whole numbers on which you can perform arithmetic operations. It can either be positive or negative without length restrictions.
Float
Float, or "floating point number" is a number, positive or negative, containing one or more decimals.
String
Strings in Python are arrays of bytes representing Unicode characters. It is used to store 'text' and can be accessed through their position (It works more or less like a list of characters). It is possible to use arithmetics operation on string with a behavioral change compared to integers.
t_string = 'Hello'
print(type(t_string))
compute_string= '400' + '4' + ' Error'
print(compute_string)
Note
Strings in Python are surrounded by either single quotation marks or double quotation marks. 'hello' is the same as "hello". You can also assign a multiline string to a variable using three quotes before and after.
Python data structures
List
Python Lists are a way to store multi-values within a variable. It is very flexible as the items in a list do not need to be of the same type. We can access list values and we can delete/append new values easily.
Dictionary
Dictionaries are unordered collections of data values, used to store data values like a map, it holds the Key:Value pair. We can store different types of data and store nested data to be accessed at different levels.
You can store complex data within a dictionary with multiple levels of data.
t_nested_dict = [
{
"state": "Florida",
"shortname":"FL",
"Info": {"governor": "Rick Scott"},
"Counties": [
{"name": "Dade", "population": 12345},
{"name": "Broward", "population": 40000},
{"name": "Palm Beach", "population": 60000},
],
},
{
"state": "Ohio",
"shortname": "OH",
"info": {"governor":"John Kasich"},
"counties": [
{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337},
],
},
]
print(t_nested_dict)
Tuple
Python Tuple is a collection of Python objects much like a list but Tuples are immutable. The elements in the tuple cannot be added or removed once created, like a List, a Tuple can also contain elements of various types.
DataFrame
A Dataframe is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
import pandas as pd
data = {
"City": ['Paris', 'Lyon', 'Bordeaux'],
"Population": [2161000, 513275, 249712]
}
t_dataframe = pd.DataFrame(data)
print(t_dataframe)
Data manipulation
For loop
A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).
While
With the while loop, we can execute a set of statements as long as a condition is true.
Python supports the usual logical conditions from mathematics:
- Equals is a == b
- Not Equals is a != b
- Less than is a < b
- Less than or equal to is a <= b
- Greater than: a > b
- Greater than or equal to is a >= b These conditions can be used in several ways, most commonly in "if statements" and loops. An "If statement" is used with the "if" function.
montant_compte = 50
if montant_compte < -500:
print("Compte en contentieux")
elif montant_compte < 0:
print("Compte débiteur")
else:
print("Compte créditeur")
Function
A function is a block of code which only runs when it is called. You can pass data, known as parameters, into a function. A function can return data as a result. Information can be passed into function through arguments. Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with a coma.
def check_account_statut (montant_compte):
if montant_compte < -500:
statut='Compte en contentieux'
elif montant_compte < 0:
statut="Compte débiteur"
else:
statut='Compte créditeur'
return statut
Example - Generate and merge multiple files
nasdaq_top_companies = [
{
"Ticker": "AAPL",
"Company": "Apple Inc.",
"Industry": "Technology",
"Market Cap": "Trillion Dollars",
"Share_Price": 150.00,
"Description": "Share price for Apple Inc. in September 2021"
},
{
"Ticker": "MSFT",
"Company": "Microsoft Corporation",
"Industry": "Technology",
"Market Cap": "Trillion Dollars",
"Share_Price": 300.00,
"Description": "Share price for Microsoft Corporation in September 2021"
},
{
"Ticker": "AMZN",
"Company": "Amazon.com Inc.",
"Industry": "E-commerce",
"Market Cap": "Trillion Dollars",
"Share_Price": 3400.00,
"Description": "Share price for Amazon.com Inc. in September 2021"
},
{
"Ticker": "GOOGL",
"Company": "Alphabet Inc. (Google)",
"Industry": "Technology",
"Market Cap": "Trillion Dollars",
"Share_Price": 2700.00,
"Description": "Share price for Alphabet Inc. in September 2021"
},
{
"Ticker": "FB",
"Company": "Meta Platforms, Inc. (Facebook)",
"Industry": "Social Media",
"Market Cap": "Trillion Dollars",
"Share_Price": 330.00,
"Description": "Share price for Meta Platforms, Inc. in September 2021"
}
]
# Load dict to DataFrame
df_nasdaq=pd.DataFrame(nasdaq_top_companies)
cac40_data = [
{
"Ticker": "AC",
"Company": "Accor",
"Share_Price": 35.60
},
{
"Ticker": "AI",
"Company": "Air Liquide",
"Share_Price": 147.80
},
{
"Ticker": "MT",
"Company": "ArcelorMittal",
"Share_Price": 28.90
},
{
"Ticker": "BNP",
"Company": "BNP Paribas",
"Share_Price": 53.20
},
{
"Ticker": "CAP",
"Company": "Capgemini",
"Share_Price": 140.45
},
]
# Load dict to DataFrame
cac40_data=pd.DataFrame(cac40_data)
def CreateCsvFile(df, PrefixName: str):
'''
The function CreateCsvFile enables to create multiple csv files by tickers
from a dataframe and a prefix as the filename.
'''
for ticker in df['Ticker']:
pathName = r'./MARKET_DATA/'+ str(PrefixName) +str(ticker)+'.csv'
df[df['Ticker']==ticker].to_csv(pathName, index=False)
return None
CreateCsvFile(df_nasdaq, 'NASDAQ_')
CreateCsvFile(cac40_data, 'CAC40_')
# Assign the folder to parse
pathFolder = r'.\MARKET_DATA'
# Parse the folder
walkFolder = [path[0] for path in os.walk (pathFolder)]
# Initialize list
listOfFile = []
# Loop on the folder
for folder in walkFolder:
# Assign folder path
folderName = os.path.basename(folder)
# Assign folder path joined to a file type searched
files = os.path.join(folder, 'NASDAQ_*.csv')
# Store the file path matching the file type searched
listOfFile = glob.glob(files)
# Initialize list
full_list = []
# Loop on all files
for file in listOfFile:
# Import data to a Dataframe
full_file = pd.read_csv(file, dtype = str).reset_index(drop = True)
# Assign the file path to the file_name columns
full_file['file_name']=os.path.basename(file)
# Append the DataFrame to the list initialized
full_list.append(full_file)
# Merge the list of DataFrames to a single DataFrame
fileConcat = pd.concat(full_list)