In this post, we will dive into the DateTime related module of Python and Pandas. Handling DateTime is always a boring part of any programming language. Many times we can achieve most of our requirements without delving much into this module. But if we understand it structurally, it is not that boring. It will make your life pretty easy when handling a Timeseries dataset.
We will try to develop a mindmap along with this post. We will cover,
- Datetime objects in Python
- Operations and Arithmetic on Python Datetime object
- Read DateTime from String and format back to String
- Datetime objects in Pandas
- Learning to operate TimeSeries data based on Datetime Index
- Understanding and applying Delta, Offsets, Timezone
Python Date and Time ecosystem
The above-mentioned modules we will cover in this post. We will dive deep into the datetime module of Python and all the shown modules of Pandas. These are enough for all our DateTime need.
Python Internal packages/modules
Time is the first package that we will discuss. You may not need it more often because the datetime module will cover everything that is available in this module.
Create a Time object
There are 3 ways we can input the information for a time
- epoch - Seconds since a reference instant, known as the epoch. Midnight, UTC, of January 1, 1970, is a popular epoch used on both Unix and Windows platforms.
- As a tuple - An alternative to seconds since the epoch, a time instant can be represented by a tuple of nine integers, called a timetuple. As show below
tm_year=2005, tm_mon=8, tm_mday=7, tm_hour=23, tm_min=21, tm_sec=29, tm_wday=6, tm_yday=219, tm_isdst=0
This is an intuitive approach since we have the option to input all the relevant values with a keyword argument. This approach is common across different modules but with different names of the underlying Class.
struct_timeis the name for the Class in
- From String - We can also read from strings like '2020-11-18 23:59:59'
Let's see the functions that are required to achieve the above methods.
import time tm = time.gmtime(1123456889.5) # epoch --> time.struct_time object time.mktime(tm) # time.struct_time object --> epoch time.struct_time((2005, 8, 7, 23, 21, 29, 6, 219, 0)) # Create struct_time explicitly time.time() # Current time in epoch # Get the individual attributes print(tm.tm_year, tm.tm_mon, tm.tm_mday,tm.tm_hour, tm.tm_min, tm.tm_sec)
We have simply used the 3 methods of time class [ in the time module ]
All other parts of the code is quite trivial and self-explanatory.
With the above code snippet, we are equipped to read and save time data. let's read from Sring and format back to a string
read_time = time.strptime("2018-04-02 23:59:50", '%Y-%m-%d %H:%M:%S') str_time = time.strftime('%d-%b-%Y %H:%M:%S', read_time)
We have two method to our service - strptime and strftime.
The meaning of each alphabetic code can be checked Here
datetime module has all the functionality of the time module and has many APIs on top of it. So, you might ignore the time module.
datetime module has Classes for - Date, Time, and Datetime. The first two are for Date and Time respectively and the last one is the superset for the two. Hence the last one i.e
datetime Class is sufficient for all of our tasks.
Why we need
datetimewhen we have the
The high-level reason is that the
timemodule is to handle time as a Float. It is not designed keeping humans in mind.
datetimehas all the required API needed to handle date and time by a Human. Check his Reddit Answer Reddit
Let's check the
datetime module with the required code. Be mindful that the Object of the Datetime which stores the values will be
datetime. Also, take a note that the name of the top-level package is also
from datetime import datetime # Both are named datetime dtm = datetime(2000, 5, 23, hour=0, minute=0,second=0, microsecond=0,tzinfo=None) # Time tuple dtm = datetime.fromtimestamp(1123456889.5) # epoch --> datetime. Similar to mktime datetime.now() # Current time # Read from String datetime.strptime("2018-04-02 23:59:50", '%Y-%m-%d %H:%M:%S') # string--> datetime # Back to String d = datetime.strptime("2018-04-02 23:59:50", '%Y-%m-%d %H:%M:%S') datetime.strftime(d, '%d-%b-%Y %H:%M:%S') # datetime --> string # Individual attributes of datetime d2.year,d2.month, d2.day, d2.minute, d2.second # Weekdays names are not directly avaialble as attribute d2.strftime("%A"), d2.strftime("%a")
Code is quite intuitive to understand. In addition, now we have an option for timezone(tzinfo parameter). We will use it later
Datetime arithmetic and Timedelta module
We now know the approach to input, format, and print formatted datetime. So, let's learn how to do Arithmetic with datetime.
timedelta is the module to create and manage the difference between to datetime. We can also calculate the future date if the delta is known.
Instances of the timedelta class represent time intervals with three read-only integer attributes days, seconds, and microseconds.
Let's check the
timedelta module with the required code.
from datetime import timedelta, datetime d1 = datetime.strptime("2018-04-02 23:59:50", '%Y-%m-%d %H:%M:%S') d2 = datetime.strptime("2019-05-03 23:57:12", '%Y-%m-%d %H:%M:%S') d2 - d1 # >>> datetime.timedelta(days=395, seconds=86242) # this is timedelta object delta = timedelta(days=395, seconds=86242) # this is timedelta object delta.days,delta.seconds # Check attributes # Add the delta to d1 datetime.strftime(d1+delta, '%d-%b-%Y %H:%M:%S') # Same as d2 str(d1+delta) # str function implementation of timedelta
As mentioned above, timedeta can be expressed in only 3 attributes days, seconds and microseconds.
Difference of two datetime object is a timedelta object
pytz is a third-party module to handle timezone-related manipulations. Timezone handling can be prone to bugs and issues. Here are the words of wisdom from "Python in a Nutshell"
The best way to program around the traps and pitfalls of time zones is to always use the UTC time zone internally, converting from other time zones on input, and to other time zones only for display purposes.
Let's check a quick code snippet to handle timezone with datetime.
!pip install pytz import pytz # Get the list of all available timezones pytz.common_timezones #1 # Timezone for a particular country # Use the ISO format of country code pytz.country_timezones('IN') # >>> ['Asia/Kolkata'] #2 inp_ny = datetime(2021,11,11, tzinfo=pytz.timezone('America/New_York')) # Return datetime with New york time #3 # use the astimezone method of datetime object out_ind = inp_ny.astimezone( pytz.timezone('Asia/Kolkata')) #4
#1 - Fetech the list of all avaialble timezones
#2 - Fetch the list of all timezones for a country [India here]
#3 -Use the tzinfo of datetime constructor
#4 -Covert to the desired timezone
When we create a datetime without a
tzinfo it's a naive
datetime i.e. just a datetime without any timezone attached. When we pass the timezone to the tzinfo parameter, the datetime became the datetime for that timezone.
Let's do a small exercise and create two datetime with the same values but pinned to different timezones. Then calculate the timedelta of the two.
time_1 = datetime(2021,11,11, tzinfo=pytz.timezone('America/New_York')) time_2 = datetime(2021,11,11, tzinfo=pytz.timezone('Pacific/Auckland')) time_1 - time_2 # >>>datetime.timedelta(seconds=59700) | Equivalent to ~16.5 Hours
This was all for this post. If you keep these few snippets in mind, datetime will never haunt you. We will continue this post and add on to the Pandas library. That post will not just focus on core pf pandas datetime objects but also on the Timeseries data.
You may try,
- The dateutil module - a third-party package that offers modules to manipulate dates. [Link]
- The calendar module - calendar module supplies calendar-related functions
- The arrow library - It offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps. [Link]