Python has extremely scalable choices for exploring knowledge. With Pandas or Dask, you possibly can scale Jupyter as much as huge knowledge. But what about small knowledge? Personal knowledge? Private knowledge?
JupyterLab and Jupyter Notebook present an ideal surroundings to scrutinize my laptop-based life.
My exploration is powered by the truth that nearly each service I exploit has an internet utility programming interface (API). I exploit many such companies: a to-do listing, a time tracker, a behavior tracker, and extra. But there’s one that nearly everybody makes use of: a calendar. The identical concepts could be utilized to different companies, however calendars have one cool characteristic: an open commonplace that the majority internet calendars assist:
Parsing your calendar with Python in Jupyter
Most calendars present a method to export into the
CalDAV format. You may have some authentication for accessing this non-public knowledge. Following your service’s directions ought to do the trick. How you get the credentials relies on your service, however ultimately, it is best to have the ability to retailer them in a file. I retailer mine in my root listing in a file known as
with open(os.path.expanduser("~/.caldav")) as fpin:
username, password = fpin.learn().cut up()
Never put usernames and passwords immediately in notebooks! They may simply leak with a stray
The subsequent step is to make use of the handy PyPI caldav library. I regarded up the CalDAV server for my electronic mail service (yours could also be totally different):
shopper = caldav.DAVClient(url="https://caldav.fastmail.com/dav/", username=username, password=password)
CalDAV has an idea known as the
principal. It is just not essential to get into proper now, besides to know it is the factor you utilize to entry the calendars:
principal = shopper.principal()
calendars = principal.calendars()
Calendars are, actually, all about time. Before accessing occasions, you might want to determine on a time vary. One week must be a superb default:
from dateutil import tz
now = datetime.datetime.now(tz.tzutc())
since = now - datetime.timedelta(days=7)
Most individuals use multiple calendar, and most of the people need all their occasions collectively. The
itertools.chain.from_iterable makes this simple:
raw_events = listing(
calendar.date_search(begin=since, finish=now, develop=True)
for calendar in calendars
Reading all of the occasions into reminiscence is essential, and doing so within the API’s uncooked, native format is a vital observe. This implies that when fine-tuning the parsing, analyzing, and displaying code, there isn’t a want to return to the API service to refresh the information.
But “raw” is just not an understatement. The occasions come via as strings in a particular format:
Luckily, PyPI involves the rescue once more with one other helper library, vobject:
knowledge = raw_event.knowledge
parsed = vobject.learnOne(io.StringIO(knowledge))
contents = parsed.vevent.contents
'dtend': [<DTEND2020-08-25 23:00:00+00:00>],
'dtstamp': [<DTSTAMP2020-08-25 18:19:15+00:00>],
'dtstart': [<DTSTART2020-08-25 22:00:00+00:00>],
Well, at the very least it is slightly higher.
There continues to be some work to do to transform it to an inexpensive Python object. The first step is to have an inexpensive Python object. The attrs library offers a pleasant begin:
from __future__ import annotations
Time to jot down the conversion code!
The first abstraction will get the worth from the parsed dictionary with out all of the decorations:
def get_piece(contents, title):
datetime.datetime(2020, eight, 25, 22, Zero, tzinfo=tzutc())
Calendar occasions at all times have a begin, however they generally have an “end” and typically a “duration.” Some cautious parsing logic can harmonize each into the identical Python objects:
def from_calendar_event_and_timezone(occasion, timezone):
contents = parse_event(occasion)
begin = get_piece(contents, "dtstart")
abstract = get_piece(contents, "summary")
finish = get_piece(contents, "dtend")
finish = begin + get_piece(contents, "duration")
return Event(begin=begin, finish=finish, abstract=abstract, timezone=timezone)
Since it’s helpful to have the occasions in your native time zone reasonably than UTC, this makes use of the native timezone:
my_timezone = tz.gettz()
Event(begin=datetime.datetime(2020, eight, 25, 22, Zero, tzinfo=tzutc()), finish=datetime.datetime(2020, eight, 25, 23, Zero, tzinfo=tzutc()), timezone=tzfile('/and so forth/localtime'), abstract='Busy')
Now that the occasions are actual Python objects, they actually ought to have some further data. Luckily, it’s potential so as to add strategies retroactively to courses.
But figuring which day an occasion occurs is just not that apparent. You want the day within the native timezone:
offset = self.timezone.utcoffset(self.begin)
mounted = self.begin + offset
Event.day = property(day)
Events are at all times represented internally as begin/finish, however figuring out the period is a helpful property. Duration can be added to the present class:
return self.finish - self.begin
Event.period = property(period)
Now it’s time to convert all occasions into helpful Python objects:
all_events = [from_calendar_event_and_timezone(raw_event, my_timezone)
for raw_event in raw_events]
All-day occasions are a particular case and possibly much less helpful for analyzing life. For now, you possibly can ignore them:
# ignore all-day occasions
all_events = [occasion for occasion in all_events if not kind(occasion.begin) == datetime.date]
Events have a pure order—figuring out which one occurred first might be helpful for evaluation:
all_events.kind(key=lambda ev: ev.begin)
Now that the occasions are sorted, they are often damaged into days:
events_by_day = collections.defaultdict(listing)
for occasion in all_events:
And with that, you will have calendar occasions with dates, period, and sequence as Python objects.
Reporting in your life in Python
Now it’s time to write reporting code! It is enjoyable to have eye-popping formatting with correct headers, lists, essential issues in daring, and so forth.
This means HTML and a few HTML templating. I like to make use of Chameleon:
template_content = """
<div tal:repeat="merchandise gadgets">
<h2 tal:content material="merchandise">Day</h2>
<li tal:repeat="occasion merchandise"><span tal:replace="occasion">Thing</span></li>
One cool characteristic of Chameleon is that it’s going to render objects utilizing its
html methodology. I’ll use it in two methods:
- The abstract shall be in daring
- For most occasions, I’ll take away the abstract (since that is my private data)
offset = my_timezone.utcoffset(self.begin)
mounted = self.begin + offset
start_str = str(mounted).cut up("+")[Zero]
abstract = self.abstract
if abstract != "Busy":
abstract = "<REDACTED>"
return f"<b>summary[:30]</b> -- (self.duration)"
Event.__html__ = __html__
In the curiosity of brevity, the report shall be sliced into sooner or later’s price.
from IPython.show import HTML
template = chameleon.PageTemplate(template_content)
html = template(gadgets=itertools.islice(events_by_day.gadgets(), three, four))
When rendered, it should look one thing like this:
- <REDACTED> — 2020-08-25 08:30:00 (Zero:45:00)
- <REDACTED> — 2020-08-25 10:00:00 (1:00:00)
- <REDACTED> — 2020-08-25 11:30:00 (Zero:30:00)
- <REDACTED> — 2020-08-25 13:00:00 (Zero:25:00)
- Busy — 2020-08-25 15:00:00 (1:00:00)
- <REDACTED> — 2020-08-25 15:00:00 (1:00:00)
- <REDACTED> — 2020-08-25 19:00:00 (1:00:00)
- <REDACTED> — 2020-08-25 19:00:12 (1:00:00)
Endless choices with Python and Jupyter
This solely scratches the floor of what you are able to do by parsing, analyzing, and reporting on the information that varied internet companies have on you.
Why not strive it along with your favourite service?