Institutional investment managers are required to fill out a form known as 13F every quarter which shows information about each asset worth over $100M.

Some people think that this data is interesting because it gives you a clue as to what some of the smartest investors are doing. This article contains no investment advice. Instead, let’s just assume for the sake of argument that the data is interesting and potentially useful.

How do we get it?

The good news is that the SEC has a system called Edgar where all of these reports are filed. The bad news is that there’s no straightforward way to just go and get the data in an automated way.

So – we are going to build our own.

The ideas here are language agnostic, but for the example I’ll be using Elixir. Their site has a great walkthrough on getting things installed and how the basics of the language work, so I’ll leave that reading to you.

Getting Set Up

I typically like to start something like this by jumping into the terminal and running some commands to get my head around the problem. But first, we need to install some packages. We’re going to be using a library called crawly to help us out. Let’s create a new project and install it:

$ mix new form_13f_reader --sup
$ cd form_13f_reader

Open up the project in your text editor of choice and edit the dependencies in the mix.exs file section with the following:

defp deps do
  [
    { :crawly, "~> 0.1" }
  ]
end

Once you add that, back in the terminal run:

$ mix deps.get

Now we should be good to go.

Reading Urls

Now that we have everything installed, we can start running some commands to see what happens. To launch an elixir console with our dependencies (and eventually code) preloaded, run:

$ iex -S mix

This kicks off an interactive console with Crawly preloaded. Let’s pull in some data.

We’re going to start by getting the raw data from a single filing in this part of the series. In the next parts we will look at how to back out and automatically pull data from multiple filings across multiple investment companies. For now though, let’s just focus on getting a single filing from Berkshire Hathaway.

iex> url = "https://www.sec.gov/Archives/edgar/data/1067983/000095012319005436/xslForm13F_X01/form13fInfoTable.xml"
iex> { _code, response } = Crawly.fetch(url)
=> {:ok, %HTTPoison.Response{ ... }}

We are making use of Elixir’s pattern matching feature to extract the response into a variable. We can get the raw HTML from the page by calling response.body.

iex> response.body
=>"<!DOCTYPE html PUBLIC ... />

We can use a tool called Floki which should already be installed with the other tools to extract specific HTML elements.

iex> Floki.find(response.body, "tr")
=> [{"tr", ... }, {"tr", ... }, ...]

or alternatively:

iex> table_rows = response.body |> Floki.find("tr")
=> [{"tr", ... }, {"tr", ... }, ...]

This is a bit more true to Elixir’s function pipe syntax.

If you inspect carefully, you’ll see that the actual data we care about starts on the 11th row of the data given back to us and goes to the end. We can access it like this:

iex> raw_positions = Enum.slice(table_rows, 11, length(table_rows) - 1)
=> [{"tr", ... }, {"tr", ... }, ...]

Inspecting this data, you should see only the information from the actual filings table. Looking at the data, you can see that the last element of each td tag contained within our tr tags has the data that we want. What we’d like to do is push that into some kind of mapping.

Putting It All Together

Back in the text editor, let’s create a file we can run called scraper.ex inside of lib/form13freader. Inside that file, we’ll create a function called run that basically encapsulates what we’ve done so far.

# scraper.ex
defmodule Scraper do
  def run do
    url = "https://www.sec.gov/Archives/edgar/data/1067983/000095012319008356/xslForm13F_X01/form13fInfoTable.xml"
    {code, response} = Crawly.fetch(url)
    table_rows = response.body |> Floki.find("tr")
    raw_positions = Enum.slice(table_rows, 11, length(table_rows) - 1)
    # TODO: Pull actual data out of raw positions into a mapping for each row
  end
end

Now we need to actually get the data out of this raw format into something more workable. We can start by roughing in a function kind of like this (insert at the TODO comment):

Enum.map(raw_positions, fn raw_position -> get_data(raw_position) end)

(If you need to learn more about maps, check this out.)

Obviously we will need to define the get_data function, so let’s do that now:

def get_data({ _element, _array, row }) do
  %{
    issuer: get_text_from(row, 0),
    class: get_text_from(row, 1),
    cusip: get_text_from(row, 2),
    value: get_integer_from(row, 3) * 1000,
    shares: get_integer_from(row, 4)
  }
end

First of all, we are using pattern matching in the argument of the function to tell Elixir that we only really care about the third item of the tuple. We extract it and set it to a variable called row.

We also put some function placeholders for extracting the data we are interested in. (To be fair, I worked this out earlier.) The 0,1,2,3,4 arguments you see correspond to the 0th through 4th columns of the table. This will probably make more sense if we go ahead and implement those functions.

def get_text_from(row, element_number) do
  Enum.at(row, element_number) |> elem(2) |> Enum.at(0)
end
def get_integer_from(row, element_number) do
  get_text_from(row, element_number)
  |> String.replace(",", "")
  |> Integer.parse()
  |> elem(0)
end

These functions are essentially just parsing the data structure that comes back from Crawly to drill down to the final data points that we care about. The second function takes the results of the first function and converts it to an integer (so we can do math later).

Conclusion

Here is the entire file you should have at this point.

defmodule Scraper do
  def run do
    url = "https://www.sec.gov/Archives/edgar/data/1067983/000095012319008356/xslForm13F_X01/form13fInfoTable.xml"
    {code, response} = Crawly.fetch(url)
    table_rows = response.body |> Floki.find("tr")
    raw_positions = Enum.slice(table_rows, 11, length(table_rows) - 1)

    Enum.map(raw_positions, fn raw_position -> get_data(raw_position) end)
  end

  def get_data({ _element, _array, row }) do
    %{
      issuer: get_text_from(row, 0),
      class: get_text_from(row, 1),
      cusip: get_text_from(row, 2),
      value: get_integer_from(row, 3) * 1000,
      shares: get_integer_from(row, 4)
    }
  end

  def get_text_from(row, element_number) do
    Enum.at(row, element_number) |> elem(2) |> Enum.at(0)
  end

  def get_integer_from(row, element_number) do
    get_text_from(row, element_number)
    |> String.replace(",", "")
    |> Integer.parse()
    |> elem(0)
  end
end

You should now be able to restart your iex session and do the following:

$ iex -S mix
iex> Scraper.run

This should give you a list of maps that contain the actual data we care about.

In the coming parts of this, we’ll look at how to pull in data from multiple filings by crawling the list of filing for a given company, how to do repeat this process for a list of companies, and probably how to clean and organize the data.

Need Help?

We are a full service software engineering firm. We can help you with everything from design to project management to development.

Got a project you’d like to talk about? Get in touch at info@centralstandard.tech.