sec

Utilities for working with EDGAR filings

Example: Extracting Data from an XOM 10-Q

The first step is to run xbrl_retreiver.py. This will collect all of the 10-Q and 10-K filings for a particular ticker

For example:
```
 $ xbrl_retreiver.py XOM
```
Next, create a file of fields you would like to extract from the XBRL filings.

For example:
```
 $ echo Assets > fields
 $ echo Liabilities >> fields
```

Then run the xbrl_tuple_generator.py.

For example:

 $ xbrl_tuple_generator.py 2013-11-05T17:08:04+00:00_10-Q_xom-20130930 fields xom
 Please enter the label for Assets or press 's' to skip: total assets
 The following tag IDs have been found:

 	(0) us-gaap:Assets
 	(1) us-gaap:Assets
 	(2) us-gaap:Assets

 Please choose.
 Valid choices are 0, 1, 2: 0
 You chose 'us-gaap:Assets' as the tag for 'Assets'
 Is this correct? y
 Please enter the label for Liabilities or press 's' to skip: total liabilities
 The following tag IDs have been found:

 	(0) us-gaap:Liabilities
 	(1) us-gaap:Liabilities
 	(2) us-gaap:Liabilities

 Please choose.
 Valid choices are 0, 1, 2: 0
 You chose 'us-gaap:Liabilities' as the tag for 'Liabilities'
 Is this correct? y

This will result in a pickled list of tuples stored in xom_fields. The tuples associate the fields you specified to an XML tag in the XBRL data file.

From here, run xbrl_tuple_reader.py to extract and print the fields of interest to STDOUT

For example:

 $ xbrl_tuple_reader.py 2013-11-05T17:08:04+00:00_10-Q_xom-20130930 xom_fields
 CIK,Reporting Period End Date,Submission Time,Segments,Submission Period Focus,Period Start,Period End,BoP Assets,BoP Liabilities,EoP Assets,EoP Liabilities
 34088,2013-09-30,2013-11-05T17:08:04+00:00,,2013Q3,2013-01-01,2013-09-30,333795000000,162135000000,347564000000,172086000000

You can use your favorite shell-scripting language to extract data from multiple filings, for example

extractor.bash:

 #!/bin/bash
 PTUPLE=$1
 echo "" > csvs
 for base in `ls | grep xsd | awk -F . '{print $1}'`
 do
 	xbrl_tuple_reader.py $base $PTUPLE > $base.$PTUPLE.csv
 	echo $base.$PTUPLE.csv >> csvs
 done
 merge_csvs csvs -s $PTUPLE.csv
 for f in `cat csvs`
 do
 	rm $f
 done
 rm csvs

(merge_csvs can be found in the http://github.com/gazzman/data_cleaning repo)

Then run:

 $ extractor.bash xom_fields

to generate a file called xom_fields.csv that looks like this:

 $ cat xom_fields.csv
 CIK,Reporting Period End Date,Submission Time,Segments,Submission Period Focus,Period Start,Period End,BoP Assets,BoP Liabilities,EoP Assets,EoP Liabilities
 34088,2010-03-31,2010-05-06T17:53:44+00:00,,2010Q1,2010-01-01,2010-03-31,233323000000,117931000000,242748000000,125082000000
 34088,2010-06-30,2010-08-04T19:04:53+00:00,,2010Q2,2010-01-01,2010-06-30,233323000000,117931000000,291068000000,145701000000
 34088,2010-09-30,2010-11-03T19:42:58+00:00,,2010Q3,2010-01-01,2010-09-30,233323000000,117931000000,299994000000,149394000000
 34088,2010-12-31,2011-02-25T21:07:35+00:00,,2010FY,2010-01-01,2010-12-31,233323000000,117931000000,302510000000,149831000000
 34088,2010-12-31,2011-02-28T22:01:32+00:00,,2010FY,2010-01-01,2010-12-31,233323000000,117931000000,302510000000,149831000000
 34088,2011-03-31,2011-05-05T16:53:46+00:00,,2011Q1,2011-01-01,2011-03-31,302510000000,149831000000,319533000000,162002000000
 34088,2011-06-30,2011-08-04T16:19:05+00:00,,2011Q2,2011-01-01,2011-06-30,302510000000,149831000000,326204000000,164369000000
 34088,2011-09-30,2011-11-03T15:41:58+00:00,,2011Q3,2011-01-01,2011-09-30,302510000000,149831000000,323227000000,161015000000
 34088,2011-12-31,2012-02-24T21:08:32+00:00,,2011FY,2011-01-01,2011-12-31,302510000000,149831000000,331052000000,170308000000
 34088,2012-03-31,2012-05-03T18:56:03+00:00,,2012Q1,2012-01-01,2012-03-31,331052000000,170308000000,345152000000,181035000000
 34088,2012-06-30,2012-08-02T17:10:52+00:00,,2012Q2,2012-01-01,2012-06-30,331052000000,170308000000,329645000000,161660000000
 34088,2012-09-30,2012-11-06T17:14:21+00:00,,2012Q3,2012-01-01,2012-09-30,331052000000,170308000000,335191000000,162836000000
 34088,2012-12-31,2013-02-27T21:05:06+00:00,,2012FY,2012-01-01,2012-12-31,331052000000,170308000000,333795000000,162135000000
 34088,2013-03-31,2013-05-02T15:50:47+00:00,,2013Q1,2013-01-01,2013-03-31,333795000000,162135000000,339639000000,166562000000
 34088,2013-06-30,2013-08-06T15:54:46+00:00,,2013Q2,2013-01-01,2013-06-30,333795000000,162135000000,341615000000,170027000000
 34088,2013-09-30,2013-11-05T17:08:04+00:00,,2013Q3,2013-01-01,2013-09-30,333795000000,162135000000,347564000000,172086000000

The result is a nicely formatted csv ready for importing into your favorite analysis application.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
rss_tbird_import_generator.py		rss_tbird_import_generator.py
xbrl_reader.py		xbrl_reader.py
xbrl_retreiver.py		xbrl_retreiver.py
xbrl_tuple_generator.py		xbrl_tuple_generator.py
xbrl_tuple_reader.py		xbrl_tuple_reader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sec

Example: Extracting Data from an XOM 10-Q

About

Uh oh!

Releases

Packages

Languages

License

gazzman/sec

Folders and files

Latest commit

History

Repository files navigation

sec

Example: Extracting Data from an XOM 10-Q

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages