Basics¶
Input¶
- from file, as .csv or .html
- from url (if there are more tables at the provided url, use the
table_number
argument) - from python list object
[1]:
table_path = '../examples/tables/table_example.csv'
from tabledataextractor import Table
table = Table(table_path)
First we will check out the original table, which is now stored as table.raw_table
. We can use the print_raw_table()
function within TableDataExtractor:
[2]:
table.print_raw_table()
Rutile Rutile Rutile Anatase Anatase Anatase
a = b (Å) c (Å) u a = b (Å) c (Å) u
Computational This study 4.64 2.99 0.305 3.83 9.62 0.208
Computational GGA [25] 4.67 2.97 0.305 3.80 9.67 0.207
Computational GGA [26] 4.63 2.98 0.305 - - -
Computational HF [27] - - - 3.76 9.85 0.202
Experimental Expt. [23] 4.594 2.958 0.305 3.785 9.514 0.207
TableDataExtractor provides a category table, where each row corresponds to a single data point. This is the main result of TableDataExtractor. We can simply print
the table to see it:
[3]:
print(table)
+-------+---------------------------------+--------------------------+
| Data | Row Categories | Column Categories |
+-------+---------------------------------+--------------------------+
| 4.64 | ['Computational', 'This study'] | ['Rutile', 'a = b (Å)'] |
| 2.99 | ['Computational', 'This study'] | ['Rutile', 'c (Å)'] |
| 0.305 | ['Computational', 'This study'] | ['Rutile', 'u'] |
| 3.83 | ['Computational', 'This study'] | ['Anatase', 'a = b (Å)'] |
| 9.62 | ['Computational', 'This study'] | ['Anatase', 'c (Å)'] |
| 0.208 | ['Computational', 'This study'] | ['Anatase', 'u'] |
| 4.67 | ['Computational', 'GGA [25]'] | ['Rutile', 'a = b (Å)'] |
| 2.97 | ['Computational', 'GGA [25]'] | ['Rutile', 'c (Å)'] |
| 0.305 | ['Computational', 'GGA [25]'] | ['Rutile', 'u'] |
| 3.80 | ['Computational', 'GGA [25]'] | ['Anatase', 'a = b (Å)'] |
| 9.67 | ['Computational', 'GGA [25]'] | ['Anatase', 'c (Å)'] |
| 0.207 | ['Computational', 'GGA [25]'] | ['Anatase', 'u'] |
| 4.63 | ['Computational', 'GGA [26]'] | ['Rutile', 'a = b (Å)'] |
| 2.98 | ['Computational', 'GGA [26]'] | ['Rutile', 'c (Å)'] |
| 0.305 | ['Computational', 'GGA [26]'] | ['Rutile', 'u'] |
| - | ['Computational', 'GGA [26]'] | ['Anatase', 'a = b (Å)'] |
| - | ['Computational', 'GGA [26]'] | ['Anatase', 'c (Å)'] |
| - | ['Computational', 'GGA [26]'] | ['Anatase', 'u'] |
| - | ['Computational', 'HF [27]'] | ['Rutile', 'a = b (Å)'] |
| - | ['Computational', 'HF [27]'] | ['Rutile', 'c (Å)'] |
| - | ['Computational', 'HF [27]'] | ['Rutile', 'u'] |
| 3.76 | ['Computational', 'HF [27]'] | ['Anatase', 'a = b (Å)'] |
| 9.85 | ['Computational', 'HF [27]'] | ['Anatase', 'c (Å)'] |
| 0.202 | ['Computational', 'HF [27]'] | ['Anatase', 'u'] |
| 4.594 | ['Experimental', 'Expt. [23]'] | ['Rutile', 'a = b (Å)'] |
| 2.958 | ['Experimental', 'Expt. [23]'] | ['Rutile', 'c (Å)'] |
| 0.305 | ['Experimental', 'Expt. [23]'] | ['Rutile', 'u'] |
| 3.785 | ['Experimental', 'Expt. [23]'] | ['Anatase', 'a = b (Å)'] |
| 9.514 | ['Experimental', 'Expt. [23]'] | ['Anatase', 'c (Å)'] |
| 0.207 | ['Experimental', 'Expt. [23]'] | ['Anatase', 'u'] |
+-------+---------------------------------+--------------------------+
If we want to further process the category table, we can access it as a list of lists:
[4]:
print(table.category_table)
[['4.64', ['Computational', 'This study'], ['Rutile', 'a = b (Å)']], ['2.99', ['Computational', 'This study'], ['Rutile', 'c (Å)']], ['0.305', ['Computational', 'This study'], ['Rutile', 'u']], ['3.83', ['Computational', 'This study'], ['Anatase', 'a = b (Å)']], ['9.62', ['Computational', 'This study'], ['Anatase', 'c (Å)']], ['0.208', ['Computational', 'This study'], ['Anatase', 'u']], ['4.67', ['Computational', 'GGA [25]'], ['Rutile', 'a = b (Å)']], ['2.97', ['Computational', 'GGA [25]'], ['Rutile', 'c (Å)']], ['0.305', ['Computational', 'GGA [25]'], ['Rutile', 'u']], ['3.80', ['Computational', 'GGA [25]'], ['Anatase', 'a = b (Å)']], ['9.67', ['Computational', 'GGA [25]'], ['Anatase', 'c (Å)']], ['0.207', ['Computational', 'GGA [25]'], ['Anatase', 'u']], ['4.63', ['Computational', 'GGA [26]'], ['Rutile', 'a = b (Å)']], ['2.98', ['Computational', 'GGA [26]'], ['Rutile', 'c (Å)']], ['0.305', ['Computational', 'GGA [26]'], ['Rutile', 'u']], ['-', ['Computational', 'GGA [26]'], ['Anatase', 'a = b (Å)']], ['-', ['Computational', 'GGA [26]'], ['Anatase', 'c (Å)']], ['-', ['Computational', 'GGA [26]'], ['Anatase', 'u']], ['-', ['Computational', 'HF [27]'], ['Rutile', 'a = b (Å)']], ['-', ['Computational', 'HF [27]'], ['Rutile', 'c (Å)']], ['-', ['Computational', 'HF [27]'], ['Rutile', 'u']], ['3.76', ['Computational', 'HF [27]'], ['Anatase', 'a = b (Å)']], ['9.85', ['Computational', 'HF [27]'], ['Anatase', 'c (Å)']], ['0.202', ['Computational', 'HF [27]'], ['Anatase', 'u']], ['4.594', ['Experimental', 'Expt. [23]'], ['Rutile', 'a = b (Å)']], ['2.958', ['Experimental', 'Expt. [23]'], ['Rutile', 'c (Å)']], ['0.305', ['Experimental', 'Expt. [23]'], ['Rutile', 'u']], ['3.785', ['Experimental', 'Expt. [23]'], ['Anatase', 'a = b (Å)']], ['9.514', ['Experimental', 'Expt. [23]'], ['Anatase', 'c (Å)']], ['0.207', ['Experimental', 'Expt. [23]'], ['Anatase', 'u']]]
We may wish to access other elements of the table, such as the title row, the row or column headers, and the data:
[5]:
print ("Title row: \n", table.title_row)
print ("Row header: \n", table.row_header)
print ("Column header: \n", table.col_header)
print ("Data: \n", table.data)
Title row:
0
Row header:
[['Computational' 'This study']
['Computational' 'GGA [25]']
['Computational' 'GGA [26]']
['Computational' 'HF [27]']
['Experimental' 'Expt. [23]']]
Column header:
[['Rutile' 'Rutile' 'Rutile' 'Anatase' 'Anatase' 'Anatase']
['a = b (Å)' 'c (Å)' 'u' 'a = b (Å)' 'c (Å)' 'u']]
Data:
[['4.64' '2.99' '0.305' '3.83' '9.62' '0.208']
['4.67' '2.97' '0.305' '3.80' '9.67' '0.207']
['4.63' '2.98' '0.305' '-' '-' '-']
['-' '-' '-' '3.76' '9.85' '0.202']
['4.594' '2.958' '0.305' '3.785' '9.514' '0.207']]
If needed we can transpose the whole table, which will return the same category table, with row and column categories interchanged:
[6]:
table.transpose()
print(table)
+-------+--------------------------+---------------------------------+
| Data | Row Categories | Column Categories |
+-------+--------------------------+---------------------------------+
| 4.64 | ['Rutile', 'a = b (Å)'] | ['Computational', 'This study'] |
| 4.67 | ['Rutile', 'a = b (Å)'] | ['Computational', 'GGA [25]'] |
| 4.63 | ['Rutile', 'a = b (Å)'] | ['Computational', 'GGA [26]'] |
| - | ['Rutile', 'a = b (Å)'] | ['Computational', 'HF [27]'] |
| 4.594 | ['Rutile', 'a = b (Å)'] | ['Experimental', 'Expt. [23]'] |
| 2.99 | ['Rutile', 'c (Å)'] | ['Computational', 'This study'] |
| 2.97 | ['Rutile', 'c (Å)'] | ['Computational', 'GGA [25]'] |
| 2.98 | ['Rutile', 'c (Å)'] | ['Computational', 'GGA [26]'] |
| - | ['Rutile', 'c (Å)'] | ['Computational', 'HF [27]'] |
| 2.958 | ['Rutile', 'c (Å)'] | ['Experimental', 'Expt. [23]'] |
| 0.305 | ['Rutile', 'u'] | ['Computational', 'This study'] |
| 0.305 | ['Rutile', 'u'] | ['Computational', 'GGA [25]'] |
| 0.305 | ['Rutile', 'u'] | ['Computational', 'GGA [26]'] |
| - | ['Rutile', 'u'] | ['Computational', 'HF [27]'] |
| 0.305 | ['Rutile', 'u'] | ['Experimental', 'Expt. [23]'] |
| 3.83 | ['Anatase', 'a = b (Å)'] | ['Computational', 'This study'] |
| 3.80 | ['Anatase', 'a = b (Å)'] | ['Computational', 'GGA [25]'] |
| - | ['Anatase', 'a = b (Å)'] | ['Computational', 'GGA [26]'] |
| 3.76 | ['Anatase', 'a = b (Å)'] | ['Computational', 'HF [27]'] |
| 3.785 | ['Anatase', 'a = b (Å)'] | ['Experimental', 'Expt. [23]'] |
| 9.62 | ['Anatase', 'c (Å)'] | ['Computational', 'This study'] |
| 9.67 | ['Anatase', 'c (Å)'] | ['Computational', 'GGA [25]'] |
| - | ['Anatase', 'c (Å)'] | ['Computational', 'GGA [26]'] |
| 9.85 | ['Anatase', 'c (Å)'] | ['Computational', 'HF [27]'] |
| 9.514 | ['Anatase', 'c (Å)'] | ['Experimental', 'Expt. [23]'] |
| 0.208 | ['Anatase', 'u'] | ['Computational', 'This study'] |
| 0.207 | ['Anatase', 'u'] | ['Computational', 'GGA [25]'] |
| - | ['Anatase', 'u'] | ['Computational', 'GGA [26]'] |
| 0.202 | ['Anatase', 'u'] | ['Computational', 'HF [27]'] |
| 0.207 | ['Anatase', 'u'] | ['Experimental', 'Expt. [23]'] |
+-------+--------------------------+---------------------------------+
Output & Pandas¶
- as csv file
- as Pandas DataFrame
To store the table as .csv
:
[7]:
table.to_csv('./saved_table.csv')
The table can also be converted to a Pandas DataFrame object:
[8]:
import pandas
df = table.to_pandas()
df
[8]:
Computational | Experimental | |||||
---|---|---|---|---|---|---|
This study | GGA [25] | GGA [26] | HF [27] | Expt. [23] | ||
Rutile | a = b (Å) | 4.64 | 4.67 | 4.63 | - | 4.594 |
c (Å) | 2.99 | 2.97 | 2.98 | - | 2.958 | |
u | 0.305 | 0.305 | 0.305 | - | 0.305 | |
Anatase | a = b (Å) | 3.83 | 3.80 | - | 3.76 | 3.785 |
c (Å) | 9.62 | 9.67 | - | 9.85 | 9.514 | |
u | 0.208 | 0.207 | - | 0.202 | 0.207 |
We can now use all the powerful features of Pandas to interpret the content of the table. Lets say that we are interested in the experimental values for ‘Anatase’:
[9]:
df.loc['Anatase','Experimental']
[9]:
Expt. [23] | |
---|---|
a = b (Å) | 3.785 |
c (Å) | 9.514 |
u | 0.207 |
The most powerful feature of TableDataExtractor is that it will automatically create a MultiIndex
for the Pandas DataFrame, which would traditionally be done by hand for every individual table.
[10]:
print(df.index)
print(df.columns)
MultiIndex(levels=[['Anatase', 'Rutile'], ['a = b (Å)', 'c (Å)', 'u']],
labels=[[1, 1, 1, 0, 0, 0], [0, 1, 2, 0, 1, 2]])
MultiIndex(levels=[['Computational', 'Experimental'], ['Expt. [23]', 'GGA [25]', 'GGA [26]', 'HF [27]', 'This study']],
labels=[[0, 0, 0, 0, 1], [4, 1, 2, 3, 0]])
Or, we might be interested in only the ‘c (Å)’ values from the table. Here, ilevel_1
specifies the index level of 1, which includes a=b(Å), c(Å) and u:
[11]:
df.query('ilevel_1 == "c (Å)"')
[11]:
Computational | Experimental | |||||
---|---|---|---|---|---|---|
This study | GGA [25] | GGA [26] | HF [27] | Expt. [23] | ||
Rutile | c (Å) | 2.99 | 2.97 | 2.98 | - | 2.958 |
Anatase | c (Å) | 9.62 | 9.67 | - | 9.85 | 9.514 |