Welcome to TableDataExtractor!¶
Input a table as unstructured .csv file, python list, .html file, or url and output a standardized table where each row corresponds to a single data entry in the original table. TableDataExtractor will take care of complicated header structures in row and column headers, which includes:
- spanning cells
- nested column/row headers
- titles within the table
- note cells
- footnotes
- prefixing of row and column headers if non-unique
- multiple tables within one
Tip
TableDataExtractor can output to Pandas and will automatically create complex MultiIndex DataFrame structures.
Example
Importing Table 2 from ‘https://link.springer.com/article/10.1007%2Fs10853-012-6439-6’:
from tabledataextractor import Table
table = Table('https://link.springer.com/article/10.1007%2Fs10853-012-6439-6',2)
print(table)
+---------+------------------+-------------------------------------+
| Data | Row Categories | Column Categories |
+---------+------------------+-------------------------------------+
| 4.64 | [' This study '] | [' Rutile ', ' a = b (Å) '] |
| 2.99 | [' This study '] | [' Rutile ', ' c (Å) '] |
| 0.305 | [' This study '] | [' Rutile ', ' u '] |
| 3.83 | [' This study '] | [' Anatase ', ' a = b (Å) '] |
| 9.62 | [' This study '] | [' Anatase ', ' c (Å) '] |
| 0.208 | [' This study '] | [' Anatase ', ' u '] |
| 4.67 | [' GGA [25] '] | [' Rutile ', ' a = b (Å) '] |
| 2.97 | [' GGA [25] '] | [' Rutile ', ' c (Å) '] |
| 0.305 | [' GGA [25] '] | [' Rutile ', ' u '] |
| 3.80 | [' GGA [25] '] | [' Anatase ', ' a = b (Å) '] |
| 9.67 | [' GGA [25] '] | [' Anatase ', ' c (Å) '] |
| 0.207 | [' GGA [25] '] | [' Anatase ', ' u '] |
| 4.63 | [' GGA [26] '] | [' Rutile ', ' a = b (Å) '] |
| 2.98 | [' GGA [26] '] | [' Rutile ', ' c (Å) '] |
| 0.305 | [' GGA [26] '] | [' Rutile ', ' u '] |
| – | [' GGA [26] '] | [' Anatase ', ' a = b (Å) '] |
| – | [' GGA [26] '] | [' Anatase ', ' c (Å) '] |
| – | [' GGA [26] '] | [' Anatase ', ' u '] |
| – | [' HF [27] '] | [' Rutile ', ' a = b (Å) '] |
| – | [' HF [27] '] | [' Rutile ', ' c (Å) '] |
| – | [' HF [27] '] | [' Rutile ', ' u '] |
| 3.76 | [' HF [27] '] | [' Anatase ', ' a = b (Å) '] |
| 9.85 | [' HF [27] '] | [' Anatase ', ' c (Å) '] |
| 0.202 | [' HF [27] '] | [' Anatase ', ' u '] |
| 4.594 | [' Expt. [23] '] | [' Rutile ', ' a = b (Å) '] |
| 2.958 | [' Expt. [23] '] | [' Rutile ', ' c (Å) '] |
| 0.305 | [' Expt. [23] '] | [' Rutile ', ' u '] |
| 3.785 | [' Expt. [23] '] | [' Anatase ', ' a = b (Å) '] |
| 9.514 | [' Expt. [23] '] | [' Anatase ', ' c (Å) '] |
| 0.207 | [' Expt. [23] '] | [' Anatase ', ' u '] |
+---------+------------------+-------------------------------------+