Skip to content

thanasisn/BBand_LAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Broad Band LAP

Developed in the Laboratory of Atmospheric Physics of Thessaloniki, Greece.

To process the data from broadband instruments of LAP.

Some plots and reports should be found here.

This is partial used in operational procedures (github.com/thanasisn/CS_id is still in use).

Table of Contents

Data status overview

Name Rows Vars Values Size Fill Bytes/Value
BBDB 16847272 81 533786332 2.6 GiB 39.12% 5.33
BBDB meta 11702 82 505108 2.2 MiB 52.64% 4.52
TrackerDB 8770526 23 78860710 169.4 MiB 39.09% 2.25
TrackerDB meta 3208 9 28120 368.0 KiB 97.4% 13.4
Raw files hashes 812419 4 3249676 4.1 MiB 100% 1.32
Total 26445127 199 616429946 2.8 GiB NA% 4.91

Table: Datasets sizes on 2025-01-12

What it does

For CHP-1

  • Digest raw data
    • Signal from CHP-1
    • Tracker "async"
    • CHP-1 internal temperature from thermistor
  • Bad data ranges flagging
    • From manual set execution ranges
    • From acquisition signal physical limits
  • Converts signal to radiation
    • Computes temperature correction when possible
  • Plots
    • Overview of Clean/Dirty signal
    • Daily signal with and without dark
    • Overview of Direct radiation measurements
    • Daily Direct radiation measurements

For CM-21

  • Digest raw data
    • Signal from CHP-1
  • Bad data ranges flagging
    • From manual set execution ranges
    • From acquisition signal physical limits
  • Converts signal to radiation
  • Plots
    • Overview of Clean/Dirty signal
    • Daily signal with and without dark

Other processes

  • Quality Check of radiation data (QCRad)
    • Flags data using mainly the algorithm of C. N. Long and Y. Shi (2006)
  • Imports data from github.com/thanasisn/TSI
    • Sun_Dist_Astropy Sun - LAP distance
    • TSI_TOA TSI at TOA at LAP
    • TSI_1au TSI
    • TSI_source TSI data source
  • Imports atmospheric pressure data from proxies
    • Pressure Atmospheric pressure at LAP
    • Pressure_source Data source
  • Keeps an md5sum of all input files to check for bit rot and other data corruption.

Tools

  • inspect_days_DB.R interactive plot of some data in the DB
  • inspect_days_Lap.R interactive plot of some data from source files
  • inspect_days_Lap_sirena.R interactive plot of some data from source files

TODO

  • Fully port all to duckdb
  • Replace and compare processes from "CM_21_GLB"
    • All the major stages have been replaced
    • Secondary processes are to be ported
  • Process more instruments
  • Import libRadtran data
  • May import CSid
  • Import other references

Details

Development and Design

Some aspects on the implementation of this project.

  • We use a dataset of parquet files as a database for all measurements and additional data.
  • We are migrating the original parquet dataset scheme to Duckdb to improve overall efficiency.
  • The parquet dataset use one file for each month, this facilitates:
    • Syncing of the data between different computers.
    • Partial processing when needed without using the dataset function.
  • It should be easy to migrate to a pure database like duckdb or sqlite.
  • There are some files with extra meta data for the data in the database and the analysis performed.
  • We use features of the arrow library, and also data.table when it is more suitable or clear to code.
  • The analysis should be able to be performed with under 8Gb of RAM, but is not assured.
  • There is a trade-of with the disk usage/wearing, especially when starting from scratch.
  • New data should be easy to be added on daily base on all levels.
  • New process and analysis should be easy to added for all data.
  • Goal to become a framework for all broadband instruments data analysis and manipulation.

Documentation and usage

There is no centralized documentation for the project. Although you can refer to:

  • Readme.md or other markdown files for a relevant overview
  • Summary notes on the start of each script
  • Comments inside each script
  • Compiled reports from each script