Conversation
kgururaj
left a comment
There was a problem hiding this comment.
Good code.
General comment: Let's replace "meta" with "mapping". "meta" is overloaded so many times in the genomics area that it causes a lot of confusion.
CMakeLists.txt
Outdated
| find_library(LIBRT_LIBRARY rt) | ||
|
|
||
| #librt | ||
| find_library(LIBRT_LIBRARY rt) |
CMakeLists.txt
Outdated
|
|
||
| #pgsql driver and dbi libs | ||
| find_package(libdbi REQUIRED) | ||
| include_directories(${LIBDBI_INCLUDE_DIR}) |
There was a problem hiding this comment.
Let's make this an optional dependency - if LIBDBI_FOUND, then a preprocessor macro -DLIBDBI_FOUND should be passed (see add_definitions() for libcsv in the CMakeLists.txt).
In the source code,
#ifdef LIBDBI
//All libdbi related code
#endif
CMakeLists.txt
Outdated
| if(LIBCSV_FOUND) | ||
| target_link_libraries(${target} ${LIBCSV_LIBRARY}) | ||
| endif() | ||
| target_link_libraries(${target} ${LIBDBI_DEV_LIBRARY} ${LIBDBI1_LIBRARY} ${LIBPGSQL_DRIVER_LIBRARY}) |
cmake/Modules/Findlibdbi.cmake
Outdated
| find_path(LIBDBI_INCLUDE_DIR NAMES dbi.h HINTS "/usr/include/dbi") | ||
| find_library(LIBDBI_DEV_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu") | ||
| find_library(LIBDBI1_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu") | ||
| find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "/usr/lib/x86_64-linux-gnu/dbd") |
There was a problem hiding this comment.
These functions must have a variable LIBDBI_ROOT (similar to LIBCSV_DIR) which the user can set to point to the libdbi libraries and headers. This allows users to install the headers/libraries in non-standard locations and still compile GenomicsDB
cmake/Modules/Findlibdbi.cmake
Outdated
| find_library(LIBDBI1_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu") | ||
| find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "/usr/lib/x86_64-linux-gnu/dbd") | ||
| include(FindPackageHandleStandardArgs) | ||
| find_package_handle_standard_args(libcsv "Could not find libcsv headers and/or libraries ${DEFAULT_MSG}" LIBDBI_INCLUDE_DIR LIBDBI_DEV_LIBRARY LIBDBI1_LIBRARY LIBPGSQL_DRIVER_LIBRARY) |
|
|
||
| ~SQLBasedVidMapper(); | ||
| protected: | ||
| int num_contigs; |
There was a problem hiding this comment.
No need to store this in the object - use local variable in the function. Once data is loaded into the VidMapper objects m_contig_idx_to_info.size() will give you the number of contigs anyway.
| protected: | ||
| int num_contigs; | ||
| dbi_conn conn; | ||
| dbi_inst instance; |
There was a problem hiding this comment.
Generally, we distinguish members of a class from local variables with a naming convention.
- Google coding style recommends adding "_" at the end of variable name.
- In GenomicsDB, I have been using the Hungarian notation - prefix every member variable name with "m_"
- Snake case for variable names.
| dbi_conn_set_option(conn, "username", "postgres"); | ||
| dbi_conn_set_option(conn, "password", "postgres"); | ||
| dbi_conn_set_option(conn, "dbname", "gendb"); | ||
| dbi_conn_set_option(conn, "encoding", "UTF-8"); |
There was a problem hiding this comment.
These should be arguments to the constructor. Other arguments include:
- Workspace (string)
- Array (string)
| ##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads"> | ||
| ##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities"> | ||
| ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias"> | ||
| ##reference=file:///seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta |
There was a problem hiding this comment.
Why was this file deleted?
There was a problem hiding this comment.
Strange. I find it in my local repository. Let me check.
|
Thank you Karthik! @kgururaj |
kgururaj
left a comment
There was a problem hiding this comment.
Probably a good time to start adding CI tests in Travis. Let's start by installing the pre-requisites (libdbi-dev and libdbd-pgsql modules) and compile on Travis
example/CMakeLists.txt
Outdated
| build_GenomicsDB_executable(example_libtiledb_variant_driver) | ||
| build_GenomicsDB_executable(test_genomicsdb_bcf_generator) | ||
| build_GenomicsDB_executable(test_genomicsdb_importer) | ||
| build_GenomicsDB_executable(test_mapping_data_loader) |
There was a problem hiding this comment.
if(LIBDBI_FOUND)
build_GenomicsDB_executable(test_mapping_data_loader)
endif()
| cpp/src/utils/lut.cc | ||
| cpp/src/utils/known_field_info.cc | ||
| cpp/src/utils/vid_mapper.cc | ||
| cpp/src/utils/vid_mapper_sql.cc |
There was a problem hiding this comment.
This is not necessary. There is a macro enveloping the code which prevents code from compiling
if libdbi is not installed. Should we remove that check and place this check instead?
| dbi_conn_set_option(m_conn, "encoding", "UTF-8"); | ||
|
|
||
| if (dbi_conn_connect(m_conn) < 0) { | ||
| VERIFY_OR_THROW(1 < 0); |
There was a problem hiding this comment.
throw exception with a descriptive error message "Could not connect to DB : with "
| contig_name = dbi_result_get_string(result, "name"); | ||
|
|
||
| /** | ||
| * There is probably a constraint which prevents the following |
| class SQLBasedVidMapper : public VidMapper { | ||
| public: | ||
| SQLBasedVidMapper(const SQLVidMapperRequest&); | ||
|
|
There was a problem hiding this comment.
We should probably delete the copy constructor since there is no way to do deep copy of the libdbi structures.
If you wish to create a move constructor, you can probably copy the libdbi fields to the current object and zero out the other structure.
To delete the copy and move constructors, do:
SQLBasedVidMapper(const SQLBasedVidMapper& other) = delete;
SQLBasedVidMapper(SQLBasedVidMapper&& other) = delete;
There was a problem hiding this comment.
That is not a copy constructor. We are passing a request object.
There was a problem hiding this comment.
I realize that.
The compiler will always initialize a default copy constructor if you do not explicitly define one or delete the copy constructor. The default copy constructor will likely be incorrect since we have no idea whether the dbi* structures will be copied 'deeply' by an assignment operator. Hence, I'm suggesting to delete the copy and move constructors
.travis.yml
Outdated
| - sudo apt-get update -q | ||
| - sudo apt-get install g++-4.9 -y | ||
| - sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 60 | ||
| - sudo apt-get install libdbi-dev libdbd-pgsql |
|
|
||
| #pgsql driver and dbi libs | ||
| find_package(libdbi) | ||
| if(LIBDBI_FOUND) |
There was a problem hiding this comment.
shouldn't the libdbi dependence be optional?
There was a problem hiding this comment.
It is optional. If libdbi is not installed in an environment, the dependent files won't be compiled.
SQLVidMapper in this instance.
| MappingDataLoaderTester tester(request); | ||
|
|
||
| tester.print_contig_info(); | ||
|
|
There was a problem hiding this comment.
a test code with any assertion is meaningless. Instead of printing these values, why not check whether they match with known values?
|
|
||
| SQLBasedVidMapper::SQLBasedVidMapper(const SQLVidMapperRequest& request) : VidMapper() { | ||
| dbi_initialize_r(NULL, &m_instance); | ||
| m_conn = dbi_conn_new_r("pgsql", m_instance); |
There was a problem hiding this comment.
don't hardcode the driver name! This goes to the header file as a constant
| continue; | ||
| } | ||
|
|
||
| int64_t tiledb_column_offset = dbi_result_get_longlong(result, "tiledb_column_offset"); |
There was a problem hiding this comment.
@kgururaj, these field names are hard-coded in many places. better to use macros for all occurrences
| #vcf_metadata_and_tile | ||
| vcf_only_tile | ||
| echo "--------------------------------------" | ||
| # ----------------------------------------------------------------------------- |
kgururaj
left a comment
There was a problem hiding this comment.
Good one - let's add callset mapping query
| int load_mapping_data_from_db(); | ||
|
|
||
| std::vector<ContigInfo> get_contigs() { return(m_contig_idx_to_info); } | ||
| std::vector<ContigInfo>& get_contigs() { return(m_contig_idx_to_info); } |
There was a problem hiding this comment.
Could we move this to VidMapper?
| std::vector<std::pair<int64_t, int>>& get_contig_end_offsets() { | ||
| return(m_contig_end_2_idx); | ||
| } | ||
|
|
| ss <<"and (b.name = '" <<m_array_name <<"') and "; | ||
| ss <<"(a.reference_set_id = b.reference_set_id))"; | ||
| ss <<"(a.reference_set_id = b.reference_set_id)) "; | ||
| ss <<"order by a.tiledb_column_offset"; |
|
|
||
| SQLBasedVidMapper&& operator=(SQLBasedVidMapper&&) = delete; | ||
|
|
||
| int load_mapping_data_from_db(); |
There was a problem hiding this comment.
@rsm0001 add documentation to the class functions/members in the header file.
| ss <<"order by a.tiledb_column_offset"; | ||
|
|
||
| std::string query = ss.str(); | ||
| std::cout <<"QUERY: <" <<query <<">\n"; |
There was a problem hiding this comment.
delete hanging print statements like this one.
| @@ -0,0 +1,170 @@ | |||
| #! /bin/bash | |||
There was a problem hiding this comment.
Remove author name. Add Intel MIT license
tests/load_genomics_metadata.sh
Outdated
|
|
||
| # ----------------------------------------------------------------------------- | ||
|
|
||
| create_load_tile_cfg() { |
tests/load_genomics_metadata.sh
Outdated
| # ----------------------------------------------------------------------------- | ||
|
|
||
| create_load_tile_cfg() { | ||
| outfile="${INITDIR}/load_to_tile.cfg" |
kgururaj
left a comment
There was a problem hiding this comment.
Next review phase completed
cmake/Modules/Findlibdbi.cmake
Outdated
|
|
||
| find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "${LIBDBI_DIR}/lib/${CMAKE_C_LIBRARY_ARCHITECTURE}/dbd") | ||
|
|
||
| find_library(LIBDBI_DEV_LIBRARY NAMES dbi) |
There was a problem hiding this comment.
Should the same HINTS be passed for libdbi also?
There was a problem hiding this comment.
DEV_LIB is being found without any issues. It was necessary for PGSQL driver because the
library is in a subdirectory (dbd).
There was a problem hiding this comment.
I think it's being found because you are installing the libraries in a standard system directory (/usr ?) - if the library is installed in a non-standard location, it wouldn't be detected. Adding HINTS would fix that
There was a problem hiding this comment.
Karthik - these are libraries installed thro' package manager. Unless we build libraries from source, we shouldn't have to worry. What do you think?
|
|
||
| while (dbi_result_next_row(result)) { | ||
| int64_t field_id = dbi_result_get_longlong(result, DBTABLE_FIELD_COLUMN_ID.c_str()); | ||
| int64_t field_idx = (field_id - 1); |
There was a problem hiding this comment.
field_id returned from the DB could be much larger than the num_fields for this specific array when the mapping DB has information about multiple arrays. Computing field_idx from the field_id will cause writes to un-allocated memory in the lines below. Use an incrementing counter for field_idx.
| auto known_field_enum = 0u; | ||
| auto is_known_field = KnownFieldInfo::get_known_field_enum_for_name(field_name, known_field_enum); | ||
|
|
||
| if ((0 == length_type.compare("ERROR")) || length_type.empty()) { |
There was a problem hiding this comment.
With the std::string, you can use length_type == "ERROR"
There was a problem hiding this comment.
True. Let us leave it this way.
| get_input("pass_word", "postgres", request.pass_word); | ||
| get_input("db_name", "gendb", request.db_name); | ||
|
|
||
| get_input("work_space", "/home/rmantrix/git_repos/NomixDB/INST_GenomicsDB/tests/workspace", request.work_space); |
There was a problem hiding this comment.
Hard-coded path :(
Even if this is default - no hard-coding for workspace, array, please. localhost for mapping DB is fine.
CMakeLists.txt
Outdated
| build_GenomicsDB_executable_common(${target}) | ||
| endif() | ||
| endfunction() | ||
|
|
There was a problem hiding this comment.
Are the tests being called in the CI somewhere? if you use add_test() in CMake, they will be called when "make test" is invoked.
There was a problem hiding this comment.
Done. However, CLASSPATH and LD_LIBRARY_PATH are expected in the environment.
There was a problem hiding this comment.
Can you modify these environment variables in the .travis.yml file as needed? See the section:
env:
global:
You have to install gtest on the Travis VM - add commands to the .travis.yml. The tests didn't build because the library wasn't found.
There was a problem hiding this comment.
Also, you should add install(TARGETS ${target} RUNTIME DESTINATION bin) for the gtest executable inside the if block (after line 216)
There was a problem hiding this comment.
Added gtest installation in .travis.yml
LD_LIBRARY_PATH and CLASSPATH already exist
install TARGET is on line 203
| } | ||
| } | ||
|
|
||
| std::string combine_op = dbi_result_get_string(result, DBTABLE_FIELD_COLUMN_COMBOP.c_str()); |
There was a problem hiding this comment.
Can the result be null? Based on your PR Intel-HLS/GenomicsSampleAPIs#33, it looks like it can be
kgururaj
left a comment
There was a problem hiding this comment.
Final review - if the CI tests pass, can be merged
CMakeLists.txt
Outdated
| build_GenomicsDB_executable_common(${target}) | ||
| endif() | ||
| endfunction() | ||
|
|
There was a problem hiding this comment.
Can you modify these environment variables in the .travis.yml file as needed? See the section:
env:
global:
You have to install gtest on the Travis VM - add commands to the .travis.yml. The tests didn't build because the library wasn't found.
cmake/Modules/Findlibdbi.cmake
Outdated
|
|
||
| find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "${LIBDBI_DIR}/lib/${CMAKE_C_LIBRARY_ARCHITECTURE}/dbd") | ||
|
|
||
| find_library(LIBDBI_DEV_LIBRARY NAMES dbi) |
There was a problem hiding this comment.
I think it's being found because you are installing the libraries in a standard system directory (/usr ?) - if the library is installed in a non-standard location, it wouldn't be detected. Adding HINTS would fix that
|
Karthik - these are libraries installed thro' package manager. Unless we build libraries from source, we shouldn't have to worry. What do you think? |
|
Merge with latest master commit and I will merge the PR into the master |
…csDB into metadata_server
Karthik,
pl. review without merging. Pl. notify Kushal also, if an extra review helps. Any
suggestions to conform to current project coding standards would be appreciated.
Thanks.
Ramesh